People love graphs and plots but I once the data begins to collect many factors and volume increases then it's a must for pivot tables. Now, the main reason I love R is because you can do the same-thing multiple ways. Depending on how I am feeling and the size of the data my methods change.
Pivot Methods
# lets build some data roughly 20 million records will suffice.
test =structure(list(ID = c(23L, 23L, 23L, 23L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L),
CT = structure(c(1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 1L, 2L, 3L),
.Label = c("A", "B", "D"), class = "factor")), .Names = c("ID", "CT"),
class = "data.frame", row.names = c(NA, -18L))
for(i in 1:20){
test = rbind(test,test)
print(formatC(nrow(test), format="d", big.mark=','))
}
# R df table function
start = Sys.time()
df = as.data.frame.matrix(table(test$ID,test$CT))
df$ID = rownames(df)
df = df[ ,c(4,2,3,1)]
end = Sys.time()
end - start
# reshape cast
start = Sys.time()
test$CTVal = 1
subjmeans <- cast(test, ID~CT)
end = Sys.time()
end - start
Pivot Methods
- sql (sqldf)
- data frame table function
- reshapes cast
- combination of data.table and reshape
# lets build some data roughly 20 million records will suffice.
test =structure(list(ID = c(23L, 23L, 23L, 23L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L),
CT = structure(c(1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 1L, 2L, 3L),
.Label = c("A", "B", "D"), class = "factor")), .Names = c("ID", "CT"),
class = "data.frame", row.names = c(NA, -18L))
for(i in 1:20){
test = rbind(test,test)
print(formatC(nrow(test), format="d", big.mark=','))
}
# R df table function
start = Sys.time()
df = as.data.frame.matrix(table(test$ID,test$CT))
df$ID = rownames(df)
df = df[ ,c(4,2,3,1)]
end = Sys.time()
end - start
# reshape cast
start = Sys.time()
test$CTVal = 1
subjmeans <- cast(test, ID~CT)
end = Sys.time()
end - start