ดูเหมือนว่าpurrrแพ็คเกจ (0.2.2) เวอร์ชันปัจจุบันจะเป็นวิธีที่เร็วที่สุด:
by_row(x, function(v) list(v)[[1L]], .collate = "list")$.out
ลองเปรียบเทียบโซลูชันที่น่าสนใจที่สุด:
data("Batting", package = "Lahman")
x <- Batting[1:10000, 1:10]
library(benchr)
library(purrr)
benchmark(
split = split(x, seq_len(.row_names_info(x, 2L))),
mapply = .mapply(function(...) structure(list(...), class = "data.frame", row.names = 1L), x, NULL),
purrr = by_row(x, function(v) list(v)[[1L]], .collate = "list")$.out
)
Rsults:
Benchmark summary:
Time units : milliseconds
expr n.eval min lw.qu median mean up.qu max total relative
split 100 983.0 1060.0 1130.0 1130.0 1180.0 1450 113000 34.3
mapply 100 826.0 894.0 963.0 972.0 1030.0 1320 97200 29.3
purrr 100 24.1 28.6 32.9 44.9 40.5 183 4490 1.0
นอกจากนี้เรายังได้ผลลัพธ์เดียวกันด้วยRcpp:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
List df2list(const DataFrame& x) {
std::size_t nrows = x.rows();
std::size_t ncols = x.cols();
CharacterVector nms = x.names();
List res(no_init(nrows));
for (std::size_t i = 0; i < nrows; ++i) {
List tmp(no_init(ncols));
for (std::size_t j = 0; j < ncols; ++j) {
switch(TYPEOF(x[j])) {
case INTSXP: {
if (Rf_isFactor(x[j])) {
IntegerVector t = as<IntegerVector>(x[j]);
RObject t2 = wrap(t[i]);
t2.attr("class") = "factor";
t2.attr("levels") = t.attr("levels");
tmp[j] = t2;
} else {
tmp[j] = as<IntegerVector>(x[j])[i];
}
break;
}
case LGLSXP: {
tmp[j] = as<LogicalVector>(x[j])[i];
break;
}
case CPLXSXP: {
tmp[j] = as<ComplexVector>(x[j])[i];
break;
}
case REALSXP: {
tmp[j] = as<NumericVector>(x[j])[i];
break;
}
case STRSXP: {
tmp[j] = as<std::string>(as<CharacterVector>(x[j])[i]);
break;
}
default: stop("Unsupported type '%s'.", type2name(x));
}
}
tmp.attr("class") = "data.frame";
tmp.attr("row.names") = 1;
tmp.attr("names") = nms;
res[i] = tmp;
}
res.attr("names") = x.attr("row.names");
return res;
}
ตอนนี้เปรียบเทียบกับpurrr:
benchmark(
purrr = by_row(x, function(v) list(v)[[1L]], .collate = "list")$.out,
rcpp = df2list(x)
)
ผล:
Benchmark summary:
Time units : milliseconds
expr n.eval min lw.qu median mean up.qu max total relative
purrr 100 25.2 29.8 37.5 43.4 44.2 159.0 4340 1.1
rcpp 100 19.0 27.9 34.3 35.8 37.2 93.8 3580 1.0
splitแต่ละองค์ประกอบจะมีประเภทdata.frame with 1 rows and N columnsแทนlist of length N