r - Efficient apply or mapply for multiple matrix arguments by row -
i have 2 matrices want apply function to, rows:
matrixa gsm83009 gsm83037 gsm83002 gsm83029 gsm83041 100001_at 5.873321 5.416164 3.512227 6.064150 3.713696 100005_at 5.807870 6.810829 6.105804 6.644000 6.142413 100006_at 2.757023 4.144046 1.622930 1.831877 3.694880 matrixb gsm82939 gsm82940 gsm82974 gsm82975 100001_at 3.673556 2.372952 3.228049 3.555816 100005_at 6.916954 6.909533 6.928252 7.003377 100006_at 4.277985 4.856986 3.670161 4.075533 i've found several similar questions, not whole lot of answers: mapply matrices, multi matrix row-wise mapply?. code have splits matrices row lists, having split makes rather slow , not faster loop, considering have 9000 rows in each matrix:
scores <- mapply(t.test.stat, split(matrixa, row(matrixa)), split(matrixb, row(matrixb))) the function simple, finding t-value:
t.test.stat <- function(x, y) { return( (mean(x) - mean(y)) / sqrt(var(x)/length(x) + var(y)/length(y)) ) }
splitting matrices isn't biggest contributor evaluation time.
set.seed(21) matrixa <- matrix(rnorm(5 * 9000), nrow = 9000) matrixb <- matrix(rnorm(4 * 9000), nrow = 9000) system.time( scores <- mapply(t.test.stat, split(matrixa, row(matrixa)), split(matrixb, row(matrixb))) ) # user system elapsed # 1.57 0.00 1.58 sma <- split(matrixa, row(matrixa)) smb <- split(matrixb, row(matrixb)) system.time( scores <- mapply(t.test.stat, sma, smb) ) # user system elapsed # 1.14 0.00 1.14 look @ output rprof see of time is--not surprisingly--spent evaluating t.test.stat (mean, var, etc.). basically, there's quite bit of overhead function calls.
rprof() scores <- mapply(t.test.stat, sma, smb) rprof(null) summaryrprof() you may able find faster generalized solutions, none approach speed of vectorized solution below.
since function simple, can take advantage of vectorized rowmeans function instantaneously (though it's bit messy):
system.time({ nca <- ncol(matrixa) ncb <- ncol(matrixb) ans <- (rowmeans(matrixa)-rowmeans(matrixb)) / sqrt( rowmeans((matrixa-rowmeans(matrixa))^2)*(nca/(nca-1))/nca + rowmeans((matrixb-rowmeans(matrixb))^2)*(ncb/(ncb-1))/ncb ) }) # user system elapsed # 0 0 0 head(ans) # [1] 0.8272511 -1.0965269 0.9862844 -0.6026452 -0.2477661 1.1896181 update
here's "cleaner" version using rowvars function:
rowvars <- function(x, na.rm=false, dims=1l) { rowmeans((x-rowmeans(x, na.rm, dims))^2, na.rm, dims)*(ncol(x)/(ncol(x)-1)) } ans <- (rowmeans(matrixa)-rowmeans(matrixb)) / sqrt( rowvars(matrixa)/ncol(matrixa) + rowvars(matrixb)/ncol(matrixb) )
Comments
Post a Comment