r - Efficient apply or mapply for multiple matrix arguments by row -

- September 15, 2012

i have 2 matrices want apply function to, rows:

matrixa            gsm83009  gsm83037  gsm83002  gsm83029  gsm83041 100001_at  5.873321  5.416164  3.512227  6.064150  3.713696 100005_at  5.807870  6.810829  6.105804  6.644000  6.142413 100006_at  2.757023  4.144046  1.622930  1.831877  3.694880  matrixb           gsm82939 gsm82940 gsm82974 gsm82975 100001_at 3.673556 2.372952 3.228049 3.555816 100005_at 6.916954 6.909533 6.928252 7.003377 100006_at 4.277985 4.856986 3.670161 4.075533

i've found several similar questions, not whole lot of answers: mapply matrices, multi matrix row-wise mapply?. code have splits matrices row lists, having split makes rather slow , not faster loop, considering have 9000 rows in each matrix:

scores <- mapply(t.test.stat, split(matrixa, row(matrixa)), split(matrixb, row(matrixb)))

the function simple, finding t-value:

t.test.stat <- function(x, y) {     return( (mean(x) - mean(y)) / sqrt(var(x)/length(x) + var(y)/length(y)) ) }

splitting matrices isn't biggest contributor evaluation time.

set.seed(21) matrixa <- matrix(rnorm(5 * 9000), nrow = 9000) matrixb <- matrix(rnorm(4 * 9000), nrow = 9000)  system.time( scores <- mapply(t.test.stat,     split(matrixa, row(matrixa)), split(matrixb, row(matrixb))) ) #    user  system elapsed  #    1.57    0.00    1.58  sma <- split(matrixa, row(matrixa)) smb <- split(matrixb, row(matrixb)) system.time( scores <- mapply(t.test.stat, sma, smb) ) #    user  system elapsed  #    1.14    0.00    1.14

look @ output rprof see of time is--not surprisingly--spent evaluating t.test.stat (mean, var, etc.). basically, there's quite bit of overhead function calls.

rprof() scores <- mapply(t.test.stat, sma, smb) rprof(null) summaryrprof()

you may able find faster generalized solutions, none approach speed of vectorized solution below.

since function simple, can take advantage of vectorized rowmeans function instantaneously (though it's bit messy):

system.time({ nca <- ncol(matrixa) ncb <- ncol(matrixb) ans <- (rowmeans(matrixa)-rowmeans(matrixb)) /   sqrt( rowmeans((matrixa-rowmeans(matrixa))^2)*(nca/(nca-1))/nca +         rowmeans((matrixb-rowmeans(matrixb))^2)*(ncb/(ncb-1))/ncb ) }) #    user  system elapsed  #      0       0       0  head(ans) # [1]  0.8272511 -1.0965269  0.9862844 -0.6026452 -0.2477661  1.1896181

update
here's "cleaner" version using rowvars function:

rowvars <- function(x, na.rm=false, dims=1l) {   rowmeans((x-rowmeans(x, na.rm, dims))^2, na.rm, dims)*(ncol(x)/(ncol(x)-1)) } ans <- (rowmeans(matrixa)-rowmeans(matrixb)) /   sqrt( rowvars(matrixa)/ncol(matrixa) + rowvars(matrixb)/ncol(matrixb) )

Search This Blog

Score

r - Efficient apply or mapply for multiple matrix arguments by row -

Comments

Post a Comment

Popular posts from this blog

how to build hyperlink for query string in php -

php - What is the difference between $_SERVER['PATH_INFO'] and $_SERVER['ORIG_PATH_INFO']? -

queue - mq_receive: message too long -