r - Writing a function to analyze a subset within a dataframe -


i trying write function aggregate or subset data frame particular column, , count proportion of values in column within dataframe values.

specifically, relevant parts of data frame, allmutations, this:

gennumber   sel    1          -0.00351647088810292   1           0.000728499401888683   1           0.0354633950503043   1           0.000209700229276244   2           6.42307549736376e-05   2          -0.0497259605114181   2          -0.000371856995145525   

within each generation (gennumber), count proportion of values in “sel” greater 0.001, between -0.001 , 0.001, , less -0.001. on entire data set, i've been doing this:

ben <- allmutations$sel > 0.001      #this generations                 bencount <- length(which(ben==true))  totalmu <- length(ben) #             #length(ben) = total # of mutants tot.pben <- bencount/totalmu         #proportion 

what best way operation each value in gennumber? also, there easy way proportion of values in range -0.001 < sel < 0.001? couldn't figure out how it, “cheated” , took absolute value of column , looked values less 0.001. can't feel there must better way though.

thanks can give, , please let me know if can provide clarification.

dput() of data:

structure(list(gennumber = c(1l, 1l, 1l, 1l, 2l, 2l, 2l), sel = c(-0.00351647088810292,  0.000728499401888683, 0.0354633950503043, 0.000209700229276244,  6.42307549736376e-05, -0.0497259605114181, -0.000371856995145525 )), .names = c("gennumber", "sel"), class = "data.frame", row.names = c(na,  -7l)) 

for first part, assuming data in dat, first split data gennumber:

sdat <- with(dat, split(dat, gennumber)) 

then write custom function comparison want

foo <- function(x, cutoff = 0.001) {     sum(x[,2] > cutoff) / length(x[,2]) } 

and sapply() on individual chunks of data in sdat

sapply(sdat, foo) 

which gives:

> sapply(sdat, foo)    1    2  0.25 0.00 

for sample of data.

for second part, can extend above function foo() accept upper , lower limit , computation:

bar <- function(x, upr, lwr) {     sum(lwr < x[,2] & x[,2] < upr) / length(x[,2]) } 

which gives, [showing how pass in arguments]

> sapply(sdat, bar, lwr = -0.001, upr = 0.001)         1         2  0.5000000 0.6666667 

Comments

Popular posts from this blog

php - What is the difference between $_SERVER['PATH_INFO'] and $_SERVER['ORIG_PATH_INFO']? -

fortran - Function return type mismatch -

queue - mq_receive: message too long -