r - Writing a function to analyze a subset within a dataframe -
i trying write function aggregate or subset data frame particular column, , count proportion of values in column within dataframe values.
specifically, relevant parts of data frame, allmutations, this:
gennumber sel 1 -0.00351647088810292 1 0.000728499401888683 1 0.0354633950503043 1 0.000209700229276244 2 6.42307549736376e-05 2 -0.0497259605114181 2 -0.000371856995145525 within each generation (gennumber), count proportion of values in “sel” greater 0.001, between -0.001 , 0.001, , less -0.001. on entire data set, i've been doing this:
ben <- allmutations$sel > 0.001 #this generations bencount <- length(which(ben==true)) totalmu <- length(ben) # #length(ben) = total # of mutants tot.pben <- bencount/totalmu #proportion what best way operation each value in gennumber? also, there easy way proportion of values in range -0.001 < sel < 0.001? couldn't figure out how it, “cheated” , took absolute value of column , looked values less 0.001. can't feel there must better way though.
thanks can give, , please let me know if can provide clarification.
dput() of data:
structure(list(gennumber = c(1l, 1l, 1l, 1l, 2l, 2l, 2l), sel = c(-0.00351647088810292, 0.000728499401888683, 0.0354633950503043, 0.000209700229276244, 6.42307549736376e-05, -0.0497259605114181, -0.000371856995145525 )), .names = c("gennumber", "sel"), class = "data.frame", row.names = c(na, -7l))
for first part, assuming data in dat, first split data gennumber:
sdat <- with(dat, split(dat, gennumber)) then write custom function comparison want
foo <- function(x, cutoff = 0.001) { sum(x[,2] > cutoff) / length(x[,2]) } and sapply() on individual chunks of data in sdat
sapply(sdat, foo) which gives:
> sapply(sdat, foo) 1 2 0.25 0.00 for sample of data.
for second part, can extend above function foo() accept upper , lower limit , computation:
bar <- function(x, upr, lwr) { sum(lwr < x[,2] & x[,2] < upr) / length(x[,2]) } which gives, [showing how pass in arguments]
> sapply(sdat, bar, lwr = -0.001, upr = 0.001) 1 2 0.5000000 0.6666667
Comments
Post a Comment