Skip to content

keep_variable vs base R  #31

@mblue9

Description

@mblue9

Heya, I just found I do not seem to get the same result with the base R and tidybulk code for the keep_variable genes e.g.

these gene ids

counts_scaled %>% 
	
	# extract 500 most variable genes
	keep_variable( .abundance = counts_scaled, top = 500)

are not the same as these gene ids

dgList <- SE2DGEList(airway)
group <- factor(dgList$samples$dex)
keep.exprs <- filterByExpr(dgList, group=group)
dgList <- dgList[keep.exprs,, keep.lib.sizes=FALSE]
dgList <- calcNormFactors(dgList)
logcounts <- cpm(dgList, log=TRUE)
var_genes <- apply(logcounts, 1, var)
select_var <- names(sort(var_genes, decreasing=TRUE))[1:500]
highly_variable_lcpm <- logcounts[select_var,]

and using your code on the tidybulk website (below) I get same 500 as with base R but again different to tidybulk

s <- rowMeans((logcounts-rowMeans(logcounts))^2)
o <- order(s,decreasing=TRUE)
x <- logcounts[o[1L:500],,drop=FALSE]

I thought it might be to do with the filtering so I tried adding in filter as below (I should probably add that filter into the workshop code) but I'm still getting about 23 genes different to the base R code

counts_scaled %>% 
filter(!lowly_abundant) %>% 
keep_variable( .abundance = counts_scaled, top=500)

Do you know why that might be?

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions