-
Notifications
You must be signed in to change notification settings - Fork 10
Open
Description
Heya, I just found I do not seem to get the same result with the base R and tidybulk code for the keep_variable genes e.g.
these gene ids
counts_scaled %>%
# extract 500 most variable genes
keep_variable( .abundance = counts_scaled, top = 500)
are not the same as these gene ids
dgList <- SE2DGEList(airway)
group <- factor(dgList$samples$dex)
keep.exprs <- filterByExpr(dgList, group=group)
dgList <- dgList[keep.exprs,, keep.lib.sizes=FALSE]
dgList <- calcNormFactors(dgList)
logcounts <- cpm(dgList, log=TRUE)
var_genes <- apply(logcounts, 1, var)
select_var <- names(sort(var_genes, decreasing=TRUE))[1:500]
highly_variable_lcpm <- logcounts[select_var,]
and using your code on the tidybulk website (below) I get same 500 as with base R but again different to tidybulk
s <- rowMeans((logcounts-rowMeans(logcounts))^2)
o <- order(s,decreasing=TRUE)
x <- logcounts[o[1L:500],,drop=FALSE]
I thought it might be to do with the filtering so I tried adding in filter as below (I should probably add that filter into the workshop code) but I'm still getting about 23 genes different to the base R code
counts_scaled %>%
filter(!lowly_abundant) %>%
keep_variable( .abundance = counts_scaled, top=500)
Do you know why that might be?
Metadata
Metadata
Assignees
Labels
No labels