keep_variable vs base R 

Heya, I just found I do not seem to get the same result with the base R and tidybulk code for the keep_variable genes e.g. 

these gene ids
```
counts_scaled %>% 
	
	# extract 500 most variable genes
	keep_variable( .abundance = counts_scaled, top = 500)
```

are not the same as these gene ids
```
dgList <- SE2DGEList(airway)
group <- factor(dgList$samples$dex)
keep.exprs <- filterByExpr(dgList, group=group)
dgList <- dgList[keep.exprs,, keep.lib.sizes=FALSE]
dgList <- calcNormFactors(dgList)
logcounts <- cpm(dgList, log=TRUE)
var_genes <- apply(logcounts, 1, var)
select_var <- names(sort(var_genes, decreasing=TRUE))[1:500]
highly_variable_lcpm <- logcounts[select_var,]
```
and using your code on the tidybulk website (below) I get same 500 as with base R but again different to tidybulk
```
s <- rowMeans((logcounts-rowMeans(logcounts))^2)
o <- order(s,decreasing=TRUE)
x <- logcounts[o[1L:500],,drop=FALSE]
```
I thought it might be to do with the filtering so I tried adding in filter as below (I should probably add that filter into the workshop code) but I'm still getting about 23 genes different to the base R code
```
counts_scaled %>% 
filter(!lowly_abundant) %>% 
keep_variable( .abundance = counts_scaled, top=500)
```
Do you know why that might be?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

keep_variable vs base R #31

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

keep_variable vs base R #31

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions