General question about usage

Hi Mark and Bida,

Thank you for your prompt response regarding the generation of motif sets. I opened a new issue to ask additional questions about the usage.

1. According to the help manual, vamos includes four subcommands:
```
vamos --contig [-b in.bam] [-r vntrs_region_motifs.bed] [-o output.vcf] [-s sample_name] [-t threads]
vamos --read [-b in.bam] [-r vntrs_region_motifs.bed] [-o output.vcf] [-s sample_name] [-t threads] [-p phase_flank]
vamos --somatic [-b in.bam] [-r vntrs_region_motifs.bed] [-o output.vcf] [-s sample_name] [-t threads] [-p phase_flank]
vamos -m [verison of efficient motif set]
```

So far, only `contig` and `read` have been introduced in the documentation. I would like to ask whether `somatic` is intended for detecting somatic instability. I would greatly appreciate it if you could provide more details on the purpose and use cases of `somatic` and `m` subcommands.

2. Does vamos provide read-level information corresponding to each haplotype, as Mark mentioned that the tool partitions reads prior to annotating tandem repeats? If so, is it possible to examine the supporting reads for each allele? This would help in refining the motif set and improving the annotation by rerunning with additional or more appropriate motifs.

3.  I applied vamos to several mock samples, but the genotyping results did not match the expected values.

For example, in a mock sample with 16 RU, I executed the following command: `vamos --read -b C9ORF72_1.sorted.bam -r C9ORF72.tsv -s C9ORF72_1 -o C9ORF72_1.vcf`

The corresponding C9ORF72.tsv file was:
```
chr9	27573528	27573546	GGCCCC
```
However, the VCF output was:
```
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	C9ORF72_1
chr9	27573528	.	N	<VNTR>	.	PASS	END=27573546;RU=GGCCCC;SVTYPE=VNTR;ALTANNO_H1=0-0-0-0-0-0;LEN_H1=6;	GT	1/1
```
In this case, the reported length (LEN_H1=6) is inconsistent with the expected repeat count of 16 units.
I am wondering if this discrepancy could be due to an issue with my input or parameter settings. Could you please advise whether there is anything I might need to modify or check?

4. In the VCF output, I noticed that the ID column always shows a dot (.). Is there any way to customize or assign a specific identifier to each VNTR record in the output?

5.  As mentioned earlier, the tool is currently limited to diploid genotyping. Does this imply that mosaicism detection is also not supported under the current framework? If so, are there any recommended strategies or alternative approaches for detecting or visualizing mosaic repeat structures using vamos?

I would greatly appreciate any insights or suggestions regarding the questions above.

Best,
Hsin


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

General question about usage #20

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

General question about usage #20

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions