Question about “duplicate in list” error and memory usage when running qpDstat

Dear Author,

We are currently running qpDstat from the AdmixTools package using the following command:

/data/01/user157/software/AdmixTools/bin/qpDstat -p parDstat_alltest > test.out


Our parameter file is as follows:

indivname:    snp.eigenstrat.ind
snpname:      snp.eigenstrat.lower.chr.snp
genotypename: snp.eigenstrat.chr.geno
poplistname:  test.popfilename.txt
printsd:      YES
f4mode:       NO
blgSize:      .01

The .ind, .geno, and .snp files are as follows:

<img width="711" height="452" alt="Image" src="https://github.com/user-attachments/assets/2ec9d771-3839-4414-a6e1-7abd0cf4d803" />

<img width="255" height="422" alt="Image" src="https://github.com/user-attachments/assets/5b6a0df8-55a2-44fa-af86-9494a08f0d6e" />

<img width="538" height="459" alt="Image" src="https://github.com/user-attachments/assets/ef09a28a-6e96-4e35-ba72-f35b0c2b15a7" />

We have 14 species within a single genus, and for each triplet of ingroup species, we used a species from another genus (M_un) as the outgroup.
In the popfilename.txt, we therefore listed all possible three-species combinations (each with the same outgroup) and included all three permutations for each triplet, as shown in the attached example (see screenshot).

<img width="579" height="787" alt="Image" src="https://github.com/user-attachments/assets/74f445f0-d0a5-4404-b688-2284dfa01638" />

However, during execution we encountered the following error:

fatalx:
(loadlist) duplicate in list: A_ca
/slurmState/slurmSpool/slurmd/job1133823/slurm_script: line 10: 3990729 Aborted (core dumped)


Could you please let us know what might be causing this “duplicate in list” error?
We checked our population list file, but we are  sure that no duplicate population names is within a single line.

In addition, our dataset includes 278,518,514 SNPs, which seems to require very large memory.
We currently allocate 2 CPUs and 250 GB of memory, but in some runs qpDstat still exceeds memory limits.
Do you have any recommendations on the typical memory requirements or efficient settings for datasets of this size (e.g., adjusting blgSize or using a subset of SNPs)?

Thank you very much for your time and for maintaining such an important tool for population genetic analysis.

Best regards,
Na Wan

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Question about “duplicate in list” error and memory usage when running qpDstat #121

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question about “duplicate in list” error and memory usage when running qpDstat #121

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions