As an emerging high-resolution long-read chromosome conformation capture technique, HiFi-C currently lacks dedicated software tools. Researchers have to adapt pipelines designed for Pore-C, which may underestimate the true potential of HiFi-C data and negatively impact downstream analyses such as scaffolding. To address this issue, we have developed 3D-HiFi, a tool specifically designed for efficient scaffolding using HiFi-C data. 3D-HiFi also supports Pore-C data.
Given the substantial length of HiFi-C reads, we employed an analysis strategy involving in silico fragmentation at restriction enzyme cleavage sites prior to alignment.
Software:
3D-HiFi has been tested and validated on servers running Linux.
# (1) Download 3D-HiFi from GitHub
$ git clone https://github.com/FrasergenBioinformatics/3D-HiFi.git
# (2) Resolve dependencies
# We strongly recommend using conda to install dependencies.
$ conda env create -f environment.yml
# Activate the HapHiC conda environment
$ conda activate 3D-HiFi # or: source /path/to/conda/bin/activate 3D-HiFi
# (3) Install 3d-dna
$ git clone https://github.com/aidenlab/3d-dna.git
#Give executable permissions
$ cd 3d-dna
$ chmod +x *.sh */*usage_example: 3D-HiFi -r PATH/contig.fa -i PATH/HiFi-C.fq.gz -p '-x map-hifi' -t 30 -c 10000 -e GATC -d PATH/3d-dna -o your_species
options:
-h, --help show this help message and exit
-r, --ref contig genome
-i, --fq_in <fastq file> HiFi-C/Pore-C data
-p, --map_params <minimap2 align parameter> (if your data is Pore-C,set "-x map-ont" )
-t, --threads number of threads
-c, --chunk_size Number of records per processing chunk, If the dataset is large, you can increase the `chunk_size` parameter.
-e, --enzyme_site Enzyme recognition site: GATC (MboI/DpnII), AAGCTT (HindIII), CATG(NlaIII)
-d, --_3ddna_path 3ddna software path
-o, --output_prefix output prefix
-a, --polyploid Enable polyploid mode can rescue collapsed contigs (default: disabled)The following table summarizes a performance comparison of 3D-HiFi against other tools (wf-pore-c and Cphasing) across various biological datasets, detailing metrics such as valid reads, processing time, and memory usage.
| Dataset | Software | Valid reads | Pairs number | Contacts/Reads | Wall time | RAM |
|---|---|---|---|---|---|---|
| Ceratitis_capitata (27X) | 3D-HiFi | 1,185,394 | 11,508,130 | 8.75 | 32min | 42G |
| wf-pore-c | 1,163,280 | 3,264,519 | 2.48 | 2.5h | 39G | |
| Cphasing | 1,015,477 | 5,888,352 | 4.48 | 48min | 42G | |
| Anopheles_coluzzii (56X) | 3D-HiFi | 2,282,982 | 64,712,163 | 27.35 | 1.2h | 70G |
| wf-pore-c | 2,159,056 | 9,216,303 | 3.90 | 3.5h | 65G | |
| Cphasing | 2,258,817 | 50,310,732 | 21.27 | 1.2h | 67G | |
| Homo_sapien (26X) | 3D-HiFi | 9,198,589 | 720,830,448 | 78.28 | 1d5h | 147G |
| wf-pore-c | 8,743,704 | 46,912,773 | 5.09 | 1d7h | 61G | |
| Cphasing | 9,144,776 | 364,842,740 | 39.62 | 17.9h | 128G | |
| Plecia_longiforceps (49X) | 3D-HiFi | 15,223,238 | 143,783,824 | 6.03 | 4.7h | 96G |
| wf-pore-c | 13,474,187 | 35,116,293 | 1.47 | 11.2h | 86G | |
| Cphasing | 12,793,521 | 69,399,906 | 2.91 | 3.8h | 88G | |
| Rosa_hybrida (23X) | 3D-HiFi | 13,338,018 | 539,607,578 | 37.38 | 16.2h | 82G |
| wf-pore-c | 10,956,145 | 37,981,725 | 2.63 | 1d2h | 53G | |
| Cphasing | 7,846,147 | 53,071,524 | 3.68 | 5.6h | 71G |
Primary Output Files and Their Specifications
.
├── 01.split_minimap
│ ├── your_species.paf # minimap2 result
│ ├── your_species.len # contig size if you set --polyploid parameter
│ ├── contig.depth # depth average contig if you set --polyploid parameter
│ ├── collapsed.contig.list # collapsed contig list if you set --polyploid parameter
│ └── contig.dup.fasta # rescued contig genome if you set --polyploid parameter
├── 02.paf2mnd
│ ├── your_species.mnd.txt
│ ├── your_species.mnd.sort.txt
│ ├── dups.txt
│ ├── merged_nodups.txt # nodups mnd file
│ └── tmp
├── 02.02.paf2mnd_dup # if you set --polyploid parameter
│ ├── your_species.mnd.txt
│ ├── your_species.mnd.dup.txt # new mnd file if you set --polyploid parameter
│ ├── your_species.mnd.sort.txt
│ ├── dups.txt
│ ├── merged_nodups.txt # nodups mnd file
│ └── tmp
├── 03.3ddna
│ ├── contig.0.asm
│ ├── contig.0_asm.scaffold_track.txt
│ ├── contig.0_asm.superscaf_track.txt
│ ├── contig.0.cprops
│ ├── contig.0.assembly # Input of Juicebox to manually correct
│ ├── contig.0.hic # Input of Juicebox to manually correct
│ ├── contig.cprops
│ └── contig.mnd.txt
├── 03.3ddna_dup # use this directory's result to manually correct if you set --polyploid parameter
│ ├── temp.contig.dup.0.asm_mnd.txt
│ ├── contig.dup.0_asm.scaffold_track.txt
│ ├── contig.dup.0_asm.superscaf_track.txt
│ ├── contig.dup.0.assembly # Input of Juicebox to manually correct
│ └── contig.dup.0.hic # Input of Juicebox to manually correct
└── read.summary # reads mapping stat
For detailed instructions regarding chromosome ordering, orientation, and visualization, please see automated_orient_visualization_pipeline.
Feel free to raise an issue at the isssue page
Note: Please ask questions on the issue page first. They are also helpful to other users.
For addtional help, please send an email to wanghuan@frasergen.com or zhengshang@frasergen.com.
If you use 3D-HiFi in your work,please cite:
