Skip to content

FrasergenBioinformatics/3D-HiFi

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

3D-HiFi: A fast and accurate pipeline for chromosome-scale scaffolding from HiFi-C data

Introduction

  As an emerging high-resolution long-read chromosome conformation capture technique, HiFi-C currently lacks dedicated software tools. Researchers have to adapt pipelines designed for Pore-C, which may underestimate the true potential of HiFi-C data and negatively impact downstream analyses such as scaffolding. To address this issue, we have developed 3D-HiFi, a tool specifically designed for efficient scaffolding using HiFi-C data. 3D-HiFi also supports Pore-C data.

Overview

  Given the substantial length of HiFi-C reads, we employed an analysis strategy involving in silico fragmentation at restriction enzyme cleavage sites prior to alignment.

workflow

Table of contents

Dependencies

Software:

Installation

3D-HiFi has been tested and validated on servers running Linux.

# (1) Download 3D-HiFi from GitHub
$ git clone https://github.com/FrasergenBioinformatics/3D-HiFi.git
# (2) Resolve dependencies
# We strongly recommend using conda to install dependencies. 
$ conda env create -f environment.yml
# Activate the HapHiC conda environment
$ conda activate 3D-HiFi # or: source /path/to/conda/bin/activate 3D-HiFi
# (3) Install 3d-dna
$ git clone https://github.com/aidenlab/3d-dna.git
#Give executable permissions
$ cd 3d-dna
$ chmod +x  *.sh */*

Quick start

usage_example: 3D-HiFi -r PATH/contig.fa -i PATH/HiFi-C.fq.gz -p '-x map-hifi' -t 30 -c 10000 -e GATC -d PATH/3d-dna -o your_species

options:
  -h, --help           show this help message and exit
  -r, --ref            contig genome
  -i, --fq_in          <fastq file>  HiFi-C/Pore-C data
  -p, --map_params     <minimap2 align parameter> (if your data is Pore-C,set "-x map-ont" )
  -t, --threads        number of threads
  -c, --chunk_size     Number of records per processing chunk, If the dataset is large, you can increase the `chunk_size` parameter.
  -e, --enzyme_site    Enzyme recognition site: GATC (MboI/DpnII), AAGCTT (HindIII), CATG(NlaIII)
  -d, --_3ddna_path    3ddna software path
  -o, --output_prefix  output prefix
  -a, --polyploid      Enable polyploid mode can rescue collapsed contigs  (default: disabled)

Comparison

The following table summarizes a performance comparison of 3D-HiFi against other tools (wf-pore-c and Cphasing) across various biological datasets, detailing metrics such as valid reads, processing time, and memory usage.

Dataset Software Valid reads Pairs number Contacts/Reads Wall time RAM
Ceratitis_capitata (27X) 3D-HiFi 1,185,394 11,508,130 8.75 32min 42G
wf-pore-c 1,163,280 3,264,519 2.48 2.5h 39G
Cphasing 1,015,477 5,888,352 4.48 48min 42G
Anopheles_coluzzii (56X) 3D-HiFi 2,282,982 64,712,163 27.35 1.2h 70G
wf-pore-c 2,159,056 9,216,303 3.90 3.5h 65G
Cphasing 2,258,817 50,310,732 21.27 1.2h 67G
Homo_sapien (26X) 3D-HiFi 9,198,589 720,830,448 78.28 1d5h 147G
wf-pore-c 8,743,704 46,912,773 5.09 1d7h 61G
Cphasing 9,144,776 364,842,740 39.62 17.9h 128G
Plecia_longiforceps (49X) 3D-HiFi 15,223,238 143,783,824 6.03 4.7h 96G
wf-pore-c 13,474,187 35,116,293 1.47 11.2h 86G
Cphasing 12,793,521 69,399,906 2.91 3.8h 88G
Rosa_hybrida (23X) 3D-HiFi 13,338,018 539,607,578 37.38 16.2h 82G
wf-pore-c 10,956,145 37,981,725 2.63 1d2h 53G
Cphasing 7,846,147 53,071,524 3.68 5.6h 71G

Output Files

Primary Output Files and Their Specifications

.
├── 01.split_minimap
│   ├── your_species.paf        # minimap2 result
│   ├── your_species.len        # contig size if you set --polyploid parameter
│   ├── contig.depth            # depth average contig if you set --polyploid parameter
│   ├── collapsed.contig.list   # collapsed contig list if you set --polyploid parameter
│   └── contig.dup.fasta        # rescued contig genome if you set --polyploid parameter
├── 02.paf2mnd
│   ├── your_species.mnd.txt
│   ├── your_species.mnd.sort.txt
│   ├── dups.txt
│   ├── merged_nodups.txt        # nodups mnd file
│   └── tmp
├── 02.02.paf2mnd_dup            # if you set --polyploid parameter
│   ├── your_species.mnd.txt
│   ├── your_species.mnd.dup.txt # new mnd file if you set --polyploid parameter
│   ├── your_species.mnd.sort.txt
│   ├── dups.txt
│   ├── merged_nodups.txt        # nodups mnd file
│   └── tmp
├── 03.3ddna
│   ├── contig.0.asm
│   ├── contig.0_asm.scaffold_track.txt
│   ├── contig.0_asm.superscaf_track.txt
│   ├── contig.0.cprops
│   ├── contig.0.assembly  # Input of Juicebox to manually correct
│   ├── contig.0.hic       # Input of Juicebox to manually correct
│   ├── contig.cprops
│   └── contig.mnd.txt
├── 03.3ddna_dup          # use this directory's result to manually correct if you set --polyploid parameter
│   ├── temp.contig.dup.0.asm_mnd.txt
│   ├── contig.dup.0_asm.scaffold_track.txt
│   ├── contig.dup.0_asm.superscaf_track.txt
│   ├── contig.dup.0.assembly  # Input of Juicebox to manually correct
│   └── contig.dup.0.hic       # Input of Juicebox to manually correct
└── read.summary          # reads mapping stat

Get help

Help

For detailed instructions regarding chromosome ordering, orientation, and visualization, please see automated_orient_visualization_pipeline.

Feel free to raise an issue at the isssue page

Note: Please ask questions on the issue page first. They are also helpful to other users.

Contact

For addtional help, please send an email to wanghuan@frasergen.com or zhengshang@frasergen.com.

Citating

If you use 3D-HiFi in your work,please cite:

About

A fast and accurate pipeline for chromosome-scale scaffolding from HiFi-C data

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •