The ZiBRA Pipeline

During the Zika project we have developed a fully integrated sequencing pipeline which is described in our paper at Nature Protocols.

This page describes in more detail the bioinformatics pipeline from that paper.

Installing Docker

For Linux:

https://www.digitalocean.com/community/tutorials/how-to-install-and-use-docker-on-ubuntu-16-04

For Windows:

Ideally Docker for Windows:

https://docs.docker.com/docker-for-windows/

Some users may need to use Docker Toolbox:

https://docs.docker.com/toolbox/toolbox_install_windows/

For Mac:

https://docs.docker.com/docker-for-mac/

Quick start

MinION pipeline

docker pull zibra/zibra:latest docker run -t -i zibra/zibra:latest /bin/bash

## run on the test sample mkdir whoref cd whoref wget https://s3.climb.ac.uk/nanopore/Zika_Control_Material_R9.4_2D.tar tar xvf Zika_Control_Material_R9.4_2D.tar fast5_to_consensus.sh ZikaAsian WHO 20161118_Zika/downloads/pass/NB08

Local offline basecalling

Currently we are recommending Oxford Nanopore’s Albacore for local base calling. If basecalling offline on Windows or Mac, we recommend this is done on the native operating system rather than in the Docker container, for reasons of speed.

We also do not recommend using Albacore demultiplexing at present, as this is rather lenient as it requires only a single barcode copy.

To basecall an R9.4 1D dataset on Windows with eight threads use, if the reads are stored in C:\data\reads\run and you want the basecals to save to C:\Users\nick\data\run then run::

read_fast5_basecaller.py --input C:\data\reads\run --worker_threads 8 -c r94_450bps_linear.cfg -s C:\Users\nick\data\run -r -o fast5

Demultiplexing

Currently we prefer Ryan Wick’s porechop, but a special version of is bundled with the Docker file used to make it compatible with nanopolish.

To demultiplex from /data/run, do something like (for reads stored in /data/run):

poretools fasta /data/run > run.fasta porechop –untrimmed -i run.fasta -b porechop-run –barcode_threshold 75 –threads 16 –check_reads 1000 –barcode_diff 2 —require_two_barcodes

Mounting a local directory

You will probably want to mount a local directory where your reads are kept.

On Windows, say my data is in C:\Users\nick\data you would run:

docker run -t -v //C/Users/nick/data:/data -i zibra/zibra:latest /bin/bash

On Mac:

docker run -t -v /Users/nick/data:/data -i zibra/zibra:latest /bin/bash

Then use Porechop (in the container) to demultiplex the reads:

Illumina pipeline- Quickstart

docker pull zibra/zibra:latest docker run -t -i zibra/zibra:latest /bin/bashwget –no-check-certificate wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR512/007/SRR5122847/SRR5122847_1.fastq.gz wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR512/007/SRR5122847/SRR5122847_2.fastq.gz illumina_pipeline.sh SRR5122847 SRR5122847_1.fastq.gz SRR5122847_2.fastq.gz ZikaAsian

Credits

The ZiBRA Pipeline was developed with contributions from:

Nick Loman (MinION pipeline)
Jared Simpson (nanopolish SNP calling)
Matt Loose (nanopore demultiplexing script)
Karthik Gangavarapu (Illumina pipeline)
Nate Grubaugh (Illumina pipeline)
Kristian Andersen (Illumina pipeline)
Trevor Bedford (help with Docker and useful fixes)

It relies on a whole heap of open source software, thank you to all contributors.