NUCwave

NUCwave is a wavelet-based bioinformatic tool that generates nucleosome occupation maps from sequence reads generated by:

  • Chromatin digestion with micrococcal nuclease (MNase-seq)
  • Chemical cleavage of chromatin (CC-seq)
  • Chromatin inmunoprecipitation (ChIP-seq) and fragmentation by sonication

NUCwave can process datasets by sequencing protocols:

  • Single-read (SR)
  • Paired-end (PE)

NUCwave requires only two inputs:

  • The dataset of aligned sequence reads
  • The sequence of the reference genome

NUCwave is implemented in Python and is distributed under GPLv3 license.

Citing NUCwave

If you use NUCwave, please cite :

Luis Quintales, Enrique Vázquez and Francisco Antequera (2015)
Briefings in Bioinformatics 16: 576-587
_images/all.jpg

Download

Source Code

Dependencies:

Manual

nucwave_sr.py

Direct generation of nucleosome maps from single-read MNase-seq, ChIP-seq, and CC-seq (chemical cleavage) experiments. For chemical cleavage experiments -c option must be used (see Example 3). Paired-end experiments can also be analyzed as single-read (see Example 3).

$ python nucwave_sr.py -h
usage: python nucwave_sr.py [-h] [-w] [-c] [-o OUTDIR] -g GENOMEFILE -a
                            ALIGNFILE -p PREFIXNAME

optional arguments:
  -h, --help      show this help message and exit
  -w, --wigfiles  write intermediate wig files
  -c, --chemical  chemical cleavage
  -o OUTDIR       output directory
  -g GENOMEFILE   FASTA genome file
  -a ALIGNFILE    BOWTIE alignment file
  -p PREFIXNAME   Prefix name for result files

Alignment file format

Each line of the input alignment file must contain 4 fields separated by a tab character:

strand  chromosome  genomic_coordinate  sequence

Example:

+     chrX    239356  NAAGCAAACACCTTTCTTAAGCTGTTGGTGCAAAAAAGGA
-     chrIII  189611  CGTTCAGAAATGGCAGAACCTACAGTGACAGATTTGCGTN
-     chrXI   252551  AGAGATACTTTCTAAACTTAGATTGTCACCAGAAAATCCN
-     chrXV   427200  TTCGATTTCGTTTATATCAGACATTTTTAGTTTCTATTCN
+     chrII   727006  NATACTTCTTTAGTTAATTGTTTAACAGTTTTGGGGTCAT
-     chrXV   58063   CGCTAGTATCAGCGGGTCTAGAATTTGATCCGGTTTCCAN
-     chrXV   32376   CCCTACTAACTAATTCATAGCAAAATTCAGAACTTATCCN
+     chrXV   818987  NGAAGATGATAAGGTTTTACGTACCAGCAACGGTGGTAAC
+     chrIV   1087492 NGGCAAAATAAACTAGTCTAGCTAATATTCGACTAAATTG
+     chrXVI  138142  NCGATGGATTAATGATTACTTGTGAAAAATTAGAAAAAAC

Output file extensions

Extension Content
cut_p Cleavage site (+ strand)
cut_m Cleavage site (- strand)
depth_p Depth coverage (+ strand)
depth_m Depth coverage (- strand)
depth_p_wl Depth coverage wavelet denoised (+ strand)
depth_p_wl Depth coverage wavelet denoised (- strand)
depth_c NOM
depth_c_wl NOM wavelet denoised
depth_c_wl_norm NOM wavelet denoised and normalized

nucwave_pe.py

Direct generation of nucleosome maps from paired-end MNase-seq, ChIP-seq, and CC-seq (chemical cleavage) experiments. For chemical cleavage experiments, this program generate a map of linker positions (see Example 3)

$ python nucwave_pe.py -h
usage: python nucwave_pe.py [-h] [-w] [-o OUTDIR] -g GENOMEFILE -a ALIGNFILE -p PREFIXNAME

optional arguments:
  -h, --help      show this help message and exit
  -w, --wigfiles  write intermediate wig files
  -o OUTDIR       output directory
required arguments:
  -g GENOMEFILE   FASTA genome file
  -a ALIGNFILE    Alignment file
  -p PREFIXNAME   Prefix name for result files

Alignment file format

Each line of the input alignment file must contain 4 fields separated by a tab character:

strand  chromosome  genomic_coordinate  sequence

Paired end reads for the same sequenced tag must be contiguous.

Example:

+       chrX    537898  TTTGATATTTTGTCATATTATCCTATTATTTTATCAATCC
-       chrX    538002  TATGTGATAATATACTAGTAACATGAATACTACTAAATGA
+       chrIII  71367   ACGATGATTCAGTTCGCCTTCTATCCTTTGTTTACGTATT
-       chrIII  71476   TCAATCCTTCTTTTGCTTCCATATTTACCATGTGGACCCT
+       chrX    88383   TGACACCTTTTCCCAAAACTTCTGTGAAGTTTTGCTCAAT
-       chrX    88505   GTTTCAATCTTATGGAATTCACAATGAAGCATCCCTTCCT
+       chrXIV  616010  TGAAGTTGCCAAGAGGCTTCAAAACATGATGCTCAGCTCC
-       chrXIV  616119  CTAGGGTTATAGTGTTCAGACTTGAGGTTGAACATATCCA
+       chrXV   1069765 TGAACGCCATACTGAGTAGAGTTAAACAATTGGTTAGCTA
-       chrXV   1069877 TAGCAAAAGGAACCATAGCATCACCAATGGATTGAGGGTT
+       chrII   669920  AGTTCGTATTGCAAACTGAAGAGTATGCTTCTTTTTAGTG
-       chrII   670006  TCCCATAAAATCCTCTCTTTCATAGAAGTCCGATTTCATT
+       chrXII  729928  TGAATCCCACTGGACTGTTTGTGGTGCGTTTGCTGCCGCT

Output file extensions

Extension Content
cut_p Cleavage site (+ strand)
cut_m Cleavage site (- strand)
depth_complete_PE Depth coverage for complete sequence
PE_center Complete sequence center count
depth_trimmed_PE NOM (Depth coverage for trimmed sequence)
depth_wl_trimmed_PE NOM wavelet denoised and normalized
historeadsize Frequency of PE fragment sizes

Example 1: MNase-seq single-reads

Source

Original publication: Tsui K, Dubuis S, Gebbia M, Morse RH, Barkai N, Tirosh I, Nislow C. Evolution of nucleosome occupancy: conservation of global properties and divergence of gene-specific patterns. Mol Cell Biol 2011; 31:4348-55. PMID: 21896781

NGS data:

Pipeline

To reproduce this pipeline Bowtie 1 must be previously installed, but note you are free to use your preferred aligner software to generate the input files for nucwave.

$ wget http://nucleosome.usal.es/nucwave/data/S288C_reference_sequence_R64-1-1_20110203.fsa

more info S. cerevisiae reference genome was downloaded from SGD and FASTA headers for chromosome names were replaced with chrI-chrXVI.

$ bowtie-build S288C_reference_sequence_R64-1-1_20110203.fsa SC

more info Remember that Bowtie indexes creation step must be done only once for each genome reference used. For a complete description of use, refer to the manual page for bowtie-build

$ wget ftp://ftp.ddbj.nig.ac.jp/ddbj_database/dra/fastq/SRA023/SRA023538/SRX025556/SRR063958.fastq.bz2
$ bunzip2 SRR063958.fastq.bz2

more info The DNA Data Bank of Japan repository allows the download of compressed FASTQ files.

$ bowtie --suppress 1,6,7,8 -t -p 6 -m 1 -v 2 SC SRR063958.fastq SRR063958.bowtie 2> SRR063958.bowtieout
more info The parameters chosen to run Bowtie are critical to the obtention of the final alignment and must be adapted to your experiment. For a complete description of all parameters refer to the Bowtie manual. For this example the parameters are:
  • -m 1: Suppress all alignments for a particular read if more than 1 reportable alignments exist for it.
  • -v 2: Report alignments with at most 2 mismatch.
  • --suppress 1,6,7,8: Suppress columns of output not used in the NUCwave input file
  • -p 6: Launch 6 parallel search threads
  • -t: Print the amount of wall-clock time taken by each phase.
$ python nucwave_sr.py -g S288C_reference_sequence_R64-1-1_20110203.fsa -a SRR063958.bowtie -o Tsui2011_results -p Tsui2011 -w
Reading and processing genome
  Time taken: 0 seconds
Reading and processing alignment file
  Number of reads processed: 27145510
  Time taken: 81 seconds
Writing intermediate files: cut points per strand
  Time taken: 39 seconds
Depth coverage calculation
  Time taken: 15 seconds
Depth coverage denoising
  Time taken: 3 seconds
Writing intermediate files: depth coverage per strand
  Time taken: 36 seconds
Mean depth coverage calculation
Writing intermediate files: Depth coverage wavelet denoised per strand
  Mean depth coverage: 46
  Time taken: 81 seconds
Shift calculation
  Shifting value 42 (from 19069 measures)
  Time taken: 11 seconds
Signal integration from strand signals
  Time taken: 7 seconds
Writing intermediate files: Integrated depth coverage
  Time taken: 10 seconds
Depth coverage wavelet denoising
  Time taken: 2 seconds
Mean depth coverage calculation
Writing intermediate files: depth coverage wavelet denoised
  Mean depth coverage: 92
  Time taken: 39 seconds
Writing results file: depth coverage wavelet denoised and normalized
  Time taken: 16 seconds

more info Computer used: MacPro3.1 with a Quad-Core Intel Xeon (2.8GHz) and 12 GB of RAM. Note that without the -w argument, only the final NOM wavelet denoised and normalized file is written.

Click on the image to see the results in a GBrowse server.

_images/Tsui.jpg

more info To explore locally the wig files we recommend the installation of the Integrate Genome Browser. Select specie (Saccharomyces cerevisiae), genome version (S_cerevisiae_Apr_2011), and open the generated wig files you want.

Example 2: MNase-seq paired-end reads

Source

Original publication: Cole HA, Howard BH, Clark DJ. Activation-induced disruption of nucleosome position clusters on the coding regions of Gcn4-dependent genes extends into neighbouring genes. Nucleic Acids Res 2011; 39:9521-35. PMID: 21880600

NGS data:

Pipeline

To reproduce this pipeline Bowtie 1 must be previously installed, but note you are free to use your preferred aligner software to generate the input files for nucwave.

$ wget http://nucleosome.usal.es/nucwave/data/S288C_reference_sequence_R64-1-1_20110203.fsa

more info S. cerevisiae reference genome was downloaded from SGD and FASTA headers for chromosome names were replaced with chrI-chrXVI.

$ bowtie-build S288C_reference_sequence_R64-1-1_20110203.fsa SC

more info Remember that Bowtie indexes creation step must be done only once for each genome reference used. For a complete description of use, refer to the manual page for bowtie-build

$ wget ftp://ftp.ddbj.nig.ac.jp/ddbj_database/dra/fastq/SRA029/SRA029255/SRX038807/SRR094649_1.fastq.bz2
$ wget ftp://ftp.ddbj.nig.ac.jp/ddbj_database/dra/fastq/SRA029/SRA029255/SRX038807/SRR094649_2.fastq.bz2
$ bunzip2 SRR094649_1.fastq.bz2
$ bunzip2 SRR094649_2.fastq.bz2

more info The DNA Data Bank of Japan repository allows the download of compressed FASTQ files. For PE sequencing two files with each end read are provided.

$ bowtie --suppress 1,6,7,8 -t --fr -p 6 -m 1 -v 2 SC -1 SRR094649_1.fastq -2 SRR094649_2.fastq SRR094649.bowtie 2> SRR094649.bowtieout
more info The parameters chosen to run Bowtie are critical to the obtention of the final alignment and must be adapted to your experiment. For a complete description of all parameters refer to the Bowtie manual. For this example the parameters are:
  • -m 1: Suppress all alignments for a particular read if more than 1 reportable alignments exist for it.
  • -v 2: Report alignments with at most 2 mismatch.
  • --suppress 1,6,7,8: Suppress columns of output not used in the NUCwave input file
  • -p 6: Launch 6 parallel search threads
  • -t: Print the amount of wall-clock time taken by each phase.
  • -1 and -2: Each of the files for paired end reads
  • --fr: The upstream/downstream mate orientations for a valid paired-end alignment against the forward reference strand.
$ python nucwave_pe.py -g S288C_reference_sequence_R64-1-1_20110203.fsa -a SRR094649.bowtie -o Cole2011_results -p Cole2011 -w
Reading and processing genome
      Time taken: 0 seconds
Reading and processing alignment file
  Number of reads processed: 19208336
  Time taken: 142 seconds
Writing intermediate files: fragment size histogram
  Time taken: 0 seconds
Writing intermediate files: cut points per strand
  Time taken: 37 seconds
Writing intermediate files: depth coverage for complete PE reads
  Time taken: 21 seconds
Writing intermediate files: PE center count
  Time taken: 19 seconds
Depth coverage for trimmed PE reads
Writing intermediate files: depth coverage for trimmed PE reads
  Time taken: 23 seconds
Depth coverage wavelet denoising
  Time taken: 2 seconds
Mean depth coverage calculation
  Mean depth coverage: 67
  Time taken: 20 seconds
Writing results file: Depth coverage wavelet denoised and normalized
  Time taken: 18 seconds

more info Computer used: MacPro3.1 with a Quad-Core Intel Xeon (2.8GHz) and 12 GB of RAM. Note that without the -w argument, only the final NOM wavelet denoised and normalized file is written.

Click on the image to see the results in a GBrowse server.

_images/Cole.jpg

more info To explore locally the wig files we recommend the installation of the Integrate Genome Browser. Select specie (Saccharomyces cerevisiae), genome version (S_cerevisiae_Apr_2011), and open the generated wig files you want.

Example 3: CC-seq paired-end reads

Source

Original publication: Brogaard K, Xi L, Wang JP, Widom J. A map of nucleosome positions in yeast at base-pair resolution. Nature 2012; 28:496-501. PMID: 22722846

NGS data:

Pipeline

To reproduce this pipeline Bowtie 1 must be previously installed, but note you are free to use your preferred aligner software to generate the input files for nucwave.

$ wget http://nucleosome.usal.es/nucwave/data/S288C_reference_sequence_R64-1-1_20110203.fsa

more info S. cerevisiae reference genome was downloaded from SGD and FASTA headers for chromosome names were replaced with chrI-chrXVI.

$ bowtie-build -C S288C_reference_sequence_R64-1-1_20110203.fsa SCcolor

more info Remember that Bowtie indexes creation step must be done only once for each genome reference used. For a complete description of use, refer to the manual page for bowtie-build. Note that for a ABI SOLiD experiment, the color space index must be created with the -C option.

$ wget ftp://ftp.ddbj.nig.ac.jp/ddbj_database/dra/fastq/SRA050/SRA050596/SRX127430/SRR438677_1.fastq.bz2
$ wget ftp://ftp.ddbj.nig.ac.jp/ddbj_database/dra/fastq/SRA050/SRA050596/SRX127430/SRR438677_2.fastq.bz2
$ bunzip2 SRR438677_1.fastq.bz2
$ bunzip2 SRR438677_2.fastq.bz2

more info The DNA Data Bank of Japan repository allows the download of compressed FASTQ files. For PE sequencing two files with each end read are provided.

$ bowtie --suppress 1,6,7,8 -t --fr -p 6 -m 1 -v 2 -C SCcolor -1 SRR438677_1.fastq -2 SRR438677_2.fastq SRR438677.bowtie 2> SRR438677.bowtieout
more info The parameters chosen to run Bowtie are critical to the obtention of the final alignment and must be adapted to your experiment. For a complete description of all parameters refer to the Bowtie manual. For this example the parameters are:
  • -m 1: Suppress all alignments for a particular read if more than 1 reportable alignments exist for it.
  • -v 2: Report alignments with at most 2 mismatch.
  • --suppress 1,6,7,8: Suppress columns of output not used in the NUCwave input file
  • -p 6: Launch 6 parallel search threads
  • -t: Print the amount of wall-clock time taken by each phase.
  • -1 and -2: Each of the files for paired end reads.
  • -C: Color-space index is used.
  • --fr: The upstream/downstream mate orientations for a valid paired-end alignment against the forward reference strand.
$ python nucwave_pe.py -g S288C_reference_sequence_R64-1-1_20110203.fsa -a SRR438677.bowtie -o Brogaard2012_results -p Brogaard2012 -w
Reading and processing genome
      Time taken: 0 seconds
Reading and processing alignment file
  Number of reads processed: 31778353
  Time taken: 227 seconds
Writing intermediate files: fragment size histogram
  Time taken: 0 seconds
Writing intermediate files: cut points per strand
  Time taken: 37 seconds
Writing intermediate files: depth coverage for complete PE reads
  Time taken: 21 seconds
Writing intermediate files: PE center count
  Time taken: 18 seconds
Depth coverage for trimmed PE reads
Writing intermediate files: depth coverage for trimmed PE reads
  Time taken: 24 seconds
Depth coverage wavelet denoising
  Time taken: 2 seconds
Mean depth coverage calculation
  Mean depth coverage: 110
  Time taken: 20 seconds
Writing results file: Depth coverage wavelet denoised and normalized
  Time taken: 20 seconds

more info Note that without the -w argument, only the final file wavelet denoised and normalized file is written. For CC-seq with nucwave_pe.py the map of linker centers are obtained.

$ python nucwave_sr.py -g S288C_reference_sequence_R64-1-1_20110203.fsa -a SRR438677.bowtie -o Brogaard2012_results -p Brogaard2012 -w -c
Reading and processing genome
  Time taken: 1 seconds
Reading and processing alignment file
  Number of reads processed: 63821504
  Time taken: 204 seconds
Writing intermediate files: cut points per strand
  Time taken: 38 seconds
Depth coverage calculation
  Time taken: 16 seconds
Depth coverage denoising
  Time taken: 3 seconds
Writing intermediate files: depth coverage per strand
  Time taken: 34 seconds
Mean depth coverage calculation
Writing intermediate files: Depth coverage wavelet denoised per strand
  Mean depth coverage: 74
  Time taken: 82 seconds
Shift calculation
  Shifting value 27 (from 14157 measures)
  Time taken: 11 seconds
Signal integration from strand signals
  Time taken: 7 seconds
Writing intermediate files: Integrated depth coverage
  Time taken: 10 seconds
Depth coverage wavelet denoising
  Time taken: 2 seconds
Mean depth coverage calculation
Writing intermediate files: depth coverage wavelet denoised
  Mean depth coverage: 148
  Time taken: 40 seconds
Writing results file: depth coverage wavelet denoised and normalized
  Time taken: 17 seconds

more info Computer used: MacPro3.1 with a Quad-Core Intel Xeon (2.8GHz) and 12 GB of RAM. Note that without the -w argument, only the final file wavelet denoised and normalized file is written. For CC-seq with nucwave_sr.py -c the NOM is obtained.

Click on the image to see the results in a GBrowse server.

_images/Brogaard.jpg

more info To explore locally the wig files we recommend the installation of the Integrate Genome Browser. Select specie (Saccharomyces cerevisiae), genome version (S_cerevisiae_Apr_2011), and open the generated wig files you want.

Example 4: ChIP-Seq single reads

Source

Original publication: Perales R, Erickson B, Zhang L, Kim H, Valiquett E, Bentley D. Gene promoters dictate histone occupancy within genes. EMBO J. 2013; 19:2645-56. PMID: 24013117

NGS data:

Pipeline

To reproduce this pipeline Bowtie 1 must be previously installed, but note you are free to use your preferred aligner software to generate the input files for nucwave.

$ wget http://nucleosome.usal.es/nucwave/data/S288C_reference_sequence_R64-1-1_20110203.fsa

more info S. cerevisiae reference genome was downloaded from SGD and FASTA headers for chromosome names were replaced with chrI-chrXVI.

$ bowtie-build S288C_reference_sequence_R64-1-1_20110203.fsa SC

more info Remember that Bowtie indexes creation step must be done only once for each genome reference used. For a complete description of use, refer to the manual page for bowtie-build

$ wget ftp://ftp.ddbj.nig.ac.jp/ddbj_database/dra/fastq/SRA098/SRA098024/SRX335654/SRR953022.fastq.bz2
$ bunzip2 SRR953022.fastq.bz2

more info The DNA Data Bank of Japan repository allows the download of compressed FASTQ files.

$ bowtie --suppress 1,6,7,8 -t -p 6 -m 1 -v 2 SC SRR953022.fastq SRR953022.bowtie 2> SRR953022.bowtieout
more info The parameters chosen to run Bowtie are critical to the obtention of the final alignment and must be adapted to your experiment. For a complete description of all parameters refer to the Bowtie manual. For this example the parameters are:
  • -m 1: Suppress all alignments for a particular read if more than 1 reportable alignments exist for it.
  • -v 2: Report alignments with at most 2 mismatch.
  • --suppress 1,6,7,8: Suppress columns of output not used in the NUCwave input file
  • -p 6: Launch 6 parallel search threads
  • -t: Print the amount of wall-clock time taken by each phase.
$ python nucwave_sr.py -g S288C_reference_sequence_R64-1-1_20110203.fsa -a SRR953022.bowtie -o Perales2013_results -p Perales2013 -w
Reading and processing genome
  Time taken: 1 seconds
Reading and processing alignment file
  Number of reads processed: 20102329
  Time taken: 60 seconds
Writing intermediate files: cut points per strand
  Time taken: 38 seconds
Depth coverage calculation
  Time taken: 16 seconds
Depth coverage denoising
  Time taken: 3 seconds
Writing intermediate files: depth coverage per strand
  Time taken: 35 seconds
Mean depth coverage calculation
Writing intermediate files: Depth coverage wavelet denoised per strand
  Mean depth coverage: 42
  Time taken: 84 seconds
Shift calculation
  Shifting value 60 (from 4966 measures)
  Time taken: 10 seconds
Signal integration from strand signals
  Time taken: 7 seconds
Writing intermediate files: Integrated depth coverage
  Time taken: 10 seconds
Depth coverage wavelet denoising
  Time taken: 1 seconds
Mean depth coverage calculation
Writing intermediate files: depth coverage wavelet denoised
  Mean depth coverage: 86
  Time taken: 40 seconds
Writing results file: depth coverage wavelet denoised and normalized
  Time taken: 16 seconds

more info Computer used: MacPro3.1 with a Quad-Core Intel Xeon (2.8GHz) and 12 GB of RAM. Note that without the -w argument, only the final NOM wavelet denoised and normalized file is written.

Click on the image to see the results in a GBrowse server.

_images/Perales.jpg

more info To explore locally the wig files we recommend the installation of the Integrate Genome Browser. Select specie (Saccharomyces cerevisiae), genome version (S_cerevisiae_Apr_2011), and open the generated wig files you want.