NUCwave¶
NUCwave is a wavelet-based bioinformatic tool that generates nucleosome occupation maps from sequence reads generated by:
- Chromatin digestion with micrococcal nuclease (MNase-seq)
- Chemical cleavage of chromatin (CC-seq)
- Chromatin inmunoprecipitation (ChIP-seq) and fragmentation by sonication
NUCwave can process datasets by sequencing protocols:
- Single-read (SR)
- Paired-end (PE)
NUCwave requires only two inputs:
- The dataset of aligned sequence reads
- The sequence of the reference genome
NUCwave is implemented in Python and is distributed under GPLv3 license.
Citing NUCwave¶
If you use NUCwave, please cite :
Luis Quintales, Enrique Vázquez and Francisco Antequera (2015)Briefings in Bioinformatics 16: 576-587

Manual¶
nucwave_sr.py¶
Direct generation of nucleosome maps from single-read MNase-seq, ChIP-seq, and CC-seq (chemical cleavage) experiments.
For chemical cleavage experiments -c
option must be used (see Example 3).
Paired-end experiments can also be analyzed as single-read (see Example 3).
$ python nucwave_sr.py -h
usage: python nucwave_sr.py [-h] [-w] [-c] [-o OUTDIR] -g GENOMEFILE -a
ALIGNFILE -p PREFIXNAME
optional arguments:
-h, --help show this help message and exit
-w, --wigfiles write intermediate wig files
-c, --chemical chemical cleavage
-o OUTDIR output directory
-g GENOMEFILE FASTA genome file
-a ALIGNFILE BOWTIE alignment file
-p PREFIXNAME Prefix name for result files
Alignment file format¶
Each line of the input alignment file must contain 4 fields separated by a tab character:
strand chromosome genomic_coordinate sequence
Example:
+ chrX 239356 NAAGCAAACACCTTTCTTAAGCTGTTGGTGCAAAAAAGGA
- chrIII 189611 CGTTCAGAAATGGCAGAACCTACAGTGACAGATTTGCGTN
- chrXI 252551 AGAGATACTTTCTAAACTTAGATTGTCACCAGAAAATCCN
- chrXV 427200 TTCGATTTCGTTTATATCAGACATTTTTAGTTTCTATTCN
+ chrII 727006 NATACTTCTTTAGTTAATTGTTTAACAGTTTTGGGGTCAT
- chrXV 58063 CGCTAGTATCAGCGGGTCTAGAATTTGATCCGGTTTCCAN
- chrXV 32376 CCCTACTAACTAATTCATAGCAAAATTCAGAACTTATCCN
+ chrXV 818987 NGAAGATGATAAGGTTTTACGTACCAGCAACGGTGGTAAC
+ chrIV 1087492 NGGCAAAATAAACTAGTCTAGCTAATATTCGACTAAATTG
+ chrXVI 138142 NCGATGGATTAATGATTACTTGTGAAAAATTAGAAAAAAC
Output file extensions¶
Extension | Content |
---|---|
cut_p |
Cleavage site (+ strand) |
cut_m |
Cleavage site (- strand) |
depth_p |
Depth coverage (+ strand) |
depth_m |
Depth coverage (- strand) |
depth_p_wl |
Depth coverage wavelet denoised (+ strand) |
depth_p_wl |
Depth coverage wavelet denoised (- strand) |
depth_c |
NOM |
depth_c_wl |
NOM wavelet denoised |
depth_c_wl_norm |
NOM wavelet denoised and normalized |
nucwave_pe.py¶
Direct generation of nucleosome maps from paired-end MNase-seq, ChIP-seq, and CC-seq (chemical cleavage) experiments. For chemical cleavage experiments, this program generate a map of linker positions (see Example 3)
$ python nucwave_pe.py -h
usage: python nucwave_pe.py [-h] [-w] [-o OUTDIR] -g GENOMEFILE -a ALIGNFILE -p PREFIXNAME
optional arguments:
-h, --help show this help message and exit
-w, --wigfiles write intermediate wig files
-o OUTDIR output directory
required arguments:
-g GENOMEFILE FASTA genome file
-a ALIGNFILE Alignment file
-p PREFIXNAME Prefix name for result files
Alignment file format¶
Each line of the input alignment file must contain 4 fields separated by a tab character:
strand chromosome genomic_coordinate sequence
Paired end reads for the same sequenced tag must be contiguous.
Example:
+ chrX 537898 TTTGATATTTTGTCATATTATCCTATTATTTTATCAATCC
- chrX 538002 TATGTGATAATATACTAGTAACATGAATACTACTAAATGA
+ chrIII 71367 ACGATGATTCAGTTCGCCTTCTATCCTTTGTTTACGTATT
- chrIII 71476 TCAATCCTTCTTTTGCTTCCATATTTACCATGTGGACCCT
+ chrX 88383 TGACACCTTTTCCCAAAACTTCTGTGAAGTTTTGCTCAAT
- chrX 88505 GTTTCAATCTTATGGAATTCACAATGAAGCATCCCTTCCT
+ chrXIV 616010 TGAAGTTGCCAAGAGGCTTCAAAACATGATGCTCAGCTCC
- chrXIV 616119 CTAGGGTTATAGTGTTCAGACTTGAGGTTGAACATATCCA
+ chrXV 1069765 TGAACGCCATACTGAGTAGAGTTAAACAATTGGTTAGCTA
- chrXV 1069877 TAGCAAAAGGAACCATAGCATCACCAATGGATTGAGGGTT
+ chrII 669920 AGTTCGTATTGCAAACTGAAGAGTATGCTTCTTTTTAGTG
- chrII 670006 TCCCATAAAATCCTCTCTTTCATAGAAGTCCGATTTCATT
+ chrXII 729928 TGAATCCCACTGGACTGTTTGTGGTGCGTTTGCTGCCGCT
Output file extensions¶
Extension | Content |
---|---|
cut_p |
Cleavage site (+ strand) |
cut_m |
Cleavage site (- strand) |
depth_complete_PE |
Depth coverage for complete sequence |
PE_center |
Complete sequence center count |
depth_trimmed_PE |
NOM (Depth coverage for trimmed sequence) |
depth_wl_trimmed_PE |
NOM wavelet denoised and normalized |
historeadsize |
Frequency of PE fragment sizes |
Example 1: MNase-seq single-reads¶
Source¶
Original publication: Tsui K, Dubuis S, Gebbia M, Morse RH, Barkai N, Tirosh I, Nislow C. Evolution of nucleosome occupancy: conservation of global properties and divergence of gene-specific patterns. Mol Cell Biol 2011; 31:4348-55. PMID: 21896781
NGS data:
Pipeline¶
To reproduce this pipeline Bowtie 1 must be previously installed, but note you are free to use your preferred aligner software to generate the input files for nucwave
.
$ wget http://nucleosome.usal.es/nucwave/data/S288C_reference_sequence_R64-1-1_20110203.fsa
S. cerevisiae reference genome was downloaded from SGD and FASTA headers for chromosome names were replaced with
chrI
-chrXVI
.
$ bowtie-build S288C_reference_sequence_R64-1-1_20110203.fsa SC
Remember that Bowtie indexes creation step must be done only once for each genome reference used. For a complete description of use, refer to the manual page for bowtie-build
$ wget ftp://ftp.ddbj.nig.ac.jp/ddbj_database/dra/fastq/SRA023/SRA023538/SRX025556/SRR063958.fastq.bz2
$ bunzip2 SRR063958.fastq.bz2
The DNA Data Bank of Japan repository allows the download of compressed FASTQ files.
$ bowtie --suppress 1,6,7,8 -t -p 6 -m 1 -v 2 SC SRR063958.fastq SRR063958.bowtie 2> SRR063958.bowtieout
The parameters chosen to run Bowtie are critical to the obtention of the final alignment and must be adapted to your experiment. For a complete description of all parameters refer to the Bowtie manual. For this example the parameters are:
-m 1
: Suppress all alignments for a particular read if more than 1 reportable alignments exist for it.-v 2
: Report alignments with at most 2 mismatch.--suppress 1,6,7,8
: Suppress columns of output not used in the NUCwave input file-p 6
: Launch 6 parallel search threads-t
: Print the amount of wall-clock time taken by each phase.
$ python nucwave_sr.py -g S288C_reference_sequence_R64-1-1_20110203.fsa -a SRR063958.bowtie -o Tsui2011_results -p Tsui2011 -w
Reading and processing genome
Time taken: 0 seconds
Reading and processing alignment file
Number of reads processed: 27145510
Time taken: 81 seconds
Writing intermediate files: cut points per strand
Time taken: 39 seconds
Depth coverage calculation
Time taken: 15 seconds
Depth coverage denoising
Time taken: 3 seconds
Writing intermediate files: depth coverage per strand
Time taken: 36 seconds
Mean depth coverage calculation
Writing intermediate files: Depth coverage wavelet denoised per strand
Mean depth coverage: 46
Time taken: 81 seconds
Shift calculation
Shifting value 42 (from 19069 measures)
Time taken: 11 seconds
Signal integration from strand signals
Time taken: 7 seconds
Writing intermediate files: Integrated depth coverage
Time taken: 10 seconds
Depth coverage wavelet denoising
Time taken: 2 seconds
Mean depth coverage calculation
Writing intermediate files: depth coverage wavelet denoised
Mean depth coverage: 92
Time taken: 39 seconds
Writing results file: depth coverage wavelet denoised and normalized
Time taken: 16 seconds
Computer used: MacPro3.1 with a Quad-Core Intel Xeon (2.8GHz) and 12 GB of RAM.
Note that without the
-w
argument, only the final NOM wavelet denoised and normalized file is written.
Click on the image to see the results in a GBrowse server.

To explore locally the
wig
files we recommend the installation of the Integrate Genome Browser. Select specie (Saccharomyces cerevisiae), genome version (S_cerevisiae_Apr_2011), and open the generated wig
files you want.
Example 2: MNase-seq paired-end reads¶
Source¶
Original publication: Cole HA, Howard BH, Clark DJ. Activation-induced disruption of nucleosome position clusters on the coding regions of Gcn4-dependent genes extends into neighbouring genes. Nucleic Acids Res 2011; 39:9521-35. PMID: 21880600
NGS data:
Pipeline¶
To reproduce this pipeline Bowtie 1 must be previously installed, but note you are free to use your preferred aligner software to generate the input files for nucwave
.
$ wget http://nucleosome.usal.es/nucwave/data/S288C_reference_sequence_R64-1-1_20110203.fsa
S. cerevisiae reference genome was downloaded from SGD and FASTA headers for chromosome names were replaced with
chrI
-chrXVI
.
$ bowtie-build S288C_reference_sequence_R64-1-1_20110203.fsa SC
Remember that Bowtie indexes creation step must be done only once for each genome reference used. For a complete description of use, refer to the manual page for bowtie-build
$ wget ftp://ftp.ddbj.nig.ac.jp/ddbj_database/dra/fastq/SRA029/SRA029255/SRX038807/SRR094649_1.fastq.bz2
$ wget ftp://ftp.ddbj.nig.ac.jp/ddbj_database/dra/fastq/SRA029/SRA029255/SRX038807/SRR094649_2.fastq.bz2
$ bunzip2 SRR094649_1.fastq.bz2
$ bunzip2 SRR094649_2.fastq.bz2
The DNA Data Bank of Japan repository allows the download of compressed FASTQ files. For PE sequencing two files with each end read are provided.
$ bowtie --suppress 1,6,7,8 -t --fr -p 6 -m 1 -v 2 SC -1 SRR094649_1.fastq -2 SRR094649_2.fastq SRR094649.bowtie 2> SRR094649.bowtieout
The parameters chosen to run Bowtie are critical to the obtention of the final alignment and must be adapted to your experiment. For a complete description of all parameters refer to the Bowtie manual. For this example the parameters are:
-m 1
: Suppress all alignments for a particular read if more than 1 reportable alignments exist for it.-v 2
: Report alignments with at most 2 mismatch.--suppress 1,6,7,8
: Suppress columns of output not used in the NUCwave input file-p 6
: Launch 6 parallel search threads-t
: Print the amount of wall-clock time taken by each phase.-1
and-2
: Each of the files for paired end reads--fr
: The upstream/downstream mate orientations for a valid paired-end alignment against the forward reference strand.
$ python nucwave_pe.py -g S288C_reference_sequence_R64-1-1_20110203.fsa -a SRR094649.bowtie -o Cole2011_results -p Cole2011 -w
Reading and processing genome
Time taken: 0 seconds
Reading and processing alignment file
Number of reads processed: 19208336
Time taken: 142 seconds
Writing intermediate files: fragment size histogram
Time taken: 0 seconds
Writing intermediate files: cut points per strand
Time taken: 37 seconds
Writing intermediate files: depth coverage for complete PE reads
Time taken: 21 seconds
Writing intermediate files: PE center count
Time taken: 19 seconds
Depth coverage for trimmed PE reads
Writing intermediate files: depth coverage for trimmed PE reads
Time taken: 23 seconds
Depth coverage wavelet denoising
Time taken: 2 seconds
Mean depth coverage calculation
Mean depth coverage: 67
Time taken: 20 seconds
Writing results file: Depth coverage wavelet denoised and normalized
Time taken: 18 seconds
Computer used: MacPro3.1 with a Quad-Core Intel Xeon (2.8GHz) and 12 GB of RAM.
Note that without the
-w
argument, only the final NOM wavelet denoised and normalized file is written.
Click on the image to see the results in a GBrowse server.

To explore locally the
wig
files we recommend the installation of the Integrate Genome Browser. Select specie (Saccharomyces cerevisiae), genome version (S_cerevisiae_Apr_2011), and open the generated wig
files you want.
Example 3: CC-seq paired-end reads¶
Source¶
Original publication: Brogaard K, Xi L, Wang JP, Widom J. A map of nucleosome positions in yeast at base-pair resolution. Nature 2012; 28:496-501. PMID: 22722846
NGS data:
Pipeline¶
To reproduce this pipeline Bowtie 1 must be previously installed, but note you are free to use your preferred aligner software to generate the input files for nucwave
.
$ wget http://nucleosome.usal.es/nucwave/data/S288C_reference_sequence_R64-1-1_20110203.fsa
S. cerevisiae reference genome was downloaded from SGD and FASTA headers for chromosome names were replaced with
chrI
-chrXVI
.
$ bowtie-build -C S288C_reference_sequence_R64-1-1_20110203.fsa SCcolor
Remember that Bowtie indexes creation step must be done only once for each genome reference used. For a complete description of use, refer to the manual page for bowtie-build. Note that for a ABI SOLiD experiment, the color space index must be created with the
-C
option.
$ wget ftp://ftp.ddbj.nig.ac.jp/ddbj_database/dra/fastq/SRA050/SRA050596/SRX127430/SRR438677_1.fastq.bz2
$ wget ftp://ftp.ddbj.nig.ac.jp/ddbj_database/dra/fastq/SRA050/SRA050596/SRX127430/SRR438677_2.fastq.bz2
$ bunzip2 SRR438677_1.fastq.bz2
$ bunzip2 SRR438677_2.fastq.bz2
The DNA Data Bank of Japan repository allows the download of compressed FASTQ files. For PE sequencing two files with each end read are provided.
$ bowtie --suppress 1,6,7,8 -t --fr -p 6 -m 1 -v 2 -C SCcolor -1 SRR438677_1.fastq -2 SRR438677_2.fastq SRR438677.bowtie 2> SRR438677.bowtieout
The parameters chosen to run Bowtie are critical to the obtention of the final alignment and must be adapted to your experiment. For a complete description of all parameters refer to the Bowtie manual. For this example the parameters are:
-m 1
: Suppress all alignments for a particular read if more than 1 reportable alignments exist for it.-v 2
: Report alignments with at most 2 mismatch.--suppress 1,6,7,8
: Suppress columns of output not used in the NUCwave input file-p 6
: Launch 6 parallel search threads-t
: Print the amount of wall-clock time taken by each phase.-1
and-2
: Each of the files for paired end reads.-C
: Color-space index is used.--fr
: The upstream/downstream mate orientations for a valid paired-end alignment against the forward reference strand.
$ python nucwave_pe.py -g S288C_reference_sequence_R64-1-1_20110203.fsa -a SRR438677.bowtie -o Brogaard2012_results -p Brogaard2012 -w
Reading and processing genome
Time taken: 0 seconds
Reading and processing alignment file
Number of reads processed: 31778353
Time taken: 227 seconds
Writing intermediate files: fragment size histogram
Time taken: 0 seconds
Writing intermediate files: cut points per strand
Time taken: 37 seconds
Writing intermediate files: depth coverage for complete PE reads
Time taken: 21 seconds
Writing intermediate files: PE center count
Time taken: 18 seconds
Depth coverage for trimmed PE reads
Writing intermediate files: depth coverage for trimmed PE reads
Time taken: 24 seconds
Depth coverage wavelet denoising
Time taken: 2 seconds
Mean depth coverage calculation
Mean depth coverage: 110
Time taken: 20 seconds
Writing results file: Depth coverage wavelet denoised and normalized
Time taken: 20 seconds
Note that without the
-w
argument, only the final file wavelet denoised and normalized file is written. For CC-seq with nucwave_pe.py
the map of linker centers are obtained.
$ python nucwave_sr.py -g S288C_reference_sequence_R64-1-1_20110203.fsa -a SRR438677.bowtie -o Brogaard2012_results -p Brogaard2012 -w -c
Reading and processing genome
Time taken: 1 seconds
Reading and processing alignment file
Number of reads processed: 63821504
Time taken: 204 seconds
Writing intermediate files: cut points per strand
Time taken: 38 seconds
Depth coverage calculation
Time taken: 16 seconds
Depth coverage denoising
Time taken: 3 seconds
Writing intermediate files: depth coverage per strand
Time taken: 34 seconds
Mean depth coverage calculation
Writing intermediate files: Depth coverage wavelet denoised per strand
Mean depth coverage: 74
Time taken: 82 seconds
Shift calculation
Shifting value 27 (from 14157 measures)
Time taken: 11 seconds
Signal integration from strand signals
Time taken: 7 seconds
Writing intermediate files: Integrated depth coverage
Time taken: 10 seconds
Depth coverage wavelet denoising
Time taken: 2 seconds
Mean depth coverage calculation
Writing intermediate files: depth coverage wavelet denoised
Mean depth coverage: 148
Time taken: 40 seconds
Writing results file: depth coverage wavelet denoised and normalized
Time taken: 17 seconds
Computer used: MacPro3.1 with a Quad-Core Intel Xeon (2.8GHz) and 12 GB of RAM.
Note that without the
-w
argument, only the final file wavelet denoised and normalized file is written. For CC-seq with nucwave_sr.py -c
the NOM is obtained.
Click on the image to see the results in a GBrowse server.

To explore locally the
wig
files we recommend the installation of the Integrate Genome Browser. Select specie (Saccharomyces cerevisiae), genome version (S_cerevisiae_Apr_2011), and open the generated wig
files you want.
Example 4: ChIP-Seq single reads¶
Source¶
Original publication: Perales R, Erickson B, Zhang L, Kim H, Valiquett E, Bentley D. Gene promoters dictate histone occupancy within genes. EMBO J. 2013; 19:2645-56. PMID: 24013117
NGS data:
Pipeline¶
To reproduce this pipeline Bowtie 1 must be previously installed, but note you are free to use your preferred aligner software to generate the input files for nucwave
.
$ wget http://nucleosome.usal.es/nucwave/data/S288C_reference_sequence_R64-1-1_20110203.fsa
S. cerevisiae reference genome was downloaded from SGD and FASTA headers for chromosome names were replaced with
chrI
-chrXVI
.
$ bowtie-build S288C_reference_sequence_R64-1-1_20110203.fsa SC
Remember that Bowtie indexes creation step must be done only once for each genome reference used. For a complete description of use, refer to the manual page for bowtie-build
$ wget ftp://ftp.ddbj.nig.ac.jp/ddbj_database/dra/fastq/SRA098/SRA098024/SRX335654/SRR953022.fastq.bz2
$ bunzip2 SRR953022.fastq.bz2
The DNA Data Bank of Japan repository allows the download of compressed FASTQ files.
$ bowtie --suppress 1,6,7,8 -t -p 6 -m 1 -v 2 SC SRR953022.fastq SRR953022.bowtie 2> SRR953022.bowtieout
The parameters chosen to run Bowtie are critical to the obtention of the final alignment and must be adapted to your experiment. For a complete description of all parameters refer to the Bowtie manual. For this example the parameters are:
-m 1
: Suppress all alignments for a particular read if more than 1 reportable alignments exist for it.-v 2
: Report alignments with at most 2 mismatch.--suppress 1,6,7,8
: Suppress columns of output not used in the NUCwave input file-p 6
: Launch 6 parallel search threads-t
: Print the amount of wall-clock time taken by each phase.
$ python nucwave_sr.py -g S288C_reference_sequence_R64-1-1_20110203.fsa -a SRR953022.bowtie -o Perales2013_results -p Perales2013 -w Reading and processing genome Time taken: 1 seconds Reading and processing alignment file Number of reads processed: 20102329 Time taken: 60 seconds Writing intermediate files: cut points per strand Time taken: 38 seconds Depth coverage calculation Time taken: 16 seconds Depth coverage denoising Time taken: 3 seconds Writing intermediate files: depth coverage per strand Time taken: 35 seconds Mean depth coverage calculation Writing intermediate files: Depth coverage wavelet denoised per strand Mean depth coverage: 42 Time taken: 84 seconds Shift calculation Shifting value 60 (from 4966 measures) Time taken: 10 seconds Signal integration from strand signals Time taken: 7 seconds Writing intermediate files: Integrated depth coverage Time taken: 10 seconds Depth coverage wavelet denoising Time taken: 1 seconds Mean depth coverage calculation Writing intermediate files: depth coverage wavelet denoised Mean depth coverage: 86 Time taken: 40 seconds Writing results file: depth coverage wavelet denoised and normalized Time taken: 16 seconds
Computer used: MacPro3.1 with a Quad-Core Intel Xeon (2.8GHz) and 12 GB of RAM.
Note that without the
-w
argument, only the final NOM wavelet denoised and normalized file is written.
Click on the image to see the results in a GBrowse server.

To explore locally the
wig
files we recommend the installation of the Integrate Genome Browser. Select specie (Saccharomyces cerevisiae), genome version (S_cerevisiae_Apr_2011), and open the generated wig
files you want.