Coverage

Coverage is computed for annotated regions in the reference database and implemented as follows. Each sequence in the reference database is represented by an array of integers which are incremented and decremented by 1 for each alignment start and end position respectively. Coverage over each reference sequence is then calculated as the cumulative sum over the corresponding array. The calculation of coverage has a time complexity of O(n + N), where nis the total number of bases in the reference database and N is the number of mapped reads. This is substantially faster than the naïve approach which for each mapped read increments the coverage for each mapped base, yielding a time complexity of O(n + N*M), where M is the maximum number of bases per read.

Modifying how coverage is computed

The Tentacle modules that compute coverage are located in tentacle/coverage. They use the mapping data in the contigCoverage data structure that is populated in tentacle/parsers/index_references.py. Check the code in that file to see how the dictionary is laid out. Essentially the dicionary holds a NumPy array for each of the sequences in the reference file. The array contains integers and after going through the mapper output each position in the array contains a number representing the number of times that position was covered by a read.

It is possible to modify the way the statistics are computed. See the files in the coverage module to see how it works.

Functions in the coverage module

Coverage

This module contains all the functions required to manipulate the contig coverage data structure.

Tentacle coverage module.

date: 2014-04-30

exception tentacle.coverage.coverage.Error(msg)[source]

Bases: exceptions.Exception

Base class for exceptions in this module.

Attributes:
msg error message
exception tentacle.coverage.coverage.ParseError(msg)[source]

Bases: tentacle.coverage.coverage.Error

Raised for parsing errors.

tentacle.coverage.coverage.determine_if_read_is_inside_region(contig_data, contig, rstart, rend, options, logger)[source]

Determines if a read lies within an annotated region of a contig.

tentacle.coverage.coverage.update_contig_data(contig_data, contig, rstart, rend, options, logger)[source]

Updates mapping data for contig.

Use 0-based starting positions and non-inclusive end positions (like Python).

Statistics

This module contains the function that computes statistics across annotated regions of the reference sequences.

Tentacle coverage module.

date:: 2014-04-30

tentacle.coverage.statistics.compute_region_statistics(region)[source]

Compute general statistics of reads mapped to a region of a contig.

Input:
region a NumPy array of ints with the number of reads mapped
to the region.
Output:

median (float) the median number of reads mapped to the region. mean (float) the mean numbr of reads mapped to the region. stdev (float) the standard deviation of number of reads

mapped to the region.

Compute and write coverage statistics

This module contains a single function responsible for formatting the output.

Tentacle coverage module: compute coverage statistics

author: Fredrik Boulund <fredrik.boulund@chalmers.se> date: 2014-04-30 purpose: Writes results to file, and computes coverage statistics

tentacle.coverage.compute_and_write_coverage_statistics.compute_and_write_coverage_statistics(annotationFilename, contig_data, outFilename, options, logger)[source]

Computes coverage for each annotated region. Writes results to file.

Input:
annotationFilename filename of annotation file contig_data the contig_data dictionary outFilename output filename options options namespace logger a logger object object
Output:
None Writes directly to file
Raises:
ParseError On parsing errors

Table Of Contents

Previous topic

Parsers

Next topic

Customizing modules in Tentacle

This Page