ngsci package

Submodules

ngsci.calculator module

ngsci.calculator.complexity_index(reads: list, read_length: int, denominator: int)

Calculates the fully-normalized complexity index for a list of deduplicated/unique reads

Parameters:
  • reads (list) – A list of deduplicated pysam.AlignedSegment objects
  • read_length (int) – The read_length sampled from the BAM file.
  • denominator (int) – The denominator estimated from the read_length
Returns:

the complexity index of the group of reads.

Return type:

int

ngsci.calculator.denominator_calc(read_length: int)

Calculates the denominator of the complexity index as the product of the maximum_summed_dissimilarity of that read_length and the read_length itself.

Parameters:read_length (int) – The length of reads of the sequencing library.
Returns:the denominator including normalization factors for the complexity index
Return type:int
ngsci.calculator.dissimilarity(read1: pysam.libcalignedsegment.AlignedSegment, read2: pysam.libcalignedsegment.AlignedSegment)

Calculates the dissimilarity between two reads

Parameters:
  • read1 (pysam.AlignedSegment) – The first read to be compared
  • read2 (pysam.AlignedSegment) – The second read to be compared
Returns:

Length of non-overlapping/unique bases

Return type:

int

ngsci.calculator.max_summed_dissimilarity(read_length: int)

Calculates the max_summed_dissimilarity for a given read_length. The algebraic formulation is a little complicated, and the triangular number equivalent is equally complicated. I’d uggest reading the whitepaper for the full formulation.

Parameters:read_length (int) – The length of reads of the sequencing library.
Returns:maximum possible summed dissimilarity
Return type:int
ngsci.calculator.summed_dissimilarity(reads: list)

Calculates the summed dissimilarity across a group of reads

Parameters:reads (list) – A list of pysam.AlignedSegment reads to calculate dissimilarities
Returns:
Return type:int

ngsci.parser module

class ngsci.parser.BamReader(bamfile: str)

Bases: object

Variables:
  • bamfile – The SAM/BAM/CRAM file to process
  • strand_specific – The F/FR/RF strand_specific chemistry
fetch(chrom: str, start: int, step: int, strand_specific=False)

Calculates the complexity index across a region specified by a start site and ‘step’ bases in the 5’->3’ direction on the forward strand. The complexity index can be calculated in a strand-specific manner.

Parameters:
  • chrom (str) – The chromosome to calculate complexities from.
  • start (int) – The start site in the chromosome to fetch from.
  • step (int) – The step size or number of complexities to calculate downstream from the start site.
  • strand_specific (bool) – Whether or not to calculate complexities in a strand-specific manner.
Returns:

a 2-membered tuple of complexity indices for the positive and negative strands, respectively

Return type:

tuple

Module contents