ngsci package¶

Submodules¶

ngsci.calculator module¶

ngsci.calculator.complexity_index(reads: list, read_length: int, denominator: int)¶

Calculates the fully-normalized complexity index for a list of deduplicated/unique reads

Parameters:	reads (list) – A list of deduplicated pysam.AlignedSegment objects read_length (int) – The read_length sampled from the BAM file. denominator (int) – The denominator estimated from the read_length
Returns:	the complexity index of the group of reads.
Return type:	int

ngsci.calculator.denominator_calc(read_length: int)¶

Calculates the denominator of the complexity index as the product of the maximum_summed_dissimilarity of that read_length and the read_length itself.

Parameters:	read_length (int) – The length of reads of the sequencing library.
Returns:	the denominator including normalization factors for the complexity index
Return type:	int

ngsci.calculator.dissimilarity(read1: pysam.libcalignedsegment.AlignedSegment, read2: pysam.libcalignedsegment.AlignedSegment)¶

Calculates the dissimilarity between two reads

Parameters:	read1 (pysam.AlignedSegment) – The first read to be compared read2 (pysam.AlignedSegment) – The second read to be compared
Returns:	Length of non-overlapping/unique bases
Return type:	int

ngsci.calculator.max_summed_dissimilarity(read_length: int)¶

Calculates the max_summed_dissimilarity for a given read_length. The algebraic formulation is a little complicated, and the triangular number equivalent is equally complicated. I’d uggest reading the whitepaper for the full formulation.

Parameters:	read_length (int) – The length of reads of the sequencing library.
Returns:	maximum possible summed dissimilarity
Return type:	int

ngsci.calculator.summed_dissimilarity(reads: list)¶

Calculates the summed dissimilarity across a group of reads

Parameters:	reads (list) – A list of pysam.AlignedSegment reads to calculate dissimilarities
Returns:
Return type:	int

ngsci.parser module¶

class ngsci.parser.BamReader(bamfile: str)¶

Bases: object

Variables:	bamfile – The SAM/BAM/CRAM file to process strand_specific – The F/FR/RF strand_specific chemistry

fetch(chrom: str, start: int, step: int, strand_specific=False)¶

Calculates the complexity index across a region specified by a start site and ‘step’ bases in the 5’->3’ direction on the forward strand. The complexity index can be calculated in a strand-specific manner.

Parameters:	chrom (str) – The chromosome to calculate complexities from. start (int) – The start site in the chromosome to fetch from. step (int) – The step size or number of complexities to calculate downstream from the start site. strand_specific (bool) – Whether or not to calculate complexities in a strand-specific manner.
Returns:	a 2-membered tuple of complexity indices for the positive and negative strands, respectively
Return type:	tuple

ngsci package¶

Submodules¶

ngsci.calculator module¶

ngsci.parser module¶

Module contents¶