ngsci package¶
Submodules¶
ngsci.calculator module¶
-
ngsci.calculator.complexity_index(reads: list, read_length: int, denominator: int)¶ Calculates the fully-normalized complexity index for a list of deduplicated/unique reads
Parameters: - reads (list) – A list of deduplicated pysam.AlignedSegment objects
- read_length (int) – The read_length sampled from the BAM file.
- denominator (int) – The denominator estimated from the read_length
Returns: the complexity index of the group of reads.
Return type: int
-
ngsci.calculator.denominator_calc(read_length: int)¶ Calculates the denominator of the complexity index as the product of the maximum_summed_dissimilarity of that read_length and the read_length itself.
Parameters: read_length (int) – The length of reads of the sequencing library. Returns: the denominator including normalization factors for the complexity index Return type: int
-
ngsci.calculator.dissimilarity(read1: pysam.libcalignedsegment.AlignedSegment, read2: pysam.libcalignedsegment.AlignedSegment)¶ Calculates the dissimilarity between two reads
Parameters: - read1 (pysam.AlignedSegment) – The first read to be compared
- read2 (pysam.AlignedSegment) – The second read to be compared
Returns: Length of non-overlapping/unique bases
Return type: int
-
ngsci.calculator.max_summed_dissimilarity(read_length: int)¶ Calculates the max_summed_dissimilarity for a given read_length. The algebraic formulation is a little complicated, and the triangular number equivalent is equally complicated. I’d uggest reading the whitepaper for the full formulation.
Parameters: read_length (int) – The length of reads of the sequencing library. Returns: maximum possible summed dissimilarity Return type: int
-
ngsci.calculator.summed_dissimilarity(reads: list)¶ Calculates the summed dissimilarity across a group of reads
Parameters: reads (list) – A list of pysam.AlignedSegment reads to calculate dissimilarities Returns: Return type: int
ngsci.parser module¶
-
class
ngsci.parser.BamReader(bamfile: str)¶ Bases:
objectVariables: - bamfile – The SAM/BAM/CRAM file to process
- strand_specific – The F/FR/RF strand_specific chemistry
-
fetch(chrom: str, start: int, step: int, strand_specific=False)¶ Calculates the complexity index across a region specified by a start site and ‘step’ bases in the 5’->3’ direction on the forward strand. The complexity index can be calculated in a strand-specific manner.
Parameters: - chrom (str) – The chromosome to calculate complexities from.
- start (int) – The start site in the chromosome to fetch from.
- step (int) – The step size or number of complexities to calculate downstream from the start site.
- strand_specific (bool) – Whether or not to calculate complexities in a strand-specific manner.
Returns: a 2-membered tuple of complexity indices for the positive and negative strands, respectively
Return type: tuple