CLI Internals¶
The following page describes the internal API used by the Command Line Pipeline. These functions and objects are not meant for interactive usage. So this page is useful if you want to change the behavior of the molecule counting pipeline.
velocyto.counter module¶
velocyto.transcript_model module¶
- class velocyto.transcript_model.TranscriptModel(trid: str, trname: str, geneid: str, genename: str, chromstrand: str)[source]¶
Bases:
object
A simple object representing a transcript model as a list of vcy.Feature objects
- trid¶
- trname¶
- geneid¶
- genename¶
- chromstrand¶
- list_features¶
- property start: int¶
This should be accessed only after the creation of the transcript model is finished (i.e.) after append_exon has been called to add all the exons/introns
- Type
NOTE
- property end: int¶
This should be accessed only after the creation of the transcript model is finished (i.e.) after append_exon has been called to add all the exons/introns
- Type
NOTE
- ends_upstream_of(read: velocyto.read.Read) → bool[source]¶
- append_exon(exon_feature: velocyto.feature.Feature) → None[source]¶
Append an exon and create an intron when needed
- Parameters
exon_feature (vcy.Feature) – A feature object represneting an exon to add to the transcript model.
- chop_if_long_intron(maxlen: int = 1000000) → None[source]¶
Modify a Transcript model choppin the 5’ region upstram of a very long intron To avoid that extremelly long intron mask the counting of interal genes
- Parameters
maxlen (int, default=vcy.LONGEST_INTRON_ALLOWED) – transcript model tha contain one or more intronic interval of len == maxlen will be chopped
- Returns
Nothing it will call _remove_upstream_of or _remove_downstream_of on the transcript model
its name will be changed appending _mod to both trid and trname
velocyto.segment_match module¶
- class velocyto.segment_match.SegmentMatch(segment: Tuple[int, int], feature: velocyto.feature.Feature, is_spliced: bool = False)[source]¶
Bases:
object
- segment¶
- feature¶
- is_spliced¶
- property maps_to_intron: bool¶
- property maps_to_exon: bool¶
- property skip_makes_sense: bool¶
If the SKIP in the segment matches some extremity of the feature and therefore can be interpreted as a splice event
velocyto.feature module¶
- class velocyto.feature.Feature(start: int, end: int, kind: int, exin_no: str, transcript_model: Optional[Any] = None)[source]¶
Bases:
object
A simple class representing an annotated genomic feature (e.g. exon, intron, masked repeat)
- start¶
- end¶
- transcript_model¶
- kind¶
- exin_no¶
- is_validated¶
- property is_last_3prime: bool¶
- get_downstream_exon() → Any[source]¶
To use only for introns. Returns the vcy.Feature corresponding to the neighbour exon downstream
Note
In a 15 exons transcript model: Downstream to intron10 is exon11 or the interval with index 20 if strand “+”. Downtream to intron10 is exon10 or the interval with index 10 if strand “-“
- get_upstream_exon() → Any[source]¶
To use only for introns. Returns the vcy.Feature corresponding to the neighbour exon downstream
Note
In a 15 exons transcript model: Upstream to intron10 is exon9 or the interval with inxex 18 if strand “+”. Upstream to intron10 is exon11 or the interval with inxex 8 if strand “-“
- ends_upstream_of(read: velocyto.read.Read) → bool[source]¶
- The following situation happens
Read
*|||segment|||-?-||segment|||????????
???????|||||Ivl|||||||||*
- doesnt_start_after(segment: Tuple[int, int]) → bool[source]¶
One of the following situation happens
*||||||segment|||||????????
- contains(segment: Tuple[int, int], minimum_flanking: int = 5) → bool[source]¶
One of following situation happens
- —–||||||segment|||||—–
|||||||||||||Ivl||||||||||||||||
—–||||||segment|||||—–
|||||||||||||Ivl||||||||||||||||
—–||||||segment|||||—–
|||||||||||||Ivl||||||||||||||||
where — idicates the minimum flanking
velocyto.indexes module¶
- class velocyto.indexes.TransciptsIndex(trascript_models: List[velocyto.transcript_model.TranscriptModel])[source]¶
Bases:
object
- transcipt_models¶
- tidx¶
- maxtidx¶
- property scan_not_terminated: bool¶
Return false when all the chromosome has been scanned
- find_overlapping_trascript_models(read: velocyto.read.Read) → Set[velocyto.transcript_model.TranscriptModel][source]¶
Finds all the Transcript models the Read overlaps with
- Parameters
read (vcy.Read) – the read object to be analyzed
- Returns
matched_transcripts – TranscriptModel the read is overlapping with and values the kind of overlapping it is one of vcy.MATCH_INSIDE (1), vcy.MATCH_OVER5END (2), vcy.MATCH_OVER3END (4)
- Return type
set of vcy.TranscriptModel
- class velocyto.indexes.FeatureIndex(ivls: List[velocyto.feature.Feature] = [])[source]¶
Bases:
object
Search help class used to find the intervals that a read is spanning
- property last_interval_not_reached: bool¶
- has_ivls_enclosing(read: velocyto.read.Read) → bool[source]¶
Finds out if there are intervals that are fully containing all the read segments
- Parameters
read (vcy.Read) – the read object to be analyzed
- Returns
respones – if one has been found
- Return type
bool
- mark_overlapping_ivls(read: velocyto.read.Read) → None[source]¶
Finds the overlap between Read and Features and mark intronic features if spanned
- Parameters
read (vcy.Read) – the read object to be analyzed
- Returns
- Return type
Nothing, it marks the vcy.Feature object (is_validated = True) if there is evidence of exon-intron spanning
- find_overlapping_ivls(read: velocyto.read.Read) → Dict[velocyto.transcript_model.TranscriptModel, List[velocyto.segment_match.SegmentMatch]][source]¶
Finds the possible overlaps between Read and Features and return a 1 read derived mapping record
- Parameters
read (vcy.Read) – the read object to be analyzed
- Returns
mapping_record – A record of the mappings by transcript model. Every entry contains a list of segment matches that in turn contains information on the segment and the feature
- Return type
Dict[vcy.TranscriptModel, List[vcy.SegmentMatch]]
Note
It is possible that a segment overalps at the same time an exon and an intron (spanning segment)
It is not possible that a segment overalps at the same time two exons. In that case the read is splitted
into two segments and the Read attribute is_spliced == True. - Notice that the name of the function might be confousing. if there is a non valid overallapping an empty mappign record will be return - Also notice that returning an empty mapping record will cause the suppression of the counting of the molecule
velocyto.molitem module¶
- velocyto.molitem.dictionary_union(d1: DefaultDict[Any, List], d2: DefaultDict[Any, List]) → DefaultDict[Any, List][source]¶
Set union (|) operation on default dicitonary
- Parameters
d1 (defaultdict) – First default dict
d2 (defaultdict) – Second default dict
- Returns
A dictionary with the key the set union of the keys.
If same key is present the entry will be combined using __add__
- velocyto.molitem.dictionary_intersect(d1: DefaultDict[Any, List], d2: DefaultDict[Any, List]) → DefaultDict[Any, List][source]¶
Set intersection (&) operation on default dicitonary
- Parameters
d1 (defaultdict) – First default dict
d2 (defaultdict) – Second default dict
- Returns
A dictionary with the key the set intersection of the keys.
If same key is present the entry will be combined using __add__
- class velocyto.molitem.Molitem[source]¶
Bases:
object
Object that represents a molecule in the counting pipeline
- mappings_record¶
- add_mappings_record(mappings_record: DefaultDict[velocyto.transcript_model.TranscriptModel, List[velocyto.segment_match.SegmentMatch]]) → None[source]¶
velocyto.gene_info module¶
velocyto.read module¶
- class velocyto.read.Read(bc: str, umi: str, chrom: str, strand: str, pos: int, segments: List, clip5: Any, clip3: Any, ref_skipped: bool)[source]¶
Bases:
object
Container for reads from sam alignment file
- bc¶
- umi¶
- chrom¶
- strand¶
- pos¶
- segments¶
- clip5¶
- clip3¶
- ref_skipped¶
- property is_spliced: bool¶
- property start: int¶
- property end: int¶
- property span: int¶