CLI Internals

The following page describes the internal API used by the Command Line Pipeline. These functions and objects are not meant for interactive usage. So this page is useful if you want to change the behavior of the molecule counting pipeline.

velocyto.counter module

velocyto.transcript_model module

class velocyto.transcript_model.TranscriptModel(trid: str, trname: str, geneid: str, genename: str, chromstrand: str)[source]

Bases: object

A simple object representing a transcript model as a list of vcy.Feature objects

trid
trname
geneid
genename
chromstrand
list_features
property start: int

This should be accessed only after the creation of the transcript model is finished (i.e.) after append_exon has been called to add all the exons/introns

Type

NOTE

property end: int

This should be accessed only after the creation of the transcript model is finished (i.e.) after append_exon has been called to add all the exons/introns

Type

NOTE

ends_upstream_of(read: velocyto.read.Read)bool[source]
intersects(segment: Tuple[int, int], minimum_flanking: int = 5)bool[source]
append_exon(exon_feature: velocyto.feature.Feature)None[source]

Append an exon and create an intron when needed

Parameters

exon_feature (vcy.Feature) – A feature object represneting an exon to add to the transcript model.

chop_if_long_intron(maxlen: int = 1000000)None[source]

Modify a Transcript model choppin the 5’ region upstram of a very long intron To avoid that extremelly long intron mask the counting of interal genes

Parameters

maxlen (int, default=vcy.LONGEST_INTRON_ALLOWED) – transcript model tha contain one or more intronic interval of len == maxlen will be chopped

Returns

  • Nothing it will call _remove_upstream_of or _remove_downstream_of on the transcript model

  • its name will be changed appending _mod to both trid and trname

velocyto.segment_match module

class velocyto.segment_match.SegmentMatch(segment: Tuple[int, int], feature: velocyto.feature.Feature, is_spliced: bool = False)[source]

Bases: object

segment
feature
is_spliced
property maps_to_intron: bool
property maps_to_exon: bool
property skip_makes_sense: bool

If the SKIP in the segment matches some extremity of the feature and therefore can be interpreted as a splice event

velocyto.feature module

class velocyto.feature.Feature(start: int, end: int, kind: int, exin_no: str, transcript_model: Optional[Any] = None)[source]

Bases: object

A simple class representing an annotated genomic feature (e.g. exon, intron, masked repeat)

start
end
transcript_model
kind
exin_no
is_validated
property is_last_3prime: bool
get_downstream_exon()Any[source]

To use only for introns. Returns the vcy.Feature corresponding to the neighbour exon downstream

Note

In a 15 exons transcript model: Downstream to intron10 is exon11 or the interval with index 20 if strand “+”. Downtream to intron10 is exon10 or the interval with index 10 if strand “-“

get_upstream_exon()Any[source]

To use only for introns. Returns the vcy.Feature corresponding to the neighbour exon downstream

Note

In a 15 exons transcript model: Upstream to intron10 is exon9 or the interval with inxex 18 if strand “+”. Upstream to intron10 is exon11 or the interval with inxex 8 if strand “-“

ends_upstream_of(read: velocyto.read.Read)bool[source]
The following situation happens

Read

*|||segment|||-?-||segment|||????????

???????|||||Ivl|||||||||*

doesnt_start_after(segment: Tuple[int, int])bool[source]

One of the following situation happens

*||||||segment|||||????????

||||Ivl|||||
*|||||||||||||Ivl||||||||||????????????
*|||||||||||||Ivl||||||||||????????????

*|||||||||||||Ivl||||||||||????????????

intersects(segment: Tuple[int, int], minimum_flanking: int = 5)bool[source]
contains(segment: Tuple[int, int], minimum_flanking: int = 5)bool[source]

One of following situation happens

—–||||||segment|||||—–

|||||||||||||Ivl||||||||||||||||

—–||||||segment|||||—–

|||||||||||||Ivl||||||||||||||||

—–||||||segment|||||—–

|||||||||||||Ivl||||||||||||||||

where idicates the minimum flanking

start_overlaps_with_part_of(segment: Tuple[int, int], minimum_flanking: int = 5)bool[source]

The following situation happens

—|||segment||—

|||||||||||||Ivl||||||||||||||||

where idicates the minimum flanking

end_overlaps_with_part_of(segment: Tuple[int, int], minimum_flanking: int = 5)bool[source]

The following situation happens

—|||segment||—

|||||||||||||Ivl||||||||||||||||

where idicates the minimum flanking

velocyto.indexes module

class velocyto.indexes.TransciptsIndex(trascript_models: List[velocyto.transcript_model.TranscriptModel])[source]

Bases: object

transcipt_models
tidx
maxtidx
property scan_not_terminated: bool

Return false when all the chromosome has been scanned

find_overlapping_trascript_models(read: velocyto.read.Read)Set[velocyto.transcript_model.TranscriptModel][source]

Finds all the Transcript models the Read overlaps with

Parameters

read (vcy.Read) – the read object to be analyzed

Returns

matched_transcripts – TranscriptModel the read is overlapping with and values the kind of overlapping it is one of vcy.MATCH_INSIDE (1), vcy.MATCH_OVER5END (2), vcy.MATCH_OVER3END (4)

Return type

set of vcy.TranscriptModel

class velocyto.indexes.FeatureIndex(ivls: List[velocyto.feature.Feature] = [])[source]

Bases: object

Search help class used to find the intervals that a read is spanning

property last_interval_not_reached: bool
reset()None[source]

It set the current feature to the first feature

has_ivls_enclosing(read: velocyto.read.Read)bool[source]

Finds out if there are intervals that are fully containing all the read segments

Parameters

read (vcy.Read) – the read object to be analyzed

Returns

respones – if one has been found

Return type

bool

mark_overlapping_ivls(read: velocyto.read.Read)None[source]

Finds the overlap between Read and Features and mark intronic features if spanned

Parameters

read (vcy.Read) – the read object to be analyzed

Returns

Return type

Nothing, it marks the vcy.Feature object (is_validated = True) if there is evidence of exon-intron spanning

find_overlapping_ivls(read: velocyto.read.Read)Dict[velocyto.transcript_model.TranscriptModel, List[velocyto.segment_match.SegmentMatch]][source]

Finds the possible overlaps between Read and Features and return a 1 read derived mapping record

Parameters

read (vcy.Read) – the read object to be analyzed

Returns

mapping_record – A record of the mappings by transcript model. Every entry contains a list of segment matches that in turn contains information on the segment and the feature

Return type

Dict[vcy.TranscriptModel, List[vcy.SegmentMatch]]

Note

  • It is possible that a segment overalps at the same time an exon and an intron (spanning segment)

  • It is not possible that a segment overalps at the same time two exons. In that case the read is splitted

into two segments and the Read attribute is_spliced == True. - Notice that the name of the function might be confousing. if there is a non valid overallapping an empty mappign record will be return - Also notice that returning an empty mapping record will cause the suppression of the counting of the molecule

velocyto.molitem module

velocyto.molitem.dictionary_union(d1: DefaultDict[Any, List], d2: DefaultDict[Any, List])DefaultDict[Any, List][source]

Set union (|) operation on default dicitonary

Parameters
  • d1 (defaultdict) – First default dict

  • d2 (defaultdict) – Second default dict

Returns

  • A dictionary with the key the set union of the keys.

  • If same key is present the entry will be combined using __add__

velocyto.molitem.dictionary_intersect(d1: DefaultDict[Any, List], d2: DefaultDict[Any, List])DefaultDict[Any, List][source]

Set intersection (&) operation on default dicitonary

Parameters
  • d1 (defaultdict) – First default dict

  • d2 (defaultdict) – Second default dict

Returns

  • A dictionary with the key the set intersection of the keys.

  • If same key is present the entry will be combined using __add__

class velocyto.molitem.Molitem[source]

Bases: object

Object that represents a molecule in the counting pipeline

mappings_record
add_mappings_record(mappings_record: DefaultDict[velocyto.transcript_model.TranscriptModel, List[velocyto.segment_match.SegmentMatch]])None[source]

velocyto.gene_info module

class velocyto.gene_info.GeneInfo(genename: str, geneid: str, chromstrand: str, start: int, end: int)[source]

Bases: object

A simple objects that stores basic info on a gene. Parsed from the .gtf file and used to build the row_attrs of the loom file

genename
geneid
chrom
strand
start
end

velocyto.read module

class velocyto.read.Read(bc: str, umi: str, chrom: str, strand: str, pos: int, segments: List, clip5: Any, clip3: Any, ref_skipped: bool)[source]

Bases: object

Container for reads from sam alignment file

bc
umi
chrom
strand
pos
segments
clip5
clip3
ref_skipped
property is_spliced: bool
property start: int
property end: int
property span: int