Fingerprint module

Use UMI to count transcripts

umicount.dedup_fingerprint.get_fingerprint(read)[source]

Get fingerprint id from the read’s name. It assumes that the read’s name contains the following pattern FP:XXX; where XXX is the fingerprint id.

Args:
read: A list of twelve elements where each element refers to a field in the BED format.
Returns:
A string containing the fingerprint id
>>> read = ['chrX', '100', '200', 'FP:0012', '12', '+', '100', '110', '255,0,0', '2', '21,25', '0,75']
>>> get_fingerprint(read)
'0012'
umicount.dedup_fingerprint.print_read_to_bed12(key, reads)[source]

Merge the reads by blocks and print a single read in the BED12 format on stdout. It assumes that the reads are on the same TSS and contains fingerprint information in the read’s name.

Args:

key: A tuple that contain the chromosome, barcode and fingerprint information.

reads: A list of reads (in a list) from the same TSS, that have similar barcode and fingerprint.

>>> reads = []
>>> reads.append(['chrX', '100', '200', 'FP:0012', '12', '+', '100', '110', '255,0,0', '2', '20,25', '0,75'])
>>> reads.append(['chrX', '100', '300', 'FP:0012', '12', '+', '100', '110', '255,0,0', '3', '20,25', '0,175'])
>>> print_read_to_bed12(('chrX', '0012'), reads) 
chrX    100 300     FP:0012 2       +       100     120     255,0,0 3       20,25,25        0,75,175