The bed12 module¶
Defines a set a generic function to parse and process bed12 files.
-
umicount.bed12.blocks_to_absolute_start_end(read)[source]¶ Calculate the absolute start and end of the blocks from a bed12 line.
- Args:
- read: A list of twelve elements where each element refers to a field in the BED format.
- Returns:
- A list of tuple where each tuple contains the absolute start and end coordinates of a block.
>>> read = ['chrX', '100', '200', 'toto', '12', '+', '100', '110', '255,0,0', '2', '21,25', '0,75'] >>> blocks_to_absolute_start_end(read) [(100, 121), (175, 200)]
-
umicount.bed12.get_chrom(read)[source]¶ Get chromosome from a bed12 line.
- Args:
- read: A list of twelve elements where each element refers to a field in the BED format.
- Returns:
- The chromosome name
>>> read = ['chrX', '100', '200', 'toto', '12', '+', '100', '110', '255,0,0', '2', '21,25', '0,75'] >>> get_chrom(read) 'chrX'
-
umicount.bed12.get_end(read)[source]¶ Get end position from a bed12 line.
- Args:
- read: A list of twelve elements where each element refers to a field in the BED format.
- Returns:
- An integer representing the end position of the read.
>>> read = ['chrX', '100', '200', 'toto', '12', '+', '100', '110', '255,0,0', '2', '21,25', '0,75'] >>> get_end(read) 200
-
umicount.bed12.get_start(read)[source]¶ Get start position from a bed12 line.
- Args:
- read: A list of twelve elements where each element refers to a field in the BED format.
- Returns:
- An integer representing the start position of the read.
>>> read = ['chrX', '100', '200', 'toto', '12', '+', '100', '110', '255,0,0', '2', '21,25', '0,75'] >>> get_start(read) 100
-
umicount.bed12.get_strand(read)[source]¶ Get strand from a bed12 line.
- Args:
- read: A list of twelve elements where each element refers to a field in the BED format.
- Returns:
- A single char representing the strand of a read
>>> read = ['chrX', '100', '200', 'toto', '12', '+', '100', '110', '255,0,0', '2', '21,25', '0,75'] >>> get_strand(read) '+'
-
umicount.bed12.get_tss(read)[source]¶ Get Transcription Start Site (TSS) from a bed12 line.
- Args:
- read: A list of twelve elements where each element refers to a field in the BED format.
- Returns:
- The start position as an integer if the read is on the plus strand. The end position as an integer if the read is on the minus strand.
>>> read = ['chrX', '100', '200', 'toto', '12', '+', '100', '110', '255,0,0', '2', '21,25', '0,75'] >>> get_tss(read) 100 >>> read = ['chrX', '100', '200', 'toto', '12', '-', '100', '110', '255,0,0', '2', '21,25', '0,75'] >>> get_tss(read) 200
-
umicount.bed12.merge_overlapping_blocks(reads)[source]¶ Merge blocks if they overlap.
- Args:
- reads: A list of read in the BED12 format.
- Returns:
- Two lists where the first list contains the blocks sizes and the second the blocks starts. Values in the lists are integer.
>>> reads = [] >>> reads.append(['chrX', '100', '200', 'toto', '12', '+', '100', '110', '255,0,0', '2', '20,25', '0,75']) >>> reads.append(['chrX', '100', '200', 'toto', '12', '+', '100', '110', '255,0,0', '3', '10,10,25', '0,15,75']) >>> merge_overlapping_blocks(reads) ([25, 25], [0, 75])