cell2net.preprocessing.fragments_to_coverage#

cell2net.preprocessing.fragments_to_coverage(df_fragments, chrom_sizes, normalize=True, scaling_factor=1.0, cut_sites=False, extend_cut_sites=0)#

Convert fragment data to genome coverage signal.

This function processes fragment data and generates genome coverage or cut-site signal, which can be used for creating BigWig files or similar outputs.

Parameters:

df_fragments (DataFrame) – A Polars DataFrame containing fragment data. Must include the columns: ‘Chromosome’, ‘Start’, and ‘End’.
chrom_sizes (dict[str, int]) – Dictionary mapping chromosome names to their respective sizes.
normalize (bool (default: True)) – If True, normalize the coverage values to Reads Per Million (RPM). Default is True.
scaling_factor (float (default: 1.0)) – A scaling factor to apply to the signal values. Only used if normalize is True. Default is 1.0.
cut_sites (bool (default: False)) – Use 1 bp Tn5 cut sites (start and end of each fragment) instead of whole fragment length for coverage calculation.
extend_cut_sites (int (default: 0)) – If set cut_sites, expand cut sites for both upstream and downstream, by default: 0

Yields:

A tuple containing –

chroms (numpy.ndarray): Chromosome names for each coverage interval.
starts (numpy.ndarray): Start positions of coverage intervals.
ends (numpy.ndarray): End positions of coverage intervals.
values (numpy.ndarray): Signal values for each coverage interval.

Notes

The df_fragments DataFrame is partitioned by chromosome for efficient processing.
The chrom_sizes dictionary defines the size of each chromosome and is used to initialize arrays.
If cut_sites is True, the coverage is computed at the fragment boundaries rather than the entire fragment range.
Normalization scales the signal to RPM, and an additional scaling factor can further adjust the signal values.

Examples

>>> import polars as pl
>>> import cell2net as cn
>>> df_fragments = pl.DataFrame(
...     {"Chromosome": ["chr1", "chr1", "chr2"], "Start": [100, 200, 300], "End": [150, 250, 350]}
... )
>>> chrom_sizes = {"chr1": 1000, "chr2": 500}
>>> results = cn.pp.fragments_to_coverage(df_fragments, chrom_sizes, normalize=False)
>>> for chroms, starts, ends, values in results:
...     print(chroms, starts, ends, values)