cell2net.preprocessing.add_peaks#
- cell2net.preprocessing.add_peaks(mdata, mod_name='atac', delimiter='-', peak_len=256, chr_var_key='chr', start_var_key='start', end_var_key='end', summit_var_key='summit')#
Add peak metadata to an ATAC-seq modality in a MuData object.
This function parses peak information from variable names in the AnnData object of a specified modality within a MuData object. It computes the genomic coordinates (chromosome, start, end, and summit) for each peak and adds them as metadata in the .var attribute of the AnnData object.
- Parameters:
mdata (
MuData
) – A MuData object containing the ATAC-seq modality to be updated.mod_name (
str
(default:'atac'
)) – The name of the modality containing the peak data. Defaults to “atac”.delimiter (default:
'-'
) – The delimiter used to split the variable names in the AnnData object. Defaults to “-“.peak_len (
int
(default:256
)) – The standardized length of the peaks. The midpoint of each peak is computed, and the start and end positions are adjusted to match this length. Defaults to 256.chr_var_key (
str
(default:'chr'
)) – The key under which chromosome names will be stored in the .var attribute. Defaults to “chr”.start_var_key (
str
(default:'start'
)) – The key under which the start positions of peaks will be stored in the .var attribute. Defaults to “start”.end_var_key (
str
(default:'end'
)) – The key under which the end positions of peaks will be stored in the .var attribute. Defaults to “end”.summit_var_key (
str
(default:'summit'
)) – The key under which the summit (midpoint) positions of peaks will be stored in the .var attribute. Defaults to “summit”.
- Return type:
- Returns:
None The function modifies the MuData object in place by adding the computed peak metadata to the .var attribute of the specified modality.
- Raises:
AssertionError – If the specified modality (mod_name) is not found in the MuData object.
Notes
The variable names in the AnnData object are expected to follow the format chromosome{delimiter}start{delimiter}end (e.g., “chr1-100-200”).
The peak summit is calculated as the midpoint of the start and end positions, and the peak length is standardized to peak_len.
Examples
>>> from mudata import MuData >>> import anndata as ad >>> import pandas as pd >>> import cell2net as cn >>> data = ad.AnnData(var=pd.DataFrame(index=["chr1-100-200", "chr2-300-400"])) >>> mdata = MuData({"atac": data}) >>> cn.pp.add_peaks(mdata, mod_name="atac", peak_len=256) >>> print(mdata["atac"].var) chr start end summit 0 chr1 72 328 150 1 chr2 272 528 350