cell2net.preprocessing.add_peaks#

cell2net.preprocessing.add_peaks(mdata, mod_name='atac', delimiter='-', peak_len=256, chr_var_key='chr', start_var_key='start', end_var_key='end', summit_var_key='summit')#

Add peak metadata to an ATAC-seq modality in a MuData object.

This function parses peak information from variable names in the AnnData object of a specified modality within a MuData object. It computes the genomic coordinates (chromosome, start, end, and summit) for each peak and adds them as metadata in the .var attribute of the AnnData object.

Parameters:
  • mdata (MuData) – A MuData object containing the ATAC-seq modality to be updated.

  • mod_name (str (default: 'atac')) – The name of the modality containing the peak data. Defaults to “atac”.

  • delimiter (default: '-') – The delimiter used to split the variable names in the AnnData object. Defaults to “-“.

  • peak_len (int (default: 256)) – The standardized length of the peaks. The midpoint of each peak is computed, and the start and end positions are adjusted to match this length. Defaults to 256.

  • chr_var_key (str (default: 'chr')) – The key under which chromosome names will be stored in the .var attribute. Defaults to “chr”.

  • start_var_key (str (default: 'start')) – The key under which the start positions of peaks will be stored in the .var attribute. Defaults to “start”.

  • end_var_key (str (default: 'end')) – The key under which the end positions of peaks will be stored in the .var attribute. Defaults to “end”.

  • summit_var_key (str (default: 'summit')) – The key under which the summit (midpoint) positions of peaks will be stored in the .var attribute. Defaults to “summit”.

Return type:

None

Returns:

None The function modifies the MuData object in place by adding the computed peak metadata to the .var attribute of the specified modality.

Raises:

AssertionError – If the specified modality (mod_name) is not found in the MuData object.

Notes

  • The variable names in the AnnData object are expected to follow the format chromosome{delimiter}start{delimiter}end (e.g., “chr1-100-200”).

  • The peak summit is calculated as the midpoint of the start and end positions, and the peak length is standardized to peak_len.

Examples

>>> from mudata import MuData
>>> import anndata as ad
>>> import pandas as pd
>>> import cell2net as cn
>>> data = ad.AnnData(var=pd.DataFrame(index=["chr1-100-200", "chr2-300-400"]))
>>> mdata = MuData({"atac": data})
>>> cn.pp.add_peaks(mdata, mod_name="atac", peak_len=256)
>>> print(mdata["atac"].var)
    chr  start  end  summit
0   chr1     72  328     150
1   chr2    272  528     350