cell2net.preprocessing.add_dna_sequence#
- cell2net.preprocessing.add_dna_sequence(mdata, ref_fasta, mod_name='atac', chr_var_key='chr', start_var_key='start', end_var_key='end', sequence_var_key='dna_sequence')#
Add sequences to peak metadata in a MuData object.
This function retrieves DNA sequences for genomic regions specified in the .var attribute of the AnnData object within a MuData object. The sequences are fetched from a reference FASTA file and added as metadata under the specified key.
- Parameters:
mdata (
MuData
) – A MuData object containing the modality with peak metadata.ref_fasta (
str
) – Path to the reference FASTA file. This file must be indexed (e.g., with samtools faidx).mod_name (
str
(default:'atac'
)) – The name of the modality containing peak data. Defaults to “atac”.chr_var_key (
str
(default:'chr'
)) – The key in .var that contains chromosome names. Defaults to “chr”.start_var_key (
str
(default:'start'
)) – The key in .var that contains the start positions of peaks. Defaults to “start”.end_var_key (
str
(default:'end'
)) – The key in .var that contains the end positions of peaks. Defaults to “end”.sequence_var_key (
str
(default:'dna_sequence'
)) – The key under which the retrieved DNA sequences will be stored in .var. Defaults to “dna_sequence”.
- Return type:
- Returns:
None The function modifies the MuData object in place by adding DNA sequences to the specified key in the .var attribute.
- Raises:
AssertionError – If the specified modality (mod_name) is not found in the MuData object.
FileNotFoundError – If the ref_fasta file does not exist or is not properly indexed.
Examples
>>> from mudata import MuData >>> import anndata as ad >>> import pandas as pd >>> import cell2net as cn >>> data = ad.AnnData(var=pd.DataFrame({ ... "chr": ["chr1", "chr2"], ... "start": [100, 200], ... "end": [150, 250] ... })) >>> mdata = MuData({"atac": data}) >>> cn.pp.add_dna_sequence(mdata, ref_fasta="reference.fasta") >>> print(mdata["atac"].var["dna_sequence"]) 0 ATCGTTGAC... 1 TGGCCAATA...