cell2net.preprocessing.match_motif#
- cell2net.preprocessing.match_motif(mdata, motifs, atac_mod='atac', pseudocounts=0.0001, p_value=5e-05, sequence_var_key='dna_sequence', key_added='motif_match')#
Matches transcription factor motifs to accessible DNA sequences and links them with expressed genes.
This function identifies transcription factor (TF) binding motifs that are relevant to genes expressed in scRNA-seq data. It uses accessible DNA sequences from ATAC-seq data and computationally scans for TF motifs using the MOODS library. Results are stored as a sparse matrix in the ATAC modality.
- Parameters:
mdata (
MuData
) – Multimodal data object containing both RNA and ATAC modalities.motifs (
Iterable
) – A collection of motif objects. Each motif must have attributes name, matrix_id, and counts representing the motif’s name, unique identifier, and nucleotide frequencies respectively.atac_mod (
str
(default:'atac'
)) – Key for the ATAC modality in mdata, by default “atac”. This modality should contain DNA accessibility data and DNA sequences in .var[“dna_sequence”].pseudocounts (
float
(default:0.0001
)) – Small value added to motif counts to avoid division by zero in log-odds computations.p_value (
float
(default:5e-05
)) – P-value threshold for motif matching. Lower values result in stricter matches.key_added (
str
(default:'motif_match'
)) – Name of the key to store the resulting motif match matrix in adata_atac.varm.
- Return type:
- Returns:
Results are added to the mdata object in place:
mdata[atac_mod].varm[key_added]: Sparse matrix indicating motif matches for each accessible DNA sequence.
- Raises:
AssertionError – If the DNA sequence information (“dna_sequence”) is missing in adata_atac.var.
AssertionError – If the number of motifs does not match the number of overlapping genes after filtering.
ValueError – If the background parameter is not one of the predefined choices (“even”, “subject”, or “genome”).
Notes
It computes motif log-odds scores based on the provided background nucleotide frequencies.
Motif matching is performed on accessible DNA sequences using the MOODS library, which allows for efficient scanning and p-value thresholding.
The resulting sparse matrix is binary (0 or 1), where 1 indicates the presence of a significant motif match.
Examples
Match TF motifs to accessible regions and associate them with expressed genes:
>>> match_motif( ... mdata, ... motifs=motif_list, ... rna_mod="rna", ... atac_mod="atac", ... pseudocounts=0.0001, ... p_value=5e-05, ... background="even", ... key_added="motif_match", ... )
Access overlapping motifs and genes:
>>> mdata.uns["motifs"]
Access motif match results:
>>> mdata["atac"].varm["motif_match"]
Customize background nucleotide frequencies:
>>> match_motif(mdata, motifs=motif_list, background="subject")