cell2net.preprocessing.match_motif#

cell2net.preprocessing.match_motif(mdata, motifs, atac_mod='atac', pseudocounts=0.0001, p_value=5e-05, sequence_var_key='dna_sequence', key_added='motif_match')#

Matches transcription factor motifs to accessible DNA sequences and links them with expressed genes.

This function identifies transcription factor (TF) binding motifs that are relevant to genes expressed in scRNA-seq data. It uses accessible DNA sequences from ATAC-seq data and computationally scans for TF motifs using the MOODS library. Results are stored as a sparse matrix in the ATAC modality.

Parameters:
  • mdata (MuData) – Multimodal data object containing both RNA and ATAC modalities.

  • motifs (Iterable) – A collection of motif objects. Each motif must have attributes name, matrix_id, and counts representing the motif’s name, unique identifier, and nucleotide frequencies respectively.

  • atac_mod (str (default: 'atac')) – Key for the ATAC modality in mdata, by default “atac”. This modality should contain DNA accessibility data and DNA sequences in .var[“dna_sequence”].

  • pseudocounts (float (default: 0.0001)) – Small value added to motif counts to avoid division by zero in log-odds computations.

  • p_value (float (default: 5e-05)) – P-value threshold for motif matching. Lower values result in stricter matches.

  • key_added (str (default: 'motif_match')) – Name of the key to store the resulting motif match matrix in adata_atac.varm.

Return type:

None

Returns:

Results are added to the mdata object in place:

  • mdata[atac_mod].varm[key_added]: Sparse matrix indicating motif matches for each accessible DNA sequence.

Raises:
  • AssertionError – If the DNA sequence information (“dna_sequence”) is missing in adata_atac.var.

  • AssertionError – If the number of motifs does not match the number of overlapping genes after filtering.

  • ValueError – If the background parameter is not one of the predefined choices (“even”, “subject”, or “genome”).

Notes

  • It computes motif log-odds scores based on the provided background nucleotide frequencies.

  • Motif matching is performed on accessible DNA sequences using the MOODS library, which allows for efficient scanning and p-value thresholding.

  • The resulting sparse matrix is binary (0 or 1), where 1 indicates the presence of a significant motif match.

Examples

Match TF motifs to accessible regions and associate them with expressed genes:

>>> match_motif(
...     mdata,
...     motifs=motif_list,
...     rna_mod="rna",
...     atac_mod="atac",
...     pseudocounts=0.0001,
...     p_value=5e-05,
...     background="even",
...     key_added="motif_match",
... )

Access overlapping motifs and genes:

>>> mdata.uns["motifs"]

Access motif match results:

>>> mdata["atac"].varm["motif_match"]

Customize background nucleotide frequencies:

>>> match_motif(mdata, motifs=motif_list, background="subject")