cell2net.interpretation.peak_to_gene#

cell2net.interpretation.peak_to_gene(mdata, attr, n_resamples=100, confidence_level=0.95, random_state=42, groupby=None)#

Extracts peak-to-gene links based on the attribution of peak accessibility

This function assigns peak-level attributions to their corresponding genes based on the peak_to_gene mapping in a MuData object. It computes the average attribution for each peak, either across all cells or grouped by a specified metadata column.

Parameters:
  • mdata (MuData) –

    A MuData object containing multi-modal single-cell data. It must have:

    • mdata[“rna”]: RNA modality with gene names in var_names.

    • mdata.uns[“peak_to_gene”]: A mapping between peaks and genes with a column “peak”.

    • mdata.obs: Cell metadata, required if groupby is specified.

  • attr (ndarray) – A 2D array of peak-level attributions with shape (n_cells, n_peaks). Rows correspond to cells, and columns correspond to peaks.

  • groupby (str | None (default: None)) – The name of a column in mdata.obs to group cells by. If None, attributions are averaged across all cells.

Return type:

DataFrame

Returns:

A DataFrame summarizing peak-to-gene attributions with the following columns:

  • ”peak”: Peak identifiers.

  • ”gene”: The associated gene (from the first gene in mdata[“rna”].var_names).

  • ”avg_attr”: Average attribution for each peak.

  • Additional column(s) for group labels if groupby is specified.

Raises:

AssertionError

  • If groupby is specified but not found in mdata.obs. - If the length of the groupby column does not match the number of cells in attr.

Notes

  • If groupby is None, the function computes average attributions across all cells.

  • If groupby is specified, the function computes group-specific average attributions.

  • The mdata.uns[“peak_to_gene”][“peak”] must contain a mapping of peaks to genes.

Examples

>>> mdata = MuData(...)  # Load MuData object
>>> attr = np.random.rand(100, 5000)  # Example attributions for 100 cells and 5000 peaks
>>> # Compute average attribution across all cells
>>> df = peak_to_gene(mdata, attr)
>>> print(df.head())
     peak    gene   attribution
0  peak_1  gene_1  0.123456
1  peak_2  gene_1  0.234567
>>> # Compute group-specific average attributions
>>> df_grouped = peak_to_gene(mdata, attr, groupby="cell_type")
>>> print(df_grouped.head())
     peak    gene    cell_type   attribution
0  peak_1  gene_1  B_cells      0.123456
1  peak_2  gene_1  T_cells      0.234567