cell2net.interpretation.tf_to_gene#

cell2net.interpretation.tf_to_gene(mdata, attr, groupby=None, n_tfs=10)#

Aggregate transcription factor (TF) attributions and link them to genes for each group.

This function computes the mean TF regulation values for each group specified in the metadata, links them to a single gene (default RNA dataset), and selects the top n_tfs most regulated TFs for each group. The results are returned as a pandas DataFrame.

Parameters:

mdata (MuData) –
A MuData object containing the metadata and RNA dataset. The object must have:
- obs (metadata) with the column specified by groupby.
- uns[“tfs”] containing the list of transcription factors.
- [“rna”].var_names containing gene names.
attr (ndarray) – A 2D NumPy array of shape (n_cells, n_tfs) containing TF attributions. Each row corresponds to a cell, and each column corresponds to a TF.
groupby (str | None (default: None)) – The column name in mdata.obs to group cells by (e.g., cell type, cluster ID).
n_tfs (int | None (default: 10)) – The number of top transcription factors to select for each group based on their mean regulation values.

Return type:

DataFrame

Returns:

A pandas DataFrame with the following columns:

tf: The transcription factor name.
gene: The linked gene (from mdata[“rna”].var_names[0]).
groupby: The group name (e.g., cell type or cluster).
attribution: The mean attribution value of the TF within the group.

The DataFrame is grouped by the groupby column, with the top n_tfs TFs included for each group.

Raises:

AssertionError – If the groupby column is not present in mdata.obs, or if the length of the groupby column does not match the number of rows in attr.

Notes

The function assumes that mdata.uns[“tfs”] contains a list of transcription factor names, and mdata[“rna”].var_names[0] provides the associated gene name.
Within each group, TF regulation values are aggregated by their mean, and the top n_tfs with the highest mean regulation are retained.

Examples

>>> mdata = MuData(...)  # MuData object with metadata and RNA data
>>> attr = np.random.rand(100, 20)  # Example attribution array (100 cells, 20 TFs)
>>> groupby = "cell_type"
>>> df = tf_to_gene(mdata, attr, groupby, n_tfs=5)
>>> df.head()
   tf      gene      cell_type  attribution
0  TF1     Gene1    Type1      0.1234
1  TF2     Gene1    Type1      0.1123
2  TF3     Gene1    Type1      0.0987
3  TF4     Gene1    Type1      0.0876
4  TF5     Gene1    Type1      0.0765