Cell2Net Model Interpretation: Transcription Factor Attribution#

This tutorial demonstrates how to interpret trained Cell2Net models by analyzing transcription factor (TF) contributions to gene expression predictions. Using advanced attribution methods, we identify which transcription factors are most important for regulating each target gene across different immune cell types.

Overview#

Cell2Net model interpretation reveals the regulatory logic learned during training by:

  1. TF Attribution Analysis: Quantifying how much each transcription factor contributes to gene expression predictions

  2. Cell Type Specificity: Analyzing TF importance across different PBMC cell populations

  3. Regulatory Networks: Constructing TF-to-gene regulatory relationships from model interpretations

  4. Biological Validation: Connecting computational predictions to known regulatory mechanisms

Biological Significance#

TF attribution analysis provides insights into:

  • Regulatory Hierarchies: Which TFs are master regulators vs. downstream effectors

  • Cell Type Programs: How different immune cells use distinct TF combinations

  • Therapeutic Targets: Key regulatory nodes for intervention strategies

Technical Framework#

We employ Integrated Gradients attribution method because:

  • Model-agnostic: Works with any differentiable model architecture

  • Faithful attribution: Satisfies completeness and sensitivity axioms

  • Gradient-based: Efficiently computes feature importance through backpropagation

  • Baseline comparison: Measures importance relative to neutral reference state

import warnings
warnings.filterwarnings("ignore")

import os
import numpy as np
import mudata as md
import cell2net as cn
from tqdm import tqdm
import pandas as pd
md.set_options(pull_on_update=False)
<mudata._core.config.set_options at 0x7f4f9b24b860>

2. Setup File Paths and Output Directory#

Configure input and output directories for TF attribution analysis:

  • data_dir: Path to prepared multiome dataset with peak-to-gene associations

  • in_dir: Directory containing trained Cell2Net models from previous tutorial step

  • out_dir: New directory for storing TF attribution results and regulatory networks

This organized structure ensures:

  • Computational reproducibility: Clear tracking of model inputs and interpretation outputs

  • Result organization: Systematic storage of attribution matrices and TF-gene relationships

  • Analysis pipeline: Seamless integration with downstream regulatory network analysis

data_dir = "./02_prepare_data/mdata.h5mu"
in_dir = "./03_train_cell2net"
out_dir = "./05_to_gene"

os.makedirs(out_dir, exist_ok=True)

3. Load Multiome Dataset#

Load the prepared multiome dataset containing:

  • RNA expression: Gene expression measurements across PBMC cell types

  • ATAC accessibility: Chromatin accessibility profiles for regulatory regions

  • Peak-to-gene associations: Regulatory links between accessible peaks and target genes

  • Sequence information: DNA sequences around peaks for motif analysis

  • Cell annotations: Cell type labels for cell-type-specific TF analysis

This dataset provides the foundation for interpreting how Cell2Net models learned cell-type-specific regulatory relationships from sequence and accessibility features.

mdata_bulk = md.read_h5mu(data_dir)

4. Extract Target Gene List#

Extract the list of genes that have trained Cell2Net models available for interpretation:

  • Gene selection: Only genes with sufficient peak-to-gene associations and trained models

  • Regulatory focus: Genes with well-characterized regulatory regions in the PBMC dataset

  • Model availability: Ensures consistency between training and interpretation phases

Each gene in this list represents a regulatory target with learned Cell2Net models that can be interpreted to reveal transcription factor contributions and cell-type-specific regulatory mechanisms.

genes = mdata_bulk.uns['peak_to_gene']['gene'].unique().tolist()

5. Transcription Factor Attribution Analysis#

This is the main computational loop that performs TF attribution analysis for each trained Cell2Net model:

Model Loading and Setup#

For each gene, we:

  1. Initialize Cell2Net model with the same architecture used during training

  2. Load trained weights from the saved model checkpoints

  3. Transfer to GPU for efficient gradient computation during attribution

Attribution Computation#

The cn.ip.tf_attr() function implements Integrated Gradients attribution:

  • n_steps=100: Number of integration steps for stable gradient estimation

  • multiply_by_inputs=True: Scale attributions by input values for interpretability

  • batch_size=2: Process multiple samples simultaneously for efficiency

Technical Details#

Integrated Gradients Method:

  • Computes gradients along a straight path from baseline (neutral) to actual input

  • Satisfies axioms of completeness (attributions sum to prediction difference)

  • Provides stable, robust feature importance scores

Expression-Level Analysis:

  • Attributions computed for each TF motif across all regulatory peaks

  • Results capture how TF expression contribute to gene expression

  • Cell-type-specific analysis reveals regulatory context dependence

Output Generation#

For each gene, we save:

  1. Raw attributions (.npy): Full attribution matrices for detailed analysis

  2. TF-gene relationships (.csv): Top TFs per cell type with importance scores

Biological Interpretation#

The attribution scores reveal:

  • Positive attributions: TFs that promote gene expression when bound

  • Negative attributions: TFs that repress gene expression or compete for binding

  • Cell-type specificity: Different TF importance across immune cell populations

  • Regulatory logic: How combinations of TFs work together to control genes

This systematic analysis reveals the regulatory code learned by Cell2Net models and provides mechanistic insights into immune cell gene regulation.

for gene in tqdm(genes):
    if os.path.exists(f"{out_dir}/{gene}.npy"):
        continue
    
    model = cn.pd.model.Cell2Net(mdata=mdata_bulk, 
                                 gene=gene, 
                                 covariates=['total_counts_rna_log', 'total_counts_atac_log'])

    model.load(dir_path=f"{in_dir}/model")
    model.to_device('cuda:0')
    
    tf_attr = cn.ip.tf_attr(model, 
                            batch_size=2,
                            n_steps=100,
                            multiply_by_inputs=True)
    
    df = cn.ip.tf_to_gene(model.mdata, tf_attr, groupby="cell_type_v2", n_tfs=10)
    np.save(f"{out_dir}/{gene}.npy", tf_attr)
    df.to_csv(f"{out_dir}/{gene}.csv", index=False)

6. Compile TF-Gene Regulatory Networks#

Aggregate individual gene attribution results into a comprehensive regulatory network dataset:

Data Integration#

  • Load individual results: Read TF attribution CSV files for each analyzed gene

  • Concatenate datasets: Combine all TF-gene relationships into single dataframe

  • Network construction: Build comprehensive regulatory network from attribution scores

Network Structure#

The compiled dataset contains:

  • Gene: Target gene symbols

  • TF: Transcription factor names from JASPAR2024 database

  • Cell_type: PBMC cell type where regulation occurs

  • Attribution_score: Quantitative measure of TF importance for gene regulation

  • Rank: TF importance ranking within each cell type-gene combination

This integrated network provides a systems-level view of transcription factor regulation across the PBMC immune system.

df_list = []
for gene in genes:
    df = pd.read_csv(f"{out_dir}/{gene}.csv")

    df_list.append(df)
df_p2g = pd.concat(df_list).reset_index(drop=True)

Consolidate Network Data#

Create the final consolidated regulatory network by combining all individual gene attribution results into a single comprehensive dataset for analysis and visualization.

df = pd.concat(df_list)

Inspect the Regulatory Network#

df
tf gene cell_type_v2 mean_attr std_attr
0 KLF2 ISG15 B cell 1.415844 0.468231
1 ELF1 ISG15 B cell 0.975179 0.298402
2 MEF2C ISG15 B cell 0.851608 0.448217
3 IKZF3 ISG15 B cell 0.699268 0.239014
4 JUNB ISG15 B cell 0.691353 0.323894
... ... ... ... ... ...
75 ARID3A CLIC2 pDC 0.167850 0.062825
76 MAX CLIC2 pDC 0.154308 0.050194
77 CREB3L2 CLIC2 pDC 0.151772 0.049553
78 MGA CLIC2 pDC 0.150544 0.062157
79 JUNB CLIC2 pDC 0.141468 0.071961

154160 rows × 5 columns

7. Save Comprehensive Regulatory Network#

Export the complete TF-gene regulatory network to CSV format for downstream analysis and biological interpretation:

Saved Dataset Structure#

  • Comprehensive coverage: All TF-gene relationships across PBMC cell types

  • Quantitative scores: Attribution-based importance measures for each regulatory edge

  • Cell-type resolution: Context-specific regulatory relationships for each immune cell population

  • Standardized format: Ready for network analysis, visualization, and biological validation

Applications of the Regulatory Network#

Biological Discovery:

  • Master regulators: Identify TFs with broad regulatory influence across immune genes

  • Cell-type programs: Understand how different TFs drive cell-type-specific expression

  • Regulatory modules: Group genes by shared TF regulatory patterns

  • Disease mechanisms: Connect regulatory disruptions to immune disorders

Computational Analysis:

  • Network topology: Analyze regulatory network structure and connectivity

  • Pathway enrichment: Link TF targets to biological pathways and processes

  • Comparative analysis: Compare regulatory networks across conditions or diseases

  • Predictive modeling: Use regulatory relationships for gene expression prediction

Experimental Design:

  • Target prioritization: Select key TFs for experimental validation

  • Perturbation experiments: Design TF knockout/overexpression studies

  • Drug discovery: Identify regulatory nodes for therapeutic intervention

  • Biomarker development: Use TF activity signatures for cell state classification

This comprehensive regulatory network represents the interpretable regulatory logic learned by Cell2Net models and provides a valuable resource for understanding immune cell gene regulation.

df.to_csv(f"{out_dir}/tf_to_gene.csv")

Analysis Complete#

The Cell2Net transcription factor interpretation analysis is now complete! The interpreted regulatory networks provide mechanistic insights into how Cell2Net learned immune cell gene regulation and offer valuable hypotheses for experimental validation and therapeutic development.