Preprocessing: pp#

ATAC-seq matrix#

preprocessing.binarize(data[, atac_mod, layer])

Binarize the data matrix in an AnnData or MuData object

ATAC-seq fragment#

preprocessing.calculate_depth(chrom_size, ...)

Calculate genome depth for a given chromosome.

preprocessing.collapse_consecutive_values(X)

Collapse consecutive identical values in an array.

preprocessing.fragments_to_coverage(...[, ...])

Convert fragment data to genome coverage signal.

preprocessing.fragment_to_bigwig(...[, ...])

Convert fragment file to BigWig format.

preprocessing.split_fragments(...)

Splits a fragment file into multiple group-specific fragment files based on cell barcodes.

Gene#

preprocessing.get_gene_tss_coord(gene_gtf[, ...])

Extract transcription start site (TSS) coordinates for genes from a GTF file.

preprocessing.add_gene_tss_coord(mdata, gene_gtf)

Add the TSS coordinates of genes to mdata[mod_names].uns.

Motif#

preprocessing.get_motifs_from_jaspar([...])

Fetch transcription factor motifs from the JASPAR database.

preprocessing.filter_motifs_by_genes(motifs, ...)

Filter motifs by matching their names to expressed gene names in the RNA modality.

preprocessing.match_motif(mdata, motifs[, ...])

Matches transcription factor motifs to accessible DNA sequences and links them with expressed genes.

preprocessing.tf_to_gene(mdata[, rna_mod, ...])

Link transcription factors (TFs) to target genes based on TF binding sites.

Peaks#

preprocessing.add_peaks(mdata[, mod_name, ...])

Add peak metadata to an ATAC-seq modality in a MuData object.

preprocessing.peak_to_gene(mdata[, rna_mod, ...])

Link peaks to genes based on proximity to transcription start sites (TSS).

Sequences#

preprocessing.add_dna_sequence(mdata, ref_fasta)

Add sequences to peak metadata in a MuData object.

preprocessing.add_variants_to_sequence(mdata)

Add genomic variants to DNA sequences from peak regions to generate personalized haplotype sequences.

preprocessing.dinucleotide_shuffle_one_hot(one_hot)

Shuffle a one-hot encoded DNA sequence while preserving its dinucleotide composition.

preprocessing.dinucleotide_shuffle_str(seq)

Shuffle a DNA sequence while preserving its dinucleotide composition.

preprocessing.one_hot_to_seq(one_hot)

Converts a one-hot encoded DNA matrix back to a nucleotide sequence.

preprocessing.random_seq(seq_len[, bases])

Generate a random nucleotide sequence of a specified length.

preprocessing.seq_to_one_hot(seq)

One-hot encodes a DNA sequence while handling unknown bases.

preprocessing.update_sequence_with_variants(...)

Update reference DNA sequences with genomic variants based on genotype information.