Preprocessing: pp#

ATAC-seq matrix#

preprocessing.binarize(data[, atac_mod, layer])

Binarize the data matrix in an AnnData or MuData object

ATAC-seq fragment#

`preprocessing.calculate_depth`(chrom_size, ...)	Calculate genome depth for a given chromosome.
`preprocessing.collapse_consecutive_values`(X)	Collapse consecutive identical values in an array.
`preprocessing.fragments_to_coverage`(...[, ...])	Convert fragment data to genome coverage signal.
`preprocessing.fragment_to_bigwig`(...[, ...])	Convert fragment file to BigWig format.
`preprocessing.split_fragments`(...)	Splits a fragment file into multiple group-specific fragment files based on cell barcodes.

Gene#

`preprocessing.get_gene_tss_coord`(gene_gtf[, ...])	Extract transcription start site (TSS) coordinates for genes from a GTF file.
`preprocessing.add_gene_tss_coord`(mdata, gene_gtf)	Add the TSS coordinates of genes to mdata[mod_names].uns.

Motif#

`preprocessing.get_motifs_from_jaspar`([...])	Fetch transcription factor motifs from the JASPAR database.
`preprocessing.filter_motifs_by_genes`(motifs, ...)	Filter motifs by matching their names to expressed gene names in the RNA modality.
`preprocessing.match_motif`(mdata, motifs[, ...])	Matches transcription factor motifs to accessible DNA sequences and links them with expressed genes.
`preprocessing.tf_to_gene`(mdata[, rna_mod, ...])	Link transcription factors (TFs) to target genes based on TF binding sites.

Peaks#

`preprocessing.add_peaks`(mdata[, mod_name, ...])	Add peak metadata to an ATAC-seq modality in a MuData object.
`preprocessing.peak_to_gene`(mdata[, rna_mod, ...])	Link peaks to genes based on proximity to transcription start sites (TSS).

Sequences#

`preprocessing.add_dna_sequence`(mdata, ref_fasta)	Add sequences to peak metadata in a MuData object.
`preprocessing.add_variants_to_sequence`(mdata)	Add genomic variants to DNA sequences from peak regions to generate personalized haplotype sequences.
`preprocessing.dinucleotide_shuffle_one_hot`(one_hot)	Shuffle a one-hot encoded DNA sequence while preserving its dinucleotide composition.
`preprocessing.dinucleotide_shuffle_str`(seq)	Shuffle a DNA sequence while preserving its dinucleotide composition.
`preprocessing.one_hot_to_seq`(one_hot)	Converts a one-hot encoded DNA matrix back to a nucleotide sequence.
`preprocessing.random_seq`(seq_len[, bases])	Generate a random nucleotide sequence of a specified length.
`preprocessing.seq_to_one_hot`(seq)	One-hot encodes a DNA sequence while handling unknown bases.
`preprocessing.update_sequence_with_variants`(...)	Update reference DNA sequences with genomic variants based on genotype information.