cell2net.preprocessing.dinucleotide_shuffle_one_hot#
- cell2net.preprocessing.dinucleotide_shuffle_one_hot(one_hot)#
Shuffle a one-hot encoded DNA sequence while preserving its dinucleotide composition.
This function converts a one-hot encoded DNA sequence into its nucleotide representation, shuffles it while maintaining the same dinucleotide composition, and then converts the shuffled sequence back into one-hot encoding.
- Parameters:
one_hot (
ndarray
) – A 2D array of shape (L, 4), where L is the sequence length, and each row is a one-hot encoded nucleotide. Each row should contain exactly one 1 and three 0s, corresponding to the nucleotides “A”, “C”, “G”, and “T”.- Return type:
- Returns:
A 2D array of shape (L, 4) representing the shuffled sequence in one-hot encoding. The dinucleotide composition of the original sequence is preserved.
Notes
The function assumes the input sequence is valid one-hot encoding. Behavior is undefined if the input contains invalid rows.
Shuffling is performed on the nucleotide sequence derived from the one-hot input, and the shuffled sequence is converted back to one-hot encoding.
The function uses the dinucleotide_shuffle helper function to handle the shuffling of the nucleotide sequence.
Examples
>>> import numpy as np >>> import cell2net as cn >>> import random >>> random.seed(42) >>> one_hot_sequence = np.array([ ... [1, 0, 0, 0], # A ... [0, 1, 0, 0], # C ... [0, 0, 1, 0], # G ... [0, 0, 0, 1] # T ... ]) >>> shuffled_one_hot = cn.pp.dinucleotide_one_hot_shuffle(one_hot_sequence) >>> shuffled_one_hot array([[0., 1., 0., 0.], # "C" [1., 0., 0., 0.], # "A" [0., 0., 0., 1.], # "T" [0., 0., 1., 0.]]) # "G"