cell2net.preprocessing.dinucleotide_shuffle_str#

cell2net.preprocessing.dinucleotide_shuffle_str(seq, random_state=42)#

Shuffle a DNA sequence while preserving its dinucleotide composition.

This function takes a DNA sequence as input, splits it into overlapping dinucleotides, shuffles them, and reconstructs a sequence with the same dinucleotide composition but in a randomized order.

Parameters:
  • seq (str) – The DNA sequence to shuffle. Must be a string of nucleotides (e.g., “ATCG”). Sequences with fewer than 2 characters are returned unchanged.

  • random_state (int (default: 42)) – The randome state.

Return type:

str

Returns:

A shuffled version of the input sequence with the same dinucleotide composition. If the input sequence has fewer than 2 characters, it is returned as is.

Notes

  • The function ensures that the dinucleotide composition of the shuffled sequence matches that of the input sequence, but the overall sequence order is randomized.

  • Randomization is achieved using the random.shuffle function.

Examples

>>> import cell2net as cn
>>> cn.pp.dinucleotide_shuffle_str("ATCG")
'TACG'
>>> cn.pp.dinucleotide_shuffle_str("A")
'A'
>>> cn.pp.dinucleotide_shuffle_str("")
''