cell2net.preprocessing.split_fragments#

cell2net.preprocessing.split_fragments(fragment_files, cell_barcodes, groups, out_dir)#

Splits a fragment file into multiple group-specific fragment files based on cell barcodes.

This function reads a fragment file, assigns each fragment to a group based on the cell barcode, and writes group-specific fragments into separate files. The output files are compressed and indexed using bgzip and tabix.

Parameters:
  • fragment_files (str | list[str]) –

    Path to the input fragment files. Each file can be a plain text or gzip-compressed (.gz) file and should have the following formats:

    chr1

    10012

    10013

    TTTGCGACACCCACAG-1

    1

    chr1

    10066

    10198

    ACGAATCTCATTTGCT-1

    1

    chr1

    10066

    10478

    TCAAGAACAGTAATAG-1

    1

  • cell_barcodes (list[str]) – A list of cell barcodes corresponding to the fragments.

  • groups (list[str]) – A list of group names corresponding to each cell barcode. This can represent cell types or states, or different conditions. Must have the same length as cell_barcodes.

  • out_dir (str) – Path to the output directory where the group-specific fragment files will be saved.

Return type:

None

Returns:

Write output to fragment file

Notes

  • For each unique group in groups, a compressed and indexed fragment file is created in the output directory.

  • The files are named as <group>.fragments.tsv.gz.