cell2net.preprocessing.split_fragments#
- cell2net.preprocessing.split_fragments(fragment_files, cell_barcodes, groups, out_dir)#
Splits a fragment file into multiple group-specific fragment files based on cell barcodes.
This function reads a fragment file, assigns each fragment to a group based on the cell barcode, and writes group-specific fragments into separate files. The output files are compressed and indexed using bgzip and tabix.
- Parameters:
fragment_files (
str
|list
[str
]) –Path to the input fragment files. Each file can be a plain text or gzip-compressed (.gz) file and should have the following formats:
chr1
10012
10013
TTTGCGACACCCACAG-1
1
chr1
10066
10198
ACGAATCTCATTTGCT-1
1
chr1
10066
10478
TCAAGAACAGTAATAG-1
1
cell_barcodes (
list
[str
]) – A list of cell barcodes corresponding to the fragments.groups (
list
[str
]) – A list of group names corresponding to each cell barcode. This can represent cell types or states, or different conditions. Must have the same length as cell_barcodes.out_dir (
str
) – Path to the output directory where the group-specific fragment files will be saved.
- Return type:
- Returns:
Write output to fragment file
Notes
For each unique group in groups, a compressed and indexed fragment file is created in the output directory.
The files are named as <group>.fragments.tsv.gz.