bean qc¶
bean qc
: QC of reporter screen data¶
bean qc \
my_sorting_screen.h5ad `# Input ReporterScreen .h5ad file path` \
-o my_sorting_screen_masked.h5ad `# Output ReporterScreen .h5ad file path` \
-r qc_report_my_sorting_screen `# Prefix for QC report` \
--control-condition presort `# "condition" column in the control sample before selection. Mean gRNA editing rates in these samples are reported. ` \
# Inspect the output qc_report_my_sorting_screen.html to tweak QC threshold
bean qc \
my_sorting_screen.h5ad \
-o my_sorting_screen_masked.h5ad \
-r qc_report_my_sorting_screen \
--control-condition presort \
#[--count-correlation-thres 0.7 ...]\
-b # Removes the failing replicates without enough number of good quality samples.
bean qc
supports following quality control and masks samples with low quality. Specifically:
Plots guide coverage and the uniformity of coverage
Guide count correlation between samples
Log fold change correlation when positive controls are provided
Plots editing rate distribution
Identify samples with low guide coverage/guide count correlation/editing rate and mask the sample in
bdata.samples.mask
Identify outlier guides to filter out
Output¶
Above command produces
my_sorting_screen_masked.h5ad
without problematic replicate and guides and with sample masks, andqc_report_my_sorting_screen.[html,ipynb]
as QC report.
Full parameters¶
usage: bean qc [-h] [--count-correlation-thres COUNT_CORRELATION_THRES]
[--edit-rate-thres EDIT_RATE_THRES] [--lfc-thres LFC_THRES]
[-o OUT_SCREEN_PATH] [-r OUT_REPORT_PREFIX] [-b] [-i]
[--no-editing] [--dont-recalculate-edits] [--tiling TILING]
[--replicate-col REPLICATE_COL]
[--sample-covariates SAMPLE_COVARIATES]
[--condition-col CONDITION_COL]
[--target-pos-col TARGET_POS_COL] [--rel-pos-is-reporter]
[--edit-start-pos EDIT_START_POS] [--edit-end-pos EDIT_END_POS]
[--posctrl-col POSCTRL_COL] [--posctrl-val POSCTRL_VAL]
[--lfc-conds LFC_CONDS] [--control-condition CONTROL_CONDITION]
[--reporter-length REPORTER_LENGTH]
[--reporter-right-flank-length REPORTER_RIGHT_FLANK_LENGTH]
bdata_path
Positional Arguments¶
- bdata_path
Path to the ReporterScreen object to run QC on
Named Arguments¶
- -o, --out-screen-path
Path where quality-filtered ReporterScreen object to be written to
- -r, --out-report-prefix
Output prefix of qc report (prefix.html, prefix.ipynb)
- --reporter-length
Length of reporter sequence in the construct.
- --reporter-right-flank-length
Length of the right-flanking nucleotides of protospacer in the reporter.
QC thresholds¶
- --count-correlation-thres
Correlation threshold to mask out.
Default:
0.7
- --edit-rate-thres
Mean editing rate threshold per sample to mask out.
Default:
0.1
- --lfc-thres
Positive guides’ correlation threshold to filter out.
Default:
-0.1
Run options¶
- -b, --remove-bad-replicates
Remove replicates with at least two of its samples meet the QC threshold.
Default:
False
- -i, --ignore-missing-samples
If the flag is not provided, if the ReporterScreen object does not contain all condiitons for each replicate, make fake empty samples. If the flag is provided, don’t add dummy samples.
Default:
False
- --no-editing
Ignore QC about editing. Can be used for QC of other editing modalities.
Default:
False
- --dont-recalculate-edits
When ReporterScreen.layers[‘edit_count’] exists, do not recalculate the edit counts from ReporterScreen.uns[‘allele_count’].
Default:
False
Input .h5ad formatting¶
- --tiling
Specify that the guide library is tiling library without ‘n guides per target’ design
- --replicate-col
Label of column in bdata.samples that describes replicate ID.
Default:
'replicate'
- --sample-covariates
Comma-separated list of column names in bdata.samples that describes non-selective experimental condition. (drug treatment, etc.)
- --condition-col
Label of column in bdata.samples that describes experimental condition. (sorting bin, time, etc.)
Default:
'condition'
- --target-pos-col
Target position column in bdata.guides specifying target edit position in reporter
Default:
'target_pos'
- --rel-pos-is-reporter
Specifies whether edit_start_pos and edit_end_pos are relative to reporter position. If False, those are relative to spacer position.
Default:
False
- --edit-start-pos
Edit start position to quantify editing rate on, 0-based inclusive.
Default:
2
- --edit-end-pos
Edit end position to quantify editing rate on, 0-based exclusive.
Default:
7
- --posctrl-col
Column name in ReporterScreen.guides DataFrame that specifies guide category. To use all gRNAs, feed empty string ‘’.
Default:
'target_group'
- --posctrl-val
Value in ReporterScreen.guides[posctrl_col] that specifies guide will be used as the positive control in calculating log fold change.
Default:
'PosCtrl'
- --lfc-conds
Values in of column in ReporterScreen.samples[condition_label] for LFC will be calculated between, delimited by comma
Default:
'top,bot'
- --control-condition
Values in of column in ReporterScreen.samples[condition_label] for guide-level editing rate to be calculated
Default:
'bulk'