CRISPR Reporter Screen analysis with BEAN¶
In this tutorial, we will cover * Obtaining target variant editing rate from allele count information * ReporterScreen object slicing (indexing) * Allele filtering based on significance
For the basic API of ReporterScreen object, see perturb-tools tutorial of Screen API.
! pip install -q crispr-bean
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting beans==0.1.2
Downloading beans-0.1.2.tar.gz (464 kB)
[K |████████████████████████████████| 464 kB 4.1 MB/s
[?25h Installing build dependencies ... [?25l[?25hdone
Getting requirements to build wheel ... [?25l[?25hdone
Preparing wheel metadata ... [?25l[?25hdone
Requirement already satisfied: numpy in /usr/local/lib/python3.7/dist-packages (from beans==0.1.2) (1.21.6)
Collecting perturb-tools>=0.0.16
Downloading perturb-tools-0.1.4.tar.gz (147 kB)
[K |████████████████████████████████| 147 kB 51.2 MB/s
[?25hCollecting matplotlib>=3.4
Downloading matplotlib-3.5.3-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (11.2 MB)
[K |████████████████████████████████| 11.2 MB 51.2 MB/s
[?25hCollecting anndata>=0.7.1
Downloading anndata-0.8.0-py3-none-any.whl (96 kB)
[K |████████████████████████████████| 96 kB 4.9 MB/s
[?25hRequirement already satisfied: pandas>=1.1.2 in /usr/local/lib/python3.7/dist-packages (from perturb-tools>=0.0.16->beans==0.1.2) (1.3.5)
Collecting biopython>=1.79
Downloading biopython-1.79-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (2.3 MB)
[K |████████████████████████████████| 2.3 MB 57.1 MB/s
[?25hCollecting pool-sharq>=0.0.12
Downloading pool_sharq-0.0.12-py3-none-any.whl (7.8 kB)
Requirement already satisfied: plotly in /usr/local/lib/python3.7/dist-packages (from perturb-tools>=0.0.16->beans==0.1.2) (5.5.0)
Requirement already satisfied: regex in /usr/local/lib/python3.7/dist-packages (from perturb-tools>=0.0.16->beans==0.1.2) (2022.6.2)
Requirement already satisfied: typing_extensions in /usr/local/lib/python3.7/dist-packages (from anndata>=0.7.1->perturb-tools>=0.0.16->beans==0.1.2) (4.1.1)
Requirement already satisfied: importlib_metadata>=0.7 in /usr/local/lib/python3.7/dist-packages (from anndata>=0.7.1->perturb-tools>=0.0.16->beans==0.1.2) (4.12.0)
Requirement already satisfied: packaging>=20 in /usr/local/lib/python3.7/dist-packages (from anndata>=0.7.1->perturb-tools>=0.0.16->beans==0.1.2) (21.3)
Requirement already satisfied: scipy>1.4 in /usr/local/lib/python3.7/dist-packages (from anndata>=0.7.1->perturb-tools>=0.0.16->beans==0.1.2) (1.7.3)
Requirement already satisfied: natsort in /usr/local/lib/python3.7/dist-packages (from anndata>=0.7.1->perturb-tools>=0.0.16->beans==0.1.2) (5.5.0)
Requirement already satisfied: h5py>=3 in /usr/local/lib/python3.7/dist-packages (from anndata>=0.7.1->perturb-tools>=0.0.16->beans==0.1.2) (3.1.0)
Requirement already satisfied: cached-property in /usr/local/lib/python3.7/dist-packages (from h5py>=3->anndata>=0.7.1->perturb-tools>=0.0.16->beans==0.1.2) (1.5.2)
Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.7/dist-packages (from importlib_metadata>=0.7->anndata>=0.7.1->perturb-tools>=0.0.16->beans==0.1.2) (3.8.1)
Collecting fonttools>=4.22.0
Downloading fonttools-4.37.3-py3-none-any.whl (959 kB)
[K |████████████████████████████████| 959 kB 44.2 MB/s
[?25hRequirement already satisfied: pyparsing>=2.2.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=3.4->perturb-tools>=0.0.16->beans==0.1.2) (3.0.9)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=3.4->perturb-tools>=0.0.16->beans==0.1.2) (1.4.4)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=3.4->perturb-tools>=0.0.16->beans==0.1.2) (0.11.0)
Requirement already satisfied: pillow>=6.2.0 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=3.4->perturb-tools>=0.0.16->beans==0.1.2) (7.1.2)
Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=3.4->perturb-tools>=0.0.16->beans==0.1.2) (2.8.2)
Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.7/dist-packages (from pandas>=1.1.2->perturb-tools>=0.0.16->beans==0.1.2) (2022.2.1)
Collecting licorice>=0.0.2
Downloading licorice-0.0.3-py3-none-any.whl (8.1 kB)
Collecting requests>=2.26.0
Downloading requests-2.28.1-py3-none-any.whl (62 kB)
[K |████████████████████████████████| 62 kB 1.4 MB/s
[?25hRequirement already satisfied: tqdm>=4.62.3 in /usr/local/lib/python3.7/dist-packages (from pool-sharq>=0.0.12->perturb-tools>=0.0.16->beans==0.1.2) (4.64.1)
Collecting pyrequisites>=0.0.2
Downloading pyrequisites-0.0.2-py3-none-any.whl (4.1 kB)
Collecting beautifulsoup4>=4.10.0
Downloading beautifulsoup4-4.11.1-py3-none-any.whl (128 kB)
[K |████████████████████████████████| 128 kB 66.4 MB/s
[?25hCollecting soupsieve>1.2
Downloading soupsieve-2.3.2.post1-py3-none-any.whl (37 kB)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.7/dist-packages (from python-dateutil>=2.7->matplotlib>=3.4->perturb-tools>=0.0.16->beans==0.1.2) (1.15.0)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests>=2.26.0->pool-sharq>=0.0.12->perturb-tools>=0.0.16->beans==0.1.2) (2.10)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests>=2.26.0->pool-sharq>=0.0.12->perturb-tools>=0.0.16->beans==0.1.2) (1.24.3)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests>=2.26.0->pool-sharq>=0.0.12->perturb-tools>=0.0.16->beans==0.1.2) (2022.6.15)
Requirement already satisfied: charset-normalizer<3,>=2 in /usr/local/lib/python3.7/dist-packages (from requests>=2.26.0->pool-sharq>=0.0.12->perturb-tools>=0.0.16->beans==0.1.2) (2.1.1)
Requirement already satisfied: tenacity>=6.2.0 in /usr/local/lib/python3.7/dist-packages (from plotly->perturb-tools>=0.0.16->beans==0.1.2) (8.0.1)
Building wheels for collected packages: beans, perturb-tools
Building wheel for beans (PEP 517) ... [?25l[?25hdone
Created wheel for beans: filename=beans-0.1.2-cp37-cp37m-linux_x86_64.whl size=506725 sha256=43eb6b2b41a112f7ea2d37a17e1e9bd91dffd4599710f3958769f84e59b3e6b8
Stored in directory: /root/.cache/pip/wheels/68/81/58/49b78bc024e2a5312dc830797a25c15c7acda4225b5b25d232
Building wheel for perturb-tools (setup.py) ... [?25l[?25hdone
Created wheel for perturb-tools: filename=perturb_tools-0.1.4-py3-none-any.whl size=54209 sha256=e383c22165c92cf7c22605aec1da8689f60d55d1c1ea530a925aa89e5ee6608d
Stored in directory: /root/.cache/pip/wheels/35/75/47/47a3d25b1958a24d7d3ee488b8956a9f43bda0bb0761b4d67c
Successfully built beans perturb-tools
Installing collected packages: soupsieve, licorice, requests, pyrequisites, fonttools, beautifulsoup4, pool-sharq, matplotlib, biopython, anndata, perturb-tools, beans
Attempting uninstall: requests
Found existing installation: requests 2.23.0
Uninstalling requests-2.23.0:
Successfully uninstalled requests-2.23.0
Attempting uninstall: beautifulsoup4
Found existing installation: beautifulsoup4 4.6.3
Uninstalling beautifulsoup4-4.6.3:
Successfully uninstalled beautifulsoup4-4.6.3
Attempting uninstall: matplotlib
Found existing installation: matplotlib 3.2.2
Uninstalling matplotlib-3.2.2:
Successfully uninstalled matplotlib-3.2.2
Successfully installed anndata-0.8.0 beautifulsoup4-4.11.1 beans-0.1.2 biopython-1.79 fonttools-4.37.3 licorice-0.0.3 matplotlib-3.5.3 perturb-tools-0.1.4 pool-sharq-0.0.12 pyrequisites-0.0.2 requests-2.28.1 soupsieve-2.3.2.post1
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import bean as be
!gdown 18Azb8YmmMvFZo9urc2TxZr540xXWOv_v
Downloading...
From: https://drive.google.com/uc?id=18Azb8YmmMvFZo9urc2TxZr540xXWOv_v
To: /content/bean_count_072121_ABE_topbot_LDLvar.h5ad
100% 40.2M/40.2M [00:00<00:00, 160MB/s]
bdata = be.read_h5ad("bean_count_072121_ABE_topbot_LDLvar.h5ad")
Data Structure¶
ReporterScreen object is a shallow wrapper around AnnData. More comprehensive data wrangling documentation can be found in their documentation.
bdata
ReporterScreen.X
: guide countReporterScreen.guides
: guide metadata. Reference toAnnData.obs
ReporterScreen.samples
: sample/condition metadata. Reference toAnnData.var
.ReporterScreen.layers["X_bcmatch"]
: barcode-matched guide countsReporterScreen.layers["edits"]
: edit countsReporterScreen.uns["allele_counts"]
: allele counts per guide and conditionReporterScreen.uns["edit_counts"]
: edit counts per guide and condition
.guides
attribute contains the information about each guide.
bdata.guides
.samples
attribute contains the sample and condition specific
information.
bdata.samples
Per-guide allele count information is stored in
.uns['allele_counts']
.
bdata.uns["allele_counts"]
guide | allele | rep1_bot | rep2_bot | rep3_VPA_bot | rep4_VPA_bot | rep1_bulk | rep2_bulk | rep3_VPA_bulk | rep4_VPA_bulk | rep1_top | rep2_top | rep3_VPA_top | rep4_VPA_top | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | LDLR_SA_3_g5 | 26 | 13 | 9 | 13 | 60 | 21 | 27 | 31 | 32 | 26 | 43 | 43 | |
1 | LDLR_SA_3_g5 | 0:13:+:A>G | 6 | 16 | 11 | 24 | 29 | 21 | 28 | 22 | 27 | 11 | 20 | 13 |
2 | LDLR_SA_3_g5 | -12:1:+:A>G,0:13:+:A>G | 2 | 16 | 1 | 24 | 16 | 5 | 9 | 4 | 17 | 12 | 7 | 3 |
3 | LDLR_SA_3_g5 | 0:13:+:A>G,7:20:+:A>G | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 9 | 0 | 0 | 0 | 0 |
4 | LDLR_SA_3_g5 | -12:1:+:A>G,-8:5:+:A>G,0:13:+:A>G,10:23:+:A>G | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
165506 | 2:164588224GAG_Maj_ABE_10_g3 | -10:1:+:C>T,-8:3:+:A>T | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
165507 | 2:164588224GAG_Maj_ABE_10_g3 | -8:3:+:A>T,-3:8:+:C>A,-1:10:+:G>A | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
165508 | rs4921914_Min_ABE_501_g4 | -9:3:+:A>G,-8:4:+:A>G,-3:9:+:A>G,2:14:+:A>G,4:... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
165509 | rs191388787_Maj_ABE_121_g2 | -7:3:+:A>G,-5:5:+:A>G,-3:7:+:A>G | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
165510 | rs113408797_Maj_ABE_41_g1 | -8:1:+:A>G,-1:8:+:A>G,3:12:+:A>G,5:14:+:A>G,17... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
165511 rows × 14 columns
Per-variant aggregated count information is stored in
.uns['edit_counts']
.
bdata.uns["edit_counts"]
Changing column names¶
ReporterScreen.guides
and ReporterScreen.var
are equivalent to
AnnData.obs
and AnnData.var
, which are Pandas DataFrames. Column
names can be changed as in Pandas:
bdata.guides = bdata.guides.rename(columns={"Reporter":"reporter"})
New columns can be assigned likewise:
bdata.samples["lower_quantile"] = [
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.7,
0.7,
0.7,
0.7,
]
Subsetting & addition¶
Works as anndata, supports allele & edit count operations.
Subsetting & selection¶
bdata.samples["replicate"], bdata.samples["sort"] = zip(*bdata.samples.index.map(lambda s: s.rsplit("_", 1)))
bdata_subset = bdata[:10,bdata.samples.sort == "bulk"]
bdata.uns["allele_counts"] = bdata.uns['allele_counts'].loc[bdata.uns['allele_counts'].allele.map(str) != ""]
Getting edit rates from allele counts¶
bdata.uns['allele_counts']
guide | allele | rep1_bot | rep2_bot | rep3_VPA_bot | rep4_VPA_bot | rep1_bulk | rep2_bulk | rep3_VPA_bulk | rep4_VPA_bulk | rep1_top | rep2_top | rep3_VPA_top | rep4_VPA_top | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | LDLR_SA_3_g5 | 0:13:+:A>G | 6 | 16 | 11 | 24 | 29 | 21 | 28 | 22 | 27 | 11 | 20 | 13 |
2 | LDLR_SA_3_g5 | -12:1:+:A>G,0:13:+:A>G | 2 | 16 | 1 | 24 | 16 | 5 | 9 | 4 | 17 | 12 | 7 | 3 |
3 | LDLR_SA_3_g5 | 0:13:+:A>G,7:20:+:A>G | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 9 | 0 | 0 | 0 | 0 |
4 | LDLR_SA_3_g5 | -12:1:+:A>G,-8:5:+:A>G,0:13:+:A>G,10:23:+:A>G | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
5 | LDLR_SA_3_g5 | -12:1:+:A>G,0:13:+:A>G,7:20:+:A>G | 1 | 0 | 0 | 0 | 0 | 2 | 2 | 0 | 1 | 0 | 0 | 0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
165506 | 2:164588224GAG_Maj_ABE_10_g3 | -10:1:+:C>T,-8:3:+:A>T | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
165507 | 2:164588224GAG_Maj_ABE_10_g3 | -8:3:+:A>T,-3:8:+:C>A,-1:10:+:G>A | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
165508 | rs4921914_Min_ABE_501_g4 | -9:3:+:A>G,-8:4:+:A>G,-3:9:+:A>G,2:14:+:A>G,4:... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
165509 | rs191388787_Maj_ABE_121_g2 | -7:3:+:A>G,-5:5:+:A>G,-3:7:+:A>G | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
165510 | rs113408797_Maj_ABE_41_g1 | -8:1:+:A>G,-1:8:+:A>G,3:12:+:A>G,5:14:+:A>G,17... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
162065 rows × 14 columns
bdata.uns["edit_counts"] = bdata.get_edit_from_allele()
bdata.get_edit_mat_from_uns("A", "G", target_pos_col = "target_pos")
New edit matrix saved in .layers['edits']. Returning old edits.
array([[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]])
window_edit_rate= bdata.get_guide_edit_rate(normalize_by_editable_base = True,
edited_base = "A",
editable_base_start = 3,
editable_base_end = 8,
bcmatch_thres = 5,
prior_weight = 1,
return_result = True)
plt.hist(window_edit_rate, bins=30)
(array([579., 364., 299., 249., 221., 214., 191., 134., 159., 133., 97.,
90., 83., 76., 70., 67., 48., 53., 45., 56., 38., 37.,
32., 21., 24., 15., 8., 7., 1., 2.]),
array([4.13052458e-04, 2.89016995e-02, 5.73903465e-02, 8.58789936e-02,
1.14367641e-01, 1.42856288e-01, 1.71344935e-01, 1.99833582e-01,
2.28322229e-01, 2.56810876e-01, 2.85299523e-01, 3.13788170e-01,
3.42276817e-01, 3.70765464e-01, 3.99254111e-01, 4.27742758e-01,
4.56231405e-01, 4.84720052e-01, 5.13208699e-01, 5.41697346e-01,
5.70185993e-01, 5.98674640e-01, 6.27163287e-01, 6.55651934e-01,
6.84140582e-01, 7.12629229e-01, 7.41117876e-01, 7.69606523e-01,
7.98095170e-01, 8.26583817e-01, 8.55072464e-01]),
<BarContainer object of 30 artists>)