Giter Club home page Giter Club logo

crisprme's Introduction

CRISPRme

install with bioconda GitHub release (latest by date) Conda license

CRISPRme is a tool for comprehensive off-target assessment available as a web application online, offline, and command line. It integrates human genetic variant datasets with orthogonal genomic annotations to predict and prioritize CRISPR-Cas off-target sites at scale. The method considers both single-nucleotide variants (SNVs) and indels, accounts for bona fide haplotypes, accepts spacer:protospacer mismatches and bulges, and is suitable for population and personal genome analyses. CRISPRme takes care of all steps in the process including data download, executing the complete search, and presents an exhaustive report with tables and figures within interactive web-based GUI.

The software has the following main functionalities:

  • complete-search performs a search from scratch with the given inputs (including gRNA, reference genome, and genetic variants).
  • targets-integration integrates the search results with GENCODE data to identify genes close to the candidate off-targets and collect the top ranking candidates in term of CFD score, CRISTA score, or number of mismatches/bulges.
  • web-interface starts a local instance of the web interface accessible from any browser.

Installation

CRISPRme can be installed both via Conda (only Linux users) and Docker (all operating systems, including OSX and Windows).

Installation via Conda

If conda is not already available on your machine, the next section will describe how to obtain a fresh conda distribution. If conda is already available on your machine you can skip the next section and go to Create CRISPRme conda environment section.

Obtaining a fresh conda distribution

If conda is not already available in your environment you can get a fresh miniconda distribution. To obtain a fresh miniconda distribution, open a new terminal window and type:

curl https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64
bash Miniconda3-latest-Linux-x86_64.sh

Press ENTER when requested and answer yes when required. Conda will set all the directories in your HOME path for an easy use.

Close the current terminal window and reopen it to allow the system to start conda. If you see in the new window something similar to

(base) user@nameofPC:~$

conda was correctly installed and it is ready to run.

The next step, will be a one-time set up of conda channels. To set up the channels type on your terminal:

conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge

Create CRISPRme conda environment

To create the conda environment for CRISPRme, it is suggested to use mamba. mamba is a drop-in replacement for conda that uses a faster dependency solving library and parts reimplemented in C++ for speed. To install mamba, in your terminal window type:

conda install mamba -n base -c conda-forge

Once installed mamba, you are ready to build the CRISPRme environmet. To build the environment, type:

mamba create -n crisprme python=3.8 crisprme -y

To activate the environmment, type:

conda activate crisprme

To test the installation, type in your terminal window:

crisprme.py

If you see all CRISPRme's functionalities listed, you succesfully installed CRISPRme on your machine, and it is ready to be used on your machine.

Updating CRISPRme conda installation

If you want to update an older CRISPRme installation to the latest version, we suggest updating as:

mamba install crisprme==<latest_version>

For example:

mamba install crisprme==2.1.2

You can find the latest release indicated at the top of our README.

Installation via Docker

For OSX and Windows users is suggested to run CRISPRme via Docker. Follow the following links to install Docker on OSX or Windows, and follow the on-screen instructions.

If you plan to use CRISPRme via Docker on a Linux-based OS read and follow the instructions listed in the next section, skip it otherwise.

Install Docker on Linux machines

Open a new terminal window and type:

sudo apt-get update

To allow packages installation/update type over HTTPS channels:

sudo apt-get install \
apt-transport-https \
ca-certificates \
curl \
gnupg-agent \
software-properties-common

To add the docker key type:

curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -

To set the right Docker version for your system, type:

sudo add-apt-repository \
"deb [arch=amd64] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) \
Stable"

To make sure everythign is set to install Docker, refresh the repositories index again by typing:

sudo apt-get update

To install Docker, type:

sudo apt-get install docker-ce docker-ce-cli containerd.io

To test your Docker installation, type

sudo docker run hello-world

If the following message is printed, Docker is correctly running on your machine: fig1

To complete Docker's set up a few more steps are required. First, we need to create a Docker group. To do it, type:

sudo groupadd docker

To add your current user to the Docker group, type:

sudo usermod -aG docker $USER

Repeat the above command for all the user you plan to add to the Docker group (NB on most system you must be a sudo user to be able to do so).

To make all changes effective, you will need to restart the machine or the environment.

To test if the Docker group has been correctly configured, open a new terminal window and type:

docker run hello-world

If the above "hello from Docker!" message is printed, Docker has been suceesfully installed and propoerly set on your machine.

Install CRISPRme docker image

Once obtained docker, open a new terminal window and type:

docker pull pinellolab/crisprme

This command will download and install CRISPRme Docker image on your machine.

Test CRISPRme

To test your CRISPRme installation, open a new terminal window and type:

wget https://www.dropbox.com/s/urciozkana5md0z/crisprme_test.tar.gz?dl=1 -O crisprme_test.tar.gz
tar -xvf crisprme_test.tar.gz

This will download a folder containing some test data, to run and test CRISPRme.

Once downloaded, enter the folder by typing:

cd crisprme_test

If you installed CRISPRme via conda, test your conda installation by typing:

bash crisprme_auto_test_conda.sh

Otherwise, if you installed CRISPRme via Docker, test your Docker installation by typing:

bash crisprme_auto_test_docker.sh

After starting, the tests will download the required test data, then CRISPRme will start its analysis. NB Depending on your hardware the test may take very different time to complete.

Once downloaded and untared the folder, you will have a ready to use CRISPRme directory tree. NB DO NOT CHANGE ANY FOLDER NAME to avoid losing data or forcing to recompute indexes and dictionaries. YOU MUST USE THE DEFAULT FOLDERS TO STORE THE DATA since the software have been designed to recognize only files and folders in its own folder tree (see Usage section).

Usage

CRISPRme is designed to work and recognize its specific directories tree structure. See the following image for a detailed explanantion of CRISPRme's folders structure fig2


CAVEAT. Before running CRISPRme make sure that your system has >= 64 GB of memory available.

The following sections will describe the main functionalities of CRISPRme, listing their input data, and the expected output.

Complete-search function

complete-search performs a complete search from scratch returing all the results and post-analysis data.

Input:

  • Directory containing a reference genome (FASTA format). The reference genome must be separated into single chromosome files (e.g. chr1.fa, chr2.fa, etc.).
  • Text file storing path to the VCF directories [OPTIONAL]
  • Text file with a list of guides (1 to N)
  • Text file with a single PAM sequence
  • BED file with annotations, containing a list of genetic regions with a function associated
  • Text file containing a list of path to a samplesID file (1 to N) equal to the number of VCF dataset used [OPTIONAL]
  • Base editor window, used to specify the window to search for susceptibilty to certain base editor [OPTIONAL]
  • Base editor nucleotide(s), used to specify the base(s) to check for the choosen editor [OPTIONAL]
  • BED file extracted from Gencode data to find gene proximity of targets
  • Maximal number of allowed bulges of any kind to compute in the index genome
  • Threshold of mismatches allowed
  • Size of DNA bulges allowed
  • Size of RNA bulges allowed
  • Merge range, necessary to reduce the inflation of targets due to bulges, it's the window of bp necessary to merge one target into another maintaining the highest scoring one
  • Sorting criteria to use while merging targets based on CFD/CRISTA scores (scores have highest priority)
  • Sorting criteria to use while merging targets based on fewest mismatches+bulges
  • Output directory, in which all the data will be produced
  • Number of threads to use in computation

Output

  • bestMerge targets file, containing all the highest scoring targets, in terms of CFD and targets with the lowest combination of mismatches and bulges (with preference to lowest mismatches count), each genomic position is represent by one target
  • altMerge targets file, containing all the discarded targets from the bestMerge file, each genomic position can be represented by more than target
  • Parameters data file, containing all the parameters used in the search
  • Count and distribution files, containing all the data count file useful in the web-tool representation to generate main tables and view
  • Integrated results and database, containing all the tabulated data with genes proximity analysis and database representation to rapid querying on the web-tool GUI
  • Directory with raw targets, containing the un-processed results from the search, useful to recover any possible target found during the search
  • Directory with images, containing all the images generated to be used with the web-tool

Example

  • via conda:
    crisprme.py complete-search --genome Genomes/hg38/ --vcf list_vcf.txt/ --guide sg1617.txt --pam PAMs/20bp-NGG-spCas9.txt --annotation Annotations/gencode_encode.hg38.bed --samplesID list_samplesID.txt --be-window 4,8 --be-base A --gene_annotation Gencode/gencode.protein_coding.bed --bMax 2 --mm 6 --bDNA 2 --bRNA 2 --merge 3 --sorting-criteria-scoring mm+bulges --sorting-criteria mm+bulges,mm --output sg1617/ --thread 4
    
  • via Docker:
    docker run -v ${PWD}:/DATA -w /DATA -i pinellolab/crisprme crisprme.py complete-search --genome Genomes/hg38/ --vcf list_vcf.txt/ --guide sg1617.txt --pam ./PAMs/20bp-NGG-SpCas9.txt --annotation ./Annotations/encode+gencode.hg38.bed --samplesID list_samplesID.txt --be-window 4,8 --be-base A --gene_annotation ./Annotations/gencode.protein_coding.bed --bMax 2 --mm 6 --bDNA 2 --bRNA 2 --merge 3 --output sg1617/ --thread 4
    

Targets-integration function

targets-integration returns an integrated_result file with paired empirical targets from an integrated_results file.

Input

  • Integrated results from a search, containing the processed targets
  • BED file containing empirical verified OT, like via GUIDE-seq, CIRCLE-seq and other sequencing protocols
  • Output directory, in which the integrated result file with empirical data will be created

Output

  • Directory containing the integrated result with each target pair with an existing empirical target (if found)

Example

  • via conda:
    crisprme.py targets-integration --targets *integrated_results.tsv --empirical_data empirical_data.tsv --output dir/
    
  • via Docker:
    docker run -v ${PWD}:/DATA -w /DATA -i i pinellolab/crisprme crisprme.py targets-integration --targets *integrated_results.tsv --empirical_data empirical_data.tsv --output dir/
    

gnomAD-converter function

gnomAD-converter converts a set of gnomADv3.1 VCFs into compatible VCFs.

Input

  • gnomAD_VCFdir, used to specify the directory containing gnomADv3.1 original VCFs
  • samplesID, used to specify the pre-generated samplesID file necessary to introduce samples into gnomAD variant
  • thread, the number of threads used in the process (default is ALL available minus 2)

Output

  • original gnomAD directory with the full set of gnomAD VCFs converted to compatible format

Example

  • via conda:
    crisprme.py gnomAD-converter --gnomAD_VCFdir gnomad_dir/ --samplesID samplesIDs/hg38_gnomAD.samplesID.txt -thread 4
    
  • via Docker:
    docker run -v ${PWD}:/DATA -w /DATA -i i pinellolab/crisprme crisprme.py gnomAD-converter --gnomAD_VCFdir gnomad_dir/ --samplesID samplesIDs/hg38_gnomAD.samplesID.txt -thread 4
    

Generate-personal-card function

generate-personal-card generates a personal card for a specified input sample.

Input

  • result_dir, directory containing the result from which extract the targets to generate the card
  • guide_seq, sequence of the guide to use in order to exctract the targets
  • sample_id, ID of the sample to use in order to generate the card

Output

  • Set of plots generated with personal and private targets containing the variant CFD score and the reference CFD score
  • Filtered file with private targets of the sample directly extracted from integrated file

Example

  • via conda:
    crisprme.py generate-personal-card --result_dir Results/sg1617.6.2.2/ --guide_seq CTAACAGTTGCTTTTATCACNNN --sample_id NA21129
    
  • via Docker
    docker run -v ${PWD}:/DATA -w /DATA -i i pinellolab/crisprme crisprme.py generate-personal-card --result_dir Results/sg1617.6.2.2/ --guide_seq CTAACAGTTGCTTTTATCACNNN --sample_id NA21129
    

Web-interface function (only via conda)

web-interface starts a local server to use CRISPRme's web interface.

Example

  • via conda
    crisprme.py web-interface
    

Citation

If you use CRISPRme in your research, please cite our paper (shareable link to full text):

Cancellieri S, Zeng J, Lin LY, Tognon M, Nguyen MA, Lin J, ... Giugno R, Bauer DE, Pinello L. (2022). Human genetic diversity alters off-target outcomes of therapeutic gene editing. Nature Genetics, 1-10. https://doi.org/10.1038/s41588-022-01257-y

License

AGPL-3.0 (academic research only).

For-profit institutions must purchase a license before using CRISPRme. Contact [email protected] for further details.

crisprme's People

Contributors

lindayqlin avatar manueltgn avatar samuelecancellieri avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

crisprme's Issues

Test Installation Run Failure

Describe the bug

Terminal output:

$ bash crisprme_auto_test_conda.sh
starting download and unzip of data
unzip gencode+encode annotations
start download VCF data and genome (this may take a long time due to connection speed)
download 1000G VCFs
crisprme_auto_test_conda.sh: line 26: 40706 Abort trap: 6           wget -c -q ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000_genomes_project/release/20190312_biallelic_SNV_and_INDEL/ALL.chr$i.shapeit2_integrated_snvindels_v2a_27022019.GRCh38.phased.vcf.gz
download hg38
start testing
Launching job /Users/sdm8/Desktop/crisprme_test/crisprme_test/Results/sg1617.6.2.2. The stdout is redirected in log_verbose.txt and stderr is redirected in log_error.txt
Traceback (most recent call last):
  File "/Users/sdm8/opt/anaconda3/envs/crisprme/bin/crisprme.py", line 934, in <module>
    complete_search()
  File "/Users/sdm8/opt/anaconda3/envs/crisprme/bin/crisprme.py", line 673, in complete_search
    raise OSError(f"\nCRISPRme run failed! See {os.path.join(outputfolder, 'log_error.txt')} for details\n")
OSError:
CRISPRme run failed! See /Users/sdm8/Desktop/crisprme_test/crisprme_test/Results/sg1617.6.2.2/log_error.txt for details

log_error.txt

[W::bcf_sr_add_reader] No BGZF EOF marker; file '/Users/sdm8/Desktop/crisprme_test/crisprme_test/VCFs/hg38_1000G/ALL.chr7.shapeit2_integrated_snvindels_v2a_27022019.GRCh38.phased.vcf.gz' may be truncated
mv: rename /Users/sdm8/Desktop/crisprme_test/crisprme_test/Genomes/variants_genome/SNPs_genome/hg38_enriched/ to ./hg38+hg38_1000G/: No such file or directory

To Reproduce

I had to install like this without specifying python version as described in README because I kept getting an error message pertaining to version one of the program:

Encountered problems while solving:
- package crisprme-1.0.1-0 requires crispritz, but none of the providers can be installed

Installed via:

mamba install -c bioconda crisprme
## missing module Bio -
mamba install biopython

I had to download test data off of my VPN (federal govt netowork), I downloaded the test data via:

wget https://www.dropbox.com/s/urciozkana5md0z/crisprme_test.tar.gz?dl=1 -O crisprme_test.tar.gz
tar -xvf crisprme_test.tar.gz

Environment (please complete the following information, ONLY applicable if running CRISPRme via command line):

  • Python version 3.7.12
  • CRISPRme version 2.1.1
  • CRISPRitz version 1.2.1
  • axel version 2.17.11
  • gdown version 4.7.1
  • numpy version 1.20.0
  • dash version 1.10.0
  • dash-bootstrap-components version 0.10.0
  • dash-core-components version 1.9.0
  • dash-daq version 0.4.0
  • dash-html-components version 1.0.3
  • dash-renderer version 1.3.0
  • dash-table version 4.6.2
  • flask version 1.1.3
  • flask-caching version 1.7.1
  • flask-compress version 1.5.0
  • fontconfig version 2.13.1
  • freetype version 2.10.1
  • future version 0.18.2
  • gettext version 0.19.8.1
  • gunicorn version 20.0.4
  • werkzeug version 1.0.1
  • pandas version 1.2.5

Getting an error running Docker test

I am running on a clean docker image with the test script. It gets far into the process and then gets and error.

log_error.txt contains :

./merge_close_targets_cfd.sh: line 34: 3282 Killed python remove_contiguous_samples_cfd.py $fileIn $fileOut $thresh $chrom $position $total $true_guide $snp_info $cfd $sort_pivot $sorting_criteria_scoring $sorting_criteria
CRISPRme ERROR: contigous SNP removal failed (script: ./merge_close_targets_cfd.sh line 31)
./merge_close_targets_cfd.sh: line 34: 3281 Killed python remove_contiguous_samples_cfd.py $fileIn $fileOut $thresh $chrom $position $total $true_guide $snp_info $cfd $sort_pivot $sorting_criteria_scoring $sorting_criteria
CRISPRme ERROR: contigous SNP removal failed (script: ./merge_close_targets_cfd.sh line 31)
Traceback (most recent call last):
File "/opt/conda/opt/crisprme/PostProcess/remove_contiguous_samples_cfd.py", line 662, in
merge_targets()
File "/opt/conda/opt/crisprme/PostProcess/remove_contiguous_samples_cfd.py", line 618, in merge_targets
int(target_data[input_args[4]]),
ValueError: invalid literal for int() with base 10: '+'
CRISPRme ERROR: contigous SNP removal failed (script: ./merge_close_targets_cfd.sh line 31)

mv: cannot stat '/Test123/Results/sg1617_2/final_results_sg1617_2.bestMerge.txt.bestCFD.txt.trimmed': No such file or directory

I can add the full verbose log file if that would help, but here is the end that corresponds to the error...

Sorting file
Sorting file
Sorting file
Sorting done in 0 seconds
Sorting done in 0 seconds
Sorting done in 0 seconds
Merging contiguous targets
Merging contiguous targets
Merging contiguous targets

Any help tracing this down would be helpful, I am looking forward to running the tool on a real sequence!

Let me know if any other information would be helpful...

Tom

single chromosome run

Hi,

Is it possible to run CRISPRme on a single chromosome? Ideally keeping all chromosomes in the same Genomes/ folder.

Thanks!

gnomAD-converter error on v4.1 (joint) VCF dataset

thanks for udpating the gnomAD-converter to support the gnomAD v4 data.
It would be nice to also support the gnomAD joint v4.1 VCF dataset, which has a little different in the INFO column.
In the joint dataset, it uses AC_joint_ prefix rather than AC_, which triggers an error. This can be fixed on the user end by pre-prosessing the vcf file, but I though it would be nice if crisprme could directly support this joint version.
Additionally, gnomAD VCF started to label the "oth" group as "remaining", which would trigger an error if the old hg38_gnomAD.samplesID.txt is used. (I only have the v4.1 version, not sure if the v4.0 version was in the orignal format used in the v3 version.)
it would be helpful to other users if you want to include another one sampleID file for v4.1 gnomAD vcfs replacing the "oth" with "remaining"

see an example below:
gnomAD v4.1 genom vcf:
chrY 2790041 . C A . PASS AC=1;AN=34333;AF=2.91265e-05;AC_XY=1;AF_XY=2.91265e-05;AN_XY=34333;nhomalt_XY=0;nhomalt=0;AC_afr_XY=0;AF_afr_XY=0;AN_afr_XY=8816;nhomalt_afr_XY=0;AC_afr=0;AF_afr=0;AN_afr=8816;nhomalt_afr=0;AC_ami_XY=0;AF_ami_XY=0;AN_ami_XY=210;nhomalt_ami_XY=0;AC_ami=0;AF_ami=0;AN_ami=210;nhomalt_ami=0;AC_amr_XY=0;AF_amr_XY=0;AN_amr_XY=3877;nhomalt_amr_XY=0;AC_amr=0;AF_amr=0;AN_amr=3877;nhomalt_amr=0;AC_asj_XY=0;AF_asj_XY=0;AN_asj_XY=774;nhomalt_asj_XY=0;AC_asj=0;AF_asj=0;AN_asj=774;nhomalt_asj=0;AC_eas_XY=0;AF_eas_XY=0;AN_eas_XY=1275;nhomalt_eas_XY=0;AC_eas=0;AF_eas=0;AN_eas=1275;nhomalt_eas=0;AC_fin_XY=1;AF_fin_XY=0.000277008;AN_fin_XY=3610;nhomalt_fin_XY=0;AC_fin=1;AF_fin=0.000277008;AN_fin=3610;nhomalt_fin=0;AC_mid_XY=0;AF_mid_XY=0;AN_mid_XY=72;nhomalt_mid_XY=0;AC_mid=0;AF_mid=0;AN_mid=72;nhomalt_mid=0;AC_nfe_XY=0;AF_nfe_XY=0;AN_nfe_XY=13652;nhomalt_nfe_XY=0;AC_nfe=0;AF_nfe=0;AN_nfe=13652;nhomalt_nfe=0;AC_raw=1;AF_raw=2.70117e-05;AN_raw=37021;nhomalt_raw=0;AC_remaining_XY=0;AF_remaining_XY=0;AN_remaining_XY=480;nhomalt_remaining_XY=0;AC_remaining=0;AF_remaining=0;AN_remaining=480;nhomalt_remaining=0;AC_sas_XY=0;AF_sas_XY=0;AN_sas_XY=1567;nhomalt_sas_XY=0;AC_sas=0;AF_sas=0;AN_sas=1567;nhomalt_sas=0;faf95_XY=0;faf95=0;faf95_afr_XY=0;faf95_afr=0;faf95_amr_XY=0;faf95_amr=0;faf95_eas_XY=0;faf95_eas=0;faf95_nfe_XY=0;faf95_nfe=0;faf95_sas_XY=0;faf95_sas=0;faf99_XY=0;faf99=0;faf99_afr_XY=0;faf99_afr=0;faf99_amr_XY=0;faf99_amr=0;faf99_eas_XY=0;faf99_eas=0;faf99_nfe_XY=0;faf99_nfe=0;faf99_sas_XY=0;faf99_sas=0;age_hist_het_bin_freq=0|0|0|0|0|0|0|0|0|0;age_hist_het_n_smaller=0;age_hist_het_n_larger=0;age_hist_hom_bin_freq=0|0|0|0|0|0|0|1|0|0;age_hist_hom_n_smaller=0;age_hist_hom_n_larger=0;FS=.;MQ=60;QUALapprox=427;QD=32.8462;SOR=0.836;VarDP=13;AS_FS=.;AS_MQ=60;AS_QUALapprox=427;AS_QD=32.8462;AS_SB_TABLE=0,0|6,7;AS_SOR=0.835568;AS_VarDP=13;inbreeding_coeff=-2.70124e-05;AS_culprit=AS_MQ;AS_VQSLOD=-1.8603;negative_train_site;allele_type=snv;n_alt_alleles=1;variant_type=snv;non_par;gq_hist_alt_bin_freq=0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0;gq_hist_all_bin_freq=0|0|0|0|24819|6661|2091|581|133|44|3|0|1|0|0|0|0|0|0|0;dp_hist_alt_bin_freq=0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0;dp_hist_alt_n_larger=0;dp_hist_all_bin_freq=0|0|9021|17651|6629|891|127|12|2|0|0|0|0|0|0|0|0|0|0|0;dp_hist_all_n_larger=0;ab_hist_alt_bin_freq=0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0;cadd_raw_score=0.281023;cadd_phred=4.068;pangolin_largest_ds=0.02;phylop=4.323;VRS_Allele_IDs=ga4gh:VA.at1nxhs_qHpWgbCnyGBM3ba2LlAVRQAu,ga4gh:VA.67hbcHI15qnRWi-Z9I0ay-rGN3bPKQ4-;VRS_Starts=2790040,2790040;VRS_Ends=2790041,2790041;VRS_States=C,A;vep=A|upstream_gene_variant|MODIFIER|SRY|ENSG00000184895|Transcript|ENST00000383070|protein_coding||||||||||1|2359|-1||SNV|HGNC|HGNC:11311|YES|NM_003140.3|||P1|CCDS14772.1|ENSP00000372547||Ensembl|||||||||||||,A|non_coding_transcript_exon_variant|MODIFIER|RNASEH2CP1|ENSG00000237659|Transcript|ENST00000454281|processed_pseudogene|1/1||ENST00000454281.1:n.215C>A||215|||||1||1||SNV|HGNC|HGNC:24117|YES||||||||Ensembl|||||||||||||,A|intron_variant&non_coding_transcript_variant|MODIFIER|XGY2|ENSG00000288686|Transcript|ENST00000679518|processed_transcript||3/6|ENST00000679518.1:n.106+15302C>A|||||||1||1||SNV|HGNC|HGNC:34022|||||||||Ensembl|||||||||||||,A|downstream_gene_variant|MODIFIER|XGY2|ENSG00000288686|Transcript|ENST00000679825|processed_transcript||||||||||1|2471|1||SNV|HGNC|HGNC:34022|||||||||Ensembl|||||||||||||,A|non_coding_transcript_exon_variant|MODIFIER|XGY2|ENSG00000288686|Transcript|ENST00000680285|processed_transcript|4/4||ENST00000680285.1:n.612C>A||612|||||1||1||SNV|HGNC|HGNC:34022|||||||||Ensembl|||||||||||||,A|downstream_gene_variant|MODIFIER|XGY2|ENSG00000288686|Transcript|ENST00000680845|processed_transcript||||||||||1|2407|1||SNV|HGNC|HGNC:34022|||||||||Ensembl|||||||||||||,A|intron_variant&non_coding_transcript_variant|MODIFIER|XGY2|ENSG00000288686|Transcript|ENST00000681787|processed_transcript||3/7|ENST00000681787.1:n.106+15302C>A|||||||1||1||SNV|HGNC|HGNC:34022|||||||||Ensembl|||||||||||||,A|intron_variant&non_coding_transcript_variant|MODIFIER|XGY2|ENSG00000288686|Transcript|ENST00000681940|processed_transcript||3/4|ENST00000681940.1:n.106+15302C>A|||||||1||1||SNV|HGNC|HGNC:34022|||||||||Ensembl|||||||||||||,A|upstream_gene_variant|MODIFIER|SRY|6736|Transcript|NM_003140.3|protein_coding||||||||||1|2359|-1||SNV|EntrezGene|HGNC:11311|YES|ENST00000383070.2|||||NP_003131.1||RefSeq|||||||||||||
gnomAD v4.1 joint vcf:
chrY 2782439 . G C . PASS AC_joint=2;AN_joint=34075;AF_joint=5.86941e-05;grpmax_joint=sas;AC_genomes=2;AN_genomes=34075;AF_genomes=5.86941e-05;grpmax_genomes=sas;AC_joint_XY=2;AF_joint_XY=5.86941e-05;AN_joint_XY=34075;nhomalt_joint_XY=0;nhomalt_joint=0;AC_joint_afr_XY=0;AF_joint_afr_XY=0;AN_joint_afr_XY=8801;nhomalt_joint_afr_XY=0;AC_joint_afr=0;AF_joint_afr=0;AN_joint_afr=8801;nhomalt_joint_afr=0;AC_joint_ami_XY=0;AF_joint_ami_XY=0;AN_joint_ami_XY=216;nhomalt_joint_ami_XY=0;AC_joint_ami=0;AF_joint_ami=0;AN_joint_ami=216;nhomalt_joint_ami=0;AC_joint_amr_XY=0;AF_joint_amr_XY=0;AN_joint_amr_XY=3716;nhomalt_joint_amr_XY=0;AC_joint_amr=0;AF_joint_amr=0;AN_joint_amr=3716;nhomalt_joint_amr=0;AC_joint_asj_XY=0;AF_joint_asj_XY=0;AN_joint_asj_XY=773;nhomalt_joint_asj_XY=0;AC_joint_asj=0;AF_joint_asj=0;AN_joint_asj=773;nhomalt_joint_asj=0;AC_joint_eas_XY=0;AF_joint_eas_XY=0;AN_joint_eas_XY=1328;nhomalt_joint_eas_XY=0;AC_joint_eas=0;AF_joint_eas=0;AN_joint_eas=1328;nhomalt_joint_eas=0;AC_joint_fin_XY=0;AF_joint_fin_XY=0;AN_joint_fin_XY=3422;nhomalt_joint_fin_XY=0;AC_joint_fin=0;AF_joint_fin=0;AN_joint_fin=3422;nhomalt_joint_fin=0;AC_joint_mid_XY=0;AF_joint_mid_XY=0;AN_joint_mid_XY=73;nhomalt_joint_mid_XY=0;AC_joint_mid=0;AF_joint_mid=0;AN_joint_mid=73;nhomalt_joint_mid=0;AC_joint_nfe_XY=1;AF_joint_nfe_XY=7.303e-05;AN_joint_nfe_XY=13693;nhomalt_joint_nfe_XY=0;AC_joint_nfe=1;AF_joint_nfe=7.303e-05;AN_joint_nfe=13693;nhomalt_joint_nfe=0;AC_joint_raw=2;AF_joint_raw=5.40862e-05;AN_joint_raw=36978;nhomalt_joint_raw=0;AC_joint_remaining_XY=0;AF_joint_remaining_XY=0;AN_joint_remaining_XY=474;nhomalt_joint_remaining_XY=0;AC_joint_remaining=0;AF_joint_remaining=0;AN_joint_remaining=474;nhomalt_joint_remaining=0;AC_joint_sas_XY=1;AF_joint_sas_XY=0.000633312;AN_joint_sas_XY=1579;nhomalt_joint_sas_XY=0;AC_joint_sas=1;AF_joint_sas=0.000633312;AN_joint_sas=1579;nhomalt_joint_sas=0;AC_grpmax_joint=1;AF_grpmax_joint=0.000633312;AN_grpmax_joint=1579;nhomalt_grpmax_joint=0;faf95_joint_XY=9.72e-06;faf99_joint_XY=3.64e-06;faf95_joint=9.72e-06;faf99_joint=3.64e-06;faf95_joint_afr_XY=0;faf99_joint_afr_XY=0;faf95_joint_afr=0;faf99_joint_afr=0;faf95_joint_amr_XY=0;faf99_joint_amr_XY=0;faf95_joint_amr=0;faf99_joint_amr=0;faf95_joint_eas_XY=0;faf99_joint_eas_XY=0;faf95_joint_eas=0;faf99_joint_eas=0;faf95_joint_mid_XY=0;faf99_joint_mid_XY=0;faf95_joint_mid=0;faf99_joint_mid=0;faf95_joint_nfe_XY=0;faf99_joint_nfe_XY=0;faf95_joint_nfe=0;faf99_joint_nfe=0;faf95_joint_sas_XY=0;faf99_joint_sas_XY=0;faf95_joint_sas=0;faf99_joint_sas=0;age_hist_het_bin_freq_joint=0|0|0|0|0|0|0|0|0|0;age_hist_het_n_smaller_joint=0;age_hist_het_n_larger_joint=0;age_hist_hom_bin_freq_joint=0|0|0|0|0|1|0|0|0|0;age_hist_hom_n_smaller_joint=0;age_hist_hom_n_larger_joint=0;gq_hist_alt_bin_freq_joint=0|0|0|0|1|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0;gq_hist_all_bin_freq_joint=0|0|0|0|24426|6771|2136|534|105|33|6|0|0|0|0|0|0|0|0|0;dp_hist_alt_bin_freq_joint=0|0|0|2|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0;dp_hist_alt_n_larger_joint=0;dp_hist_all_bin_freq_joint=0|0|8887|17718|6484|802|108|10|2|0|0|0|0|0|0|0|0|0|0|0;dp_hist_all_n_larger_joint=0;ab_hist_alt_bin_freq_joint=0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0;outside_broad_capture_region;outside_ukb_capture_region;outside_broad_calling_region;outside_ukb_calling_region;not_called_in_exomes;AC_genomes_XY=2;AF_genomes_XY=5.86941e-05;AN_genomes_XY=34075;nhomalt_genomes_XY=0;nhomalt_genomes=0;AC_genomes_afr_XY=0;AF_genomes_afr_XY=0;AN_genomes_afr_XY=8801;nhomalt_genomes_afr_XY=0;AC_genomes_afr=0;AF_genomes_afr=0;AN_genomes_afr=8801;nhomalt_genomes_afr=0;AC_genomes_ami_XY=0;AF_genomes_ami_XY=0;AN_genomes_ami_XY=216;nhomalt_genomes_ami_XY=0;AC_genomes_ami=0;AF_genomes_ami=0;AN_genomes_ami=216;nhomalt_genomes_ami=0;AC_genomes_amr_XY=0;AF_genomes_amr_XY=0;AN_genomes_amr_XY=3716;nhomalt_genomes_amr_XY=0;AC_genomes_amr=0;AF_genomes_amr=0;AN_genomes_amr=3716;nhomalt_genomes_amr=0;AC_genomes_asj_XY=0;AF_genomes_asj_XY=0;AN_genomes_asj_XY=773;nhomalt_genomes_asj_XY=0;AC_genomes_asj=0;AF_genomes_asj=0;AN_genomes_asj=773;nhomalt_genomes_asj=0;AC_genomes_eas_XY=0;AF_genomes_eas_XY=0;AN_genomes_eas_XY=1328;nhomalt_genomes_eas_XY=0;AC_genomes_eas=0;AF_genomes_eas=0;AN_genomes_eas=1328;nhomalt_genomes_eas=0;AC_genomes_fin_XY=0;AF_genomes_fin_XY=0;AN_genomes_fin_XY=3422;nhomalt_genomes_fin_XY=0;AC_genomes_fin=0;AF_genomes_fin=0;AN_genomes_fin=3422;nhomalt_genomes_fin=0;AC_genomes_mid_XY=0;AF_genomes_mid_XY=0;AN_genomes_mid_XY=73;nhomalt_genomes_mid_XY=0;AC_genomes_mid=0;AF_genomes_mid=0;AN_genomes_mid=73;nhomalt_genomes_mid=0;AC_genomes_nfe_XY=1;AF_genomes_nfe_XY=7.303e-05;AN_genomes_nfe_XY=13693;nhomalt_genomes_nfe_XY=0;AC_genomes_nfe=1;AF_genomes_nfe=7.303e-05;AN_genomes_nfe=13693;nhomalt_genomes_nfe=0;AC_genomes_raw=2;AF_genomes_raw=5.40862e-05;AN_genomes_raw=36978;nhomalt_genomes_raw=0;AC_genomes_remaining_XY=0;AF_genomes_remaining_XY=0;AN_genomes_remaining_XY=474;nhomalt_genomes_remaining_XY=0;AC_genomes_remaining=0;AF_genomes_remaining=0;AN_genomes_remaining=474;nhomalt_genomes_remaining=0;AC_genomes_sas_XY=1;AF_genomes_sas_XY=0.000633312;AN_genomes_sas_XY=1579;nhomalt_genomes_sas_XY=0;AC_genomes_sas=1;AF_genomes_sas=0.000633312;AN_genomes_sas=1579;nhomalt_genomes_sas=0;AC_grpmax_genomes=1;AF_grpmax_genomes=0.000633312;AN_grpmax_genomes=1579;nhomalt_grpmax_genomes=0;faf95_genomes_XY=9.72e-06;faf99_genomes_XY=3.64e-06;faf95_genomes=9.72e-06;faf99_genomes=3.64e-06;faf95_genomes_afr_XY=0;faf99_genomes_afr_XY=0;faf95_genomes_afr=0;faf99_genomes_afr=0;faf95_genomes_amr_XY=0;faf99_genomes_amr_XY=0;faf95_genomes_amr=0;faf99_genomes_amr=0;faf95_genomes_eas_XY=0;faf99_genomes_eas_XY=0;faf95_genomes_eas=0;faf99_genomes_eas=0;faf95_genomes_nfe_XY=0;faf99_genomes_nfe_XY=0;faf95_genomes_nfe=0;faf99_genomes_nfe=0;faf95_genomes_sas_XY=0;faf99_genomes_sas_XY=0;faf95_genomes_sas=0;faf99_genomes_sas=0;age_hist_het_bin_freq_genomes=0|0|0|0|0|0|0|0|0|0;age_hist_het_n_smaller_genomes=0;age_hist_het_n_larger_genomes=0;age_hist_hom_bin_freq_genomes=0|0|0|0|0|1|0|0|0|0;age_hist_hom_n_smaller_genomes=0;age_hist_hom_n_larger_genomes=0;gq_hist_alt_bin_freq_genomes=0|0|0|0|1|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0;gq_hist_all_bin_freq_genomes=0|0|0|0|24426|6771|2136|534|105|33|6|0|0|0|0|0|0|0|0|0;dp_hist_alt_bin_freq_genomes=0|0|0|2|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0;dp_hist_alt_n_larger_genomes=0;dp_hist_all_bin_freq_genomes=0|0|8887|17718|6484|802|108|10|2|0|0|0|0|0|0|0|0|0|0|0;dp_hist_all_n_larger_genomes=0;ab_hist_alt_bin_freq_genomes=0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0

Thank you.

Is the dropbox test data not accessible anymore?

Describe the bug
Trying to download test data to test my installation:

$ wget https://www.dropbox.com/s/urciozkana5md0z/crisprme_test.tar.gz?dl=1 -O crisprme_test.tar.gz
--2023-04-04 15:06:23--  https://www.dropbox.com/s/urciozkana5md0z/crisprme_test.tar.gz?dl=1
Resolving www.dropbox.com (www.dropbox.com)... 162.125.8.18
Connecting to www.dropbox.com (www.dropbox.com)|162.125.8.18|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: /s/dl/urciozkana5md0z/crisprme_test.tar.gz [following]
--2023-04-04 15:06:23--  https://www.dropbox.com/s/dl/urciozkana5md0z/crisprme_test.tar.gz
Reusing existing connection to www.dropbox.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://uc9473414cea25b32ec2d6ecc563.dl.dropboxusercontent.com/cd/0/get/B5gpbbCTnzZCiZiieBea8pMWQGbJ23D63L7ejpA053ciEAlNmWDQHGedfiSn7kZEdplAXkoEaZRB9OtFblQ19ZPUT5UjucX52-9NzWR3aBcMec9NK5PvnCdXdFxAJoGgK_BsGDswJW0BdEgwdASC2OveifZdcFvCQ_IMrmBucgU5MBchSE4NLe_s8dmw2Ge2llM/file?dl=1# [following]
--2023-04-04 15:06:24--  https://uc9473414cea25b32ec2d6ecc563.dl.dropboxusercontent.com/cd/0/get/B5gpbbCTnzZCiZiieBea8pMWQGbJ23D63L7ejpA053ciEAlNmWDQHGedfiSn7kZEdplAXkoEaZRB9OtFblQ19ZPUT5UjucX52-9NzWR3aBcMec9NK5PvnCdXdFxAJoGgK_BsGDswJW0BdEgwdASC2OveifZdcFvCQ_IMrmBucgU5MBchSE4NLe_s8dmw2Ge2llM/file?dl=1
Resolving uc9473414cea25b32ec2d6ecc563.dl.dropboxusercontent.com (uc9473414cea25b32ec2d6ecc563.dl.dropboxusercontent.com)... 127.0.0.1
Connecting to uc9473414cea25b32ec2d6ecc563.dl.dropboxusercontent.com (uc9473414cea25b32ec2d6ecc563.dl.dropboxusercontent.com)|127.0.0.1|:443... failed: Connection refused.

Tried via browser as well.

This address: https://www.dropbox.com/s/urciozkana5md0z/crisprme_test.tar.gz?dl=1

This site can’t be reacheducbbeffe6fcc5ac3d70c313358f9.dl.dropboxusercontent.com refused to connect.
Try:

Checking the connection
[Checking the proxy and the firewall](chrome-error://chromewebdata/#buttons)
ERR_CONNECTION_REFUSED

Tried going up a directory and I can see the directory crisprme_test but when I try to download it produces an error. Same as when I go into the directory and try to download any of the files. A red error message at the top says "There was an error downloading your file."

A few questions...

Hi all,

Thanks for the tool! I have a few questions

  1. Are the _alt and _random files from the hg38 genome used as part of the search with the standard VCF sets? If so, can you explain how they are identified in the output?

  2. I am interested in using some other VCF datasets, can you provide any information on what the tools support for the VCF files? I am planning to format them the same as you have with your script for cleaning up the gnomad 3.1 dataset, but I would prefer to not convert them to multi-allelic records like you are doing because of the infromation loss -- is it required?

Thanks,

Tom

having issue with conda installation

Dear team,

We are trying to install this software and are somehow stuck at this step.

conda create -n crisprme python=3.8 crisprme -y

Collecting package metadata (current_repodata.json): done
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: |

Could you please help in solving this ?

Regards.
Najeeb

Question about web portal vs. running CRISPRme locally

Hello,

Thank you for creating this useful tool. I had a quick question: if I ran CRISPRme with parameters X on the web portal and then ran it locally with the same parameters X, should I expect the integrated_results file that is output to be identical? I was specifically wondering if there was any output truncation that happened on the web portal for efficiency even with the integrated_results file (which I think is the complete off-target set).

Thank you

Number of threads don't change

Thank you for this great tool!

I run this command:

nohup crisprme.py complete-search --genome Genomes/hg38/ --vcf list_vcf.txt/ --guide New_guides.txt --pam PAMs/20bp-NGG-SpCas9.txt --annotation Annotations/encode+gencode.hg38.bed --samplesID list_samplesID.txt --gene_annotation Annotations/gencode.protein_coding.bed --bMax 2 --mm 6 --bDNA 2 --bRNA 2 --merge 3 --output New --thread 64 &

I use --thread 64, but the code is still only using 4 cores. How do I change the number of cores utilized?

Also, what does --merge do? I couldn't find that in the --help section.

Thanks again,
-Davood

MemoryError in Enrichment step

Describe the bug
When running the complete-search command there is a Memory Error in the Enrichment step. The folder titled Genomes/hg38+hg38_1000G that contains the reference enrichment fastas (ex: chr3.enriched.fa) is missing the two from the largest chromosomes: chr1.enriched.fa and chr2.enriched.fa. There is both chr1.fa and chr2.fa in the Genomes/hg38 folder.

To Reproduce
Running final command in the crisprme_auto_test_conda.sh.

command:
crisprme.py complete-search --genome Genomes/hg38/ --vcf list_vcf.txt/ --guide sg1617.txt --pam PAMs/20bp-NGG-SpCas9.txt --annotation Annotations/encode+gencode.hg38.bed --samplesID list_samplesID.txt --gene_annotation Annotations/gencode.protein_coding.bed --bMax 2 --mm 6 --bDNA 2 --bRNA 2 --merge 3 --output sg1617.6.2.2 --thread 4

Expected behavior
Expect a list of off-target sites and some images. Instead some files are created but none including a full list of the off-target sites. No images are created in the img directory.

Environment (please complete the following information, ONLY applicable if running CRISPRme via command line):

  • Python version: 3.8
  • CRISPRme version: 2.1.1
  • CRISPRitz version: 2.6.6
  • axel version: 2.17.11
  • gdown version: 4.7.1
  • numpy version: 1.20.0
  • dash version: 1.10.0
  • dash-bootstrap-components version: 0.10.0
  • dash-core-components version: 1.9.0
  • dash-daq version: 0.4.0
  • dash-html-components version: 1.0.3
  • dash-renderer version: 1.3.0
  • dash-table version: 4.6.2
  • flask version: 1.1.3
  • flask-caching version: 1.7.1
  • flask-compress version: 1.5.0
  • fontconfig version: 2.13.1
  • freetype version: 2.10.1
  • future version: 0.18.2
  • gettext version: 0.19.8.1
  • gunicorn version: 20.0.4
  • werkzeug version: 1.0.1
  • pandas version: 1.2.5

Additional context
Running on a c4.4xlarge EC2 instance with 2Tb volume.

log_error_no_check.txt

Testing docker application fails

Dear Samuele,

thank you for such an interesting tool!
I have tried to run it on docker but the I am unable to successfully test the docker application.

This is the error message I get:
<3>WSL (4903) ERROR: CreateProcessEntryCommon:577: execvpe /bin/bash failed 2
<3>WSL (4903) ERROR: CreateProcessEntryCommon:586: Create process not expected to return

Docker is fully functional and working with other images. I have stuck to

Thank you in advance for your help
Malte

run failed v2.1.1

Hi!

I am trying to run CRISPRme on chr22 with no VCF files, but the run fails with the error below. Is there any required parameter that I might be missing? Thanks!

To Reproduce

$ crisprme.py complete-search --genome Genomes/hg38_chr22/ --guide sg1617.txt --pam PAMs/20bp-NGG-SpCas9.txt --bMax 2 --mm 6 --bDNA 2 --bRNA 2 --merge 3 --output output --thread 12
--annotation not used
Launching job /data2/crisprme_test/Results/output. The stdout is redirected in log_verbose.txt and stderr is redirected in log_error.txt
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/crisprme/bin/crisprme.py", line 934, in <module>
    complete_search()
  File "/home/ubuntu/miniconda3/envs/crisprme/bin/crisprme.py", line 673, in complete_search
    raise OSError(f"\nCRISPRme run failed! See {os.path.join(outputfolder, 'log_error.txt')} for details\n")
OSError:
CRISPRme run failed! See /data2/crisprme_test/Results/output/log_error.txt for details

$ head -50 /data2/crisprme_test/Results/output/log_error.txt
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/crisprme/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3081, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 4554, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 4562, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'rsID'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "./remove_n_and_dots.py", line 29, in <module>
    chunk['rsID'] = chunk['rsID'].str.replace('.', 'NA')
  File "/home/ubuntu/miniconda3/envs/crisprme/lib/python3.8/site-packages/pandas/core/frame.py", line 3024, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/home/ubuntu/miniconda3/envs/crisprme/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3083, in get_loc
    raise KeyError(key) from err
KeyError: 'rsID'
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/crisprme/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3081, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 4554, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 4562, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'rsID'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "./remove_n_and_dots.py", line 29, in <module>
    chunk['rsID'] = chunk['rsID'].str.replace('.', 'NA')
  File "/home/ubuntu/miniconda3/envs/crisprme/lib/python3.8/site-packages/pandas/core/frame.py", line 3024, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/home/ubuntu/miniconda3/envs/crisprme/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3083, in get_loc
    raise KeyError(key) from err
KeyError: 'rsID'
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/crisprme/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3081, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 4554, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 4562, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'rsID'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):

If running CRISPRme via command line, type the command line call to CRISPRme returning the error

If running CRISPRme via the website, please fill the form below:

  1. Spacer sequences

  2. Cas protein

  3. PAM

  4. Genome

  5. Variants dataset (OPTIONAL)

  6. Thresholds
    Mismatches:
    DNA Bulges:
    RNA Bulges:

  7. Base editing (OPTIONAL)
    Start:
    Stop:
    Nucleotide:

  8. Annotation

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If running CRISPRme via website, add screenshots to help explain your problem.

Environment (please complete the following information, ONLY applicable if running CRISPRme via command line):

  • Python version
  • CRISPRme version
  • CRISPRitz version
  • axel version
  • gdown version
  • numpy version
  • dash version
  • dash-bootstrap-components version
  • dash-core-components version
  • dash-daq version
  • dash-html-components version
  • dash-renderer version
  • dash-table version
  • flask version
  • flask-caching version
  • flask-compress version
  • fontconfig version
  • freetype version
  • future version
  • gettext version
  • gunicorn version
  • werkzeug version
  • pandas version

Additional context
Add any other context about the problem here.

Missing Indels? Are Indel Variants Still Supported?

I noticed that I was not getting any indels back when running the current docker against my own guides, so I ran it against the test guides and still did not get any indels. This included indels that are in the supplemental material deposited with the original paper.

Are indels no longer supported?

To Reproduce

If running CRISPRme via command line, type the command line call to CRISPRme returning the error

docker run -v ${PWD}:/DATA -w /DATA -i pinellolab/crisprme crisprme.py complete-search --genome Genomes/hg38/ --vcf vcf_hg38_gnomAD.txt/ --guide sg1617.txt --pam ./PAMs/20bp-NGG-SpCas9.txt --annotation ./Annotations/encode+gencode.hg38.bed --samplesID samplesID_hg38_gnomAD.txt --gene_annotation ./Annotations/gencode.protein_coding.bed --bMax 1 --mm 4 --bDNA 1 --bRNA 1 --merge 3 --output runReference/ --thread 12

If running CRISPRme via the website, please fill the form below:

  1. Spacer sequences
    sg1617
    CTAACAGTTGCTTTTATCACNNN

  2. Cas protein

cas9

  1. PAM

NGG

  1. Genome

hg38

  1. Variants dataset

gnomad3.1.2

  1. Thresholds
    Mismatches: 4
    DNA Bulges: 1
    RNA Bulges:1

Expected behavior
In the original paper, there is a hit at chr13, location 87474539, positive strand. It includes the variant : chr13_87474546_AAGACCC_A which is present in the gnomad 3.1.2 dataset (https://gnomad.broadinstitute.org/variant/13-87474546-AAGACCC-A?dataset=gnomad_r3). It matches with 4 mismatches and 1 DNA bulge. When I manually align it, I get the same answer. However, when I run the current docker image, I do not get this hit (and nothing close to it). Nor do I get any indel hits, which seems highly unlikely.

I verified that the VCF has the appropriate line

chr13 87474546 rs1168703497 AAGACCC A . PASS AF=0.00144928 GT 0/0 0/0 0/0 0/0 0/0 0/0 0/1 0/0 0/1 0/0

CrisprMe Website

Hi, I'm really excited to use CrisprMe and think it will be really useful for my project. I was hoping to use the website since I had tested it out before but recently any jobs I submit don't seem to be updating. I was wondering if you knew if/when the website will be up and running again to process jobs? Thanks!

Questions on Crisprme capabilities

Hello,
We're looking into CRISPRme and I'm wondering if we could get clarification on a few details.

(1) If there are multiple indels within one guide's length of each other that together introduce a new off-target, will CRISPRme find it?
(2) How does CRISPRMe deal with PAMless enzymes? Making a TST containing the entire genome seems computationally expensive.
(3) Is there an allele frequency filter that is applied? If not, how does the computation time not balloon with larger reference panels?
(4) Is it correct that bulges are not allowed in the PAM?

Thank you,
Katie

Run with converted gnomAD v4.0.0 dataset fails

Running against the gnomAD v4.0.0 (converted with CRISPRme) fails in the Integrating Results phase.

Error message

Traceback (most recent call last):
File "/opt/conda/opt/crisprme/PostProcess/./resultIntegrator.py", line 492, in
if float(elem) == 0:
ValueError: could not convert string to float: 'rs635634'
CRISPRme ERROR: result integration failed (script: /opt/conda/opt/crisprme/PostProcess/post_process.sh line 45)
CRISPRme ERROR: postprocessing failed - reference (script: /opt/conda/opt/crisprme/PostProcess/submit_job_automated_new_multiple_vcfs.sh line 848)

Some details

I ran a guide with a set of parameters against the hg38_1000G VCFs successfully and wanted to also run the same against gnomAD VCFs. I downloaded the VCFs and converted them and then ran again with the same parameters but updated the sampleIDs to be the gnomAD sample IDs and VCFs to be the gnomAD VCFs. I get the error that is above. I am running on a fresh Ubuntu VM with 128 GB of RAM against the latest docker. I have reproduced the error twice.

Y Chromosome 1000G Project

Hi CRISPRme Team,
Quick question: I noticed the installation scripts do not download the 1000G Project VCF for the Y chromosome.
Is this expected behavior? Thank you so much!

run with gnomAD converted VCF fails

Describe the bug
Hi!

I am trying to run CRISPRme with a converted gnomAD VCF but the run fails. I am using v2.1.0 by conda.
The Genome folder contains chr22 only, and the VCFs folder contains the corresponding VCF for chr22 converted using the gnomAD-converter. A run without VCFs finished without errors.

To Reproduce
The full command line is the following

crisprme.py complete-search --genome Genomes/hg38_chr22 --guide sg1617.txt --pam PAMs/20bp-NGG-SpCas9.txt --annotation Annotations/encode+gencode.hg38.bed --gene_annotation Annotations/gencode.protein_coding.bed  --mm 6 --output sg1617_gnomad_chr22_top --thread 12 --vcf list_vcf_gnomad_chr22.txt --samplesID list_samplesID_gnomad.txt

I get the following error message

$ cat Results/sg1617_gnomad_chr22_top/log_error_no_check.txt
Traceback (most recent call last):
  File "./process_summaries.py", line 136, in <module>
    dict_samples[sample][3] += 1
KeyError: 'raw'
Traceback (most recent call last):
  File "./process_summaries.py", line 136, in <module>
    dict_samples[sample][3] += 1
KeyError: 'raw'
Traceback (most recent call last):
  File "./process_summaries.py", line 136, in <module>
    dict_samples[sample][3] += 1
KeyError: 'raw'
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/crisprme/opt/crisprme/PostProcess/populations_distribution.py", line 87, in <module>
    with open(sys.argv[1]) as summary:
FileNotFoundError: [Errno 2] No such file or directory: '/data2/crisprme_test/Results/sg1617_gnomad_chr22_top/.sg1617_gnomad_chr22_top.PopulationDistribution_CFD.txt'
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/crisprme/opt/crisprme/PostProcess/populations_distribution.py", line 87, in <module>
    with open(sys.argv[1]) as summary:
FileNotFoundError: [Errno 2] No such file or directory: '/data2/crisprme_test/Results/sg1617_gnomad_chr22_top/.sg1617_gnomad_chr22_top.PopulationDistribution_CRISTA.txt'
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/crisprme/opt/crisprme/PostProcess/populations_distribution.py", line 87, in <module>
    with open(sys.argv[1]) as summary:
FileNotFoundError: [Errno 2] No such file or directory: '/data2/crisprme_test/Results/sg1617_gnomad_chr22_top/.sg1617_gnomad_chr22_top.PopulationDistribution_fewest.txt'
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/crisprme/opt/crisprme/PostProcess/populations_distribution.py", line 87, in <module>
    with open(sys.argv[1]) as summary:
FileNotFoundError: [Errno 2] No such file or directory: '/data2/crisprme_test/Results/sg1617_gnomad_chr22_top/.sg1617_gnomad_chr22_top.PopulationDistribution_CFD.txt'
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/crisprme/opt/crisprme/PostProcess/populations_distribution.py", line 87, in <module>
    with open(sys.argv[1]) as summary:
FileNotFoundError: [Errno 2] No such file or directory: '/data2/crisprme_test/Results/sg1617_gnomad_chr22_top/.sg1617_gnomad_chr22_top.PopulationDistribution_CRISTA.txt'
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/crisprme/opt/crisprme/PostProcess/populations_distribution.py", line 87, in <module>
    with open(sys.argv[1]) as summary:
FileNotFoundError: [Errno 2] No such file or directory: '/data2/crisprme_test/Results/sg1617_gnomad_chr22_top/.sg1617_gnomad_chr22_top.PopulationDistribution_fewest.txt'
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/crisprme/opt/crisprme/PostProcess/populations_distribution.py", line 87, in <module>
    with open(sys.argv[1]) as summary:
FileNotFoundError: [Errno 2] No such file or directory: '/data2/crisprme_test/Results/sg1617_gnomad_chr22_top/.sg1617_gnomad_chr22_top.PopulationDistribution_CFD.txt'
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/crisprme/opt/crisprme/PostProcess/populations_distribution.py", line 87, in <module>
    with open(sys.argv[1]) as summary:
FileNotFoundError: [Errno 2] No such file or directory: '/data2/crisprme_test/Results/sg1617_gnomad_chr22_top/.sg1617_gnomad_chr22_top.PopulationDistribution_CRISTA.txt'
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/crisprme/opt/crisprme/PostProcess/populations_distribution.py", line 87, in <module>
    with open(sys.argv[1]) as summary:
FileNotFoundError: [Errno 2] No such file or directory: '/data2/crisprme_test/Results/sg1617_gnomad_chr22_top/.sg1617_gnomad_chr22_top.PopulationDistribution_fewest.txt'
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/crisprme/opt/crisprme/PostProcess/populations_distribution.py", line 87, in <module>
    with open(sys.argv[1]) as summary:
FileNotFoundError: [Errno 2] No such file or directory: '/data2/crisprme_test/Results/sg1617_gnomad_chr22_top/.sg1617_gnomad_chr22_top.PopulationDistribution_CFD.txt'
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/crisprme/opt/crisprme/PostProcess/populations_distribution.py", line 87, in <module>
    with open(sys.argv[1]) as summary:
FileNotFoundError: [Errno 2] No such file or directory: '/data2/crisprme_test/Results/sg1617_gnomad_chr22_top/.sg1617_gnomad_chr22_top.PopulationDistribution_CRISTA.txt'
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/crisprme/opt/crisprme/PostProcess/populations_distribution.py", line 87, in <module>
    with open(sys.argv[1]) as summary:
FileNotFoundError: [Errno 2] No such file or directory: '/data2/crisprme_test/Results/sg1617_gnomad_chr22_top/.sg1617_gnomad_chr22_top.PopulationDistribution_fewest.txt'
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/crisprme/opt/crisprme/PostProcess/populations_distribution.py", line 87, in <module>
    with open(sys.argv[1]) as summary:
FileNotFoundError: [Errno 2] No such file or directory: '/data2/crisprme_test/Results/sg1617_gnomad_chr22_top/.sg1617_gnomad_chr22_top.PopulationDistribution_CFD.txt'
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/crisprme/opt/crisprme/PostProcess/populations_distribution.py", line 87, in <module>
    with open(sys.argv[1]) as summary:
FileNotFoundError: [Errno 2] No such file or directory: '/data2/crisprme_test/Results/sg1617_gnomad_chr22_top/.sg1617_gnomad_chr22_top.PopulationDistribution_CRISTA.txt'
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/crisprme/opt/crisprme/PostProcess/populations_distribution.py", line 87, in <module>
    with open(sys.argv[1]) as summary:
FileNotFoundError: [Errno 2] No such file or directory: '/data2/crisprme_test/Results/sg1617_gnomad_chr22_top/.sg1617_gnomad_chr22_top.PopulationDistribution_fewest.txt'
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/crisprme/opt/crisprme/PostProcess/populations_distribution.py", line 87, in <module>
    with open(sys.argv[1]) as summary:
FileNotFoundError: [Errno 2] No such file or directory: '/data2/crisprme_test/Results/sg1617_gnomad_chr22_top/.sg1617_gnomad_chr22_top.PopulationDistribution_CFD.txt'
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/crisprme/opt/crisprme/PostProcess/populations_distribution.py", line 87, in <module>
    with open(sys.argv[1]) as summary:
FileNotFoundError: [Errno 2] No such file or directory: '/data2/crisprme_test/Results/sg1617_gnomad_chr22_top/.sg1617_gnomad_chr22_top.PopulationDistribution_CRISTA.txt'
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/crisprme/opt/crisprme/PostProcess/populations_distribution.py", line 87, in <module>
    with open(sys.argv[1]) as summary:
FileNotFoundError: [Errno 2] No such file or directory: '/data2/crisprme_test/Results/sg1617_gnomad_chr22_top/.sg1617_gnomad_chr22_top.PopulationDistribution_fewest.txt'
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/crisprme/opt/crisprme/PostProcess/populations_distribution.py", line 87, in <module>
    with open(sys.argv[1]) as summary:
FileNotFoundError: [Errno 2] No such file or directory: '/data2/crisprme_test/Results/sg1617_gnomad_chr22_top/.sg1617_gnomad_chr22_top.PopulationDistribution_CFD.txt'
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/crisprme/opt/crisprme/PostProcess/populations_distribution.py", line 87, in <module>
    with open(sys.argv[1]) as summary:
FileNotFoundError: [Errno 2] No such file or directory: '/data2/crisprme_test/Results/sg1617_gnomad_chr22_top/.sg1617_gnomad_chr22_top.PopulationDistribution_CRISTA.txt'
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/crisprme/opt/crisprme/PostProcess/populations_distribution.py", line 87, in <module>
    with open(sys.argv[1]) as summary:
FileNotFoundError: [Errno 2] No such file or directory: '/data2/crisprme_test/Results/sg1617_gnomad_chr22_top/.sg1617_gnomad_chr22_top.PopulationDistribution_fewest.txt'

Impact of variations for On target

Hi Crisprme team,

Thanks for the amazing tool! I am using the latest version 2.10.0 on conda. I am testing Crisprme for now, but will most likely get an industry license very soon. Please let me know how that works.

I have few questions/requests for now.

  1. In which files/tables can I find information about the off-targets' closest gene(s), whether the off target is coding region or intronic region or a regulatory element? What about the type of gene information, whether it is a tumor suppressor or there is a PAM creation? Where are these information stored? Right now, I am only using the 1000G variants.

  2. Are the reported CFD scores homology based only? Or the score are more sophisticated and depend on where the mismatch has occurred?

  3. Lastly and more importantly, I would like to see the impact of genetic variation on the on-target specificity. Will genetic variation reduce the chance of being on target. Are there sub-populations in which the on-target PAM is disabled? These are the type of information I would like to see in the output. Hope this is rather easy to implement.

Looking forward to hearing from you,
-Davood

Understanding outputs

Hi @samuelecancellieri ,

Thank you and your colleagues for developing such an invaluable tool.

I had a first trial with cisprme using docker. It seemed to work and produced lots of results which I have been trying to understand.

To be exact, I would love to understand the columns in *altMerge.txt.bestCFD.txt , *bestMerge.txt and final_results_*.bestMerge.txt.bestCFD.txt.*.

I am particularly interested in understanding the columns coming from the *altMerge.txt file and have been listed below.

Moreover, If I am going to select the most likely off-targets, which file and which metrics should I rely on to make choices?

Thanks a lot in advance and Your help will be greatly appreciated.

#######################columns from *altMerge.txt file ##########
PAM_gen
Var_uniq
#Seq_in_cluster
CFD_ref
Highest_CFD_Risk_Score
Highest_CFD_Absolute_Risk_Score
MMBLG_PAM_gen
MMBLG_CFD
MMBLG_CFD_ref
MMBLG_CFD_Risk_Score
MMBLG_CFD_Absolute_Risk_Score

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x90 in position 48:cluster.dict.py

I have faced the problems
hg38+hg38_1000G_test_20bp-NGG-SpCas9.txt_guides.txt_gencode_encode.hg38.bed_6_2_2_chrX_KI270881v1_alt.total.cluster.txt.tmp_sort.txt file have specific symbols
<90> in third columns. And I had faced with error blows,I want to know if this problem can be solved,any ideas about how to solve this problem would be very grateful.
image

run command line

crisprme.py complete-search --genome Genomes/hg38 --vcf list_vcf.txt/ --guide sg1617.txt --pam PAMs/20bp-NGG-SpCas9.txt --annotation Annotations/gencode_encode.hg38.bed --samplesID list_samplesID.txt --gene_annotation Gencode/gencode.protein_coding.bed --bMax 2 --mm 6 --bDNA 2 --bRNA 2 --merge 3 --output sg1617.6.2.2_new --thread 58

8. Annotation

I download the docker crispr using command line:
docker pull pinellolab/crisprme
file format
the third column have specific symbols,<90>
image
Any help tracing this down would be helpful, I am looking forward to running the tool on a real sequence!

Let me know if any other information would be helpful...
khl

Usage for nucleases other than cas9

Can crisprme be used for prediction for non-cas9 nucleases?

I understand crisprme does not compute cfd score in these cases, but that crisprme uses CFD for selecting deduplicating targets for the bestMerge file. What does the tool do when CFD is not computable?

docker image version

I pulled the newest docker image but somehow it says v.2.1.3

 docker images
REPOSITORY             TAG       IMAGE ID       CREATED         SIZE
pinellolab/crisprme    latest    a15be9acd5d6   38 hours ago    1.85GB
pinellolab/crispritz   latest    7d74a7bc0e48   3 months ago    3.22GB
hello-world            latest    d2c94e258dcb   15 months ago   13.3kB
zma@aws$ docker run -v ${PWD}:/DATA -w /DATA -i pinellolab/crisprme crisprme.py --version
v2.1.3

is there a chance the lastest docker image is actually an old one?

CRISPRme output

Hi,

I ran CRISPRme for a guide sequence and included 1000 Genome vcf files to find out how variants effect the OT sites. I found some sites that didn't have any variants mapping but were reported as having lower number of mismatches/bulges in "MMBLG_Mismatches | MMBLG_Bulge_Size | MMBLG_Total" columns compared to "Mismatches | Bulge_Size | Total" columns.
The OT sequence is same in "DNA" and "MMBLG_DNA" columns but the alignment to guide is different.
aggCACTAG-aTTGACaCACAGG vs aggCACTAGA-TTGACaCACAGG
All variation related rows have "n" value for this example, so there is no variant mapping to this genomic region.
Is this a bug or should I interpret the results in a different way?

Thank you,
Meltem

Error message: no results when using web based CRISPRme with Cas12a

Describe the bug
Status report
Indexing genome(s): Not available
Searching spacer: Not available
Post processing: Not available
Merge targets: Not available
Annotating and generating images: Not available
Integrating results: Not available
Populating database: Not available

The selected result encountered some errors, please remove it and try to submit again

To Reproduce

If running CRISPRme via command line, type the command line call to CRISPRme returning the error

If running CRISPRme via the website, please fill the form below:

  1. Spacer sequences
    AGACAGATATTTGCATTGAGATA

  2. Cas protein
    Cas12a

  3. PAM
    TTTV-23bp-Cas12a

  4. Genome
    Hg38

  5. Variants dataset (OPTIONAL)
    plus 1000 Genomes Project variants
    plus HGDP variants

  6. Thresholds
    Mismatches: 6
    DNA Bulges: 2
    RNA Bulges: 2

  7. Base editing (OPTIONAL)
    Start: 7
    Stop: 15
    Nucleotide: A

  8. Annotation

Expected behavior
I get an error and no results

Screenshots
If running CRISPRme via website, add screenshots to help explain your problem.

Screenshot 2024-01-30 at 15 44 28 Screenshot 2024-01-30 at 15 44 45

Environment (please complete the following information, ONLY applicable if running CRISPRme via command line):

  • Python version
  • CRISPRme version
  • CRISPRitz version
  • axel version
  • gdown version
  • numpy version
  • dash version
  • dash-bootstrap-components version
  • dash-core-components version
  • dash-daq version
  • dash-html-components version
  • dash-renderer version
  • dash-table version
  • flask version
  • flask-caching version
  • flask-compress version
  • fontconfig version
  • freetype version
  • future version
  • gettext version
  • gunicorn version
  • werkzeug version
  • pandas version

Additional context
I ran your test dataset with Cas9 and had no issues. I wonder if it has something to do with the Cas12a; I also tried different gRNAs for Cas12a with the same outcome

PAM input format

Hi,

What is the format for the file containing the pam, eg. PAMs/20bp-NGG-spCas9.txt. I don't seem to find a description of this file.

Thanks!

gnomAD-converter error in v2.1.5 (conda isntall) : pysam missing in the conda build recipe

I installed the newest v2.1.5 version and tested the new gnomAD-converter function for gnomAD v4 VCFs and noticed a bug:

 crisprme.py gnomAD-converter --gnomAD_VCFdir . --samplesID ../../../../samplesIDs/hg38_gnomAD.samplesID.txt --keep
Traceback (most recent call last):
  File "/mnt/efs/home/zma/anaconda3/envs/crisprme2.1.5/opt/crisprme/src/convert_gnomAD_vcfs.py", line 13, in <module>
    import pysam
ModuleNotFoundError: No module named 'pysam'

installing the pysam solved this issue

Question about altMerge output

Hi there,

I have a question about how *.altMerge.txt is generated. We're interested in this file because we need an exhaustive list of all variants that could result in a hit at a specific locus, not just the top-scoring one.

I ran CRISPRme with the gnomAD data (6 mm, 1 bulge). I notice that for each cluster, the first 24 columns are identical, but have different MMLBG* columns. They also have identical CRISTA* columns.

Could you shed some light as to how this file is generated? Should all the sites contained in the MMBLG* be considered alternate sites?

Thanks

Position and Cluster_Position

Hi!

Thanks for the quick reply on the other issues! I managed to run it successfully on chr22.

I am inspecting output.bestMerge.txt. What does Position correspond to? and what is the difference with Cluster_Position?
How can I get start and end coordinates of the sequence in DNA?

Thanks!

Update Dockerfile to pull v2.1.2

Update and Fix Dockerfile:

  • Pull CRISPRme v2.1.2 instead of v2.1.1
  • Replace conda with mamba (use mamba miniforge image)
  • Force CRISPRitz = v2.6.6
  • Update crisprme_auto_test_docker.sh
  • Test new build

Test run failed - docker

Hello,
I tired to launch the test command:
docker run -v ${PWD}:/DATA -w /DATA -i scancellieri/crisprme crisprme.py complete-search --genome Genomes/hg38/ --vcf list_vcf.txt/ --guide sg1617.txt --pam PAMs/20bp-NGG-SpCas9.txt --annotation Annotations/encode+gencode.hg38.bed --samplesID list_samplesID.txt --gene_annotation Annotations/gencode.protein_coding.bed --bMax 2 --mm 6 --bDNA 2 --bRNA 2 --merge 3 --output sg1617.6.2.2 --thread 4

But received error:

The folder specified for --vcf does not exist

This is what the test directory looks like (command tree -L 2)

├── Annotations
│   ├── encode+gencode.hg38.bed
│   └── gencode.protein_coding.bed
├── clean_all.sh
├── crisprme_auto_test_conda.sh
├── crisprme_auto_test_docker.sh
├── crisprme_auto_test_download_essentials.sh
├── crisprme_auto_test_no_download.sh
├── Dictionaries
├── Genomes
│   ├── hg38
│   └── hg38.chromFa.tar.gz
├── list_samplesID.txt
├── list_vcf.txt
├── PAMs
│   └── 20bp-NGG-SpCas9.txt
├── Results
├── samplesIDs
│   ├── hg38_1000G.samplesID.txt
│   ├── hg38_gnomAD.samplesID.txt
│   └── hg38_HGDP.samplesID.txt
├── sg1617.txt
└── VCFs
    ├── hg38_1000G
    └── hg38_HGDP

I then modified the command to --vcf VCFs/hg38_1000G, but then have a new errorThe folder specified for --pam does not exist

What am I doing wrong?
Thank you!
Paola

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.