Giter Club home page Giter Club logo

crisprme's People

Contributors

lindayqlin avatar manueltgn avatar samuelecancellieri avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

crisprme's Issues

Understanding outputs

Hi @samuelecancellieri ,

Thank you and your colleagues for developing such an invaluable tool.

I had a first trial with cisprme using docker. It seemed to work and produced lots of results which I have been trying to understand.

To be exact, I would love to understand the columns in *altMerge.txt.bestCFD.txt , *bestMerge.txt and final_results_*.bestMerge.txt.bestCFD.txt.*.

I am particularly interested in understanding the columns coming from the *altMerge.txt file and have been listed below.

Moreover, If I am going to select the most likely off-targets, which file and which metrics should I rely on to make choices?

Thanks a lot in advance and Your help will be greatly appreciated.

#######################columns from *altMerge.txt file ##########
PAM_gen
Var_uniq
#Seq_in_cluster
CFD_ref
Highest_CFD_Risk_Score
Highest_CFD_Absolute_Risk_Score
MMBLG_PAM_gen
MMBLG_CFD
MMBLG_CFD_ref
MMBLG_CFD_Risk_Score
MMBLG_CFD_Absolute_Risk_Score

Questions on Crisprme capabilities

Hello,
We're looking into CRISPRme and I'm wondering if we could get clarification on a few details.

(1) If there are multiple indels within one guide's length of each other that together introduce a new off-target, will CRISPRme find it?
(2) How does CRISPRMe deal with PAMless enzymes? Making a TST containing the entire genome seems computationally expensive.
(3) Is there an allele frequency filter that is applied? If not, how does the computation time not balloon with larger reference panels?
(4) Is it correct that bulges are not allowed in the PAM?

Thank you,
Katie

Question about web portal vs. running CRISPRme locally

Hello,

Thank you for creating this useful tool. I had a quick question: if I ran CRISPRme with parameters X on the web portal and then ran it locally with the same parameters X, should I expect the integrated_results file that is output to be identical? I was specifically wondering if there was any output truncation that happened on the web portal for efficiency even with the integrated_results file (which I think is the complete off-target set).

Thank you

having issue with conda installation

Dear team,

We are trying to install this software and are somehow stuck at this step.

conda create -n crisprme python=3.8 crisprme -y

Collecting package metadata (current_repodata.json): done
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: |

Could you please help in solving this ?

Regards.
Najeeb

Impact of variations for On target

Hi Crisprme team,

Thanks for the amazing tool! I am using the latest version 2.10.0 on conda. I am testing Crisprme for now, but will most likely get an industry license very soon. Please let me know how that works.

I have few questions/requests for now.

  1. In which files/tables can I find information about the off-targets' closest gene(s), whether the off target is coding region or intronic region or a regulatory element? What about the type of gene information, whether it is a tumor suppressor or there is a PAM creation? Where are these information stored? Right now, I am only using the 1000G variants.

  2. Are the reported CFD scores homology based only? Or the score are more sophisticated and depend on where the mismatch has occurred?

  3. Lastly and more importantly, I would like to see the impact of genetic variation on the on-target specificity. Will genetic variation reduce the chance of being on target. Are there sub-populations in which the on-target PAM is disabled? These are the type of information I would like to see in the output. Hope this is rather easy to implement.

Looking forward to hearing from you,
-Davood

Test Installation Run Failure

Describe the bug

Terminal output:

$ bash crisprme_auto_test_conda.sh
starting download and unzip of data
unzip gencode+encode annotations
start download VCF data and genome (this may take a long time due to connection speed)
download 1000G VCFs
crisprme_auto_test_conda.sh: line 26: 40706 Abort trap: 6           wget -c -q ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000_genomes_project/release/20190312_biallelic_SNV_and_INDEL/ALL.chr$i.shapeit2_integrated_snvindels_v2a_27022019.GRCh38.phased.vcf.gz
download hg38
start testing
Launching job /Users/sdm8/Desktop/crisprme_test/crisprme_test/Results/sg1617.6.2.2. The stdout is redirected in log_verbose.txt and stderr is redirected in log_error.txt
Traceback (most recent call last):
  File "/Users/sdm8/opt/anaconda3/envs/crisprme/bin/crisprme.py", line 934, in <module>
    complete_search()
  File "/Users/sdm8/opt/anaconda3/envs/crisprme/bin/crisprme.py", line 673, in complete_search
    raise OSError(f"\nCRISPRme run failed! See {os.path.join(outputfolder, 'log_error.txt')} for details\n")
OSError:
CRISPRme run failed! See /Users/sdm8/Desktop/crisprme_test/crisprme_test/Results/sg1617.6.2.2/log_error.txt for details

log_error.txt

[W::bcf_sr_add_reader] No BGZF EOF marker; file '/Users/sdm8/Desktop/crisprme_test/crisprme_test/VCFs/hg38_1000G/ALL.chr7.shapeit2_integrated_snvindels_v2a_27022019.GRCh38.phased.vcf.gz' may be truncated
mv: rename /Users/sdm8/Desktop/crisprme_test/crisprme_test/Genomes/variants_genome/SNPs_genome/hg38_enriched/ to ./hg38+hg38_1000G/: No such file or directory

To Reproduce

I had to install like this without specifying python version as described in README because I kept getting an error message pertaining to version one of the program:

Encountered problems while solving:
- package crisprme-1.0.1-0 requires crispritz, but none of the providers can be installed

Installed via:

mamba install -c bioconda crisprme
## missing module Bio -
mamba install biopython

I had to download test data off of my VPN (federal govt netowork), I downloaded the test data via:

wget https://www.dropbox.com/s/urciozkana5md0z/crisprme_test.tar.gz?dl=1 -O crisprme_test.tar.gz
tar -xvf crisprme_test.tar.gz

Environment (please complete the following information, ONLY applicable if running CRISPRme via command line):

  • Python version 3.7.12
  • CRISPRme version 2.1.1
  • CRISPRitz version 1.2.1
  • axel version 2.17.11
  • gdown version 4.7.1
  • numpy version 1.20.0
  • dash version 1.10.0
  • dash-bootstrap-components version 0.10.0
  • dash-core-components version 1.9.0
  • dash-daq version 0.4.0
  • dash-html-components version 1.0.3
  • dash-renderer version 1.3.0
  • dash-table version 4.6.2
  • flask version 1.1.3
  • flask-caching version 1.7.1
  • flask-compress version 1.5.0
  • fontconfig version 2.13.1
  • freetype version 2.10.1
  • future version 0.18.2
  • gettext version 0.19.8.1
  • gunicorn version 20.0.4
  • werkzeug version 1.0.1
  • pandas version 1.2.5

Error message: no results when using web based CRISPRme with Cas12a

Describe the bug
Status report
Indexing genome(s): Not available
Searching spacer: Not available
Post processing: Not available
Merge targets: Not available
Annotating and generating images: Not available
Integrating results: Not available
Populating database: Not available

The selected result encountered some errors, please remove it and try to submit again

To Reproduce

If running CRISPRme via command line, type the command line call to CRISPRme returning the error

If running CRISPRme via the website, please fill the form below:

  1. Spacer sequences
    AGACAGATATTTGCATTGAGATA

  2. Cas protein
    Cas12a

  3. PAM
    TTTV-23bp-Cas12a

  4. Genome
    Hg38

  5. Variants dataset (OPTIONAL)
    plus 1000 Genomes Project variants
    plus HGDP variants

  6. Thresholds
    Mismatches: 6
    DNA Bulges: 2
    RNA Bulges: 2

  7. Base editing (OPTIONAL)
    Start: 7
    Stop: 15
    Nucleotide: A

  8. Annotation

Expected behavior
I get an error and no results

Screenshots
If running CRISPRme via website, add screenshots to help explain your problem.

Screenshot 2024-01-30 at 15 44 28 Screenshot 2024-01-30 at 15 44 45

Environment (please complete the following information, ONLY applicable if running CRISPRme via command line):

  • Python version
  • CRISPRme version
  • CRISPRitz version
  • axel version
  • gdown version
  • numpy version
  • dash version
  • dash-bootstrap-components version
  • dash-core-components version
  • dash-daq version
  • dash-html-components version
  • dash-renderer version
  • dash-table version
  • flask version
  • flask-caching version
  • flask-compress version
  • fontconfig version
  • freetype version
  • future version
  • gettext version
  • gunicorn version
  • werkzeug version
  • pandas version

Additional context
I ran your test dataset with Cas9 and had no issues. I wonder if it has something to do with the Cas12a; I also tried different gRNAs for Cas12a with the same outcome

A few questions...

Hi all,

Thanks for the tool! I have a few questions

  1. Are the _alt and _random files from the hg38 genome used as part of the search with the standard VCF sets? If so, can you explain how they are identified in the output?

  2. I am interested in using some other VCF datasets, can you provide any information on what the tools support for the VCF files? I am planning to format them the same as you have with your script for cleaning up the gnomad 3.1 dataset, but I would prefer to not convert them to multi-allelic records like you are doing because of the infromation loss -- is it required?

Thanks,

Tom

Number of threads don't change

Thank you for this great tool!

I run this command:

nohup crisprme.py complete-search --genome Genomes/hg38/ --vcf list_vcf.txt/ --guide New_guides.txt --pam PAMs/20bp-NGG-SpCas9.txt --annotation Annotations/encode+gencode.hg38.bed --samplesID list_samplesID.txt --gene_annotation Annotations/gencode.protein_coding.bed --bMax 2 --mm 6 --bDNA 2 --bRNA 2 --merge 3 --output New --thread 64 &

I use --thread 64, but the code is still only using 4 cores. How do I change the number of cores utilized?

Also, what does --merge do? I couldn't find that in the --help section.

Thanks again,
-Davood

MemoryError in Enrichment step

Describe the bug
When running the complete-search command there is a Memory Error in the Enrichment step. The folder titled Genomes/hg38+hg38_1000G that contains the reference enrichment fastas (ex: chr3.enriched.fa) is missing the two from the largest chromosomes: chr1.enriched.fa and chr2.enriched.fa. There is both chr1.fa and chr2.fa in the Genomes/hg38 folder.

To Reproduce
Running final command in the crisprme_auto_test_conda.sh.

command:
crisprme.py complete-search --genome Genomes/hg38/ --vcf list_vcf.txt/ --guide sg1617.txt --pam PAMs/20bp-NGG-SpCas9.txt --annotation Annotations/encode+gencode.hg38.bed --samplesID list_samplesID.txt --gene_annotation Annotations/gencode.protein_coding.bed --bMax 2 --mm 6 --bDNA 2 --bRNA 2 --merge 3 --output sg1617.6.2.2 --thread 4

Expected behavior
Expect a list of off-target sites and some images. Instead some files are created but none including a full list of the off-target sites. No images are created in the img directory.

Environment (please complete the following information, ONLY applicable if running CRISPRme via command line):

  • Python version: 3.8
  • CRISPRme version: 2.1.1
  • CRISPRitz version: 2.6.6
  • axel version: 2.17.11
  • gdown version: 4.7.1
  • numpy version: 1.20.0
  • dash version: 1.10.0
  • dash-bootstrap-components version: 0.10.0
  • dash-core-components version: 1.9.0
  • dash-daq version: 0.4.0
  • dash-html-components version: 1.0.3
  • dash-renderer version: 1.3.0
  • dash-table version: 4.6.2
  • flask version: 1.1.3
  • flask-caching version: 1.7.1
  • flask-compress version: 1.5.0
  • fontconfig version: 2.13.1
  • freetype version: 2.10.1
  • future version: 0.18.2
  • gettext version: 0.19.8.1
  • gunicorn version: 20.0.4
  • werkzeug version: 1.0.1
  • pandas version: 1.2.5

Additional context
Running on a c4.4xlarge EC2 instance with 2Tb volume.

log_error_no_check.txt

Getting an error running Docker test

I am running on a clean docker image with the test script. It gets far into the process and then gets and error.

log_error.txt contains :

./merge_close_targets_cfd.sh: line 34: 3282 Killed python remove_contiguous_samples_cfd.py $fileIn $fileOut $thresh $chrom $position $total $true_guide $snp_info $cfd $sort_pivot $sorting_criteria_scoring $sorting_criteria
CRISPRme ERROR: contigous SNP removal failed (script: ./merge_close_targets_cfd.sh line 31)
./merge_close_targets_cfd.sh: line 34: 3281 Killed python remove_contiguous_samples_cfd.py $fileIn $fileOut $thresh $chrom $position $total $true_guide $snp_info $cfd $sort_pivot $sorting_criteria_scoring $sorting_criteria
CRISPRme ERROR: contigous SNP removal failed (script: ./merge_close_targets_cfd.sh line 31)
Traceback (most recent call last):
File "/opt/conda/opt/crisprme/PostProcess/remove_contiguous_samples_cfd.py", line 662, in
merge_targets()
File "/opt/conda/opt/crisprme/PostProcess/remove_contiguous_samples_cfd.py", line 618, in merge_targets
int(target_data[input_args[4]]),
ValueError: invalid literal for int() with base 10: '+'
CRISPRme ERROR: contigous SNP removal failed (script: ./merge_close_targets_cfd.sh line 31)

mv: cannot stat '/Test123/Results/sg1617_2/final_results_sg1617_2.bestMerge.txt.bestCFD.txt.trimmed': No such file or directory

I can add the full verbose log file if that would help, but here is the end that corresponds to the error...

Sorting file
Sorting file
Sorting file
Sorting done in 0 seconds
Sorting done in 0 seconds
Sorting done in 0 seconds
Merging contiguous targets
Merging contiguous targets
Merging contiguous targets

Any help tracing this down would be helpful, I am looking forward to running the tool on a real sequence!

Let me know if any other information would be helpful...

Tom

single chromosome run

Hi,

Is it possible to run CRISPRme on a single chromosome? Ideally keeping all chromosomes in the same Genomes/ folder.

Thanks!

CrisprMe Website

Hi, I'm really excited to use CrisprMe and think it will be really useful for my project. I was hoping to use the website since I had tested it out before but recently any jobs I submit don't seem to be updating. I was wondering if you knew if/when the website will be up and running again to process jobs? Thanks!

run failed v2.1.1

Hi!

I am trying to run CRISPRme on chr22 with no VCF files, but the run fails with the error below. Is there any required parameter that I might be missing? Thanks!

To Reproduce

$ crisprme.py complete-search --genome Genomes/hg38_chr22/ --guide sg1617.txt --pam PAMs/20bp-NGG-SpCas9.txt --bMax 2 --mm 6 --bDNA 2 --bRNA 2 --merge 3 --output output --thread 12
--annotation not used
Launching job /data2/crisprme_test/Results/output. The stdout is redirected in log_verbose.txt and stderr is redirected in log_error.txt
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/crisprme/bin/crisprme.py", line 934, in <module>
    complete_search()
  File "/home/ubuntu/miniconda3/envs/crisprme/bin/crisprme.py", line 673, in complete_search
    raise OSError(f"\nCRISPRme run failed! See {os.path.join(outputfolder, 'log_error.txt')} for details\n")
OSError:
CRISPRme run failed! See /data2/crisprme_test/Results/output/log_error.txt for details

$ head -50 /data2/crisprme_test/Results/output/log_error.txt
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/crisprme/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3081, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 4554, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 4562, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'rsID'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "./remove_n_and_dots.py", line 29, in <module>
    chunk['rsID'] = chunk['rsID'].str.replace('.', 'NA')
  File "/home/ubuntu/miniconda3/envs/crisprme/lib/python3.8/site-packages/pandas/core/frame.py", line 3024, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/home/ubuntu/miniconda3/envs/crisprme/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3083, in get_loc
    raise KeyError(key) from err
KeyError: 'rsID'
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/crisprme/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3081, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 4554, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 4562, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'rsID'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "./remove_n_and_dots.py", line 29, in <module>
    chunk['rsID'] = chunk['rsID'].str.replace('.', 'NA')
  File "/home/ubuntu/miniconda3/envs/crisprme/lib/python3.8/site-packages/pandas/core/frame.py", line 3024, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/home/ubuntu/miniconda3/envs/crisprme/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3083, in get_loc
    raise KeyError(key) from err
KeyError: 'rsID'
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/crisprme/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3081, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 4554, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 4562, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'rsID'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):

If running CRISPRme via command line, type the command line call to CRISPRme returning the error

If running CRISPRme via the website, please fill the form below:

  1. Spacer sequences

  2. Cas protein

  3. PAM

  4. Genome

  5. Variants dataset (OPTIONAL)

  6. Thresholds
    Mismatches:
    DNA Bulges:
    RNA Bulges:

  7. Base editing (OPTIONAL)
    Start:
    Stop:
    Nucleotide:

  8. Annotation

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If running CRISPRme via website, add screenshots to help explain your problem.

Environment (please complete the following information, ONLY applicable if running CRISPRme via command line):

  • Python version
  • CRISPRme version
  • CRISPRitz version
  • axel version
  • gdown version
  • numpy version
  • dash version
  • dash-bootstrap-components version
  • dash-core-components version
  • dash-daq version
  • dash-html-components version
  • dash-renderer version
  • dash-table version
  • flask version
  • flask-caching version
  • flask-compress version
  • fontconfig version
  • freetype version
  • future version
  • gettext version
  • gunicorn version
  • werkzeug version
  • pandas version

Additional context
Add any other context about the problem here.

Is the dropbox test data not accessible anymore?

Describe the bug
Trying to download test data to test my installation:

$ wget https://www.dropbox.com/s/urciozkana5md0z/crisprme_test.tar.gz?dl=1 -O crisprme_test.tar.gz
--2023-04-04 15:06:23--  https://www.dropbox.com/s/urciozkana5md0z/crisprme_test.tar.gz?dl=1
Resolving www.dropbox.com (www.dropbox.com)... 162.125.8.18
Connecting to www.dropbox.com (www.dropbox.com)|162.125.8.18|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: /s/dl/urciozkana5md0z/crisprme_test.tar.gz [following]
--2023-04-04 15:06:23--  https://www.dropbox.com/s/dl/urciozkana5md0z/crisprme_test.tar.gz
Reusing existing connection to www.dropbox.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://uc9473414cea25b32ec2d6ecc563.dl.dropboxusercontent.com/cd/0/get/B5gpbbCTnzZCiZiieBea8pMWQGbJ23D63L7ejpA053ciEAlNmWDQHGedfiSn7kZEdplAXkoEaZRB9OtFblQ19ZPUT5UjucX52-9NzWR3aBcMec9NK5PvnCdXdFxAJoGgK_BsGDswJW0BdEgwdASC2OveifZdcFvCQ_IMrmBucgU5MBchSE4NLe_s8dmw2Ge2llM/file?dl=1# [following]
--2023-04-04 15:06:24--  https://uc9473414cea25b32ec2d6ecc563.dl.dropboxusercontent.com/cd/0/get/B5gpbbCTnzZCiZiieBea8pMWQGbJ23D63L7ejpA053ciEAlNmWDQHGedfiSn7kZEdplAXkoEaZRB9OtFblQ19ZPUT5UjucX52-9NzWR3aBcMec9NK5PvnCdXdFxAJoGgK_BsGDswJW0BdEgwdASC2OveifZdcFvCQ_IMrmBucgU5MBchSE4NLe_s8dmw2Ge2llM/file?dl=1
Resolving uc9473414cea25b32ec2d6ecc563.dl.dropboxusercontent.com (uc9473414cea25b32ec2d6ecc563.dl.dropboxusercontent.com)... 127.0.0.1
Connecting to uc9473414cea25b32ec2d6ecc563.dl.dropboxusercontent.com (uc9473414cea25b32ec2d6ecc563.dl.dropboxusercontent.com)|127.0.0.1|:443... failed: Connection refused.

Tried via browser as well.

This address: https://www.dropbox.com/s/urciozkana5md0z/crisprme_test.tar.gz?dl=1

This site can’t be reacheducbbeffe6fcc5ac3d70c313358f9.dl.dropboxusercontent.com refused to connect.
Try:

Checking the connection
[Checking the proxy and the firewall](chrome-error://chromewebdata/#buttons)
ERR_CONNECTION_REFUSED

Tried going up a directory and I can see the directory crisprme_test but when I try to download it produces an error. Same as when I go into the directory and try to download any of the files. A red error message at the top says "There was an error downloading your file."

Test run failed - docker

Hello,
I tired to launch the test command:
docker run -v ${PWD}:/DATA -w /DATA -i scancellieri/crisprme crisprme.py complete-search --genome Genomes/hg38/ --vcf list_vcf.txt/ --guide sg1617.txt --pam PAMs/20bp-NGG-SpCas9.txt --annotation Annotations/encode+gencode.hg38.bed --samplesID list_samplesID.txt --gene_annotation Annotations/gencode.protein_coding.bed --bMax 2 --mm 6 --bDNA 2 --bRNA 2 --merge 3 --output sg1617.6.2.2 --thread 4

But received error:

The folder specified for --vcf does not exist

This is what the test directory looks like (command tree -L 2)

├── Annotations
│   ├── encode+gencode.hg38.bed
│   └── gencode.protein_coding.bed
├── clean_all.sh
├── crisprme_auto_test_conda.sh
├── crisprme_auto_test_docker.sh
├── crisprme_auto_test_download_essentials.sh
├── crisprme_auto_test_no_download.sh
├── Dictionaries
├── Genomes
│   ├── hg38
│   └── hg38.chromFa.tar.gz
├── list_samplesID.txt
├── list_vcf.txt
├── PAMs
│   └── 20bp-NGG-SpCas9.txt
├── Results
├── samplesIDs
│   ├── hg38_1000G.samplesID.txt
│   ├── hg38_gnomAD.samplesID.txt
│   └── hg38_HGDP.samplesID.txt
├── sg1617.txt
└── VCFs
    ├── hg38_1000G
    └── hg38_HGDP

I then modified the command to --vcf VCFs/hg38_1000G, but then have a new errorThe folder specified for --pam does not exist

What am I doing wrong?
Thank you!
Paola

run with gnomAD converted VCF fails

Describe the bug
Hi!

I am trying to run CRISPRme with a converted gnomAD VCF but the run fails. I am using v2.1.0 by conda.
The Genome folder contains chr22 only, and the VCFs folder contains the corresponding VCF for chr22 converted using the gnomAD-converter. A run without VCFs finished without errors.

To Reproduce
The full command line is the following

crisprme.py complete-search --genome Genomes/hg38_chr22 --guide sg1617.txt --pam PAMs/20bp-NGG-SpCas9.txt --annotation Annotations/encode+gencode.hg38.bed --gene_annotation Annotations/gencode.protein_coding.bed  --mm 6 --output sg1617_gnomad_chr22_top --thread 12 --vcf list_vcf_gnomad_chr22.txt --samplesID list_samplesID_gnomad.txt

I get the following error message

$ cat Results/sg1617_gnomad_chr22_top/log_error_no_check.txt
Traceback (most recent call last):
  File "./process_summaries.py", line 136, in <module>
    dict_samples[sample][3] += 1
KeyError: 'raw'
Traceback (most recent call last):
  File "./process_summaries.py", line 136, in <module>
    dict_samples[sample][3] += 1
KeyError: 'raw'
Traceback (most recent call last):
  File "./process_summaries.py", line 136, in <module>
    dict_samples[sample][3] += 1
KeyError: 'raw'
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/crisprme/opt/crisprme/PostProcess/populations_distribution.py", line 87, in <module>
    with open(sys.argv[1]) as summary:
FileNotFoundError: [Errno 2] No such file or directory: '/data2/crisprme_test/Results/sg1617_gnomad_chr22_top/.sg1617_gnomad_chr22_top.PopulationDistribution_CFD.txt'
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/crisprme/opt/crisprme/PostProcess/populations_distribution.py", line 87, in <module>
    with open(sys.argv[1]) as summary:
FileNotFoundError: [Errno 2] No such file or directory: '/data2/crisprme_test/Results/sg1617_gnomad_chr22_top/.sg1617_gnomad_chr22_top.PopulationDistribution_CRISTA.txt'
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/crisprme/opt/crisprme/PostProcess/populations_distribution.py", line 87, in <module>
    with open(sys.argv[1]) as summary:
FileNotFoundError: [Errno 2] No such file or directory: '/data2/crisprme_test/Results/sg1617_gnomad_chr22_top/.sg1617_gnomad_chr22_top.PopulationDistribution_fewest.txt'
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/crisprme/opt/crisprme/PostProcess/populations_distribution.py", line 87, in <module>
    with open(sys.argv[1]) as summary:
FileNotFoundError: [Errno 2] No such file or directory: '/data2/crisprme_test/Results/sg1617_gnomad_chr22_top/.sg1617_gnomad_chr22_top.PopulationDistribution_CFD.txt'
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/crisprme/opt/crisprme/PostProcess/populations_distribution.py", line 87, in <module>
    with open(sys.argv[1]) as summary:
FileNotFoundError: [Errno 2] No such file or directory: '/data2/crisprme_test/Results/sg1617_gnomad_chr22_top/.sg1617_gnomad_chr22_top.PopulationDistribution_CRISTA.txt'
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/crisprme/opt/crisprme/PostProcess/populations_distribution.py", line 87, in <module>
    with open(sys.argv[1]) as summary:
FileNotFoundError: [Errno 2] No such file or directory: '/data2/crisprme_test/Results/sg1617_gnomad_chr22_top/.sg1617_gnomad_chr22_top.PopulationDistribution_fewest.txt'
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/crisprme/opt/crisprme/PostProcess/populations_distribution.py", line 87, in <module>
    with open(sys.argv[1]) as summary:
FileNotFoundError: [Errno 2] No such file or directory: '/data2/crisprme_test/Results/sg1617_gnomad_chr22_top/.sg1617_gnomad_chr22_top.PopulationDistribution_CFD.txt'
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/crisprme/opt/crisprme/PostProcess/populations_distribution.py", line 87, in <module>
    with open(sys.argv[1]) as summary:
FileNotFoundError: [Errno 2] No such file or directory: '/data2/crisprme_test/Results/sg1617_gnomad_chr22_top/.sg1617_gnomad_chr22_top.PopulationDistribution_CRISTA.txt'
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/crisprme/opt/crisprme/PostProcess/populations_distribution.py", line 87, in <module>
    with open(sys.argv[1]) as summary:
FileNotFoundError: [Errno 2] No such file or directory: '/data2/crisprme_test/Results/sg1617_gnomad_chr22_top/.sg1617_gnomad_chr22_top.PopulationDistribution_fewest.txt'
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/crisprme/opt/crisprme/PostProcess/populations_distribution.py", line 87, in <module>
    with open(sys.argv[1]) as summary:
FileNotFoundError: [Errno 2] No such file or directory: '/data2/crisprme_test/Results/sg1617_gnomad_chr22_top/.sg1617_gnomad_chr22_top.PopulationDistribution_CFD.txt'
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/crisprme/opt/crisprme/PostProcess/populations_distribution.py", line 87, in <module>
    with open(sys.argv[1]) as summary:
FileNotFoundError: [Errno 2] No such file or directory: '/data2/crisprme_test/Results/sg1617_gnomad_chr22_top/.sg1617_gnomad_chr22_top.PopulationDistribution_CRISTA.txt'
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/crisprme/opt/crisprme/PostProcess/populations_distribution.py", line 87, in <module>
    with open(sys.argv[1]) as summary:
FileNotFoundError: [Errno 2] No such file or directory: '/data2/crisprme_test/Results/sg1617_gnomad_chr22_top/.sg1617_gnomad_chr22_top.PopulationDistribution_fewest.txt'
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/crisprme/opt/crisprme/PostProcess/populations_distribution.py", line 87, in <module>
    with open(sys.argv[1]) as summary:
FileNotFoundError: [Errno 2] No such file or directory: '/data2/crisprme_test/Results/sg1617_gnomad_chr22_top/.sg1617_gnomad_chr22_top.PopulationDistribution_CFD.txt'
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/crisprme/opt/crisprme/PostProcess/populations_distribution.py", line 87, in <module>
    with open(sys.argv[1]) as summary:
FileNotFoundError: [Errno 2] No such file or directory: '/data2/crisprme_test/Results/sg1617_gnomad_chr22_top/.sg1617_gnomad_chr22_top.PopulationDistribution_CRISTA.txt'
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/crisprme/opt/crisprme/PostProcess/populations_distribution.py", line 87, in <module>
    with open(sys.argv[1]) as summary:
FileNotFoundError: [Errno 2] No such file or directory: '/data2/crisprme_test/Results/sg1617_gnomad_chr22_top/.sg1617_gnomad_chr22_top.PopulationDistribution_fewest.txt'
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/crisprme/opt/crisprme/PostProcess/populations_distribution.py", line 87, in <module>
    with open(sys.argv[1]) as summary:
FileNotFoundError: [Errno 2] No such file or directory: '/data2/crisprme_test/Results/sg1617_gnomad_chr22_top/.sg1617_gnomad_chr22_top.PopulationDistribution_CFD.txt'
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/crisprme/opt/crisprme/PostProcess/populations_distribution.py", line 87, in <module>
    with open(sys.argv[1]) as summary:
FileNotFoundError: [Errno 2] No such file or directory: '/data2/crisprme_test/Results/sg1617_gnomad_chr22_top/.sg1617_gnomad_chr22_top.PopulationDistribution_CRISTA.txt'
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/crisprme/opt/crisprme/PostProcess/populations_distribution.py", line 87, in <module>
    with open(sys.argv[1]) as summary:
FileNotFoundError: [Errno 2] No such file or directory: '/data2/crisprme_test/Results/sg1617_gnomad_chr22_top/.sg1617_gnomad_chr22_top.PopulationDistribution_fewest.txt'
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/crisprme/opt/crisprme/PostProcess/populations_distribution.py", line 87, in <module>
    with open(sys.argv[1]) as summary:
FileNotFoundError: [Errno 2] No such file or directory: '/data2/crisprme_test/Results/sg1617_gnomad_chr22_top/.sg1617_gnomad_chr22_top.PopulationDistribution_CFD.txt'
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/crisprme/opt/crisprme/PostProcess/populations_distribution.py", line 87, in <module>
    with open(sys.argv[1]) as summary:
FileNotFoundError: [Errno 2] No such file or directory: '/data2/crisprme_test/Results/sg1617_gnomad_chr22_top/.sg1617_gnomad_chr22_top.PopulationDistribution_CRISTA.txt'
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/crisprme/opt/crisprme/PostProcess/populations_distribution.py", line 87, in <module>
    with open(sys.argv[1]) as summary:
FileNotFoundError: [Errno 2] No such file or directory: '/data2/crisprme_test/Results/sg1617_gnomad_chr22_top/.sg1617_gnomad_chr22_top.PopulationDistribution_fewest.txt'

Y Chromosome 1000G Project

Hi CRISPRme Team,
Quick question: I noticed the installation scripts do not download the 1000G Project VCF for the Y chromosome.
Is this expected behavior? Thank you so much!

Update Dockerfile to pull v2.1.2

Update and Fix Dockerfile:

  • Pull CRISPRme v2.1.2 instead of v2.1.1
  • Replace conda with mamba (use mamba miniforge image)
  • Force CRISPRitz = v2.6.6
  • Update crisprme_auto_test_docker.sh
  • Test new build

Run with converted gnomAD v4.0.0 dataset fails

Running against the gnomAD v4.0.0 (converted with CRISPRme) fails in the Integrating Results phase.

Error message

Traceback (most recent call last):
File "/opt/conda/opt/crisprme/PostProcess/./resultIntegrator.py", line 492, in
if float(elem) == 0:
ValueError: could not convert string to float: 'rs635634'
CRISPRme ERROR: result integration failed (script: /opt/conda/opt/crisprme/PostProcess/post_process.sh line 45)
CRISPRme ERROR: postprocessing failed - reference (script: /opt/conda/opt/crisprme/PostProcess/submit_job_automated_new_multiple_vcfs.sh line 848)

Some details

I ran a guide with a set of parameters against the hg38_1000G VCFs successfully and wanted to also run the same against gnomAD VCFs. I downloaded the VCFs and converted them and then ran again with the same parameters but updated the sampleIDs to be the gnomAD sample IDs and VCFs to be the gnomAD VCFs. I get the error that is above. I am running on a fresh Ubuntu VM with 128 GB of RAM against the latest docker. I have reproduced the error twice.

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x90 in position 48:cluster.dict.py

I have faced the problems
hg38+hg38_1000G_test_20bp-NGG-SpCas9.txt_guides.txt_gencode_encode.hg38.bed_6_2_2_chrX_KI270881v1_alt.total.cluster.txt.tmp_sort.txt file have specific symbols
<90> in third columns. And I had faced with error blows,I want to know if this problem can be solved,any ideas about how to solve this problem would be very grateful.
image

run command line

crisprme.py complete-search --genome Genomes/hg38 --vcf list_vcf.txt/ --guide sg1617.txt --pam PAMs/20bp-NGG-SpCas9.txt --annotation Annotations/gencode_encode.hg38.bed --samplesID list_samplesID.txt --gene_annotation Gencode/gencode.protein_coding.bed --bMax 2 --mm 6 --bDNA 2 --bRNA 2 --merge 3 --output sg1617.6.2.2_new --thread 58

8. Annotation

I download the docker crispr using command line:
docker pull pinellolab/crisprme
file format
the third column have specific symbols,<90>
image
Any help tracing this down would be helpful, I am looking forward to running the tool on a real sequence!

Let me know if any other information would be helpful...
khl

CRISPRme output

Hi,

I ran CRISPRme for a guide sequence and included 1000 Genome vcf files to find out how variants effect the OT sites. I found some sites that didn't have any variants mapping but were reported as having lower number of mismatches/bulges in "MMBLG_Mismatches | MMBLG_Bulge_Size | MMBLG_Total" columns compared to "Mismatches | Bulge_Size | Total" columns.
The OT sequence is same in "DNA" and "MMBLG_DNA" columns but the alignment to guide is different.
aggCACTAG-aTTGACaCACAGG vs aggCACTAGA-TTGACaCACAGG
All variation related rows have "n" value for this example, so there is no variant mapping to this genomic region.
Is this a bug or should I interpret the results in a different way?

Thank you,
Meltem

Question about altMerge output

Hi there,

I have a question about how *.altMerge.txt is generated. We're interested in this file because we need an exhaustive list of all variants that could result in a hit at a specific locus, not just the top-scoring one.

I ran CRISPRme with the gnomAD data (6 mm, 1 bulge). I notice that for each cluster, the first 24 columns are identical, but have different MMLBG* columns. They also have identical CRISTA* columns.

Could you shed some light as to how this file is generated? Should all the sites contained in the MMBLG* be considered alternate sites?

Thanks

Testing docker application fails

Dear Samuele,

thank you for such an interesting tool!
I have tried to run it on docker but the I am unable to successfully test the docker application.

This is the error message I get:
<3>WSL (4903) ERROR: CreateProcessEntryCommon:577: execvpe /bin/bash failed 2
<3>WSL (4903) ERROR: CreateProcessEntryCommon:586: Create process not expected to return

Docker is fully functional and working with other images. I have stuck to

Thank you in advance for your help
Malte

PAM input format

Hi,

What is the format for the file containing the pam, eg. PAMs/20bp-NGG-spCas9.txt. I don't seem to find a description of this file.

Thanks!

Position and Cluster_Position

Hi!

Thanks for the quick reply on the other issues! I managed to run it successfully on chr22.

I am inspecting output.bestMerge.txt. What does Position correspond to? and what is the difference with Cluster_Position?
How can I get start and end coordinates of the sequence in DNA?

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.