vanheeringen-lab / anansnake Goto Github PK
View Code? Open in Web Editor NEWAutomate long-running ANANSE analyses with snakemake
License: Apache License 2.0
Automate long-running ANANSE analyses with snakemake
License: Apache License 2.0
Hi.
I have being facing this error for many times with different data.
There is a log file under deseq2
folder:
Error: package ‘ggplot2’ could not be loaded
In addition: Warning message:
package ‘ggplot2’ was built under R version 4.2.3
Execution halted
I am not sure which environment it refers to because I have installed ggplot2 in the anansanke evn of conda.
Here is the message from command console.
Activating conda environment: .snakemake/conda/ce7d794d0661510a0c65df6077b773c0_
[Fri Mar 24 20:34:59 2023]
Error in rule deseq2:
jobid: 9
input: /mnt/d/32_publication_scANANSE/scANANSE/analysis/samplefile.tsv, /mnt/d/32_publication_scANANSE/scANANSE/analysis/RNA_Counts.tsv
output: /mnt/d/32_publication_scANANSE/scANANSE/analysis/deseq2/hg38-anansesnake_CD4-Naive_average.diffexp.tsv
log: /mnt/d/32_publication_scANANSE/scANANSE/analysis/deseq2/log_anansesnake_CD4-Naive_average.txt (check log file(s) for error details)
shell:
outdir=$(dirname /mnt/d/32_publication_scANANSE/scANANSE/analysis/deseq2/hg38-anansesnake_CD4-Naive_average.diffexp.tsv)
# for the log
mkdir -p $outdir
deseq2science anansesnake_CD4-Naive_average /mnt/d/32_publication_scANANSE/scANANSE/analysis/samplefile.tsv /mnt/d/32_publication_scANANSE/scANANSE/analysis/RNA_Counts.tsv $outdir --assembly hg38 > /mnt/d/32_publication_scANANSE/scANANSE/analysis/deseq2/log_anansesnake_CD4-Naive_average.txt 2>&1
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
@siebrenf @JGASmits
Hi,
I'm encountering problems with running the example file.
The error reads as follows:
could not find 'rna_samples' in /Users/pediatrics/Desktop/R_projects/scANANSE/example/rna_samples.tsv
SystemExit in line 29 of /Users/pediatrics/anaconda3/envs/anansnake/lib/python3.8/site-packages/anansnake/rules/configuration.smk:
1
File "/Users/pediatrics/anaconda3/envs/anansnake/lib/python3.8/site-packages/anansnake/Snakefile", line 4, in
File "/Users/pediatrics/anaconda3/envs/anansnake/lib/python3.8/site-packages/anansnake/rules/configuration.smk", line 29, in
Can either of you help me get past this problem.
i submit a job based on the anansesnake and doent seem like it's progressing. This is the log file of the job after 120 hours on 6 cores. Similar was after 40 hours using 40 cores.
Config
rna_samples : /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/samplefile.tsv
rna_tpms : /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/TPM.tsv
rna_counts : /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/RNA_Counts.tsv
atac_samples : /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/samplefile.tsv
atac_counts : /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/Peak_Counts.tsv
genome : hg38
result_dir : /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6
contrasts : ['anansesnake_B-cells_average', 'anansesnake_agg-type_average', 'anansesnake_NK-cells_average', 'anansesnake_Myeloid-cells_average']
database : gimme.vertebrate.v5.0
jaccard : 0.1
edges : 500000
padj : 0.05
plot_type : png
tmp_dir : None
Resources
mem_mb : 60000
_cores : 6
deseq2 : 1
Conditions
B-cells :
RNA-seq samples: ['B-cells']
ATAC-seq samples: ['B-cells']
average :
RNA-seq samples: ['average']
ATAC-seq samples: ['average']
agg-type :
RNA-seq samples: ['agg-type']
ATAC-seq samples: ['agg-type']
NK-cells :
RNA-seq samples: ['NK-cells']
ATAC-seq samples: ['NK-cells']
Myeloid-cells :
RNA-seq samples: ['Myeloid-cells']
ATAC-seq samples: ['Myeloid-cells']
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 6
Rules claiming more threads will be scaled down.
Provided resources: mem_mb=60000, deseq2=1
Job stats:
job count min threads max threads
------------- ------- ------------- -------------
all 1 1 1
binding 5 1 1
influence 4 1 1
maelstrom 1 6 6
motif2factors 1 6 6
network 5 1 1
pfmscorefile 1 6 6
plot 4 1 1
total 22 1 6
Select jobs to execute...
[Mon Mar 25 18:18:31 2024]
rule motif2factors:
input: /beegfs/desy/user/nourisaj/genomes/hg38
output: /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/gimme/hg38.gimme.vertebrate.v5.0.pfm
log: /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/gimme/log_hg38_m2f.txt
jobid: 5
reason: Missing output files: /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/gimme/hg38.gimme.vertebrate.v5.0.pfm
threads: 6
resources: tmpdir=/tmp
Activating conda environment: .snakemake/conda/b43fe9c7085f26662cb0116147fff2a2_
Activating conda environment: .snakemake/conda/b43fe9c7085f26662cb0116147fff2a2_
[Mon Mar 25 18:18:44 2024]
Finished job 5.
1 of 22 steps (5%) done
Select jobs to execute...
[Mon Mar 25 18:18:45 2024]
rule maelstrom:
input: /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/Peak_Counts.tsv, /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/gimme/hg38.gimme.vertebrate.v5.0.pfm, /beegfs/desy/user/nourisaj/genomes/hg38
output: /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/gimme/hg38-maelstrom
log: /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/gimme/log_hg38_maelstrom.txt
jobid: 25
reason: Missing output files: /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/gimme/hg38-maelstrom; Input files updated by another job: /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/gimme/hg38.gimme.vertebrate.v5.0.pfm
threads: 6
resources: tmpdir=/tmp, mem_mb=40000
Activating conda environment: .snakemake/conda/b43fe9c7085f26662cb0116147fff2a2_
Activating conda environment: .snakemake/conda/b43fe9c7085f26662cb0116147fff2a2_
[Mon Mar 25 19:20:17 2024]
Finished job 25.
2 of 22 steps (9%) done
Select jobs to execute...
[Mon Mar 25 19:20:17 2024]
rule pfmscorefile:
input: /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/Peak_Counts.tsv, /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/gimme/hg38.gimme.vertebrate.v5.0.pfm, /beegfs/desy/user/nourisaj/genomes/hg38
output: /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/gimme/pfmscorefile.tsv
log: /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/gimme/log_hg38_pfmscorefile.txt
jobid: 6
reason: Missing output files: /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/gimme/pfmscorefile.tsv; Input files updated by another job: /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/gimme/hg38.gimme.vertebrate.v5.0.pfm
threads: 6
resources: tmpdir=/tmp
Activating conda environment: .snakemake/conda/b43fe9c7085f26662cb0116147fff2a2_
[Mon Mar 25 19:56:53 2024]
Finished job 6.
3 of 22 steps (14%) done
Select jobs to execute...
[Mon Mar 25 19:56:53 2024]
rule binding:
input: /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/Peak_Counts.tsv, /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/gimme/hg38.gimme.vertebrate.v5.0.pfm, /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/gimme/pfmscorefile.tsv, /beegfs/desy/user/nourisaj/genomes/hg38
output: /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/binding/Myeloid-cells.h5
log: /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/binding/log_Myeloid-cells.txt
jobid: 23
benchmark: /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/benchmarks/binding_Myeloid-cells.txt
reason: Missing output files: /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/binding/Myeloid-cells.h5; Input files updated by another job: /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/gimme/pfmscorefile.tsv, /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/gimme/hg38.gimme.vertebrate.v5.0.pfm
wildcards: condition=Myeloid-cells
resources: tmpdir=/tmp, mem_mb=40000
Activating conda environment: .snakemake/conda/d744163a4690c04ba52f3bf00737fc7a_
slurmstepd: error: *** JOB 6311320 ON max-wn050 CANCELLED AT 2024-03-30T18:18:31 DUE TO TIME LIMIT ***```
Thanks for the work.
I am running the example provided. But I have one question: Can't it create .snakemake folder?
I checked the folder, and there are conda envs inside. The Downloading and installing remote packages
process can take more than 1 hour. And then it is most likely that it will throw an error, saying something like one R package is missing.
This has made me very blue because I do not which packages are going to be used.
Hello!
Is it absolutely essential to use mamba for this purpose?
I was trying to incorporate anansnake to run scANANSE but ran into the following error:
Error in rule maelstrom:
jobid: 40
input: /Users/pediatrics/Desktop/R_projects/scANANSE/analysis/Peak_Counts.tsv, /Users/pediatrics/Desktop/R_projects/scANANSE/analysis/gimme/mm10.gimme.vertebrate.v5.0.pfm, /Users/pediatrics/Desktop/R_projects/scANANSE/data/mm10
output: /Users/pediatrics/Desktop/R_projects/scANANSE/analysis/gimme/mm10-maelstrom
log: /Users/pediatrics/Desktop/R_projects/scANANSE/analysis/gimme/log_mm10_maelstrom.txt (check log file(s) for error message)
conda-env: /Users/pediatrics/.snakemake/conda/52b8cb6f4a78ac804c0afe54b1ecb2c2_
RuleException:
CalledProcessError in line 80 of /Users/pediatrics/anaconda3/envs/anansnake/lib/python3.8/site-packages/anansnake/rules/gimme.smk:
Command 'source /Users/pediatrics/anaconda3/envs/anansnake/bin/activate '/Users/pediatrics/.snakemake/conda/52b8cb6f4a78ac804c0afe54b1ecb2c2_'; set -euo pipefail; python /Users/pediatrics/.snakemake/scripts/tmpj_rlppgz.maelstrom.py' returned non-zero exit status 1.
File "/Users/pediatrics/anaconda3/envs/anansnake/lib/python3.8/site-packages/anansnake/rules/gimme.smk", line 80, in __rule_maelstrom
File "/Users/pediatrics/anaconda3/envs/anansnake/lib/python3.8/concurrent/futures/thread.py", line 57, in run
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2023-08-23T152751.937220.snakemake.log
Can you help me understand where the problem lies?
Thank you!
Hello @JGASmits,
Although the dry run appears to have worked the real one did not.
Here is the log file:
(anansnake) iMac-Pro:anansnake pediatrics$ less Complete log: .snakemake/log/2023-08-31T143302.657342.snakemake.log
Complete: No such file or directory
log:: No such file or directory
Press RETURN to continue
plot_type : png
Resources
mem_mb : 48000
_cores : 12
deseq2 : 1
Conditions
group2 :
RNA-seq samples: ['1k-cell-1', '1k-cell-2', 'GSM1483740']
ATAC-seq samples: ['GSM3756606', 'GSM3756607', 'GSM3756608']
group1 :
RNA-seq samples: ['128-cell-1', '128-cell-2', 'GSM1483739']
ATAC-seq samples: ['GSM3756599', 'GSM3756600']
Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 12
Rules claiming more threads will be scaled down.
Provided resources: mem_mb=48000, deseq2=1
Job stats:
job count min threads max threads
all 1 1 1
binding 2 1 1
deseq2 2 1 1
influence 2 1 1
maelstrom 1 12 12
motif2factors 1 12 12
network 2 1 1
pfmscorefile 1 12 12
plot 2 1 1
total 14 1 12
Select jobs to execute...
[Thu Aug 31 14:33:05 2023]
rule motif2factors:
input: /Users/pediatrics/anansnake/GRCz11
output: /Users/pediatrics/anansnake/example/outdir/gimme/GRCz11.gimme.vertebrate.v5.0.pfm
log: /Users/pediatrics/anansnake/example/outdir/gimme/log_GRCz11_m2f.txt
jobid: 5
reason: Missing output files: /Users/pediatrics/anansnake/example/outdir/gimme/GRCz11.gimme.vertebrate.v5.0.pfm
threads: 12
resources: tmpdir=/var/folders/2c/zzjsgs_53vqflzjl28hf1x7r0000gn/T
Activating conda environment: .snakemake/conda/3f88efe941f72bcdb4d5867b0d6db92f_
Activating conda environment: .snakemake/conda/3f88efe941f72bcdb4d5867b0d6db92f_
[Thu Aug 31 14:33:36 2023]
Error in rule motif2factors:
jobid: 5
input: /Users/pediatrics/anansnake/GRCz11
output: /Users/pediatrics/anansnake/example/outdir/gimme/GRCz11.gimme.vertebrate.v5.0.pfm
log: /Users/pediatrics/anansnake/example/outdir/gimme/log_GRCz11_m2f.txt (check log file(s) for error message)
conda-env: /Users/pediatrics/anansnake/.snakemake/conda/3f88efe941f72bcdb4d5867b0d6db92f_
RuleException:
CalledProcessError in line 24 of /Users/pediatrics/anaconda3/envs/anansnake/lib/python3.8/site-packages/anansnake/rules/gimme.smk:
Command 'source /Users/pediatrics/anaconda3/envs/anansnake/bin/activate '/Users/pediatrics/anansnake/.snakemake/conda/3f88efe941f72bcdb4d5867b0d6db92f_'; set -euo pipefail; python /Users/pediatrics/anansnake/.snakemake/scripts/tmp5kcimtt8.motif2factors.py' returned non-zero exit status 1.
File "/Users/pediatrics/anaconda3/envs/anansnake/lib/python3.8/site-packages/anansnake/rules/gimme.smk", line 24, in __rule_motif2factors
File "/Users/pediatrics/anaconda3/envs/anansnake/lib/python3.8/concurrent/futures/thread.py", line 57, in run
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2023-08-31T143302.657342.snakemake.log
Your input would be much appreciated!
Thank you!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.