Giter Club home page Giter Club logo

modulome-workflow's People

Contributors

avsastry avatar kevin-rychel avatar sapoudel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

modulome-workflow's Issues

nextflow jumps to last process

I had a couple of runs recently that jumped to the last process (assemble_tmp) or the one before (multiqc) without running the rest of the required rules. As far as i can tell, it's because of the .ifEmpty([]) statement in the input lines. It looks like they are creating a bypass to generating the output from previous processes (maybe when there are too many cores/samples). What is the reason for having the ifEmpty step?

Convert QC/QA notebooks to scripts

We should try to minimize notebook usage as much as possible. They can lead to issues if the cells are not run the in the exact order and they require more manual work when compared to scripts. We can convert the QC/QA notebooks to scripts that outputs everything the user needs to know to run QC/QA (e.g. cluster figure, pearson correlation between replicates etc.) and the user can change the input parameters to the script to change QC thresholds.

Unfinished OptICA step

I am having an issue both on a local macbook, and a virtual linux machine on Azure, where the OptICA step does not finish. It seems to be 'hanging' indefinetely. For instance I ran this on a dataset of 164 samples:
bash ./run_ica.sh -n 16 -o ../data/interim/ -v ../data/processed_data/log_tpm_norm.csv

Here is the output, where it hangs:

Computing dimension 160 of 164

##################################

Setting up...
0.25 seconds elapsed

Running ICA...
Completed run 1 of 7 on Processor 0
2.10 minutes elapsed
Completed run 2 of 7 on Processor 0
2.08 minutes elapsed
Completed run 3 of 7 on Processor 0
1.85 minutes elapsed
Completed run 4 of 7 on Processor 0
1.58 minutes elapsed
Completed run 5 of 7 on Processor 0
2.07 minutes elapsed
Completed run 6 of 7 on Processor 0
52.93 seconds elapsed
Completed run 7 of 7 on Processor 0
1.60 minutes elapsed

All ICA runs complete!
12.33 minutes elapsed

So I get the A and M files for dimension 150 in this case, but not for 160. I get the same issue doing this as well, where dimension 152 does not complete:
bash ./run_ica.sh -n 16 -m 152 -s 2 -o ../data/interim/ -v ../data/processed_data/log_tpm_norm.csv

Thanks for any help!
/Mathias

Details of machine:
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 20.04.6 LTS
Release: 20.04
Codename: focal

Linux avm-sdt-nilmat-ica 5.15.0-1054-azure #62~20.04.1-Ubuntu SMP Wed Jan 17 12:22:56 UTC 2024 x86_64 GNU/Linux

Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 46 bits physical, 57 bits virtual
CPU(s): 32
On-line CPU(s) list: 0-31
Thread(s) per core: 2
Core(s) per socket: 16
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 106
Model name: Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GH

Memory:
total used free shared buff/cache available
Mem: 128756 3437 122362 6 2956 124229

error during step 2 (processing of raw data)

Can someone please help with this error?

N E X T F L O W ~ version 23.10.1
Launching main.nf [scruffy_wright] DSL2 - revision: 53840c131a
ERROR ~ No signature of method: groovyx.gpars.dataflow.DataflowBroadcast.into() is applicable for argument types: (Script_e1bcc410eabc93ca$_runScript_closure1) values: [Script_e1bcc410eabc93ca$_runScript_closure1@50a1af86]
Possible solutions: find(), any(), bind(java.lang.Object), with(groovy.lang.Closure), print(java.io.PrintWriter), print(java.lang.Object)

-- Check script 'main.nf' at line: 65 or see '.nextflow.log' file for more details

`fasterq-dump` image needs an update

It seems like the NCBI APIs are incompatible with the current version of sratools (I think), so the latter must be updated in the fasterq-dump container.

Replacing the version number worked for me:

FROM ubuntu:18.04

# Metadata
MAINTAINER Anand Sastry <[email protected]>

# Set noninteractive mode
ENV DEBIAN_FRONTEND noninteractive

# Install pigz and sra-toolbox
USER root
RUN apt-get update && apt-get install -y procps pigz wget libxml-libxml-perl
RUN wget -q http://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/3.0.6/sratoolkit.3.0.6-ubuntu64.tar.gz -O /tmp/sratoolkit.tar.gz && tar zxf /tmp/sratoolkit.tar.gz -C /opt/ && rm /tmp/sratoolkit.tar.gz

RUN mkdir //ncbi && mkdir //ncbi/public && mkdir //ncbi/public/sra && mkdir //ncbi/public/refseq && chmod -R 777 //ncbi

ENV PATH="/opt/sratoolkit.3.0.6-ubuntu64/bin/:${PATH}"

"Pre-requisite software" not listed anywhere

Your READ.Me states that all "Pre-requisite software" required for using the workflows without docker is listed under each respective workflow.
However no such list is then given anywhere. All simply refer to your docker.
Could you please provide a list of dependencies for installing this WITHOUT using docker?

Issue with RNA-seq data processing(Step-2)

I am facing an error during the 2nd step of the pipeline. I suppose the error is related to prefetch and fasterq-dump while fetching data from SRA. Also, I would like to mention that I am using version 22.10.8 of nextflow for running the pipeline as I face errors with the latest version of the same.
It would be great if someone could help with the following error.

sudo ./nextflow run main.nf -profile local --organism mycobacterium_abscessus --metadata mab.tsv --sequence_dir sequence_dir/ --outdir results
N E X T F L O W ~ version 22.10.8
Launching main.nf [reverent_bartik] DSL1 - revision: ef90b5fca3
executor > local (13)
[27/ad4b19] process > bowtie_build [100%] 1 of 1 ✔
[8d/7bec19] process > gff2bed [100%] 1 of 1 ✔
[56/467f86] process > download_fastq (14) [ 1%] 5 of 306, failed: 5, retries: 5
[- ] process > stage_fastq_single -
executor > local (13)
[27/ad4b19] process > bowtie_build [100%] 1 of 1 ✔
[8d/7bec19] process > gff2bed [100%] 1 of 1 ✔
[56/467f86] process > download_fastq (14) [ 1%] 5 of 306, failed: 5, retries: 5
[- ] process > stage_fastq_single -
executor > local (14)
[27/ad4b19] process > bowtie_build [100%] 1 of 1 ✔
[8d/7bec19] process > gff2bed [100%] 1 of 1 ✔
[41/5bd9cd] process > download_fastq (20) [ 1%] 6 of 307, failed: 6, retries: 6
[- ] process > stage_fastq_single -

The pass fail pie chart is flipped

The pie chart in expression_QC_part1 showing final pass/fail is flipped. Can be easily fixed by changing the list passed to reindex function.

_,_,pcts = plt.pie(pass_qc.value_counts().reindex([False,True]),
        labels = ['Failed','Passed'],
        colors=['tab:red','tab:blue'],
        autopct='%.0f%%',textprops={'size':16});

Use JSON file as input metadata

Use JSON file instead of csv/tsv as input metadata file.

Just talked to some of the people at DTU who developed anti-smash and they mentioned that using csv files can lead to unexpected outcomes/ errors that may be harder to catch when you start scaling your pipeline. Some of our most common errors arise from using this format. We should consider switching to JSON instead. This will require lots of changes:

  1. Integrate json into Nextflow
  2. Allow users to manually add things that are converted to json (maybe something like ALE sheets)
  3. Add checks on data types

Error executing process > 'multiqc (1)'

nextflow run main.nf -profile local --organism bacillus_subtilis --metadata ../test/test_metadata.tsv --sequence_dir ../test/sequence_files/ --outdir ../test/nf_results/

Error executing process > 'multiqc (1)'

Caused by:
Process multiqc (1) terminated with an error exit status (125)

Command executed:

multiqc -f -c multiqc_config.yaml .
assemble_qc_stats.py multiqc_data

Command exit status:
125

Command output:
(empty)

Command error:
Unable to find image 'avsastry/multiqc-rockhopper:1.0' locally
docker: Error response from daemon: pull access denied for avsastry/multiqc-rockhopper, repository does not exist or may require 'docker login': denied: requested access to the resource is denied.
See 'docker run --help'.

Any idea regarding the source of this error?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.