mdu-phl / bohra Goto Github PK
View Code? Open in Web Editor NEWA pipeline for bioinformatics analysis of bacterial genomes
License: GNU General Public License v3.0
A pipeline for bioinformatics analysis of bacterial genomes
License: GNU General Public License v3.0
Skip over isolates where no mlst is found gracefully. Thanks to @willpitchers for finding
line 404
In the summary table
Add in a setup command for the setup of databases and to check that all files and deps are installed
Would be useful to report the minaln but would suggest responsibiility on user for them to think about what makes sense for them. Can instead state in github / show in case studies what standard/ recommended threshold for inclusion based on core min aln (if doing large large scale pop genomics). Add an example config to github for snippy
Check all binary and database deps exist
And maybe their versions / names
bohra run
[INFO:03/05/2020 05:13:47 PM] Starting bohra pipeline using /home/linuxbrew/.linuxbrew/bin/bohra run
[INFO:03/05/2020 05:13:47 PM] You are running bohra in preview mode.
Traceback (most recent call last):
File "/home/linuxbrew/.linuxbrew/bin/bohra", line 8, in <module>
sys.exit(main())
File "/home/linuxbrew/.linuxbrew/opt/python/lib/python3.7/site-packages/bohra/bohra.py", line 136, in main
args.func(args)
File "/home/linuxbrew/.linuxbrew/opt/python/lib/python3.7/site-packages/bohra/bohra.py", line 32, in run_pipeline
R = RunSnpDetection(args)
File "/home/linuxbrew/.linuxbrew/opt/python/lib/python3.7/site-packages/bohra/SnpDetection.py", line 65, in __init__
self.log_messages('warning', 'Input file can not be empty, please set -i path_to_input to try again')
AttributeError: 'RunSnpDetection' object has no attribute 'log_messages'
Snakemake parameter
Bohra passed args.cpus to SnpDetection object self.cpus during the initial
Then, there is a func name: set_snakemake_jobs() to double check whether the self.cpus is over the limitation:
SnpDetection.py#L141
def set_snakemake_jobs(self):
'''
set the number of jobs to run in parallel based on the number of cpus from args
'''
if int(self.cpus) < int(psutil.cpu_count()):
self.jobs = self.cpus
else:
self.jobs = 1
However, in the final command line to run snakemake file, it does not use self.jobs.
if self.cluster:
cmd = f"{self.cluster_cmd()} -s {snake_name} -d {wd} {force} {singularity_string} --latency-wait 1200"
else:
cmd = f"snakemake {dry} -s {snake_name} {singularity_string} -j {self.cpus} -d {self.job_id} {force} --verbose 2>&1"
A good convention is to put at the top of each *.py
file or module:
import logging
logging.basicConfig(format='[%(asctime)s] %(message)s', datefmt='%m/%d/%Y %I:%M:%S %p', level=logging.INFO)
logger = logging.getLogger(__name__)
Then call logger
wherever needed (e.g., logger.info("Starting up")
)
The Unix standard for dry run is --dry-run
and the synonym -n
if needed.
bohra/bohra/utils/iqtree_generator.sh
Line 4 in 33671b8
double-N please :)
handful of isolates that we map to a reference and extract reads aligning to a small 5kb region.
I find that if I run e.g.
bohra run --input_file isolates.tab --job_id PA_20200115_1 --reference ref.fa --mask phastaf/phage.bed -mdu -n
...and get the warning message:
[WARNING:01/15/2020 02:59:44 PM] This may be a re-run of an existing job. Please try again using rerun instead of run OR use -f to force an overwrite of the existing job.
[WARNING:01/15/2020 02:59:44 PM] Exiting....
...then running the same command with -f
added only repeats the same error message.
push to pypi
Ideally ~SNPs and evol dist
Ideally want these:
Maybe these
line 693
When a report.toml is too large, the writing of the report.html from this file is waaaaay to slow - to the point of not being possible (tested on >1000 isolates).
if report dir found should rename or remove and start again.
It would be useful to include one or more ref genomes in preview mode.
It is a convention to have the tests
folder outside the module
folder. You can modify the tasks.py
to run tests before packaging, and stop if one or more tests
fail.
File "/home/linuxbrew/.linuxbrew/bin/bohra", line 8, in <module>
sys.exit(main())
File "/home/linuxbrew/.linuxbrew/opt/python/lib/python3.7/site-packages/bohra/bohra.py", line 136, in main
args.func(args)
File "/home/linuxbrew/.linuxbrew/opt/python/lib/python3.7/site-packages/bohra/bohra.py", line 33, in run_pipeline
return(R.run_pipeline())
File "/home/linuxbrew/.linuxbrew/opt/python/lib/python3.7/site-packages/bohra/SnpDetection.py", line 920, in run_pipeline
self.index_reference()
File "/home/linuxbrew/.linuxbrew/opt/python/lib/python3.7/site-packages/bohra/SnpDetection.py", line 612, in index_reference
if '.fa' not in self.ref:
TypeError: argument of type 'PosixPath' is not iterable```
The offending line should be:
if 'self.ref.match("*.fa*):
Swap in seqtk
in place of fq
Update README.md for installation options
Include pip, singularity, conda and brew?
File "/home/linuxbrew/.linuxbrew/bin/bohra", line 11, in <module>
load_entry_point('bohra==1.0.3', 'console_scripts', 'bohra')()
File "/home/linuxbrew/.linuxbrew/opt/python/lib/python3.7/site-packages/bohra/bohra.py", line 96, in main
parser.print_help(sys.stderr)
NameError: name 'parser' is not defined
nulla2bohra Ensure that bohra can be rerun over an existing
nullarbor folder. Can also be used to update older
bohra directories. Must supply name of nullarbor
directory, and your isolates.tab file
can you use nullarbor/input.tab
so they only provide the folder?
it is a copy of their original file when they ran it last
You either get None
or the full path to the executable. You could easily then allow the user to supply a path to the executable if it is not in $PATH
for some reason.
You will, of course, still need to use subprocess.run
to get the exact version of the tool.
You can then be a bit clever with regex to parse out versions and use the packaging package to compare:
import re
from packaging import version
version_pat = re.compile(r'\bv?(?P<major>[0-9]+)\.(?P<minor>[0-9]+)\.(?P<release>[0-9]+)(?:\.(?P<build>[0-9]+))?\b')
version = "snippy v3.2.1"
m = version_pat.search(version)
# you can access individual components
m.group("major") # "3"
# the whole matching string
m.group() # v3.2.1
# as a dictionary
m.groupdict() # {'major': '3', 'minor': '2', 'release': '1', 'build': None}
# or tuple
m.groups() # ('3', '2', '1', None)
# the packing package offers some comparison tools and it comes with
# setuptools, so no additional requirements needed
min_version = version.parse("v3.2.3")
version.parse(m.group()) >= min_version # False
min_version = version.parse("v3.2.0")
version.parse(m.group()) >= min_version # True
Add some docstrings to the top of these two files (at least, I would like to see it in all of them). That will help in self-documenting later. Ideally, all functions and class definitions would have one too. They don't have to be long. I generally try to start every function/class definition by writing down the docstrings what the element will do, and then what parameters it will take and what output to expect. That helps in organising things, and you can quickly see if the function is trying to do too much.
You can follow the sphinx model and then add sphinx to the tasks.py
to automatically generate some docs too. https://pythonhosted.org/an_example_pypi_project/sphinx.html
--workdir WORKDIR, -w WORKDIR
Working directory, default is current directory
(default: /home/linuxbrew)
If this is meant to be fast scratch, please use os.tempdir
instead, which will use $TMPDIR
.
This will allow Kraken2 to play nicer with the Snkakemake scheduler.
Ion Torrent are single end reads (1 fast file) and need different settings in BWA MEM normally.
Do you really support them?
make dependencies brew and conda installable
main()
should be short, and really just call on other functions.
Add a zoom functionality to the tree
Hi Kristy,
My command to run bohra
bohra run -c 8 -i ids.tab -j CRL_20210120_ -r GCF_001548355.1_JKo3.fna -p sa -mdu -ma 0 -mc 0
Josh
Add in example reports for each type of pipeline
Add the keyword __version__ = "1.0.1"
to __init__.py
(note version number is a string). You can then import the bohra
into setup.py
and never have to modify the version in more than one place. Read the bumpversion
docs too.
This is good practice when deploying python packages.
You can add __author__
, __copyright__
and __license__
variables to __init__.py
too.
Modify the cextflow pipeline to allow for each process to have customisable resources
If input files are not accessible:
prefill
path not accessible for assembly and speciation default to running assembly or speciationAdd in paths to singularity containers and recipes
Create a little invoke
script to automatically version bump, generate the bundles, and push to PyPI. Example using bumpversion
(to automatically bump version) and twine
(to upload to PyPI) below.
You can add that to a file called tasks.py
and then just run inv
to push new versions to PyPI (after pip3 install invoke
). You could also make it a bash script, of course.
Read twine
README for some more background: https://github.com/pypa/twine
And, invoke
: http://www.pyinvoke.org/
'''
Automate deployment to PyPi
'''
import invoke
@invoke.task
def deploy_patch(ctx):
'''
Automate deployment
rm -rf build/* dist/*
bumpversion patch --verbose
python3 setup.py sdist bdist_wheel
twine upload dist/*
git push --tags
'''
ctx.run("rm -rf build/* dist/*")
ctx.run("bumpversion patch --verbose")
ctx.run("python3 setup.py sdist bdist_wheel")
ctx.run("twine check dist/*")
ctx.run("twine upload dist/*")
ctx.run("git push --tags")
add in a cluster.json and a flag for cluster with alternate snaemake command
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.