Giter Club home page Giter Club logo

bohra's People

Contributors

abcdtree avatar andersgs avatar kristyhoran avatar pansapiens avatar willpitchers avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bohra's Issues

add in a setup command

Add in a setup command for the setup of databases and to check that all files and deps are installed

Update CLI and config with snippy params

Would be useful to report the minaln but would suggest responsibiility on user for them to think about what makes sense for them. Can instead state in github / show in case studies what standard/ recommended threshold for inclusion based on core min aln (if doing large large scale pop genomics). Add an example config to github for snippy

"bohra run" no params is a traceback mess

bohra run
[INFO:03/05/2020 05:13:47 PM] Starting bohra pipeline using /home/linuxbrew/.linuxbrew/bin/bohra run
[INFO:03/05/2020 05:13:47 PM] You are running bohra in preview mode.
Traceback (most recent call last):
  File "/home/linuxbrew/.linuxbrew/bin/bohra", line 8, in <module>
    sys.exit(main())
  File "/home/linuxbrew/.linuxbrew/opt/python/lib/python3.7/site-packages/bohra/bohra.py", line 136, in main
    args.func(args)
  File "/home/linuxbrew/.linuxbrew/opt/python/lib/python3.7/site-packages/bohra/bohra.py", line 32, in run_pipeline
    R = RunSnpDetection(args)
  File "/home/linuxbrew/.linuxbrew/opt/python/lib/python3.7/site-packages/bohra/SnpDetection.py", line 65, in __init__
    self.log_messages('warning', 'Input file can not be empty, please set -i path_to_input to try again')
AttributeError: 'RunSnpDetection' object has no attribute 'log_messages'

Small miss used variable in bohra

Snakemake parameter
Bohra passed args.cpus to SnpDetection object self.cpus during the initial
Then, there is a func name: set_snakemake_jobs() to double check whether the self.cpus is over the limitation:
SnpDetection.py#L141
def set_snakemake_jobs(self):
'''
set the number of jobs to run in parallel based on the number of cpus from args
'''
if int(self.cpus) < int(psutil.cpu_count()):
self.jobs = self.cpus
else:
self.jobs = 1
However, in the final command line to run snakemake file, it does not use self.jobs.
if self.cluster:
cmd = f"{self.cluster_cmd()} -s {snake_name} -d {wd} {force} {singularity_string} --latency-wait 1200"
else:
cmd = f"snakemake {dry} -s {snake_name} {singularity_string} -j {self.cpus} -d {self.job_id} {force} --verbose 2>&1"

Best practice for `logging`

A good convention is to put at the top of each *.py file or module:

import logging

logging.basicConfig(format='[%(asctime)s] %(message)s', datefmt='%m/%d/%Y %I:%M:%S %p', level=logging.INFO)
logger = logging.getLogger(__name__)

Then call logger wherever needed (e.g., logger.info("Starting up"))

run/rerun `-f` flag not working

I find that if I run e.g.
bohra run --input_file isolates.tab --job_id PA_20200115_1 --reference ref.fa --mask phastaf/phage.bed -mdu -n
...and get the warning message:

[WARNING:01/15/2020 02:59:44 PM] This may be a re-run of an existing job. Please try again using rerun instead of run OR use -f to force an overwrite of the existing job.
[WARNING:01/15/2020 02:59:44 PM] Exiting....

...then running the same command with -f added only repeats the same error message.

rerun bug

if report dir found should rename or remove and start again.

Move `test` to top directory

It is a convention to have the tests folder outside the module folder. You can modify the tasks.py to run tests before packaging, and stop if one or more tests fail.

Bug when reference is FASTA

  File "/home/linuxbrew/.linuxbrew/bin/bohra", line 8, in <module>
    sys.exit(main())
  File "/home/linuxbrew/.linuxbrew/opt/python/lib/python3.7/site-packages/bohra/bohra.py", line 136, in main
    args.func(args)
  File "/home/linuxbrew/.linuxbrew/opt/python/lib/python3.7/site-packages/bohra/bohra.py", line 33, in run_pipeline
    return(R.run_pipeline())
  File "/home/linuxbrew/.linuxbrew/opt/python/lib/python3.7/site-packages/bohra/SnpDetection.py", line 920, in run_pipeline
    self.index_reference()
  File "/home/linuxbrew/.linuxbrew/opt/python/lib/python3.7/site-packages/bohra/SnpDetection.py", line 612, in index_reference
    if '.fa' not in self.ref:
TypeError: argument of type 'PosixPath' is not iterable```

The offending line should be:
if 'self.ref.match("*.fa*):

NameError: name 'parser' is not defined

  File "/home/linuxbrew/.linuxbrew/bin/bohra", line 11, in <module>
    load_entry_point('bohra==1.0.3', 'console_scripts', 'bohra')()
  File "/home/linuxbrew/.linuxbrew/opt/python/lib/python3.7/site-packages/bohra/bohra.py", line 96, in main
    parser.print_help(sys.stderr)
NameError: name 'parser' is not defined

nulla2bohra requirements

   nulla2bohra         Ensure that bohra can be rerun over an existing
                        nullarbor folder. Can also be used to update older
                        bohra directories. Must supply name of nullarbor
                        directory, and your isolates.tab file

can you use nullarbor/input.tab so they only provide the folder?
it is a copy of their original file when they ran it last

Use `shutil.which` instead of `subprocess.run` to find out executables

You either get None or the full path to the executable. You could easily then allow the user to supply a path to the executable if it is not in $PATH for some reason.

You will, of course, still need to use subprocess.run to get the exact version of the tool.

You can then be a bit clever with regex to parse out versions and use the packaging package to compare:

import re
from packaging import version
version_pat = re.compile(r'\bv?(?P<major>[0-9]+)\.(?P<minor>[0-9]+)\.(?P<release>[0-9]+)(?:\.(?P<build>[0-9]+))?\b')
version = "snippy v3.2.1"
m = version_pat.search(version)
# you can access individual components
m.group("major") # "3"
# the whole matching string
m.group() # v3.2.1
# as a dictionary
m.groupdict() # {'major': '3', 'minor': '2', 'release': '1', 'build': None}
# or tuple
m.groups() # ('3', '2', '1', None)

# the packing package offers some comparison tools and it comes with 
# setuptools, so no additional requirements needed
min_version = version.parse("v3.2.3")
version.parse(m.group()) >= min_version # False

min_version = version.parse("v3.2.0")
version.parse(m.group()) >= min_version # True

Add docstring to top `bohra.py` and `__init__.py`

Add some docstrings to the top of these two files (at least, I would like to see it in all of them). That will help in self-documenting later. Ideally, all functions and class definitions would have one too. They don't have to be long. I generally try to start every function/class definition by writing down the docstrings what the element will do, and then what parameters it will take and what output to expect. That helps in organising things, and you can quickly see if the function is trying to do too much.

You can follow the sphinx model and then add sphinx to the tasks.py to automatically generate some docs too. https://pythonhosted.org/an_example_pypi_project/sphinx.html

What is --workdir ?

  --workdir WORKDIR, -w WORKDIR
                        Working directory, default is current directory
                        (default: /home/linuxbrew)

If this is meant to be fast scratch, please use os.tempdir instead, which will use $TMPDIR.

"or Ion Torrent read sets."

Ion Torrent are single end reads (1 fast file) and need different settings in BWA MEM normally.

Do you really support them?

Add __version__ to __init__.py

Add the keyword __version__ = "1.0.1" to __init__.py (note version number is a string). You can then import the bohra into setup.py and never have to modify the version in more than one place. Read the bumpversion docs too.

This is good practice when deploying python packages.

You can add __author__, __copyright__ and __license__ variables to __init__.py too.

Process customisation

Modify the cextflow pipeline to allow for each process to have customisable resources

fail gracefully if files are not accessible

If input files are not accessible:

  1. If reads are not present or no permissions fail before running workflow
  2. If prefill path not accessible for assembly and speciation default to running assembly or speciation

Automate push to PyPI

Create a little invoke script to automatically version bump, generate the bundles, and push to PyPI. Example using bumpversion (to automatically bump version) and twine (to upload to PyPI) below.

You can add that to a file called tasks.py and then just run inv to push new versions to PyPI (after pip3 install invoke). You could also make it a bash script, of course.

Read twine README for some more background: https://github.com/pypa/twine

And, invoke: http://www.pyinvoke.org/

'''
Automate deployment to PyPi
'''

import invoke


@invoke.task
def deploy_patch(ctx):
    '''
    Automate deployment
    rm -rf build/* dist/*
    bumpversion patch --verbose
    python3 setup.py sdist bdist_wheel
    twine upload dist/*
    git push --tags
    '''
    ctx.run("rm -rf build/* dist/*")
    ctx.run("bumpversion patch --verbose")
    ctx.run("python3 setup.py sdist bdist_wheel")
    ctx.run("twine check dist/*")
    ctx.run("twine upload dist/*")
    ctx.run("git push --tags")

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.