gagneurlab / mmsplice_mtsplice Goto Github PK

View Code? Open in Web Editor NEW

39.0 39.0 22.0 84.27 MB

Tissue-specific variant effect predictions on splicing

License: MIT License

Makefile 0.60% Python 35.47% Jupyter Notebook 60.74% Perl 2.92% Dockerfile 0.27%

machine-learning splicing variant-effect-prediction vep-plugin

mmsplice_mtsplice's People

Contributors

Stargazers

Watchers

mmsplice_mtsplice's Issues

running predictions with tensorflow is slow

MMSplice version: 2.3.0
Python version: 3.9.7
Operating System: Ubuntu 20.04.3 LTS

Description

Running mmsplice on a GPU-enabled machine is very slow. I have a NVIDIA RTX A5000 with 24 GB memory and running mmsplice is 10x slower than running on CPU with 10 cores. Has anyone benchmarked the GPU speedups?

I am running the latest drivers and cuda version 11.6. Tensorflow detects the GPU just fine.

What I Did

start_time = time.time()
gtf = 'gtf_file_coding.gtf'
dl = SplicingVCFDataloader(gtf, fasta, vcf)
model = MMSplice()
output_csv = 'preds.csv'
predict_save(model, dl, output_csv, pathogenicity=True, splicing_efficiency=True)  # also used higher batch size
print("Seconds since epoch with GTF =", seconds)
df = pd.read_csv(output_csv)
df = max_varEff(df)
df.to_csv('preds_max.csv')
print('TOTAL EXECUTION TIME ...')
print("--- %s seconds ---" % (time.time() - start_time))

Problem combining MMSplice module with custom annotation in VEP

MMSplice version: VEP plugin, git clone Aug 12 2020
Python version: Python 2.7.16
Operating System: MacOS High Sierra 10.13.6

Description

I can run VEP in offline mode with the MMSplice plugin, but if I try to add more annotation from a custom file on the same command line VEP starts running MMSplice, prints the vcf header only and then ends without any more output or error message. Combining another plugin such as Ensembl Conservation with the same custom annotation works fine. Would be great if I could get all the annotation in there in one go...

What I Did

This seems to work and produces an annotated output file:

vep -i test.vcf -o test.out --vcf --force --offline --assembly GRCh38 --plugin MMSplice --fasta ~/.vep/homo_sapiens/100_GRCh38/Homo_sapiens.GRCh38.dna.toplevel.fa.gz

This does not work and produces an output file that only contains the header section:

vep -i test.vcf -o test.out --vcf --force --offline --assembly GRCh38 --custom hg38.phyloP100way.bw,PhyloP,bigwig --plugin MMSplice --fasta ~/.vep/homo_sapiens/100_GRCh38/Homo_sapiens.GRCh38.dna.toplevel.fa.gz

By contrast, this works as expected:

vep -i test.vcf -o test.out --vcf --force --offline --assembly GRCh38 --custom hg38.phyloP100way.bw,PhyloP,bigwig

And so does:

vep -i test.vcf -o test.out --vcf --force --offline --assembly GRCh38 --plugin Conservation,gerp_conservation_scores.homo_sapiens.GRCh38.bw --custom hg38.phyloP100way.bw,PhyloP,bigwig

MMSplice output has only 5 columns

BUG: VCF requirements not clearly stated & wrong seqnames derived

Firstly thx for your amazing work.
I really like the concept behind your tool :)

Now concerning the bug:
I worked with a fresh Colab notebook

MMSplice version:
Installation with pip: mmsplice-2.4.0
Python version:
Python 3.10.13
Operating System:
Colab notebook

Description

I wanted to try out mmsplice, so I've downloaded your example data and gave it a go.
But even though I used your test data, I constantly received the following error:
ValueError: Fasta chrom names do not match with vcf chrom names.

After hours of wrapping my head around and hacking with the package code I found out:

Not only a VCF-file is required as input, but also its index version (i.e. vcf.gz.tbi).
That was never stated (check your ReadMe)...
Please fix that.
Also, when hacking around, I realized the parsing of the seqnames seems buggy.

It only parses seqnames when an indexed VCF file is provided
But it also always includes {'1'} as seqname for the VCF, no matter what is provided?!

# In your vcf_dataloader.py
def _check_chrom_annotation(self):
        # I've added these two lines
        fasta_chroms = set(self.fasta.fasta.keys())
        vcf_chroms = set(self.vcf.seqnames)

        print("fasta: ", fasta_chroms, flush=True)
        print("vcf_chroms", vcf_chroms, flush=True)
        if not fasta_chroms.intersection(vcf_chroms):
            raise ValueError(
                'Fasta chrom names do not match with vcf chrom names')

--> Output:
fasta:  {'17'}
vcf_chroms {'1', '17'}
...

The VCF seqnames should only include 17, since I've just provided your example.

What I Did

Take a look:
https://colab.research.google.com/drive/1hx6PAYT_lKuEYtnHCq0PN2lyvNBNDud1?usp=sharing

Controlling the number of threads

MMSplice version: 2.0.0
Python version: 3.6
Operating System: Ubuntu

Is there any way to control the number of used threads in MMSplice?
Setting OMP_NUM_THREADS seems to control tensorflows thread number,
however MMSplice still uses all available cores.

All the best & Thanks,
Tim

Interpretation of Results

Is there somewhere with more information on the output of MMsplice, specifically the pathogenicity and efficiency score? What do these numbers represent? What is a reasonable cutoff to say that a variant is likely to affect efficiency significantly or likely to be pathogenic? Thanks in advance.

writeVCF not implemented. I wrote some implementation

columns = [
'gene_name',
'transcript_id',
'exons',
'ref_exon',
'alt_exon',
'ref_donor',
'alt_donor',
'ref_acceptor',
'alt_acceptor',
'ref_acceptorIntron',
'alt_acceptorIntron',
'ref_donorIntron',
'alt_donorIntron',
'delta_logit_psi',
'pathogenicity',
'efficiency'
]

def writeVCF(vcf_in, vcf_out, predictions):
from cyvcf2 import VCF, Writer
vcf = VCF(vcf_in)
vcf.add_info_to_header({
'ID': 'mmsplice',
'Description': 'MMSplice splice variant effect. Format:' + '|'.join(columns),
'Type': 'Character',
'Number': '.'
})
w = Writer(vcf_out, vcf)

for var in vcf:
    ID = f"{var.CHROM}:{var.POS}:{var.REF}:{var.ALT}"
    pred = predictions[predictions.ID == ID]
    if pred is not None:
        pred_4_var = [
            '|'.join([row[k] for k in columns[:3]]) + '|' +
            '|'.join([format(row[k], ".3f") for k in columns[3:]])
            for ind, row in pred.iterrows()
        ]
        var.INFO['mmsplice'] = '&'.join(pred_4_var)
    w.write_record(var)

FAILED to annotate in VEP

MMSplice version:v0.2.7
Python version:3.6
Operating System:docker（ubuntu 18.04）

Description

I install mmsplice based on the ensemblorg/ensembl-vep image, and with following
commands

apt-get update && apt-get install -y software-properties-common && \
    add-apt-repository -y ppa:jonathonf/python-3.6  && apt-get update \
    python3.6 \
    python3.6-dev

git clone git://github.com/gagneurlab/MMSplice.git && \
  cd MMSplice && git checkout v0.2.7 && cd .. && \
  cp MMSplice/VEP_plugin/MMSplice.pm ./Plugins/

and put ./Plugins into the PERL5LIB environment.

And try to use vep to do the annotation as the following command.

What I Did

vep --format vcf --cache --offline --force_overwrite --fork 20 --sift b --polyphen b --numbers --biotype --total_length --canonical --ccds --hgvs   -q --refseq --offline --vcf --af_1kg --af --pubmed --plugin MMSplice --af_gnomad --dir_cache {database}   --input_file variants.vcf --output_file vep.vcf  --fasta human_g1k_v37.fasta

It has following INFO:

Using TensorFlow backend.
WARNING: Logging before flag parsing goes to stderr.
W0816 02:18:26.399252 140276516513600 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/optimizers.py:790: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.
/usr/local/lib/python3.6/dist-packages/sklearn/externals/joblib/__init__.py:15: DeprecationWarning: sklearn.externals.joblib is deprecated in 0.21 and will be removed in 0.23. Please import this functionality directly from joblib, which can be installed with: pip install joblib. If this warning is raised when loading pickled models, you may need to re-serialize those models with scikit-learn 0.21+.
  warnings.warn(msg, category=DeprecationWarning)
/usr/local/lib/python3.6/dist-packages/sklearn/base.py:306: UserWarning: Trying to unpickle estimator HuberRegressor from version 0.19.2 when using version 0.21.3. This might lead to breaking code or invalid results. Use at your own risk.
  UserWarning)
/usr/local/lib/python3.6/dist-packages/sklearn/base.py:306: UserWarning: Trying to unpickle estimator StandardScaler from version 0.19.2 when using version 0.21.3. This might lead to breaking code or invalid results. Use at your own risk.
  UserWarning)
/usr/local/lib/python3.6/dist-packages/sklearn/base.py:306: UserWarning: Trying to unpickle estimator LogisticRegression from version 0.19.2 when using version 0.21.3. This might lead to breaking code or invalid results. Use at your own risk.
  UserWarning)
/usr/local/lib/python3.6/dist-packages/sklearn/base.py:306: UserWarning: Trying to unpickle estimator Pipeline from version 0.19.2 when using version 0.21.3. This might lead to breaking code or invalid results. Use at your own risk.
  UserWarning)
W0816 02:18:26.448383 140276516513600 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:95: The name tf.reset_default_graph is deprecated. Please use tf.compat.v1.reset_default_graph instead.
W0816 02:18:26.448706 140276516513600 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:98: The name tf.placeholder_with_default is deprecated. Please use tf.compat.v1.placeholder_with_default instead.
W0816 02:18:26.453226 140276516513600 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:102: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.
W0816 02:18:26.458684 140276516513600 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.
W0816 02:18:26.462359 140276516513600 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.
2019-08-16 02:18:26.540076: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2019-08-16 02:18:26.637516: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2095085000 Hz
2019-08-16 02:18:26.638957: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x4731600 executing computations on platform Host. Devices:
2019-08-16 02:18:26.638996: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
2019-08-16 02:18:26.768802: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set.  If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU.  To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.
/usr/local/lib/python3.6/dist-packages/keras/engine/saving.py:292: UserWarning: No training configuration found in save file: the model was *not* compiled. Compile it manually.
  warnings.warn('No training configuration found in save file: '
W0816 02:18:27.275322 140276516513600 deprecation.py:506] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:3445: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
W0816 02:18:27.928400 140276516513600 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/nn_impl.py:180: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
/usr/local/lib/python3.6/dist-packages/keras/engine/saving.py:292: UserWarning: No training configuration found in save file: the model was *not* compiled. Compile it manually.
  warnings.warn('No training configuration found in save file: '
/usr/local/lib/python3.6/dist-packages/keras/engine/saving.py:292: UserWarning: No training configuration found in save file: the model was *not* compiled. Compile it manually.
  warnings.warn('No training configuration found in save file: '
Traceback (most recent call last):
Uncaught exception from user code:

	-------------------- EXCEPTION --------------------
	MSG:
	ERROR: Forked process(es) died: read-through of cross-process communication detected

	STACK Bio::EnsEMBL::VEP::Runner::_forked_buffer_to_output /opt/vep/src/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:556
	STACK Bio::EnsEMBL::VEP::Runner::next_output_line /opt/vep/src/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:361
	STACK Bio::EnsEMBL::VEP::Runner::run /opt/vep/src/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:202
	STACK toplevel /opt/vep/src/ensembl-vep/vep:223
	Date (localtime)    = Fri Aug 16 02:18:52 2019
	Ensembl API version = 97
	---------------------------------------------------
	Bio::EnsEMBL::Utils::Exception::throw("\x{a}ERROR: Forked process(es) died: read-through of cross-proces"...) called at /opt/vep/src/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm line 556
	Bio::EnsEMBL::VEP::Runner::_forked_buffer_to_output(Bio::EnsEMBL::VEP::Runner=HASH(0x5615ec5ef8a0), Bio::EnsEMBL::VEP::InputBuffer=HASH(0x5615ed5e4cc8), undef) called at /opt/vep/src/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm line 361
	Bio::EnsEMBL::VEP::Runner::next_output_line(Bio::EnsEMBL::VEP::Runner=HASH(0x5615ec5ef8a0)) called at /opt/vep/src/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm line 202
	Bio::EnsEMBL::VEP::Runner::run(Bio::EnsEMBL::VEP::Runner=HASH(0x5615ec5ef8a0)) called at /opt/vep/src/ensembl-vep/vep line 223



  File "/usr/local/bin/mmsplice", line 11, in <module>
    sys.exit(cli())
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/mmsplice/main.py", line 37, in run
    variant = json.loads(sys.stdin.readline().strip())
  File "/usr/lib/python3.6/json/__init__.py", line 354, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.6/json/decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.6/json/decoder.py", line 355, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Expecting ',' delimiter: line 1 column 8195 (char 8194)

BUG: Colab notebook is broken

Description

Your provided Colab notebook is not working (anymore?).

What I Did

I just tried to execute the in your README-file provided Colab notebook:
https://colab.research.google.com/drive/1Kw5rHMXaxXXsmE3WecxbXyGQJma80Eq6#scrollTo=gevQvR5TqTXI

As soon as the importing cell is reached, the notebook crashes:

from mmsplice.vcf_dataloader import SplicingVCFDataloader
from mmsplice import MMSplice, predict_all_table, predict_save
from mmsplice.utils import max_varEff

--> Output:

WARNING:kipoi_utils.external.related.mixins:Unrecognized fields for DataLoaderDescription: {'postprocessing'}. Available fields are {'args', 'output_schema', 'path', 'type', 'info', 'defined_as', 'dependencies', 'writers'}
WARNING:kipoi_utils.external.related.mixins:Unrecognized fields for DataLoaderDescription: {'postprocessing'}. Available fields are {'args', 'output_schema', 'path', 'type', 'info', 'defined_as', 'dependencies', 'writers'}
WARNING:kipoi_utils.external.related.mixins:Unrecognized fields for DataLoaderDescription: {'postprocessing'}. Available fields are {'args', 'output_schema', 'path', 'type', 'info', 'defined_as', 'dependencies', 'writers'}

---------------------------------------------------------------------------

AttributeError                            Traceback (most recent call last)

[<ipython-input-16-9491026dc097>](https://localhost:8080/#) in <cell line: 1>()
----> 1 from mmsplice.vcf_dataloader import SplicingVCFDataloader
      2 from mmsplice import MMSplice, predict_all_table, predict_save
      3 from mmsplice.utils import max_varEff

14 frames

[/usr/local/lib/python3.10/dist-packages/numpy/testing/_private/utils.py](https://localhost:8080/#) in <module>
     55 IS_PYSTON = hasattr(sys, "pyston_version_info")
     56 HAS_REFCOUNT = getattr(sys, 'getrefcount', None) is not None and not IS_PYSTON
---> 57 HAS_LAPACK64 = numpy.linalg._umath_linalg._ilp64
     58 
     59 _OLD_PROMOTION = lambda: np._get_promotion_state() == 'legacy'

AttributeError: module 'numpy.linalg._umath_linalg' has no attribute '_ilp64'

MemoryError

MMSplice version: 2.2.1
Python version: 3.7.1
Operating System: Debian 10

Description

Hi,
somehow MMSplice is reserving a huge amount of memory (~400GB) and crashing doing this. I think it crashs because of a security setting of Linux not permitting to use this much memory for a single application. But I wonder if the tool really needs this much? We have a strong server, but I want to avoid this.

Best regards,
Sebastian

What I Did

I'm using a inhouse WGS case and a GTF file taken from NCBI: https://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/annotation/GRCh37_latest/refseq_identifiers/

I also had to modify the GTF file a bit because you only mentioned official support for Ensembl and Gencode:

I replaced the NCBI chromosome names with 1,2,3....,X,Y
I filtered for protein coding genes with the tool gffread (https://github.com/gpertea/gffread). The tool also does some cleanup. Command was the following: gffread GRCh37_latest_genomic.gtf -F -T --keep-genes --keep-exon-attrs -C -o GRCh37_latest_genomic_filtered.gtf
The NCBI GTF has no exon_id fields. Hence I added an articial exon_id field for each exon to prevent MMSplice for crashing.

With this setting, I did a test on chromosome 1 only and everything was ok. But when I try to use the complete variant and GTF set I'm getting the following abort:

Traceback (most recent call last):
  File "mmsplicetest.py", line 13, in <module>
    dl = SplicingVCFDataloader(gtf, fasta, vcf, tissue_specific=False)
  File "/test/mmsplice/lib/python3.7/site-packages/mmsplice/vcf_dataloader.py", line 123, in __init__
    pr_exons = self._read_exons(gtf, overhang)
  File "/test/mmsplice/lib/python3.7/site-packages/mmsplice/vcf_dataloader.py", line 140, in _read_exons
    return read_exon_pyranges(gtf, overhang=overhang)
  File "/test/mmsplice/lib/python3.7/site-packages/mmsplice/vcf_dataloader.py", line 29, in read_exon_pyranges
    df_gtf = pyranges.read_gtf(gtf_file).df
  File "/test/mmsplice/lib/python3.7/site-packages/pyranges/readers.py", line 301, in read_gtf
    gr = read_gtf_full(f, as_df, nrows, _skiprows, duplicate_attr)
  File "/test/mmsplice/lib/python3.7/site-packages/pyranges/readers.py", line 337, in read_gtf_full
    df.loc[:, "Start"] = df.Start - 1
  File "/test/mmsplice/lib/python3.7/site-packages/pandas/core/indexing.py", line 670, in __setitem__
    iloc._setitem_with_indexer(indexer, value)
  File "/test/mmsplice/lib/python3.7/site-packages/pandas/core/indexing.py", line 1542, in _setitem_with_indexer
    take_split_path = self.obj._is_mixed_type
  File "/test/mmsplice/lib/python3.7/site-packages/pandas/core/generic.py", line 5232, in _is_mixed_type
    return self._protect_consolidate(f)
  File "/test/mmsplice/lib/python3.7/site-packages/pandas/core/generic.py", line 5194, in _protect_consolidate
    result = f()
  File "/test/mmsplice/lib/python3.7/site-packages/pandas/core/generic.py", line 5231, in <lambda>
    f = lambda: self._mgr.is_mixed_type
  File "/test/mmsplice/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 691, in is_mixed_type
    self._consolidate_inplace()
  File "/test/mmsplice/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 980, in _consolidate_inplace
    self.blocks = tuple(_consolidate(self.blocks))
  File "/test/mmsplice/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 1900, in _consolidate
    list(group_blocks), dtype=dtype, can_consolidate=_can_consolidate
  File "/test/mmsplice/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 1922, in _merge_blocks
    new_values = np.vstack([b.values for b in blocks])
  File "<__array_function__ internals>", line 6, in vstack
  File "/test/mmsplice/lib/python3.7/site-packages/numpy/core/shape_base.py", line 283, in vstack
    return _nx.concatenate(arrs, 0)
  File "<__array_function__ internals>", line 6, in concatenate
MemoryError: Unable to allocate 401. GiB for an array with shape (43752, 1231210) and data type object

Please set keras<=2.2.5 in requirements. Not work with keras=2.3

MMSplice version: 1.0.1
Python version: 3.6
Operating System: Ubuntu

Description

Error at "from keras.models import load_model" if keras=2.3.

From https://keras.io/: "Keras 2.2.5 was the last release of Keras implementing the 2.2.* API. It was the last release to only support TensorFlow 1 (as well as Theano and CNTK)."

Error with running MMSplice - keras decode('utf-8')

MMSplice version: 2.1.1
Python version: 3.6.0 (local) 3.6.12 (docker)
Operating System: Linux

Description

I usually run MMSplice with the VEP docker (as I couldn't get MMSplice running locally the first time), but have been having issues with the VEP server lately and would rather just run MMSplice without VEP, ideally using docker.

What I Did

I installed MMSplice both locally and with docker and came across the same error message when running this code:

# Import
from mmsplice.vcf_dataloader import SplicingVCFDataloader
from mmsplice import MMSplice, predict_save, predict_all_table
from mmsplice.utils import max_varEff

# example files
gtf = 'tests/data/test.gtf'
vcf = 'tests/data/test.vcf.gz'
fasta = 'tests/data/hg19.nochr.chr17.fa'
csv = 'pred.csv'

# Specify model
model = MMSplice()

dl = SplicingVCFDataloader(gtf, fasta, vcf, encode=False, tissue_specific=False)

# Or predict and return as df
predictions = predict_all_table(model, dl, pathogenicity=True, splicing_efficiency=True)

# Summerize with maximum effect size
predictionsMax = max_varEff(predictions)

predict_save(model, dl, output_csv, pathogenicity=True, splicing_efficiency=True)

Traceback (most recent call last):
  File "run_local_mmsplice.py", line 13, in <module>
    model = MMSplice()
  File "/Users/psullivan/Tools/MMSplice_MTSplice/mmsplice/mmsplice.py", line 62, in __init__
    custom_objects=custom_objects)
  File "/Users/psullivan/.pyenv/versions/3.6.0/lib/python3.6/site-packages/keras/engine/saving.py", line 419, in load_model
    model = _deserialize_model(f, custom_objects, compile)
  File "/Users/psullivan/.pyenv/versions/3.6.0/lib/python3.6/site-packages/keras/engine/saving.py", line 224, in _deserialize_model
    model_config = json.loads(model_config.decode('utf-8'))
AttributeError: 'str' object has no attribute 'decode'

I since then removed the .decode('utf-8') and .decode('utf8') lines of code from that file, in case that was the sole issue, but now I have another error which probably stemmed from removing that section from the code.

Traceback (most recent call last):
  File "run_local_mmsplice.py", line 18, in <module>
    predictions = predict_all_table(model, dl, pathogenicity=True, splicing_efficiency=True)
  File "/Users/psullivan/Tools/MMSplice_MTSplice/mmsplice/mmsplice.py", line 299, in predict_all_table
    natural_scale=natural_scale, ref_psi_version=ref_psi_version)
  File "/Users/psullivan/Tools/MMSplice_MTSplice/mmsplice/mmsplice.py", line 255, in predict_on_dataloader
    natural_scale=natural_scale, ref_psi_version=ref_psi_version)
  File "/Users/psullivan/.pyenv/versions/3.6.0/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 255, in concat
    sort=sort,
  File "/Users/psullivan/.pyenv/versions/3.6.0/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 301, in __init__
    objs = list(objs)
  File "/Users/psullivan/Tools/MMSplice_MTSplice/mmsplice/mmsplice.py", line 215, in _predict_on_dataloader
    batch, dataloader.optional_metadata)
  File "/Users/psullivan/Tools/MMSplice_MTSplice/mmsplice/mmsplice.py", line 130, in _predict_batch
    batch['inputs']['seq'])
  File "/Users/psullivan/Tools/MMSplice_MTSplice/mmsplice/mmsplice.py", line 95, in predict_modular_scores_on_batch
    self.acceptor_intronM.predict(batch['acceptor_intron']),
  File "/Users/psullivan/.pyenv/versions/3.6.0/lib/python3.6/site-packages/keras/engine/training.py", line 1149, in predict
    x, _, _ = self._standardize_user_data(x)
  File "/Users/psullivan/.pyenv/versions/3.6.0/lib/python3.6/site-packages/keras/engine/training.py", line 751, in _standardize_user_data
    exception_prefix='input')
  File "/Users/psullivan/.pyenv/versions/3.6.0/lib/python3.6/site-packages/keras/engine/training_utils.py", line 128, in standardize_input_data
    'with shape ' + str(data_shape))
ValueError: Error when checking input: expected input_5 to have 3 dimensions, but got array with shape (512, 1)

Any advice would be appreciated!

Adding vcf/gtf chromosome name check

When supplying your own GTF file there is no option for matching chromosome names between gtf and vcf files.

When I use the default grch37 or grch38 this is automatically checked using remove_chr_from_chrom_annotation

But I can't invoke this when I am supplying my own gtf. It would be nice to have as an option when my vcf file has no chromosome name (CHROM_NAME = 1) and my gtf file has chromosome names (CHROM_NAME = chr1)

Thanks
-Nick

Variants without predictions

I built the docker image based on the latest dockerfile and annotated a vcf. Around half of the variants were not annotated with a prediction score. I see the following on VEP_plugin troubleshoot section: "Some of the variants may not have prediction because they are not matched. In this case, emtpy values are returned." Could you elaborate on this issue? Is there a way for me to match the variants?

How to interpret tissue specific results

Hi,

thank you for this tool, very interesting!!
I just want to know how to interpret tissue specifc results, since I don't find any documentation.
Thank you so much!

Identifying the predicted types of splicing events

Hi,
thanks for developing and maintaining this project.

I would like to classify the predicted impact based on the type of AS event which is promoted or suppressed due to the mutation. More specifically, for a mutation over an exon, identify whether the 5'end or it's 3'end will be spliced by a mutation, or if an exon skipping event is predicted.
I can assume that using the donor and acceptor scores could help this this case. But the detail of those scores is not clear from the README file. Could you please further elaborate on those scores? And on how I could use the ref and alt values to classify the AS event?

pyranges.df is an expensive operation, best to avoid?

pyranges are collections of dataframes, one per chromosome(/strand). When you do gr.df it concatenates those dataframes into one. This might be slow and memory-consuming, especially if you are going to make a PyRange of it afterwards, then you need to split the df on chromosome/strand again :)

If the files are potentially large, Instead of doing

df = gr.df
df = do_stuff_to_df(df)
gr = pr.PyRanges(df)

you should consider

gr = gr.apply(do_stuff_to_df)

Congrats on the publication btw. PyRanges was accepted with a minor revision in bioinformatics, so be sure to cite the next time :)

model predictions in google collab

MMSplice version:
Python version: 3.7
Operating System: Google Collab

Description

Hi, I am trying to run the google collab notebook with the test vcf data and the notebook seems to crash once you try to make the predictions (so after the model is built). I run this command:

predict_save(model, dl, output_csv, pathogenicity=True, splicing_efficiency=True)

This is what the output looks like:

0it [00:00, ?it/s]

WARNING:tensorflow:Model was constructed with shape (None, None, 4) for input KerasTensor(type_spec=TensorSpec(shape=(None, None, 4), dtype=tf.float32, name='input_5'), name='input_5', description="created by layer 'input_5'"), but it was called on an input with incompatible shape (32,).

0it [00:00, ?it/s]

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

<ipython-input-58-81048a45f553> in <module>()
----> 1 predict_save(model, dl, output_csv, pathogenicity=True, splicing_efficiency=True)

6 frames

/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/func_graph.py in autograph_handler(*args, **kwargs)
   1127           except Exception as e:  # pylint:disable=broad-except
   1128             if hasattr(e, "ag_error_metadata"):
-> 1129               raise e.ag_error_metadata.to_exception(e)
   1130             else:
   1131               raise

ValueError: in user code:

    File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1621, in predict_function  *
        return step_function(self, iterator)
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1611, in step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1604, in run_step  **
        outputs = model.predict_step(data)
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1572, in predict_step
        return self(x, training=False)
    File "/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py", line 67, in error_handler
        raise e.with_traceback(filtered_tb) from None
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/input_spec.py", line 227, in assert_input_compatibility
        raise ValueError(f'Input {input_index} of layer "{layer_name}" '

    ValueError: Exception encountered when calling layer "model_5" (type Functional).
    
    Input 0 of layer "conv" is incompatible with the layer: expected min_ndim=3, found ndim=1. Full shape received: (32,)
    
    Call arguments received:
      • inputs=tf.Tensor(shape=(32,), dtype=string)
      • training=False
      • mask=None

Ship mmsplice with prebuild exon set.

Build predefined exon list of standard annotation for Grch37, Grch38. And ship MMSplice with standard annotation.

Add performance test

Add test also test for performance.

Error while installing mmsplice

MMSplice version:
Python version:
Operating System:

Description

I got an error while installing mmsplice.

What I Did

pip install mmsplice
Collecting mmsplice
  Downloading https://files.pythonhosted.org/packages/75/67/c04e48b46c8d32f948b7d4ccd84ef28f209c64fb6986fe776bc989e47cc9/mmsplice-0.2.7-py2.py3-none-any.whl (450kB)
     |████████████████████████████████| 460kB 17.5MB/s 
Collecting concise (from mmsplice)
  Downloading https://files.pythonhosted.org/packages/73/06/9b680a74cd7bc682c6da7da04704af89b4ea0eb635d917615e0c7401a686/concise-0.6.6.tar.gz (11.3MB)
     |████████████████████████████████| 11.3MB 19.7MB/s 
Collecting tqdm (from mmsplice)
  Downloading https://files.pythonhosted.org/packages/45/af/685bf3ce889ea191f3b916557f5677cc95a5e87b2fa120d74b5dd6d049d0/tqdm-4.32.1-py2.py3-none-any.whl (49kB)
     |████████████████████████████████| 51kB 13.7MB/s 
Collecting gffutils (from mmsplice)
  Downloading https://files.pythonhosted.org/packages/b4/ea/54ca403a8d471849606fb432b9afcf73c8c5105ea2dd87b8d38bd5217c5a/gffutils-0.9.tar.gz (1.5MB)
     |████████████████████████████████| 1.5MB 51.7MB/s 
Collecting sklearn (from mmsplice)
  Downloading https://files.pythonhosted.org/packages/1e/7a/dbb3be0ce9bd5c8b7e3d87328e79063f8b263b2b1bfa4774cb1147bfcd3f/sklearn-0.0.tar.gz
Collecting cyvcf2==0.9.0 (from mmsplice)
  Downloading https://files.pythonhosted.org/packages/3f/21/29c8dc12e91d55e953f230bf9e96af779b224f6d59f4eacc3dcb80753d34/cyvcf2-0.9.0.tar.gz (1.2MB)
     |████████████████████████████████| 1.2MB 60.3MB/s 
    ERROR: Complete output from command python setup.py egg_info:
    ERROR: Collecting cython
      Downloading https://files.pythonhosted.org/packages/df/5e/a43dd5869107788c56b957089a2d9819588e41d6269253590fe81e82d5bc/Cython-0.29.10-cp36-cp36m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (2.9MB)
    Collecting coloredlogs
      Downloading https://files.pythonhosted.org/packages/08/0f/7877fc42fff0b9d70b6442df62d53b3868d3a6ad1b876bdb54335b30ff23/coloredlogs-10.0-py2.py3-none-any.whl (47kB)
    Collecting click
      Downloading https://files.pythonhosted.org/packages/fa/37/45185cb5abbc30d7257104c434fe0b07e5a195a6847506c074527aa599ec/Click-7.0-py2.py3-none-any.whl (81kB)
    Collecting humanfriendly>=4.7 (from coloredlogs)
      Downloading https://files.pythonhosted.org/packages/90/df/88bff450f333114680698dc4aac7506ff7cab164b794461906de31998665/humanfriendly-4.18-py2.py3-none-any.whl (73kB)
    Installing collected packages: cython, humanfriendly, coloredlogs, click
    Successfully installed click-7.0 coloredlogs-10.0 cython-0.29.10 humanfriendly-4.18
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/private/var/folders/pm/y6_x4m551tx4l6b0kwnbh_ch0000gp/T/pip-install-tgv7hfs9/cyvcf2/setup.py", line 64, in <module>
        from Cython.Distutils import build_ext
    ModuleNotFoundError: No module named 'Cython'
    ----------------------------------------
ERROR: Command "python setup.py egg_info" failed with error code 1 in /private/var/folders/pm/y6_x4m551tx4l6b0kwnbh_ch0000gp/T/pip-install-tgv7hfs9/cyvcf2/

TSplice architecture

Hello,

Thanks for putting together these tools and the associated repo. I was looking at trying to train the TSplice model on different sets of celltype specific splicing data, analogous to the tissue specific data the available trained models use. Is the TSplice model architecture, as it's described in the MTSplice paper available for other training cases? I may have missed it in this repo, or others if so.

Thanks,
Derek

Running multiple jobs simultaneously

check_chrom_annotation

Check that gtf, fasta, and vcf file have consistent chromosome naming convention. For example, '1' in vcf don't overlap with 'chr1' in gtf and leads to cryptic error.

Refactor mmsplice with `kipoiseq` and `pyranges`

pyranges overlaps exon with high performance. So it can be used rather than interval tree.

kipoiseq have nice extractors which can be replace our dataloader and can increase performance.

mmsplice couldn't output separate score of each kind of splice type

MMSplice version: 0.2.8
Python version: Python-3.6.8
Operating System: Linux

Description

I use the mmsplice to do the prediction of splice variants, however, the output only contain 3 scores, that is "mmsplice_pathogenicity" "mmsplice_dse" and "mmsplice_dlogitPsi".
I wonder if it can output the scores like "mmsplice_alt_acceptorIntron,mmsplice_alt_acceptor,mmsplice_alt_exon,mmsplice_alt_donor,mmsplice_alt_donorIntron" and so on.

What I Did

My prediction script is this:
predictions = predict_all_table(model, dl, batch_size=1024, split_seq=False, assembly=True, pathogenicity=True, splicing_efficiency=True, progress=True) predictions.to_csv(out_csv)

My output is like this:
ID,mmsplice_dlogitPsi,exons,mmsplice_pathogenicity,mmsplice_dse 0,17:41197805:ACATCTGCC:['A'],0.04854488531130818,17_41197646_41197819:-,0.9425130312488815,-0.003766304501151105

Paste the command(s) you ran and the output.
If there was a crash, please include the traceback here.

error running example data - Input sequence acceptor intron length cannot be longer than the input sequence

MMSplice version: 2.0.0
Python version: 3.6
Operating System: CentOS 7

I am trying to run the example (as shown in the "Example Code" section of the README). I get an error on the prediction step:

>>> predictions = predict_all_table(model, dl, pathogenicity=True, splicing_efficiency=True)
0it [00:00, ?it/s][W::vcf_parse] INFO 'ALLELEID' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'CLNDISDB' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'CLNDN' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'CLNHGVS' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'CLNREVSTAT' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'CLNSIG' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'CLNVC' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'CLNVCSO' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'GENEINFO' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'MC' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'ORIGIN' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'RS' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'CLNVI' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'AF_EXAC' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'CLNSIGCONF' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'AF_ESP' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'AF_TGP' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'CLNDISDBINCL' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'CLNDNINCL' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'CLNSIGINCL' is not defined in the header, assuming Type=String
0it [00:00, ?it/s]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/rhalperin/.local/lib/python3.6/site-packages/mmsplice/mmsplice.py", line 277, in predict_all_table
    natural_scale=natural_scale, ref_psi_version=ref_psi_version)
  File "/home/rhalperin/.local/lib/python3.6/site-packages/mmsplice/mmsplice.py", line 225, in predict_on_dataloader
    natural_scale=natural_scale, ref_psi_version=ref_psi_version)
  File "/packages/python/3.6.0/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 255, in concat
    sort=sort,
  File "/packages/python/3.6.0/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 301, in __init__
    objs = list(objs)
  File "/home/rhalperin/.local/lib/python3.6/site-packages/mmsplice/mmsplice.py", line 151, in _predict_on_dataloader
    for batch in dt_iter:
  File "/home/rhalperin/.local/lib/python3.6/site-packages/tqdm/std.py", line 1130, in __iter__
    for obj in iterable:
  File "/home/rhalperin/.local/lib/python3.6/site-packages/mmsplice/exon_dataloader.py", line 278, in batch_iter
    for batch in super().batch_iter(batch_size, **kwargs):
  File "/home/rhalperin/.local/lib/python3.6/site-packages/kipoi_utils/data_utils.py", line 66, in batch_gen
    for x in iterable:
  File "/home/rhalperin/.local/lib/python3.6/site-packages/mmsplice/vcf_dataloader.py", line 142, in __next__
    return self._next(exon, variant, overhang)
  File "/home/rhalperin/.local/lib/python3.6/site-packages/mmsplice/exon_dataloader.py", line 257, in _next
    inputs['seq'] = self.spliter.split(inputs['seq'], overhang, exon)
  File "/home/rhalperin/.local/lib/python3.6/site-packages/mmsplice/exon_dataloader.py", line 109, in split
    assert intronl_len <= len(seq), "Input sequence acceptor intron" \
AssertionError: Input sequence acceptor intron length cannot be longer than the input sequence

Conflicting requirements: setuptools<=39.1.0 required by mmsplice, >=41.0.0 required by tensorflow

MMSplice version: 0.2.7
Python version: 3.6.3
Operating System: CentOS

Description

Installing mmsplice via pip

What I Did

$ pip install mmsplice
Collecting mmsplice
  Downloading https://files.pythonhosted.org/packages/75/67/c04e48b46c8d32f948b7d4ccd84ef28f209c64fb6986fe776bc989e47cc9/mmsplice-0.2.7-py2.py3-none-any.whl (450kB)
     |████████████████████████████████| 460kB 8.9MB/s
Collecting pyfaidx (from mmsplice)
  Downloading https://files.pythonhosted.org/packages/75/a5/7e2569527b3849ea28d79b4f70d7cf46a47d36459bc59e0efa4e10e8c8b2/pyfaidx-0.5.5.2.tar.gz
Collecting tqdm (from mmsplice)
  Downloading https://files.pythonhosted.org/packages/9f/3d/7a6b68b631d2ab54975f3a4863f3c4e9b26445353264ef01f465dc9b0208/tqdm-4.32.2-py2.py3-none-any.whl (50kB)
     |████████████████████████████████| 51kB 7.9MB/s
Collecting tensorflow (from mmsplice)
  Downloading https://files.pythonhosted.org/packages/de/f0/96fb2e0412ae9692dbf400e5b04432885f677ad6241c088ccc5fe7724d69/tensorflow-1.14.0-cp36-cp36m-manylinux1_x86_64.whl (109.2MB)
     |████████████████████████████████| 109.2MB 27.5MB/s
Collecting setuptools<=39.1.0 (from mmsplice)
  Downloading https://files.pythonhosted.org/packages/8c/10/79282747f9169f21c053c562a0baa21815a8c7879be97abd930dbcf862e8/setuptools-39.1.0-py2.py3-none-any.whl (566kB)
     |████████████████████████████████| 573kB 13.2MB/s
Requirement already satisfied: pandas in ./MAJIQ_2.0/lib/python3.6/site-packages (from mmsplice) (0.24.1)
Collecting sklearn (from mmsplice)
  Downloading https://files.pythonhosted.org/packages/1e/7a/dbb3be0ce9bd5c8b7e3d87328e79063f8b263b2b1bfa4774cb1147bfcd3f/sklearn-0.0.tar.gz
Collecting concise (from mmsplice)
  Downloading https://files.pythonhosted.org/packages/73/06/9b680a74cd7bc682c6da7da04704af89b4ea0eb635d917615e0c7401a686/concise-0.6.6.tar.gz (11.3MB)
     |████████████████████████████████| 11.3MB 14.9MB/s
Collecting cyvcf2==0.9.0 (from mmsplice)
  Downloading https://files.pythonhosted.org/packages/3f/21/29c8dc12e91d55e953f230bf9e96af779b224f6d59f4eacc3dcb80753d34/cyvcf2-0.9.0.tar.gz (1.2MB)
     |████████████████████████████████| 1.2MB 9.6MB/s
Collecting keras (from mmsplice)
  Downloading https://files.pythonhosted.org/packages/5e/10/aa32dad071ce52b5502266b5c659451cfd6ffcbf14e6c8c4f16c0ff5aaab/Keras-2.2.4-py2.py3-none-any.whl (312kB)
     |████████████████████████████████| 317kB 21.5MB/s
Collecting kipoi>=0.4.1 (from mmsplice)
  Downloading https://files.pythonhosted.org/packages/93/df/469e851e562855b69c1c47ca801a14fdd82cd8cc2be9158c07f347635cfd/kipoi-0.6.16-py3-none-any.whl (101kB)
     |████████████████████████████████| 102kB 6.5MB/s
Requirement already satisfied: click in ./MAJIQ_2.0/lib/python3.6/site-packages (from mmsplice) (7.0)
Collecting gffutils (from mmsplice)
  Downloading https://files.pythonhosted.org/packages/b4/ea/54ca403a8d471849606fb432b9afcf73c8c5105ea2dd87b8d38bd5217c5a/gffutils-0.9.tar.gz (1.5MB)
     |████████████████████████████████| 1.5MB 11.4MB/s
Requirement already satisfied: six in ./MAJIQ_2.0/lib/python3.6/site-packages (from pyfaidx->mmsplice) (1.11.0)
Collecting protobuf>=3.6.1 (from tensorflow->mmsplice)
  Downloading https://files.pythonhosted.org/packages/dc/0e/e7cdff89745986c984ba58e6ff6541bc5c388dd9ab9d7d312b3b1532584a/protobuf-3.9.0-cp36-cp36m-manylinux1_x86_64.whl (1.2MB)
     |████████████████████████████████| 1.2MB 22.6MB/s
Collecting termcolor>=1.1.0 (from tensorflow->mmsplice)
  Downloading https://files.pythonhosted.org/packages/8a/48/a76be51647d0eb9f10e2a4511bf3ffb8cc1e6b14e9e4fab46173aa79f981/termcolor-1.1.0.tar.gz
Collecting wheel>=0.26 (from tensorflow->mmsplice)
  Downloading https://files.pythonhosted.org/packages/bb/10/44230dd6bf3563b8f227dbf344c908d412ad2ff48066476672f3a72e174e/wheel-0.33.4-py2.py3-none-any.whl
Collecting tensorflow-estimator<1.15.0rc0,>=1.14.0rc0 (from tensorflow->mmsplice)
  Downloading https://files.pythonhosted.org/packages/3c/d5/21860a5b11caf0678fbc8319341b0ae21a07156911132e0e71bffed0510d/tensorflow_estimator-1.14.0-py2.py3-none-any.whl (488kB)
     |████████████████████████████████| 491kB 22.5MB/s
Collecting grpcio>=1.8.6 (from tensorflow->mmsplice)
  Downloading https://files.pythonhosted.org/packages/f2/5d/b434403adb2db8853a97828d3d19f2032e79d630e0d11a8e95d243103a11/grpcio-1.22.0-cp36-cp36m-manylinux1_x86_64.whl (2.2MB)
     |████████████████████████████████| 2.2MB 23.4MB/s
Collecting wrapt>=1.11.1 (from tensorflow->mmsplice)
  Downloading https://files.pythonhosted.org/packages/23/84/323c2415280bc4fc880ac5050dddfb3c8062c2552b34c2e512eb4aa68f79/wrapt-1.11.2.tar.gz
Collecting tensorboard<1.15.0,>=1.14.0 (from tensorflow->mmsplice)
  Downloading https://files.pythonhosted.org/packages/91/2d/2ed263449a078cd9c8a9ba50ebd50123adf1f8cfbea1492f9084169b89d9/tensorboard-1.14.0-py3-none-any.whl (3.1MB)
     |████████████████████████████████| 3.2MB 23.8MB/s
Collecting google-pasta>=0.1.6 (from tensorflow->mmsplice)
  Downloading https://files.pythonhosted.org/packages/d0/33/376510eb8d6246f3c30545f416b2263eee461e40940c2a4413c711bdf62d/google_pasta-0.1.7-py3-none-any.whl (52kB)
     |████████████████████████████████| 61kB 9.9MB/s
Collecting gast>=0.2.0 (from tensorflow->mmsplice)
  Downloading https://files.pythonhosted.org/packages/4e/35/11749bf99b2d4e3cceb4d55ca22590b0d7c2c62b9de38ac4a4a7f4687421/gast-0.2.2.tar.gz
Collecting absl-py>=0.7.0 (from tensorflow->mmsplice)
  Downloading https://files.pythonhosted.org/packages/da/3f/9b0355080b81b15ba6a9ffcf1f5ea39e307a2778b2f2dc8694724e8abd5b/absl-py-0.7.1.tar.gz (99kB)
     |████████████████████████████████| 102kB 9.3MB/s
Requirement already satisfied: numpy<2.0,>=1.14.5 in ./MAJIQ_2.0/lib/python3.6/site-packages (from tensorflow->mmsplice) (1.16.4)
Collecting keras-preprocessing>=1.0.5 (from tensorflow->mmsplice)
  Downloading https://files.pythonhosted.org/packages/28/6a/8c1f62c37212d9fc441a7e26736df51ce6f0e38455816445471f10da4f0a/Keras_Preprocessing-1.1.0-py2.py3-none-any.whl (41kB)
     |████████████████████████████████| 51kB 9.5MB/s
Collecting keras-applications>=1.0.6 (from tensorflow->mmsplice)
  Downloading https://files.pythonhosted.org/packages/71/e3/19762fdfc62877ae9102edf6342d71b28fbfd9dea3d2f96a882ce099b03f/Keras_Applications-1.0.8-py3-none-any.whl (50kB)
     |████████████████████████████████| 51kB 9.7MB/s
Collecting astor>=0.6.0 (from tensorflow->mmsplice)
  Downloading https://files.pythonhosted.org/packages/d1/4f/950dfae467b384fc96bc6469de25d832534f6b4441033c39f914efd13418/astor-0.8.0-py2.py3-none-any.whl
Requirement already satisfied: pytz>=2011k in ./MAJIQ_2.0/lib/python3.6/site-packages (from pandas->mmsplice) (2018.9)
Requirement already satisfied: python-dateutil>=2.5.0 in ./MAJIQ_2.0/lib/python3.6/site-packages (from pandas->mmsplice) (2.7.5)
Requirement already satisfied: scikit-learn in ./MAJIQ_2.0/lib/python3.6/site-packages (from sklearn->mmsplice) (0.21.2)
Requirement already satisfied: scipy in ./MAJIQ_2.0/lib/python3.6/site-packages (from concise->mmsplice) (1.3.0)
Requirement already satisfied: matplotlib in ./MAJIQ_2.0/lib/python3.6/site-packages (from concise->mmsplice) (3.1.1)
Collecting hyperopt (from concise->mmsplice)
  Downloading https://files.pythonhosted.org/packages/63/12/704382c3081df3ae3f9d96fe6afb62efa2fa9749be20c301cd2797fb0b52/hyperopt-0.1.2-py3-none-any.whl (115kB)
     |████████████████████████████████| 122kB 25.4MB/s
Collecting descartes (from concise->mmsplice)
  Downloading https://files.pythonhosted.org/packages/e5/b6/1ed2eb03989ae574584664985367ba70cd9cf8b32ee8cad0e8aaeac819f3/descartes-1.1.0-py3-none-any.whl
Collecting shapely (from concise->mmsplice)
  Downloading https://files.pythonhosted.org/packages/38/b6/b53f19062afd49bb5abd049aeed36f13bf8d57ef8f3fa07a5203531a0252/Shapely-1.6.4.post2-cp36-cp36m-manylinux1_x86_64.whl (1.5MB)
     |████████████████████████████████| 1.5MB 24.3MB/s
Collecting gtfparse>=1.0.7 (from concise->mmsplice)
  Downloading https://files.pythonhosted.org/packages/41/5c/8bd2e9020051ccffc60c56ae70b32a3b649ddac1962e9aa641f93542440e/gtfparse-1.2.0.tar.gz
Requirement already satisfied: h5py in ./MAJIQ_2.0/lib/python3.6/site-packages (from keras->mmsplice) (2.9.0)
Collecting pyyaml (from keras->mmsplice)
  Downloading https://files.pythonhosted.org/packages/a3/65/837fefac7475963d1eccf4aa684c23b95aa6c1d033a2c5965ccb11e22623/PyYAML-5.1.1.tar.gz (274kB)
     |████████████████████████████████| 276kB 24.2MB/s
Collecting future (from kipoi>=0.4.1->mmsplice)
  Downloading https://files.pythonhosted.org/packages/90/52/e20466b85000a181e1e144fd8305caf2cf475e2f9674e797b222f8105f5f/future-0.17.1.tar.gz (829kB)
     |████████████████████████████████| 829kB 23.9MB/s
Requirement already satisfied: jinja2 in ./MAJIQ_2.0/lib/python3.6/site-packages (from kipoi>=0.4.1->mmsplice) (2.10)
Collecting kipoi-utils>=0.1.12 (from kipoi>=0.4.1->mmsplice)
  Downloading https://files.pythonhosted.org/packages/8b/84/cfee68e16d4120d5eb23eb6ee67b34f81cbaf1c77d0a60acb6e25a19ff27/kipoi_utils-0.1.12-py3-none-any.whl
Collecting attrs>=17.4.0 (from kipoi>=0.4.1->mmsplice)
  Downloading https://files.pythonhosted.org/packages/23/96/d828354fa2dbdf216eaa7b7de0db692f12c234f7ef888cc14980ef40d1d2/attrs-19.1.0-py2.py3-none-any.whl
Requirement already satisfied: urllib3>=1.21.1 in ./MAJIQ_2.0/lib/python3.6/site-packages (from kipoi>=0.4.1->mmsplice) (1.24.1)
Collecting deprecation>=2.0.6 (from kipoi>=0.4.1->mmsplice)
  Downloading https://files.pythonhosted.org/packages/32/e9/01ffbaf3540ad54476cd7066439d629f1dd73b851cc5c0993ce2c12e1cdd/deprecation-2.0.6-py2.py3-none-any.whl
Collecting related>=0.6.0 (from kipoi>=0.4.1->mmsplice)
  Downloading https://files.pythonhosted.org/packages/23/6e/5419d1364d9c408cb4943d1301800bf88b8d91e034c355448b00440ca202/related-0.7.2-py2.py3-none-any.whl
Collecting kipoi-conda>=0.1.6 (from kipoi>=0.4.1->mmsplice)
  Downloading https://files.pythonhosted.org/packages/4c/ec/f9436cad99756e2f9981f1ceaba5452257563174c59d785e782d4d981857/kipoi_conda-0.1.6-py3-none-any.whl
Collecting enum34 (from kipoi>=0.4.1->mmsplice)
  Downloading https://files.pythonhosted.org/packages/af/42/cb9355df32c69b553e72a2e28daee25d1611d2c0d9c272aa1d34204205b2/enum34-1.1.6-py3-none-any.whl
Collecting colorlog (from kipoi>=0.4.1->mmsplice)
  Downloading https://files.pythonhosted.org/packages/68/4d/892728b0c14547224f0ac40884e722a3d00cb54e7a146aea0b3186806c9e/colorlog-4.0.2-py2.py3-none-any.whl
Collecting cookiecutter>=1.6.0 (from kipoi>=0.4.1->mmsplice)
  Downloading https://files.pythonhosted.org/packages/16/99/1ca3a75978270288354f419e9166666801cf7e7d8df984de44a7d5d8b8d0/cookiecutter-1.6.0-py2.py3-none-any.whl (50kB)
     |████████████████████████████████| 51kB 9.6MB/s
Collecting tinydb (from kipoi>=0.4.1->mmsplice)
  Downloading https://files.pythonhosted.org/packages/d7/f9/0e871cbf0da678cf1780609dc6aef26a5ed544c86733fc1ceaf134fce52c/tinydb-3.13.0-py2.py3-none-any.whl
Collecting argh (from gffutils->mmsplice)
  Downloading https://files.pythonhosted.org/packages/06/1c/e667a7126f0b84aaa1c56844337bf0ac12445d1beb9c8a6199a7314944bf/argh-0.26.2-py2.py3-none-any.whl
Collecting argcomplete (from gffutils->mmsplice)
  Downloading https://files.pythonhosted.org/packages/4d/82/f44c9661e479207348a979b1f6f063625d11dc4ca6256af053719bbb0124/argcomplete-1.10.0-py2.py3-none-any.whl
Collecting simplejson (from gffutils->mmsplice)
  Downloading https://files.pythonhosted.org/packages/e3/24/c35fb1c1c315fc0fffe61ea00d3f88e85469004713dab488dee4f35b0aff/simplejson-3.16.0.tar.gz (81kB)
     |████████████████████████████████| 81kB 12.3MB/s
Requirement already satisfied: werkzeug>=0.11.15 in ./MAJIQ_2.0/lib/python3.6/site-packages (from tensorboard<1.15.0,>=1.14.0->tensorflow->mmsplice) (0.14.1)
Collecting markdown>=2.6.8 (from tensorboard<1.15.0,>=1.14.0->tensorflow->mmsplice)
  Downloading https://files.pythonhosted.org/packages/c0/4e/fd492e91abdc2d2fcb70ef453064d980688762079397f779758e055f6575/Markdown-3.1.1-py2.py3-none-any.whl (87kB)
     |████████████████████████████████| 92kB 13.7MB/s
Requirement already satisfied: joblib>=0.11 in ./MAJIQ_2.0/lib/python3.6/site-packages (from scikit-learn->sklearn->mmsplice) (0.13.2)
Requirement already satisfied: cycler>=0.10 in ./MAJIQ_2.0/lib/python3.6/site-packages (from matplotlib->concise->mmsplice) (0.10.0)
Requirement already satisfied: kiwisolver>=1.0.1 in ./MAJIQ_2.0/lib/python3.6/site-packages (from matplotlib->concise->mmsplice) (1.0.1)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in ./MAJIQ_2.0/lib/python3.6/site-packages (from matplotlib->concise->mmsplice) (2.3.0)
Collecting pymongo (from hyperopt->concise->mmsplice)
  Downloading https://files.pythonhosted.org/packages/fb/4a/586826433281ca285f0201235fccf63cc29a30fa78bcd72b6a34e365972d/pymongo-3.8.0-cp36-cp36m-manylinux1_x86_64.whl (416kB)
     |████████████████████████████████| 419kB 23.3MB/s
Collecting networkx (from hyperopt->concise->mmsplice)
  Downloading https://files.pythonhosted.org/packages/85/08/f20aef11d4c343b557e5de6b9548761811eb16e438cee3d32b1c66c8566b/networkx-2.3.zip (1.7MB)
     |████████████████████████████████| 1.8MB 23.8MB/s
Requirement already satisfied: MarkupSafe>=0.23 in ./MAJIQ_2.0/lib/python3.6/site-packages (from jinja2->kipoi>=0.4.1->mmsplice) (1.1.0)
Requirement already satisfied: packaging in ./MAJIQ_2.0/lib/python3.6/site-packages (from deprecation>=2.0.6->kipoi>=0.4.1->mmsplice) (19.0)
Collecting binaryornot>=0.2.0 (from cookiecutter>=1.6.0->kipoi>=0.4.1->mmsplice)
  Downloading https://files.pythonhosted.org/packages/24/7e/f7b6f453e6481d1e233540262ccbfcf89adcd43606f44a028d7f5fae5eb2/binaryornot-0.4.4-py2.py3-none-any.whl
Collecting whichcraft>=0.4.0 (from cookiecutter>=1.6.0->kipoi>=0.4.1->mmsplice)
  Downloading https://files.pythonhosted.org/packages/ab/c6/eb4d1dfbb68168bb01c4394420e5e71d5851e64b910838aa0f14ebd5c7a0/whichcraft-0.5.2-py2.py3-none-any.whl
Requirement already satisfied: requests>=2.18.0 in ./MAJIQ_2.0/lib/python3.6/site-packages (from cookiecutter>=1.6.0->kipoi>=0.4.1->mmsplice) (2.21.0)
Collecting poyo>=0.1.0 (from cookiecutter>=1.6.0->kipoi>=0.4.1->mmsplice)
  Downloading https://files.pythonhosted.org/packages/e0/16/e00e3001007a5e416ca6a51def6f9e4be6a774bf1c8486d20466f834d113/poyo-0.4.2-py2.py3-none-any.whl
Collecting jinja2-time>=0.1.0 (from cookiecutter>=1.6.0->kipoi>=0.4.1->mmsplice)
  Downloading https://files.pythonhosted.org/packages/6a/a1/d44fa38306ffa34a7e1af09632b158e13ec89670ce491f8a15af3ebcb4e4/jinja2_time-0.2.0-py2.py3-none-any.whl
Requirement already satisfied: decorator>=4.3.0 in ./MAJIQ_2.0/lib/python3.6/site-packages (from networkx->hyperopt->concise->mmsplice) (4.3.2)
Requirement already satisfied: chardet>=3.0.2 in ./MAJIQ_2.0/lib/python3.6/site-packages (from binaryornot>=0.2.0->cookiecutter>=1.6.0->kipoi>=0.4.1->mmsplice) (3.0.4)
Requirement already satisfied: idna<2.9,>=2.5 in ./MAJIQ_2.0/lib/python3.6/site-packages (from requests>=2.18.0->cookiecutter>=1.6.0->kipoi>=0.4.1->mmsplice) (2.8)
Requirement already satisfied: certifi>=2017.4.17 in ./MAJIQ_2.0/lib/python3.6/site-packages (from requests>=2.18.0->cookiecutter>=1.6.0->kipoi>=0.4.1->mmsplice) (2018.11.29)
Collecting arrow (from jinja2-time>=0.1.0->cookiecutter>=1.6.0->kipoi>=0.4.1->mmsplice)
  Downloading https://files.pythonhosted.org/packages/a2/6a/a3d20e80ee4fee7c55c022fb28d52239bd01171edd3c137dd1e2ef8b2a20/arrow-0.14.2-py2.py3-none-any.whl
<font color=red>ERROR: tensorboard 1.14.0 has requirement setuptools>=41.0.0, but you'll have setuptools 39.1.0 which is incompatible.</font>
Installing collected packages: setuptools, pyfaidx, tqdm, protobuf, termcolor, wheel, tensorflow-estimator, grpcio, wrapt, absl-py, markdown, tensorboard, google-pasta, gast, keras-preprocessing, keras-applications, astor, tensorflow, sklearn, pyyaml, keras, future, pymongo, networkx, hyperopt, descartes, shapely, gtfparse, concise, cyvcf2, attrs, related, kipoi-utils, deprecation, kipoi-conda, enum34, colorlog, binaryornot, whichcraft, poyo, arrow, jinja2-time, cookiecutter, tinydb, kipoi, argh, argcomplete, simplejson, gffutils, mmsplice
  Found existing installation: setuptools 40.9.0
    Uninstalling setuptools-40.9.0:
      Successfully uninstalled setuptools-40.9.0
  Running setup.py install for pyfaidx ... done
  Running setup.py install for termcolor ... done
  Running setup.py install for wrapt ... done
  Running setup.py install for absl-py ... done
  Running setup.py install for gast ... done
  Running setup.py install for sklearn ... done
  Running setup.py install for pyyaml ... done
  Running setup.py install for future ... done
  Running setup.py install for networkx ... done
  Running setup.py install for gtfparse ... done
  Running setup.py install for concise ... done
  Running setup.py install for cyvcf2 ... done
  Running setup.py install for simplejson ... done
  Running setup.py install for gffutils ... done
Successfully installed absl-py-0.7.1 argcomplete-1.10.0 argh-0.26.2 arrow-0.14.2 astor-0.8.0 attrs-19.1.0 binaryornot-0.4.4 colorlog-4.0.2 concise-0.6.6 cookiecutter-1.6.0 cyvcf2-0.9.0 deprecation-2.0.6 descartes-1.1.0 enum34-1.1.6 future-0.17.1 gast-0.2.2 gffutils-0.9 google-pasta-0.1.7 grpcio-1.22.0 gtfparse-1.2.0 hyperopt-0.1.2 jinja2-time-0.2.0 keras-2.2.4 keras-applications-1.0.8 keras-preprocessing-1.1.0 kipoi-0.6.16 kipoi-conda-0.1.6 kipoi-utils-0.1.12 markdown-3.1.1 mmsplice-0.2.7 networkx-2.3 poyo-0.4.2 protobuf-3.9.0 pyfaidx-0.5.5.2 pymongo-3.8.0 pyyaml-5.1.1 related-0.7.2 setuptools-39.1.0 shapely-1.6.4.post2 simplejson-3.16.0 sklearn-0.0 tensorboard-1.14.0 tensorflow-1.14.0 tensorflow-estimator-1.14.0 termcolor-1.1.0 tinydb-3.13.0 tqdm-4.32.2 wheel-0.33.4 whichcraft-0.5.2 wrapt-1.11.2

Re-implement `exon_dataloader`

Re-implement exon_dataloader
Write colab example for exon dataloader

MMSplice on non-human VCF files

MMSplice version:
Python version:
Operating System:

Description

Can I run mmsplice on non-human VCF files. I am interested in running them on horse/dog/cow variant files

Replace sklearn models with numpy implementation

The sklearn models can be implement easily with numpy
This will remove the dependencies on sklearn and therefore remove the warning of using sklearn versions other than 0.19.2

kipoiseq problem

MMSplice version:
Python version:
Operating System:

Description

Describe what you were trying to get done.
Tell us what happened, what went wrong, and what you expected to happen.

What I Did

Paste the command(s) you ran and the output.
If there was a crash, please include the traceback here.

can't import kipoiseq

----> 5 import kipoiseq.transforms.functional as F

Package dependencies don't overlap & trouble with google collab script

MMSplice version: 2.2.0
Python version: Python 3.8.8
Operating System: Windows Subsystem for Linux/Google Collab

Description

Describe what you were trying to get done.

Setting up the app with

pip install cyvcf2 cython
pip install mmsplice

Tell us what happened, what went wrong, and what you expected to happen:

The following error messages regarding package dependancies:

ERROR: mmsplice 2.2.0 has requirement numpy==1.18.5, but you'll have numpy 1.21.3 which is incompatible.
ERROR: tensorflow 2.6.0 has requirement numpy~=1.19.2, but you'll have numpy 1.18.5 which is incompatible.

ERROR: kipoi 0.6.35 has requirement h5py==2.10.0, but you'll have h5py 3.1.0 which is incompatible.
ERROR: tensorflow 2.6.0 has requirement h5py~=3.1.0, but you'll have h5py 2.10.0 which is incompatible.

So the packages are dependent on different versions of other packages, hence the setup will not go through.

What I Did

To my understanding, the developers have to go under the hood to fix this issue.
So I tried to run MMSplice in the google collab link, and had the following issue:

Describe what you were trying to get done.

Runt he MMSplice google collab script using the provided example vcf

Tell us what happened, what went wrong, and what you expected to happen:

When running line:
predict_save(model, dl, output_csv, pathogenicity=True, splicing_efficiency=True)
I get the error (traceback at bottom):
ValueError: Input 0 of layer conv is incompatible with the layer: : expected min_ndim=3, found ndim=2. Full shape received: (32, 1)

What I did

Regarding the error message, I believe these could be solutions to the problem, but again ones that the authors can change:
https://stackoverflow.com/questions/57430717/input-0-of-layer-conv1d-1-is-incompatible-with-the-layer-expected-ndim-3-found
https://stackoverflow.com/questions/66718335/input-0-of-layer-conv1d-is-incompatible-with-the-layer-expected-min-ndim-3-f

Paste the command(s) you ran and the output (Google collab script)

predict_save(model, dl, output_csv, pathogenicity=True, splicing_efficiency=True)

If there was a crash, please include the traceback here (Google collab script)

0it [00:00, ?it/s]WARNING:tensorflow:Model was constructed with shape (None, None, 4) for input KerasTensor(type_spec=TensorSpec(shape=(None, None, 4), dtype=tf.float32, name='input_5'), name='input_5', description="created by layer 'input_5'"), but it was called on an input with incompatible shape (32, 1).
0it [00:00, ?it/s]
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-13-81048a45f553> in <module>()
----> 1 predict_save(model, dl, output_csv, pathogenicity=True, splicing_efficiency=True)

14 frames
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/func_graph.py in wrapper(*args, **kwargs)
    992           except Exception as e:  # pylint:disable=broad-except
    993             if hasattr(e, "ag_error_metadata"):
--> 994               raise e.ag_error_metadata.to_exception(e)
    995             else:
    996               raise

ValueError: in user code:

    /usr/local/lib/python3.7/dist-packages/keras/engine/training.py:1586 predict_function  *
        return step_function(self, iterator)
    /usr/local/lib/python3.7/dist-packages/keras/engine/training.py:1576 step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/distribute/distribute_lib.py:1286 run
        return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/distribute/distribute_lib.py:2849 call_for_each_replica
        return self._call_for_each_replica(fn, args, kwargs)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/distribute/distribute_lib.py:3632 _call_for_each_replica
        return fn(*args, **kwargs)
    /usr/local/lib/python3.7/dist-packages/keras/engine/training.py:1569 run_step  **
        outputs = model.predict_step(data)
    /usr/local/lib/python3.7/dist-packages/keras/engine/training.py:1537 predict_step
        return self(x, training=False)
    /usr/local/lib/python3.7/dist-packages/keras/engine/base_layer.py:1037 __call__
        outputs = call_fn(inputs, *args, **kwargs)
    /usr/local/lib/python3.7/dist-packages/keras/engine/functional.py:415 call
        inputs, training=training, mask=mask)
    /usr/local/lib/python3.7/dist-packages/keras/engine/functional.py:550 _run_internal_graph
        outputs = node.layer(*args, **kwargs)
    /usr/local/lib/python3.7/dist-packages/keras/engine/base_layer.py:1020 __call__
        input_spec.assert_input_compatibility(self.input_spec, inputs, self.name)
    /usr/local/lib/python3.7/dist-packages/keras/engine/input_spec.py:234 assert_input_compatibility
        str(tuple(shape)))

    ValueError: Input 0 of layer conv is incompatible with the layer: : expected min_ndim=3, found ndim=2. Full shape received: (32, 1)

PSA
For future reference these are the packages that have to be installed manually when running MMSplice 2.2.0 on the Google Collab script:

!pip install --upgrade tensorflow
!pip install --upgrade numpy
from tensorflow import keras 
from tensorflow.keras.models import load_model
from tensorflow.keras import models
from tensorflow.keras import backend
from tensorflow.keras import metrics as metrics_module

Can MMSplice be used for mouse or other non-human organisms?

MMSplice version:
Python version:
Operating System:

Description

Describe what you were trying to get done.
Tell us what happened, what went wrong, and what you expected to happen.

What I Did

Paste the command(s) you ran and the output.
If there was a crash, please include the traceback here.

VEP plugin with PYTHON error

MMSplice version:master(git clone from github)
Python version:3.5
Operating System: docker image

Description

I added MMSplice into VEP docker and ran it in the followed command, but it goes wrong.

I entered docker container

docker run -it -v `pwd`:`pwd` mmsplicedocker bash

What I Did

 vep -i test.vcf --plugin MMSplice --vcf --force  --offline --cache --dir path-to/VEP/ -o test2.vcf --fasta path-to/human_g1k_v37.fasta

OUTPUT

/usr/local/lib/python3.6/dist-packages/concise/utils/plot.py:115: FutureWarning: arrays to stack must be passed as a "sequence" type such as list or tuple. Support for non-sequence iterables such as generators is deprecated as of NumPy 1.16 and will raise an error in the future.
  min_coords = np.vstack(data.min(0) for data in polygons_data).min(0)
/usr/local/lib/python3.6/dist-packages/concise/utils/plot.py:116: FutureWarning: arrays to stack must be passed as a "sequence" type such as list or tuple. Support for non-sequence iterables such as generators is deprecated as of NumPy 1.16 and will raise an error in the future.
  max_coords = np.vstack(data.max(0) for data in polygons_data).max(0)
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
2019-07-26 02:15:11,327 [WARNING] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
2019-07-26 02:15:11.429590: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2019-07-26 02:15:11.815407: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2294595000 Hz
2019-07-26 02:15:12.031385: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x34b7bd0 executing computations on platform Host. Devices:
2019-07-26 02:15:12.031435: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
/usr/local/lib/python3.6/dist-packages/keras/engine/saving.py:292: UserWarning: No training configuration found in save file: the model was *not* compiled. Compile it manually.
  warnings.warn('No training configuration found in save file: '
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:3445: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
2019-07-26 02:15:13,066 [WARNING] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:3445: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
2019-07-26 02:15:13,494 [WARNING] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
2019-07-26 02:15:18,974 [WARNING] None GT donor: 
2019-07-26 02:15:18,975 [WARNING] None AG acceptor: 
Use of uninitialized value in subtraction (-) at
	/opt/vep/src/Plugins/MMSplice.pm line 232, <__ANONIO__> line 200 (#1)
    (W uninitialized) An undefined value was used as if it were already
    defined.  It was interpreted as a "" or a 0, but maybe it was a mistake.
    To suppress this warning assign a defined value to your variables.
    
    To help you figure out what was undefined, perl will try to tell you
    the name of the variable (if any) that was undefined.  In some cases
    it cannot do this, so it also tells you what operation you used the
    undefined value in.  Note, however, that perl optimizes your program
    and the operation displayed in the warning may not necessarily appear
    literally in your program.  For example, "that $foo" is usually
    optimized into "that " . $foo, and the warning will refer to the
    concatenation (.) operator, even though there is no . in
    your program.
    
Use of uninitialized value in addition (+) at /opt/vep/src/Plugins/MMSplice.pm
	line 232, <__ANONIO__> line 200 (#1)
Use of uninitialized value in subtraction (-) at
	/opt/vep/src/Plugins/MMSplice.pm line 216, <__ANONIO__> line 200 (#1)
Use of uninitialized value in addition (+) at /opt/vep/src/Plugins/MMSplice.pm
	line 217, <__ANONIO__> line 200 (#1)
Use of uninitialized value in concatenation (.) or string at
	/opt/vep/src/Plugins/MMSplice.pm line 340, <__ANONIO__> line 200 (#1)
Traceback (most recent call last):
  File "/usr/local/bin/mmsplice", line 11, in <module>
    sys.exit(cli())
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/mmsplice/main.py", line 32, in run
    variant = json.loads(sys.stdin.readline().strip())
  File "/usr/lib/python3.6/json/__init__.py", line 354, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.6/json/decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.6/json/decoder.py", line 357, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 17 (char 16)
Use of uninitialized value $result in chomp at /opt/vep/src/Plugins/MMSplice.pm
	line 115, <GEN1> line 54 (#1)
Use of uninitialized value $result in string eq at
	/opt/vep/src/Plugins/MMSplice.pm line 116, <GEN1> line 54 (#1)

Used in non-model species?

As title shows, whether the software can be used in non-human species, such as pig, cattle, etc.?

Error installing MMSplice

MMSplice version: 1.0.3
Python version: 3.5.1
Operating System: RHEL 7.4

I'm trying to install MMSplice. I run the commands:

pip install cyvcf2 cython
pip install mmsplice

I get this error:
ImportError: libstdc++.so.6: cannot open shared object file: No such file or directory

VEP plugin give error when connection to python package not specified

VEP plugin query predictions from the python package. An user error with message should be given if the connection is not specified. Also the readme should be updated.

Conda package for MMsplice

It would be nice to have MMsplice as a single Conda package.
I have attempted to make such integration some days ago. Unfortunately I couldn't resolve all the issues with bioconda's CI test. Even though the wrapper was successfully built and tested locally.
If this could be something you would consider, the PR for MMsplice wrapper on bioconda can be found here: bioconda/bioconda-recipes#30160

Breaks with latest versions of kipoiseq

I assume you are aware of it as you are developing both in parallel, but with the current release of kipoiseq mmsplice is unable to run

Quickfix: dependency kipoiseq==0.2.5 (haven't checked which one breaks exactly)
alternative: adjust vcf_dataloader.py L78 ff.

vep plugin error

MMSplice version: latest
Python version: 2.7x
Operating System: ubuntu 18

Description

I run mmsplice with vep plugin script but get "Can't locate DBD/mysql.pm" error
The error massage is blow:

install_driver(mysql) failed: Can't locate DBD/mysql.pm in @inc (you may need to install the DBD::mysql module) (@inc contains: /home/bio/perl5/lib/perl5/ .) at (eval 37) line 3.
Perhaps the DBD::mysql perl module hasn't been fully installed,
or perhaps the capitalisation of 'mysql' isn't right.
Available drivers: DBM, ExampleP, File, Gofer, Mem, Proxy, Sponge.
at /home/bio/bioapps/ensembl-vep/Bio/EnsEMBL/Registry.pm line 1769.

I try to install DBD::mysql module with cpan but replied with this error massage:

fatal error: xlocale.h: No such file or directory

What I Did

./vep -i vcf_file.vcf --plugin MMSplice --vcf --force --assembly GRCh37 --cache --port 3337

cpan DBD::mysql

"stop iteration" trouble

Hi, I'm trying to run MMSplice using the google colab notebook online. For some reason, I wasn't able to download MMSplice from kipoi directly in my computer... maybe it was because I have Windows, not Linux OS. I got lots of errors I just couldn't deal with. Anyways, I tried to run this model online to assess my VCF and in this part:

predict_save(model, dl, output_csv, pathogenicity=True, splicing_efficiency=True)

I got this error:

StopIteration Traceback (most recent call last)
in ()
----> 1 predict_save(model, dl, output_csv, pathogenicity=True, splicing_efficiency=True)

1 frames
/usr/local/lib/python3.7/dist-packages/mmsplice/mmsplice.py in predict_save(model, dataloader, output_csv, batch_size, progress, pathogenicity, splicing_efficiency)
272 splicing_efficiency=splicing_efficiency)
273
--> 274 return df_batch_writer(df_iter, output_csv)
275
276

/usr/local/lib/python3.7/dist-packages/mmsplice/utils.py in df_batch_writer(df_iter, output)
41
42 def df_batch_writer(df_iter, output):
---> 43 df = next(df_iter)
44 with open(output, 'w') as f:
45 df.to_csv(f, index=False)

StopIteration:

Obviously, I can't run the next lines because something is missing there. I think may be this is a bug, but i'm not sure (I'm not an expert in programming)
Would you advice me to try installing MMSplice in another computer with LinuxOS instead of running the google colab?
I will be very grateful if you can help me solve this issue, and if someone got the same error please let me know too.

--Romina--

Error when split_seq=False in dataloader

MMSplice version: 1.0.1
Python version:
Operating System:

Description

If split_seq=False in dataloader, model prediction will give an error. Better provide a more informative error message.

VEP plugin writes Tensorflow startup messages to stdout

I am running the VEP MMSplice plugin using --output_file STDOUT to pipe the results. However the tensorflow startup messages gets also forwarded and ends up in the pipe. It can be removed with | grep -v tensorflow but thats obviously not ideal.

ValueError: numpy.ndarray size changed, may indicate binary incompatibility.

When I tried to use MMSplice plugin, it comes with error message:

running code: 
vep -i ~/data/in.vcf.gz --vcf --no_stats -o ~/data/out.vcf --cache --plugin MMSplice

2022-04-12 17:44:48.311714: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-04-12 17:44:48.311754: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
Traceback (most recent call last):
  File "/home/tomas/anaconda3/bin/mmsplice", line 5, in <module>
    from mmsplice.main import cli
  File "/home/tomas/anaconda3/lib/python3.8/site-packages/mmsplice/__init__.py", line 8, in <module>
    from mmsplice.mmsplice import MMSplice, \
  File "/home/tomas/anaconda3/lib/python3.8/site-packages/mmsplice/mmsplice.py", line 10, in <module>
    from mmsplice.utils import logit, predict_deltaLogitPsi, \
  File "/home/tomas/anaconda3/lib/python3.8/site-packages/mmsplice/utils.py", line 4, in <module>
    from kipoiseq.dataclasses import Variant, Interval
  File "/home/tomas/anaconda3/lib/python3.8/site-packages/kipoiseq/__init__.py", line 11, in <module>
    from . import dataloaders
  File "/home/tomas/anaconda3/lib/python3.8/site-packages/kipoiseq/dataloaders/__init__.py", line 1, in <module>
    from .sequence import *
  File "/home/tomas/anaconda3/lib/python3.8/site-packages/kipoiseq/dataloaders/sequence.py", line 11, in <module>
    from kipoiseq.extractors import FastaStringExtractor
  File "/home/tomas/anaconda3/lib/python3.8/site-packages/kipoiseq/extractors/__init__.py", line 4, in <module>
    from .vcf import *
  File "/home/tomas/anaconda3/lib/python3.8/site-packages/kipoiseq/extractors/vcf.py", line 12, in <module>
    from cyvcf2 import VCF
  File "/home/tomas/anaconda3/lib/python3.8/site-packages/cyvcf2/__init__.py", line 1, in <module>
    from .cyvcf2 import (VCF, Variant, Writer, r_ as r_unphased, par_relatedness,
  File "cyvcf2/cyvcf2.pyx", line 1, in init cyvcf2.cyvcf2
ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject
Use of uninitialized value $result in chomp at
	/home/tomas/.vep/Plugins/MMSplice.pm line 117, <GEN1> line 25 (#1)
    (W uninitialized) An undefined value was used as if it were already
    defined.  It was interpreted as a "" or a 0, but maybe it was a mistake.
    To suppress this warning assign a defined value to your variables.
    
    To help you figure out what was undefined, perl will try to tell you
    the name of the variable (if any) that was undefined.  In some cases
    it cannot do this, so it also tells you what operation you used the
    undefined value in.  Note, however, that perl optimizes your program
    and the operation displayed in the warning may not necessarily appear
    literally in your program.  For example, "that $foo" is usually
    optimized into "that " . $foo, and the warning will refer to the
    concatenation (.) operator, even though there is no . in
    your program.
    
Use of uninitialized value $result in string eq at
	/home/tomas/.vep/Plugins/MMSplice.pm line 118, <GEN1> line 25 (#1)

What does this error suggest?
Could you please help me with it? Thanks

gagneurlab / mmsplice_mtsplice Goto Github PK

mmsplice_mtsplice's People

Contributors

Stargazers

Watchers

Forkers

mmsplice_mtsplice's Issues

Description

What I Did

Description

What I Did

Description

What I Did

Description

What I Did

Description

What I Did

Description

What I Did

Description

Description

What I Did

Description

Description

What I Did

Description

What I Did

Description

What I Did

Description

Description

What I Did

Description

What I Did

Paste the command(s) you ran and the output (Google collab script)

If there was a crash, please include the traceback here (Google collab script)

Description

What I Did

Description

What I Did

OUTPUT

Description

What I Did

Description

Recommend Projects

Recommend Topics

Recommend Org