admiralenola / scoary Goto Github PK
View Code? Open in Web Editor NEWPan-genome wide association studies
License: GNU General Public License v3.0
Pan-genome wide association studies
License: GNU General Public License v3.0
Hello.
I am trying to use Scoary to generate a GWAS from my Roary pan genome. However, the code insists that my traits.csv is incorrectly formatted, and then crashes. At first it complained that the box in the top-left should be blank (as per your instructions). However, the box was always blank. Now it is trying to tell me that the strain names are not the same from the Roary output. However, these names were taken straight from the output and placed into a list. The local bioinformatistician and I have been trying to fix this all day, and the problem persists.
The newest error is that the program reports that a strain is missing from the genes file despite no other program having this issue, and the strain in the traits list is straight from the gene file. I can manually find it within seconds by looking at the header for both files.
I have been googling this issue and it does not seem to be a common occurrence. Any advice for this problem?
If the traits file contain only a subset of genomes of the Roary file, Scoary currently exits with a KeyError.
If you want to run Scoary on just a subset of the genomes that you ran Roary on (You might be missing phenotypic data for some isolates for example), there are currently two ways of handling this:
Using --restrict_to, pointing to a csv file which lists only the genomes you want to include.
Editing the Roary file by column-wise deletion of the genomes you don't have in your traits file. (Scoary doesnt use the summary statistics in the first columns of the Roary file, so this will not impact analysis)
Plan:
Although not seen in Roary files, genetic data can also have missing components. Implement handling of this.
Hi @AdmiralenOla, thanks so much for your awesome tool!
I am getting an error and I think it's related to the fact that I have missing data in my matrix. I have it recorded as "NA". I don't want to code it as "0" because I don't know if the data is missing, I didn't collect that trait for that strain.
Is there a better way for me to code missing data in the traits.csv file?
Thanks so much! ~ Josh
There is a bug that can sometimes cause the empirical p-values from permutations to become > 1.0 (Maximum 1.03).
Hi!
I would like to implement the scoary in our cluster.
Would I like to know if possible to run in a cluster?
Do you know if it is supported by MPI?
Regards
Hi,
This is my first time to use Scoary. I'm on version 1.3.3. I seem to be having trouble proceeding with the run because of the Roary file? I didn't change anything with the gene_presence_absence.csv, it's just as Roary produced it.
Here is the error it throws:
Warning: Could not properly detect the correct names for all columns in the ROARY table.
Traceback (most recent call last):
File "scoary.py", line 22, in
methods.main()
File "/home/adgm/Scoary/scoary/methods.py", line 117, in main
allowed_isolates=allowed_isolates)
File "/home/adgm/Scoary/scoary/methods.py", line 218, in Csv_to_dic_Roary
r[q[genecol]] = {"Non-unique Gene name": q[nugcol], "Annotation": q[anncol]} if roaryfile else {}
IndexError: list index out of range
Should be fairly simple to allow internal node labels in the newick file. Although this does not impact results, there is no reason for the program to crash.
I am doing a Scoary test with a 5,829 genome Roary file (~250 Mb) and a custom tree. It works fine in the beginning, but crashes (out of memory?) when storing the pairs. The server I use is Ubuntu 14.04 LTS (Biolinux 8) with 4 core-Xeon processor (8 threads) and 32 Gb RAM and 32 Gb swap.
Is there a maximum to the number of genomes for Scoary?
There is a strange bug where Scoary will sometimes crash with the following message:
TypeError: unsupported operand type(s) for +: 'long' and 'numpy.float64'
I don't know why only some systems are seeing this error. In fact, addition of long and numpy.float64 does not throw a TypeError on my 1.11.1 or 1.11.2 versions, but it does on some other systems.
Is it possible to use piggy output (IGR_presence_absence.csv) as input for Scoary?
I've used Scoary to decipher COGs that might have different associations between Host Species, and everything worked like a charm. But now, I'm unsure of what columns I should use to extract the best observations. Sensitivity, Specificity, Odds ratio and the multiple p values that were outputted vary in interpretation (High Odds ratio, Low Sensitivity). If I want to prune the results, which column should I care the most about and filter?
Hi,
I am running Scoary with manually created k-mer file. While the job finishes ok, the output file is empty, and I don't see any errors. I have 60 different datasets for which the same occurs, so I believe I am either misusing the options, not properly formatting the k-mer file, or exceeding the array.
I used Scoary 1.6.16 with Python 3.6 installed using conda, and the command I use is:
scoary -s 2 -t gr1.csv -g gr1_matrix_kmer.csv --threads 16 \
-o scoary_out_gr1 --delimiter ',' -c I EPW
"gr1.csv" has 1077 rows and looks like:
,serovar_phenotype
DRR106950,0
ERR023784,0
while "gr1_matrix_kmer.csv" has 1077 columns and 718522 rows, where the first column is the k-mer, and remaining columns are the samples (thus the -s 2
option).
I would really appreciate your input on this. If you need any additional information, or you have any ideas why I am not getting any output, please let me know.
Thank you,
Natasha
Hi!
This is probably my issue but I've noticed a few things that are a bit confusing.
I'm running scoary for 535 genomes and have made sure that all genomeIDs in my custom newick file match the genomeIDs in the gene presence/absence file but scoary reports that they don't match?
Reading custom tree file
CRITICAL:
Traceback (most recent call last):
File "/Users/matt/bin/miniconda3/lib/python3.6/site-packages/scoary/methods.py", line 246, in main
sys.exit("CRITICAL: Please make sure that isolates in "
SystemExit: CRITICAL: Please make sure that isolates in your custom tree match those in your gene presence absence file.
CRITICAL: Please make sure that isolates in your custom tree match those in your gene presence absence file.
I checked that the IDs do match in a number of ways, one of which was to run without and have scoary output its own newick file. I noticed that scoary outputs the 'inference' column header as a leaf on the tree (so +1 leaves). I deleted this column and then scoary runs with my custom tree with no issues but for n = 534 genomes?
Thanks in advance for any help with this!
An issue for placing images to be linked in the readme.
Isolate_tree_pop_structure_highlysignificant_26May.pdf
Isolate_tree_pop_structure_notsignificant_26May.pdf
Remove enforcing of the columns "Non-unique gene name" and "Annotation" in the output. Some users might have input file with only a single identifier column (Gene ID) before sample info starts, and wants to run with -s 2.
In the current version, this will cause Scoary to fill in the "Non-unique Gene name" and "Annotation" columns with sample data. (Because it automatically assumes that this info can be found in columns 2 and 3). There is really no need to enforce any other columns than Gene ID.
/var/spool/gridengine/execd/cu17/job_scripts/371921: line 11: 9884 Killed /gluster/home/yangtao/miniconda3/bin/scoary -g /gluster/home/yangtao/zhihe_seq/velvet/contigs/prokka/b/chrom/gene_presence_absence1.csv -t /gluster/home/yangtao/zhihe_seq/velvet/contigs/prokka/b/chrom/trait1.csv
there is no result
I am trying to run Scoary on dataset of 26 strains and 6927 genes. The program begins executing until a point where it is killed and returns a segmentation error. Does this mean I do not have sufficient memory to run the analysis?
Does unequal sample sizes (strain counts) per hosts affects the enrichment analysis (using --no_pairwise flag)?
I am trying to run scoary on roary output from 3100 samples. I get the following error
RuntimeError: maximum recursion depth exceeded while calling a Python object
I tried both python versions 2.7 and 3.5 but the error remains the same
Hi!
I was having RAM problems to run scoary. I was able to solve it by using a traits.csv file per trait instead of using a single traits.csv file with all the different traits. However, I am not sure if this is going to modify the statistic calculations. My purpose is to find bacterial proteins associated to the isolation source of the bacteria, so I am doing a similar approach to that explained with "cattle, human, sheep and food" in the main page of Scoary in github ("Enrichment of genes in select host groups"). So, the question is:
Can I use a single csv file per trait and run the process different times instead of a unique csv file with all the traits?
Thank you very much in advance
Hello,
I ran scoary with "-p 1.0" option to obtain p-values for all genes, but a part of genes in gene_presence_absence.csv were not reported.
Could you please tell me why scoary does not output information for all genes even when I used "-p 1.0"?
As originally reported by @dutchscientist in #53 , Scoary currently throws a
RuntimeError: maximum recursion depth exceeded while calling a Python object
when attempting to perform pairwise comparisons on a too big dataset. The exact threshold is unknown to me. I have ran it with ~3500 isolates. This error was reported with ~5800.
Full message:
Storing results: ST45
Calculating max number of contrasting pairs for each nominally significant gene
100.00%Traceback (most recent call last):
File "/usr/local/bin/scoary", line 11, in
load_entry_point('scoary==1.6.9', 'console_scripts', 'scoary')()
File "/usr/local/lib/python2.7/dist-packages/scoary-1.6.9-py2.7.egg/scoary/methods.py", line 244, in main
delimiter=args.delimiter)
File "/usr/local/lib/python2.7/dist-packages/scoary-1.6.9-py2.7.egg/scoary/methods.py", line 813, in StoreResults
num_threads, no_time, delimiter)
File "/usr/local/lib/python2.7/dist-packages/scoary-1.6.9-py2.7.egg/scoary/methods.py", line 920, in StoreTraitResult
Threadresults = list(Threadresults)
File "/usr/lib/python2.7/multiprocessing/pool.py", line 668, in next
raise value
RuntimeError: maximum recursion depth exceeded while calling a Python object
I've been made aware of a bug that sometimes occurs when pruning many isolates from the phylogenetic tree calculated internally. The issue occurs when the following subtree is encountered, and X-es are isolates to prune:
---- [x]
----| ---- [x]
---- |
---- [x]
(Excuse my horrible ASCII drawing)
Hi,
I want to build a Debian package which usually includes to run the test suite. Unfortunately the result file Tetracycline_resistance.results.csv is not part of the download tarball so the test suite fails.
Kind regards, Andreas.
Hi
I was testing your script and i noticed that it seems to be printing the same value for the Holm-Sidak_p and Benjamini_H_p for all genes.
Best regards
C Mendes
Hi,
Nice idea for utilising Roary output to do GWAS, but I am a little concerned that people are going to be using this and reporting results when it's not appropriate to do so.
For example, you have already pointed out that Fisher's test is just not appropriate for population structure reasons. To an untrained eye P-values are good, ergo there is good evidence for the hypothesis. When in actual fact, that is not an appropriate statistics to apply..
It's a neat attempt to deal with this using non-intersecting contrasting pairs, although it would be nice if you have mentioned a definition for what that actually is. You are still faced with the same problem (perhaps worse) of not really dealing with population structure, and applying a test that hinges on independence of trials. Plus, you have picked p=0.5 for binomial distribution, but do not justify it! Is it appropriate for all kinds of trees? Is it species specific? Why not 0.3657849, or 0.6903453?
Obviously I wish you luck and hope you find good solution for dealing with population structures, but people who are naive to stats and looking for low P values this is a dangerous tool. It should be made clear to talk to statistician/bioinformatician if they don't know what they are doing and just want to apply to their Roary analysis.
I hope you would add python bindings to existing tools for doing this from Roary output:
https://github.com/jessiewu/bacterialGWAS
and
https://github.com/sgearle/bugwashttps://github.com/sgearle/bugwas
are from the same paper.
@AdmiralenOla does not splitting paralogs (-s in roary) affect Scoary results?
Hi,
in issue #19 you confirm compatibility with Python3. I'd recommend to officially switch to Python3 since Python2 is EOL and distributions will stop to distribute it soon.
Kind regards, Andreas.
The GUI currently only uses one thread. I was getting the following error
fatal IO error 11 (Resource temporarily unavailable) on X server ":0"
at the multiprocess stage of methods.py. I'm not sure what's causing it, but everything works fine when running with a single thread. (In fact when running with a single thread the multiprocessing.Pool object is never initiated.)
Hi,
I was wondering if there is any way to incorporate the output from Scoary into phandango for visualization. As far as I understand, phandango requires a Manhattan plot format for GWAS data https://github.com/jameshadfield/phandango/wiki/Input%20data%20formats#manhattan-plots
Any idea on how to proceed.
Thanks!!!
Hi,
I was using scoary for ~3,600 isolates to test trait association on ~22,000 genes; however, even when I specify -p 1.0, scoary only reports ~3,100 genes in the results rather than the complete set of ~22,000 genes. The analysis ran to completion without errors as well.
Here's the log file content:
08/13/2020 10:34:24 AM ==== Scoary started ====
08/13/2020 10:34:24 AM Command: /home/jimmy.liu/.conda/envs/scoary-1.6.16/bin/scoary --threads 32 -g /scratch/jimmy.liu/reference_structure_chewbbaca_res_2020/cluster_157_gwas/allelic_presence_roary.csv -t /scratch/jimmy.liu/reference_structure_chewbbaca_res_2020/cluster_157_gwas/Cluster_157_subset_metadata.csv -o /scratch/jimmy.liu/reference_structure_chewbbaca_res_2020/cluster_157_gwas/ -p 1.0 -m 22416
08/13/2020 10:34:24 AM Reading gene presence absence file
08/13/2020 10:34:49 AM Creating Hamming distance matrix based on gene presence/absence
08/13/2020 10:36:30 AM Building UPGMA tree from distance matrix
08/13/2020 10:38:34 AM Reading traits file
08/13/2020 10:38:34 AM Finished loading files into memory.
08/13/2020 10:38:34 AM ==== Performing statistics ====
08/13/2020 10:38:34 AM -- Filtration options --
08/13/2020 10:38:34 AM Individual (Naive): 1.0
08/13/2020 10:38:34 AM Collapse genes: False
08/13/2020 10:38:34 AM Tallying genes and performing statistical analyses
08/13/2020 10:38:34 AM Gene-wise counting and Fisher's exact tests for trait: grp
08/13/2020 10:39:50 AM Adding p-values adjusted for testing multiple hypotheses
08/13/2020 10:39:50 AM Storing results: grp
08/13/2020 10:39:50 AM Calculating max number of contrasting pairs for each nominally significant gene
08/13/2020 10:41:04 AM Storing results to file
08/13/2020 10:41:04 AM
08/13/2020 10:41:04 AM ==== Finished ====
08/13/2020 10:41:04 AM Checked a total of 22416 genes for associations to 1 trait(s). Total time used: 399 seconds.
08/13/2020 10:41:04 AM No warnings were recorded.
You can find my data here:
Trait file: https://drive.google.com/file/d/18nj3zFWS5OWONIn1xZhM_Uht6siOY6-n/view?usp=sharing
Gene presence/absence file: https://drive.google.com/file/d/1pWaDezegBbhc06yTV2OoiMcr3Es6SeRj/view?usp=sharing
Cheers,
Jimmy
Dear developers,
I wonder if the input gene_presence_absence.csv file for Scoary should contain binary values (1 and 0) rather than Gene ID?
The example of the input file (https://raw.githubusercontent.com/AdmiralenOla/Scoary/master/scoary/exampledata/Gene_presence_absence.csv) contains binary values (1 and 0) indicating the presence and absence of each gene in each sample, like the gene_presence_absence.Rtab file with binary values (1 and 0) from Roary, rather than the gene_presence_absence.csv file with the Gene ID from Roary (https://github.com/haruosuz/mgsa/blob/master/roary/analysis/i95/gene_presence_absence.csv).
Hi,
I keep getting this error message when I ran Scoary on my Roary Output:
CRITICAL:
Traceback (most recent call last): File "/miniconda3/envs/scoary/lib/python3.6/site-packages/scoary/methods.py", line 268, in main strains)
File "miniconda3/envs/scoary/lib/python3.6/site-packages/scoary/methods.py", line 568, in Csv_to_dic sys.exit("Make sure the top-left cell in the traits file "
SystemExit: Make sure the top-left cell in the traits file is either empty or 'Name'. Do not include empty rows Make sure the top-left cell in the traits file is either empty or 'Name'. Do not include empty rows
This my trait CSV file and there seems to be no error with the traits file.
Name,Abortive,Non_Abortive
B197.11581,0,1
B197.7887,0,1
B197.7889,0,1
B197.789,0,1
B197.7927,0,1
What could be the issue? Thanks
How do I use multiple vcf files to create the Scoary csv to be used with the the main script? Or in other words if I start with multiple vcf files of different isolates mapped to the same reference, how can I get scoary results (similar to starting from roary).
I am unable to upgrade my existing Scoary installation:
pip3 install --upgrade scoary
Collecting scoary
Using cached scoary-1.6.13.tar.gz
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/linuxbrew/pip-build-4n646sv8/scoary/setup.py", line 14, in <module>
long_description=readme(),
File "/tmp/linuxbrew/pip-build-4n646sv8/scoary/setup.py", line 7, in readme
with open('README_pypi.md') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'README_pypi.md'
----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/linuxbrew/pip-build-4n646sv8/scoary/
Hi, I'm new to using scoary and am running into an issue. Here is the full error that scoary gives me:
Traceback (most recent call last): File "/home/hcm59/miniconda3/envs/scoary/bin/scoary", line 8, in <module> sys.exit(main()) File "/home/hcm59/miniconda3/envs/scoary/lib/python3.9/site-packages/scoary/methods.py", line 278, in main RES_and_GTC = Setup_results(genedic, traitsdic, args.collapse) File "/home/hcm59/miniconda3/envs/scoary/lib/python3.9/site-packages/scoary/methods.py", line 914, in Setup_results bh_c_p_v[s_p_v[len(s_p_v)-1][0]] = last_bh = s_p_v[len(s_p_v)-1][1] IndexError: list index out of range
It seems to be working prior to this, but stops here and doesn't give any output files. I looked in the methods.py script but couldn't find anything obviously wrong.
My data are output from Roary, a phenotype file, both delimited with commas, and a Newick tree file from IQTree.
I found a previous issue that was similar (#23) but it looks like their problem was that their Roary file was delimited with semicolons, but I'm 99% sure mine is commas.
Any help is appreciated! I can send example files too.
Here's the script I used:
scoary -t /path/dog_verified_host_PhenoForScoary.csv \ -g /path/gene_presence_absence_roary.csv \ -o /path \ -n /path/core_gene_alignment.aln-gb.nw \ --delimiter , \ --permute 1000 --threads 10
I'm using scoary in a conda environment that I built on a Linux server. Here are some specifications:
# packages in environment at /home/hcm59/miniconda3/envs/scoary:
#
# Name Version Build Channel
_libgcc_mutex 0.1 main
argparse 1.4.0 pypi_0 pypi
ca-certificates 2021.4.13 h06a4308_1
certifi 2020.12.5 py39h06a4308_0
ete3 3.1.2 pypi_0 pypi
ld_impl_linux-64 2.33.1 h53a641e_7
libffi 3.3 he6710b0_2
libgcc-ng 9.1.0 hdf63c60_0
libstdcxx-ng 9.1.0 hdf63c60_0
ncurses 6.2 he6710b0_1
numpy 1.20.2 pypi_0 pypi
openssl 1.1.1k h27cfd23_0
pip 21.0.1 py39h06a4308_0
python 3.9.2 hdb3f193_0
readline 8.1 h27cfd23_0
scipy 1.6.2 pypi_0 pypi
scoary 1.6.16 pypi_0 pypi
setuptools 52.0.0 py39h06a4308_0
six 1.15.0 py39h06a4308_0
sqlite 3.35.4 hdfb4753_0
tk 8.6.10 hbc83047_0
tzdata 2020f h52ac0ba_0
wheel 0.36.2 pyhd3eb1b0_0
xz 5.2.5 h7b6447c_0
zlib 1.2.11 h7b6447c_3
Thanks!!
-Holly
Update: just found out we had used Panaroo, not Roary, so I will be looking into this and seeing if I can find a solution!!
Hi,
I want to run Scoary on 384 genomes, for which I have 5 antibiotic resistance phenotypes (A,B,C,D,E) and – obviously – the Roary results. However, the following error occurs (macOS 10.12.2, Python 2.7.13, Scoary 1.6.10 installed via pip)
==== Scoary started ====
Reading gene presence absence file
Creating Hamming distance matrix based on gene presence/absence
Building UPGMA tree from distance matrix
Reading traits file
WARNING: Some isolates have missing values for trait C. Missing-value isolates will not be counted in association analysis towards this trait.
ERROR: Some isolates in your gene presence absence file were not represented in your traits file. These will count as MISSING data and will not be included.
Finished loading files into memory.
==== Performing statistics ====
-- Filtration options --
Individual (Naive): 0.05
Collapse genes: False
Tallying genes and performing statistical analyses
Gene-wise counting and Fisher's exact tests for trait: C
0.00%Traceback (most recent call last):
File "/usr/local/bin/scoary", line 11, in <module>
sys.exit(main())
File "/usr/local/lib/python2.7/site-packages/scoary/methods.py", line 253, in main
RES_and_GTC = Setup_results(genedic, traitsdic, args.collapse)
File "/usr/local/lib/python2.7/site-packages/scoary/methods.py", line 715, in Setup_results
stats = Perform_statistics(traitsdic[trait], genedic[gene])
File "/usr/local/lib/python2.7/site-packages/scoary/methods.py", line 863, in Perform_statistics
if int(traits[t]) == 1 and genes[t] == 1:
ValueError: invalid literal for int() with base 10: ''
The log file ends with the step Gene-wise counting and Fisher's exact tests
, no further information is given. I am aware of the WARNING (missing data for some traits) and ERROR (more isolates than phenotyping results, for now).
My traits.csv file looks like this:
,A,B,C,D,E
CH2500,1,0,0,0,1
CH2502,NA,0,0,NA,1
...
Cutting the traits.csv into 5 individual files produced 2/5 results, with 3 Scoary runs still failing with the same error message. Any idea what's going on and how to proceed?
Thank you.
From what I read, scoary is currently not able to work with non-binary traits.
I want to use scoary in order to determine the pangenomic differences between three apparent subspecies of my bacterium of interest. There appears to be a pretty strong signal, as the genomes cluster distinctly in a PCoA based on gene presence / absence data.
Specifically, I would like to find out which genes are differentially prevalent between the three clusters. Can I supply a trait file that has "dummy variables", something like this. My approach should work if scoary simply removes those samples that have no information for a specific trait. What do you think about this?
Sample_name Comp_clust_1_2 Comp_clust_1_3 Comp_clust_2_3
member_cluster_1 0 0 NA/empty
member_cluster_1 0 0 NA/empty
...
...
member_cluster_2 1 NA/empty 0
member_cluster_2 1 NA/empty 0
...
...
member_cluster_3 NA/empty 1 1
member_cluster_3 NA/empty 1 1
Is it possible to use self created absence_presence data file? For example for virulence genes detected with abricate?
Hello, Ola,
After completing running, I found only genes associated with the trait were reports. How can I get the information of all genes when using GUI though most of them were not significant ?
Kind regards,
Lanhong
Like any proper bioinformatics software, Scoary should produce a log file.
I am looking at a particular trait (heat resistance) across a set of strains of a given species, and I was wondering if it would be possible to examine a continuous variation in the trait (normalized from 0-1?) or a discrete set of values (low, medium, high? preferably more = better).
Thanks
Getting the following error while running scoary.
Reading gene presence absence file
Traceback (most recent call last):
File "/home/ga23981/src/Scoary-master/scoary.py", line 25, in
methods.main()
File "/home/ga23981/src/Scoary-master/scoary/methods.py", line 215, in main
outdir=args.outdir)
File "/home/ga23981/src/Scoary-master/scoary/methods.py", line 352, in Csv_to_dic_Roary
header = next(csvfile)
_csv.Error: field larger than field limit (131072)
Hi Ola,
I'm working with a large number single-cell amplified genomes, i.e. the individual assemblies are incomplete, ranging from ~30%-95% estimated completeness. This means that I do get reliable gene "presences", but "absences" can mean either true absence or just missed in the assembly.
I was wondering, what your thoughts on these kind of data would be with respect to association testing. And do you think, Scoary could be used / customized to analyze those data?
Cheers,
Thomas
Hi,
I have previously got scoary to work with roary output but can't get it to work with SNP output created with the SNP2vcf.py script.
My traits file is deffinitely formatted correctly (it works with roary input)
The SNP file looks OK but I get the error:
CRITICAL: Could not find 92dd9dbb-81ae-4faf-8867-4f27deef779f in the genes file. CRITICAL: Traceback (most recent call last): File "/home/ndm.local/sam/CSOLD/dev/lib/python2.7/site-packages/scoary/methods.py", line 278, in main RES_and_GTC = Setup_results(genedic, traitsdic, args.collapse) File "/home/ndm.local/sam/CSOLD/dev/lib/python2.7/site-packages/scoary/methods.py", line 798, in Setup_results stats = Perform_statistics(traitsdic[trait], genedic[gene]) File "/home/ndm.local/sam/CSOLD/dev/lib/python2.7/site-packages/scoary/methods.py", line 979, in Perform_statistics sys.exit("Make sure strains are named the same in your " SystemExit: Make sure strains are named the same in your traits file as in your gene presence/absence file
The vcf file definitely contains the isolate in question so I'm not sure what is going on?? Any ideas? (the names are definitely the same too!)
Scoary currently doesn't work with python3. It seems to freeze and eat up memory when trying to populate the quadtree with pairwise hamming distances.
Hi Ola,
I was giving a go with the latest version (the ascii logo is very neat!) and noticed that the non numeric fields in the output table are not quoted, which might cause problems when parsing the results, especially in the gene product field.
Example of a line which causes my parser to break:
group_3038,,outer membrane pore protein N, non-specific,3,12,3,332,50.0,96.511627907,27.6666666667,0.0011870360825,1.0,0.421032887975,3,3,1,0.125,0.5
(notice the "outer membrane pore protein N, non-specific" bit)
I'm hotfixing this issue by putting an empty string in the "Annotation" field of Roary's output, but I figured you might want to have a look into this potential issue.
Thanks a lot, Marco
Amongst the list of filter options should be an option that allow users to only see results where:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.