psipred / fragfold_idp Goto Github PK
View Code? Open in Web Editor NEWFragment based disorder simulation
License: Other
Fragment based disorder simulation
License: Other
PSIPRED RELEASE NOTES ===================== PSIPRED Version 4.0 By David Jones, January 2016 *** IMPORTANT ***************************************************** NCBI are now trying to move users to the new BLAST+ package. Please see the README file in the BLAST+ subdirectory for more information on PSIPRED's support for BLAST+. For now the preferred option is to stick with the classic BLAST package as the default. If the tar or rpm file you are downloading from NCBI has "+" in the filename, then you are downloading BLAST+ rather than BLAST. ******************************************************************* Here are some very brief notes on using the PSIPRED V4 software. PSIPRED is supplied in source code form - it must be compiled before it can be used. The code should compile on any ANSI C compiler e.g. the GNU C compiler. Please see the LICENSE file for the license terms for the software. Basically it's free to anyone (including commercial users) as long as you don't want to sell the software or, for example, store the results obtained with it in a database and then try to sell the database. If you do wish to sell the software or use it in a commercial product, then please contact UCL Business (http://www.uclb.com). PSIPRED is run via a tcsh shell script called "runpsipred" - this is a very simple script which you should be able to convert to Perl or whatever scripting language you like. If your sequence does not have any homologues in the current data banks, then it is possible to run PSIPRED on a single sequence. In this case, PSIPRED is run via a tcsh shell script called "runpsipred_single". Unfortunately, like every other secondary structure prediction method, PSIPRED does not perform as well on single sequences. Any secondary structure prediction based on a single sequence should be considered as unreliable. Before running PSIPRED, please check the runpsipred and runpsipred_single scripts to see if the path variables are set to wherever you have installed the program and data files. The default is to assume that the program is installed in the current directory - this is probably NOT what you want! INSTALLATION ============ Firstly compile the software: tcsh% cd to-wherever-you-untarred-PSIPRED tcsh% cd src tcsh% make tcsh% make install The executables will be placed in the PSIPRED bin directory. You must also install the PSI-BLAST and Impala software from the NCBI toolkit, and also install appropriate sequence data banks. The NCBI toolkit can be obtained from URL ftp://ftp.ncbi.nih.gov PSI-BLAST executables can be obtained from ftp://ftp.ncbi.nih.gov/blast EXAMPLE USAGE ============= In this example the target sequence is called "example.fasta": tcsh% runpsipred example.fasta Running PSI-BLAST with sequence example.fasta ... Predicting secondary structure... Pass1 ... Pass2 ... Cleaning up ... Final output file: example.horiz Finished. That's it - you can then look at the output: tcsh% more example.horiz SPECIAL OPTIONS =============== The psipass2 program has several special options which you can use if you wish. For example, the default command is as follows: psipass2 weights_p2.dat 1 1.0 1.0 output.ss2 input.ss > output.horiz Arguments 2,3 & 4 are as follows: Argument 2: No of filter iterations This controls the amount of "smoothing" that is carried out on the final prediction. The recommended setting is 1, but it may be worth trying higher values to increase the level of smoothing. Argument 3&4: Helix/Strand Decision constants These options control the bias for helix (Arg3) and strand (Arg4) predictions. The default values are equal to 1.0, but if you know your protein is, for example, mostly comprised of beta strands then you can increase the bias towards beta strand prediction. For example: psipass2 weights_p2.dat 1 1.0 1.3 output.ss2 input.ss > output.horiz increases the bias towards beta strand prediction by approximately 30%. SEQUENCE DATA BANK ================== As of PSIPRED V4.0 onwards, we no longer believe it is necessary for the sequence data banks used with PSI-BLAST to be filtered to remove low-complexity regions, transmembrane regions, and coiled-coil segments. The search data bank can therefore be any large non-redundant protein sequence data bank, with UNIREF90 (http://www.uniprot.org/help/uniref) being the recommended one. CHANGES FROM THE ORIGINAL PSIPRED ================================= The following is a quick summary of the main changes since the original PSIPRED. 1. The program now makes use of PSI-BLAST binary checkpoint files (using the Impala program makemat) to reduce loss of precision when parsing the original ASCII position specific matrices. 2. By default the 1st pass uses an average of 3 different neural network weight sets - this improves prediction accuracy slightly. 3. In addition to the normal horizontal summary output format, the program now also produces a full table of results which shows the individual coil, helix, strand network outputs. 4. A one-line header is output at the start of the output files to allow THREADER (and other programs) to automatically recognise a PSIPRED prediction. 5. An experimental interface to BLAST+ has been added (V3.0). This will extract PSSM data directly from ASN.1 checkpoint files. 6. Minor formatting bugs in .horiz file output for very long sequences have now been fixed (V3.21). 7. Minor output bug loses singleton residue coil predictions fixed (V3.3) 8. V4.0 released: new neural network architectures.
After installing FFIDP anew (this time all seems to work!) I'm still having problems with runFFIDP.py
Running RMSD Ensemble Clustering
PDB file error!
RMSD Ensemble Clustering Non Zero Exit status: 255
When I run rmsdclust_RMSDarray
(e.g. bin/rmsdclust_RMSDarray 2KJV.ens 2KJV_
) manually it works fine.
python bin/runSeqAnalysis.py --input example_data/2KJV.pdb
, if we're assuming the user should run it from main FFIDP directoryGot this error on psipred:
> /home/tkosciolek/test_ffidp/psipred/bin/psipass2: /lib64/libc.so.6: version `GLIBC_2.14' not found (required by /home/tkosciolek/test_ffidp/psipred/bin/psipass2)
but obviously it's not your fault I've got GLIBC_2.12
Otherwise works OK.
Some paths seem to be a little bit funky. This is an example from nfpar
file generated by the script: /home/tkosciolek/test_ffidp/fragfold_idp/bin/../output/a2117b92-ee8a-11e6-96ec-6805ca3313b9.ffaln
RMSDclust gives me an error:
> Running RMSD Ensemble Clustering
PDB file error!
RMSD Ensemble Clustering Non Zero Exit status: 255
but the program works fine when I run it manually.
JAVA (PFclust) also produces errors:
> Exception in thread "main" java.lang.UnsupportedClassVersionError: main/Main : Unsupported major.minor version 52.0
Again, that's probably not something we can do anything about.
pip install
gives me an error:Could not find a version that satisfies the requirement PyBrain==0.3.3 (from -r requirements.txt (line 6)) (from versions: 0.2.1, 0.3)
No matching distribution found for PyBrain==0.3.3 (from -r requirements.txt (line 6))
Works fine with version 0.3.
Shouldn't it read something along the lines of?
Switch to your Python2 environment and install ansible
OK (but initially I get an error saying there is no numpy and after that pip retries and installs biopython)
Is this really the only way? I pip installed ansible, so I need to find where it got installed first and then run this startup script from there. We should also change the path. Also, I'm not sure it's necessary if I do pip install
this time a got an error on unpacking BLAST+
What actually made me realize that you assume people are running this on a Linux machine. Which is fine, but it doesn't say so anywhere. And, for example, I tried to run it on a mac, hence the error above (I guess).
paths.yml
commentswhy some variables have dir
in their name and some have path
?
what are those: hhdb
hhdbversion
hhpdb
blastversion
dynamine_api_key
what I'm getting at, is that we should let users know that they may update some versions, i.e. hhdb
hhdbversion
hhpdb
, but rather not blastversion
.
Commented out section should probably read: leave commented
, instead of uncommented
, right?
install.yml
in ansible folder also has some hard-coded (but commented out) paths. Should they be removed?
The readme has an Outputs section that needs updated. Do you want to change that to describe the new csv files and what the numbers mean?
finish code in SlidingWindow.py
script
I think that runConsensus.py
script is making a wrong assertion:
if not glob.glob(args.ffidp_path+args.input_name+"/Dynamine_b_*"):
print("Dynamine results are not available")
exit(1)
There is a flag --dynamine_path
which should be used here, instead of ffidp_path
... also there is indir
flag which adds even more confusion.
Then, I got the PyBrain error, for which we already know the fix... but it gets even more wooly:
input file lengths do not match
Traceback (most recent call last):
File "bin/runConsensus.py", line 411, in <module>
out_fp=args.outdir+args.input_name+".consensus"
File "bin/runConsensus.py", line 58, in run_network
inp = network_input(ffidp_fp, dm_fp, ss_fp, aln_fp)
File "bin/runConsensus.py", line 124, in network_input
_feat = [dm[res], ff[res], ss[res][0], ss[res][1], ss[res][2]]
IndexError: index 0 is out of bounds for axis 0 with size 0
module load
and now PFClust
works! But there are other errors downstream.
name.ens_RMSDarray
, while file name is name_RMSDarray
> /home/tkosciolek/test_ffidp/fragfold_idp/bin/../output/output/a2117b92-ee8a-11e6-96ec-6805ca3313b9.ens_RMSDarray
Note output/output
- there should be only one output
folder
OK, leaving this for now and moving to the next step.
For this step I switched to Python 2 environment. The script works fine, but the server is not responding properly (both using the script and interactively, through the website). Moving on.
Switched back to Python 3 environment. The script is not working, because it expects some Dynamine_b_*
files and I don't know what it is (previous step never finished, so I copied data 2KJV.dm
from example_data
to output
).
> Traceback (most recent call last):
File "bin/runConsensus.py", line 395, in <module>
dynaResults = glob.glob(args.ffidp_path+args.input_name+"/Dynamine_b_*")[0]
IndexError: list index out of range
I ran the script from the main FFIDP directory:
python bin/RSEVAL.py -i tmp/2KJV.pdb_ens -j example_data/2KJV.cons
For some reason, even if the paths are specified correctly, I get an error:
Traceback (most recent call last):
File "bin/RSEVAL.py", line 75, in <module>
profile1 = read_profile(args.results_dir+"/"+args.i1)
File "bin/RSEVAL.py", line 20, in read_profile
with open(fp, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/home/tkosciolek/test_ffidp/fragfold_idp/bin/../output//tmp/2KJV.pdb_ens'
seems like lines 75 and 76 (profile1
and profile2
) have results_dir
argument, instead of reading i1
and i2
directly. Is there a reason for that?
When I change RSEVAL.py
to:
profile1 = read_profile(args.i1)
profile2 = read_profile(args.i2)
the command on top works no problem.
create runConsensus.py
5th step in the process. Takes Dynamine and FFIDP RMSD profile and builds consensus output
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.