Comments (3)
Don't apologise! It should have been more obvious in the documentation :)
I'm glad you got it working!
Reproduction
I can't reproduce this error. Using the following commands produces no errors for me:
cazy_webscraper <email> --families PL20 -o cazy_db
cw_get_uniprot_data cazy_db <email> --families --pdb --sequence --ec
And the data was downloaded and inserted into the local CAZyme database correctly - checked using:
sqlite3 cazy_db.db "SELECT * FROM Uniprots"
sqlite3 cazy_db.db "SELECT * FROM Ecs"
sqlite3 cazy_db.db "SELECT * FROM Pdbs"
Bioservices
The lines of code you are quoting are from bioservices. For reproducibility of work/research, I wouldn't recommend altering the code base of widely used packages such as bioservices. If you're having issues, I would recommend raising an issue in the respective GitHub repo.
The Bioservices error 596 typically arised from issues with the new UniProt API (updated last year), as discussed in issue #100 . You need to running bioservices version >= 1.10.4. cazy_webscraper
should be handling this.
You might want to checkout the bioservices issue 224.
from cazy_webscraper.
Hi! Thanks for using cazy_webscraper
.
After building the local CAZyme database with records downloaded from CAZy, did you retrieve the PDB accessions from UniProt using cazy_webscraper
?
(semi-shameless plug of our paper in coming ;) ) To summarise a chunk of the paper (where it's explained better): When building the local CAZyme database, cazy_webscraper
parses data from a plain text file dump that's available from CAZy. The text file only contains:
- NCBI protein version accessions
- taxonomic kingdoms
- source organisms
- CAZy family annotations
Therefore, the resulting database only contains that data. - you can check this using sqlite3
to query the database, and which will return nothing:
sqlite3 -header GH.db "SELECT * FROM Pdbs"
cw_get_pdb_structures
retrieves the structure files from the PDB database for PDB accessions that are in the local CAZyme database.
So you first need to populate the local CAZyme database with PDB accessions from UniProt, using the cw_get_uniprot_data
command. Hence, the note in the documentation stating:
Note: PDB structure files are retrieved for the PDB accessions in a local CAZyme database created using cazy_webscraper.
I'll add an additional note to the documentation to make this clearer - I can see how it doesn't seem obvious
from cazy_webscraper.
@HobnobMancer Thanks for the help with this dumb error! btw I realized that the paper explains this workflow very well. Sorry for that.
However, the function cw_get_uniprot_data()
gives me an error, where the code batch = self.services.http_get(link, frmt="txt")
returns an int
variable in the file ~/.local/lib/python3.10/site-packages/bioservices/uniprot.py
and it should be a string
. I didn't find any better workaround than just replacing this line batch = batch.split("\n")[1:]
with batch = str(batch).split("\n")[1:]
to avoid stopping the process from UniProt. Even so, it is still giving me a warning like
WARNING [bioservices.UniProt:596]: status is not ok with Forbidden
Even with this warning, I could get the PDB IDs and the PDB structures as well. Thanks again!
from cazy_webscraper.
Related Issues (20)
- Failing to retrieve UniProt data HOT 4
- API missing opt to include 'kingdoms' in output
- Update to `sqlalchemy` 2.x HOT 1
- Crashes when retrieving seqs from NCBI
- Unexpected error message when retrieving AA UniProt sequences HOT 2
- Add subcommands HOT 1
- Increase unit test coverage
- Crashing when retrieving taxs from NCBI HOT 6
- Incomplete read error HOT 3
- Reduced memory demand HOT 1
- Crashing when retrieving taxs from NCBI - perhaps related to #120 HOT 1
- cazy_webscraper - error downloading database HOT 1
- Error during retrieving taxa info from NCBI HOT 1
- Incorrect parsing of NCBI protein version accession HOT 1
- Fix logger inheritance
- Use NCBI Tax IDs
- Bio.Entrez NotXMLError HOT 1
- Fails to retrieve data from UniProt
- Crashes when retrieving NCBI seqs: http.client.IncompleteRead
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cazy_webscraper.