Comments (7)
Oops, looks like lxml is the only parser BeautifulSoup can use
"Right now, the only supported XML parser is lxml. If you don’t have lxml installed, asking for an XML parser won’t give you one, and asking for “lxml” won’t work either."
http://www.crummy.com/software/BeautifulSoup/bs4/doc/#specifying-the-parser-to-use
So the new question becomes, "Would it be possible to have the reader not depend on BeautifulSoup?"
from python-dwca-reader.
Hi John,
Indeed, you've perfectly nailed it: python-dwca-reader depends on BeautifulSoup, and BeautifulSoup needs lxml. I've myself been uncomfortable since a long time to have such an heavy dependency for relatively "peripheral" features.
So one of my medium-term plan was to replace BeautifulSoup by something lighter, or at least make it optional. Do you urgently need to use python-dwca-reader? I can in the next few days (let's say a week) find time to evaluate if I can publish a new version that doesn't depend on BeautifulSoup. If not too hard and useful for you, I'd definitely go for it. It's also a good opportunity to test it (and fix it if necessary) on Jython, I don't think it has been done before!
Best,
Nico
from python-dwca-reader.
I am using python-dwca-reader actively, but the Jython context does not have the same urgency as just using the Readers. I thought about forking the repository and making a version that had BeautifulSoup optional, but it would probably take me longer than next week to get around to it. If you can do it that same time frame, that is better. I will gladly test it as soon as it is ready.
from python-dwca-reader.
Cool, didn't know you were already using it, happy that my work is useful to others.
I had a quick look, and it seems indeed that it should be possible to make an version of python-dwca-reader that replace BeautifulSoup/lxml by ElementTree from the standard library... If I'm not mistaken, it is also available in Jython, and so we shouldn't be too far from having Jython compatibility... What do you think?
from python-dwca-reader.
I think, "Excellent, go for it." Waiting anxiously.
On Fri, Aug 14, 2015 at 11:16 AM, Nicolas Noé [email protected]
wrote:
Cool, didn't know you were already using it, happy that my work is useful
to others.I had a quick look, and it seems indeed that it should be possible to make
an version of python-dwca-reader that replace BeautifulSoup/lxml by
ElementTree from the standard library... If I'm not mistaken, it is also
available in Jython, and so we shouldn't be too far from having Jython
compatibility... What do you think?—
Reply to this email directly or view it on GitHub
#43 (comment)
.
from python-dwca-reader.
Hi John,
I just released a new version (0.7.0) that totally drops the dependency to BeautifulSoup and lxml. All the APIs that were returning BeautifulSoup objects now return xml.etree.ElementTree.Element (from the standard library). Could you have a look?
I only checked very briefly, but it seems to work under Jython!
from python-dwca-reader.
Confirmed that this works great under Jython and completely solves the issue for me. Closing. Thank you very much.
from python-dwca-reader.
Related Issues (20)
- Extend CSVDataFile to support hash index on Core file HOT 3
- `.close()` errors do not work on non-MS operating system HOT 6
- Documentation: update contributing (nosetests -> pytest) HOT 1
- Support Python 3.12
- Handle dynamic properties HOT 1
- Support URLs for the metadata file
- Test failure on some systems with Python 3.7
- Assign column types (instead of considering everything is a string) HOT 2
- Any extension of this library for transforming the dwca to sql? HOT 8
- Get a logo! HOT 2
- Headers consistency checks HOT 4
- Support for fields that have both a default value and a data column
- Remove Python 2 related code
- conda repository version is almost 7 years out-of-date HOT 8
- Write a GDAL Python Driver for DWCA HOT 1
- Replace Travis-CI by GitHub actions HOT 1
- InvalidArchive: The descriptor references a non-existent field (index=17) HOT 5
- Crashes with recent GBIF downloads HOT 1
- Update contributing documentation to refer to the nosetests -> pytest update HOT 1
- Add functionality to iterate over a StarRecord HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from python-dwca-reader.