Comments (8)
Even though this optional requirements strategy is documented on the PEP 508, I don't believe it's the best approach seeking the mainstream usage of this library.
You suggest the addition of docs specifying which requirements are necessary for each part of the software, but I don't believe it's a good solution too. We all understand how useful are documentations, but IMHO we should make the software work out of the box whenever possible, using things such as convention over configuration and concepts like batteries included.
You could not consider the JSON Schema validation as part of the project's core, but to me, if we are using an external library in our source code, it should be listed in the project's requirements, and having multiple requirement files (not confuse with test requirement files) could look a little bit confusing, specially for new users.
If you say this validation thing is not part of the project's core, we could just provide a default interface and publish something else in another package, stored in another repository. Something that could be installed like:
pip install spidermon
pip install spidermon-json-schema-validator
It's way clearer to me.
What I'm trying to say is that we should try to make things a little bit easier and straightforward. Doesn't matter if you are a new or experienced developer, you expect things to be easy. If the project provides a JSON Schema validator class, you're supposed to be good to run with a simple pip install spidermon
, especially if it's such a common case between users.
Not to mention the undesirable runtime errors due to lack of dependencies you only discover later and have to spend some time figuring out what's missing. I know, I know, it wouldn't happen if we read the docs, right? But people do that.
Django is a well-known and widely adopted Python web framework. Its code base is huge, but although you have a dozen classes and features you don't use, you're good to go with a pip install django
. Other things could be installed with additional packages, but that are not part of the core project and should be listed as external dependencies that use built-in interfaces manually.
from spidermon.
Considering the actual implementation of spidermon, item validation is not a core feature (despite the fact that it is widely used), so you can still use spidermon (creating monitors and custom actions) for your spider without including these libraries.
A user can also chose a different library for item validation (different than jsonschema or schematics) and IMHO the user should not be obligated to install non-essential libraries.
Even the extras_require
actually in place contains more than necessary (for example, [monitoring]
install a slackclient
even if I just want to use boto
).
Scrapy doesn't include these extra libraries by default, for example:
https://doc.scrapy.org/en/latest/topics/feed-exports.html#topics-feed-storage-s3
I think the best solution would be include in the docs, the libraries required for each of the contrib
feature the user wants to use.
from spidermon.
After reading @victor-torres I have to agree with him, it's better to have everything included, this makes things easier for developers and we want them to use the library and monitor the spider right?
from spidermon.
I think this is a nice suggestion.
We think that we use more the validation than not use it, right?
from spidermon.
@rennerocha @raphapassini I think this is a very good suggestion and we should move towards the unification of these requirements.
from spidermon.
I still don't agree that these external libraries should be installed in the user environment
when you type pip install spidermon
.
Even jsonschema library doesn't include all libraries in the environment if they won't be used. If you want more validation formats you need install optional requirements.
You mentioned Django, and besides the fact that it is a complete framework (compared to spidermon, a
simple extension), it doesn't install everything for the user (you need to install your databases libraries for example). If the majority of the users are using PostgreSQL, should Django install PostgreSQL libraries for everyone, even if I don't need it?
Scrapy doesn't install boto. If you want to use S3 storage, you need to install an optional dependency (even if is a common case to have this feature used).
After pip install spidermon
you are good and can create your monitors and your custom actions. What could be easier than that???
from spidermon.
Related to #89, jsonschema is required by PythonExpressionsMonitor, a core feature not documented yet (we need more docs!!!), so now I agree that it needs to be included as default when you install spidermon.
from spidermon.
After #100 this is solved.
from spidermon.
Related Issues (20)
- fix: CI is currently failing on master HOT 3
- It can be difficult to confirm current spidermon thresholds for monitors HOT 1
- Add support for lists of dictionaries in field coverage rules
- Dependencies needs some refactoring
- Update supported python versions
- Support different severity levels for Spidermon alerts HOT 2
- Error when using custom job tags action class HOT 1
- Create ReadTheDocs configuration build file
- Field coverage always 100% HOT 5
- spidermon.utils.zyte.client uses only project setting module for settings.
- Error with SendSmtpEmail: 'NoneType' object has no attribute 'bio_read' HOT 5
- Support restricting the coverage monitor fields HOT 4
- Unable to validate date and date-time with jsonschema HOT 2
- CI is broken in master HOT 1
- Add handling for datetime aware start time in PeriodicExecutionTimeMonitor HOT 1
- Change format of content of _validation field HOT 1
- Periodic item count monitor crashes if it runs before item_scraped_count exists in stats
- Field names containing "_items" crash field coverage
- Default email template fails with latest `scrapinghub` client version
- `SPIDERMON_MONITOR_SKIPPING_RULES` isn't documented
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from spidermon.