Comments (5)
Hello, spidermon calculates coverage based on the fields existing on the item. If you are using a item class that always defines all their fields with some default value like None, "". Alternatively, items like dict
and scrapy.Items
should work fine.
from spidermon.
regardless of whether the item field is empty or not.
Yes, there is a difference between non-empty and non-existent, and coverage is calculated based on the latter.
from spidermon.
Thank you for the clarification. If I do not load the Item with the variable at all I still get no spidermon_item_scraped_count
for the variable so I am a little unsure what is "triggering" these counts (since neither non-existent nor empty values seem to be counted). Could I bother you with briefly explaining what would be defined as 'non-existent' in this case? It would be very useful to have item coverage working...
Thanks!
from spidermon.
If I do not load the Item with the variable at all I still get no spidermon_item_scraped_count for the variable
Which is good?
I am a little unsure what is "triggering" these counts (since neither non-existent nor empty values seem to be counted).
Per your original report empty values were counted while non-existent ones were not, which is the expected behavior.
Could I bother you with briefly explaining what would be defined as 'non-existent' in this case?
Non-existent means the item doesn't have the field at all. Note that setting it to None is the same as not setting it at all, as None values are ignored by ItemLoader
, not sure if it's related to your questions though.
from spidermon.
Thank you for the clarification, I got it working!
The thing was that my pipeline errored out with a key error if I did not specify a value. For this reason I was passing None values where the value were not found. Once I understood that instead of doing (in the pipeline)
thing.variable = item['variable']
I could do
thing.variable = item.get('variable')
The pipeline allowed non-existent values and the item coverage works as you describe.
from spidermon.
Related Issues (20)
- Support different severity levels for Spidermon alerts HOT 2
- Error when using custom job tags action class HOT 1
- Create ReadTheDocs configuration build file
- spidermon.utils.zyte.client uses only project setting module for settings.
- Error with SendSmtpEmail: 'NoneType' object has no attribute 'bio_read' HOT 5
- Support restricting the coverage monitor fields HOT 4
- Unable to validate date and date-time with jsonschema HOT 2
- CI is broken in master HOT 1
- Add handling for datetime aware start time in PeriodicExecutionTimeMonitor HOT 1
- Change format of content of _validation field HOT 1
- Periodic item count monitor crashes if it runs before item_scraped_count exists in stats
- Field names containing "_items" crash field coverage
- Default email template fails with latest `scrapinghub` client version
- `SPIDERMON_MONITOR_SKIPPING_RULES` isn't documented
- feature: Allow to filter jobs in ZyteJobsComparisonMonitor by close_reason HOT 1
- feature: Allow to filter jobs in ZyteJobsComparisonMonitor by spider arguments
- TypeError: can't subtract offset-naive and offset-aware datetimes HOT 2
- issue: Wrong previous jobs count in ZyteJobsComparisonMonitor
- Scrapinghub jobs.list method called with incorrect filter parameter
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from spidermon.