Comments (4)
The alternative of SPIDERMON_LIST_FIELDS_COVERAGE_LEVELS
looks promising in my point of view. We can set the "depth" where we want to check the coverage, so with an item like:
{
"field_1": "some_value",
"field_2": "some_value",
"field_3": {
"field_3_1": "some_value",
"field_3_2": "some_value",
}
}
If SPIDERMON_LIST_FIELDS_COVERAGE_LEVELS = 1
, so the stats will be like:
{
"spidermon_field_coverage/dict/field1/": "some_value",
"spidermon_field_coverage/dict/field2/": "other_value",
"spidermon_field_coverage/dict/field3/": "other_value",
}
If SPIDERMON_LIST_FIELDS_COVERAGE_LEVELS = 2
, so the stats will be like:
{
"spidermon_field_coverage/dict/field1/": "some_value",
"spidermon_field_coverage/dict/field2/": "other_value",
"spidermon_field_coverage/dict/field3/": "other_value",
"spidermon_field_coverage/dict/field3/field_3_1": "other_value",
"spidermon_field_coverage/dict/field3/field_3_2": "other_value",
}
What do you think???
from spidermon.
Skipping the coverage of certain patterns may not solve the problem, considering that for data not well structured, we may get keys unexpected from our targets. Regarding the stats limit in Scrapy Cloud, maybe changing what HubStorageStatsCollector uploads as stats is a more reliable alternative and is specific to the platform, so we don't introduce new Zyte specific code to Spidermon.
from spidermon.
I like the depth approach the most 👍 Mauricio implemented SPIDERMON_DICT_FIELDS_COVERAGE_LEVELS
as a separate setting in his PR, though both SPIDERMON_LIST_FIELDS_COVERAGE_LEVELS
and SPIDERMON_DICT_FIELDS_COVERAGE_LEVELS
could be unified into a single setting if we consider so.
from spidermon.
Fixed by #433
from spidermon.
Related Issues (20)
- Support different severity levels for Spidermon alerts HOT 2
- Error when using custom job tags action class HOT 1
- Create ReadTheDocs configuration build file
- Field coverage always 100% HOT 5
- spidermon.utils.zyte.client uses only project setting module for settings.
- Error with SendSmtpEmail: 'NoneType' object has no attribute 'bio_read' HOT 5
- Unable to validate date and date-time with jsonschema HOT 2
- CI is broken in master HOT 1
- Add handling for datetime aware start time in PeriodicExecutionTimeMonitor HOT 1
- Change format of content of _validation field HOT 1
- Periodic item count monitor crashes if it runs before item_scraped_count exists in stats
- Field names containing "_items" crash field coverage
- Default email template fails with latest `scrapinghub` client version HOT 1
- `SPIDERMON_MONITOR_SKIPPING_RULES` isn't documented
- feature: Allow to filter jobs in ZyteJobsComparisonMonitor by close_reason HOT 1
- feature: Allow to filter jobs in ZyteJobsComparisonMonitor by spider arguments
- TypeError: can't subtract offset-naive and offset-aware datetimes HOT 2
- issue: Wrong previous jobs count in ZyteJobsComparisonMonitor
- Scrapinghub jobs.list method called with incorrect filter parameter
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from spidermon.