Comments (2)
While thinking about the best way to implement this, I came up with a design and would like to get opinions about it.
Since we want the crawler to be reusable, I'm proposing a minimal https://docs.scrapy.org app that contains:
- A spider : BaseCrawler that inherits from scrappy.Spider.
- A settings for handling scrapy configs
- An item item file for handling ordered saves.
The BaseCrawler from 1 above can implement a yaml/json loader for any other person to subclass to run create their own config/logic.
I imagine this is a bit confusing right now but I can come up with a simple demo of my thoughts and share it for opinions here when I'm done.
from iranlowo.
@ruohoruotsi so i see there are plans for a scraper already, what is the progress so far cause am thinking of one that scrapes bbc yoruba although it can be sub classed and tuned for other blogs and websites generally.
from iranlowo.
Related Issues (16)
- Improve BIG file dependencies HOT 2
- Corpus Loading Features
- Write up a short description of current confidence measure
- Pre-filter words whose diacrictic forms are not in the dictionary HOT 1
- [REFACTOR] to use the PyPI OpenNMT-py
- Tokenizer feature HOT 1
- Outstanding task before submission to PyPI HOT 3
- Simplify CI/CD with GithubActions instead of TravisCI
- undefined symbol: _ZN3re23RE2C1ERKSs
- Runtime Error when tryinng to you .diacritize_text() HOT 1
- [ADD] Travis CI integration HOT 1
- [ADD] a module to do ADR HOT 1
- [RM] torchtext as a top-level package HOT 2
- [ADD] a more intuitive confidence value
- Language Identification Helper HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from iranlowo.