cfpb / cfgov-crawler-app Goto Github PK
View Code? Open in Web Editor NEWAn electron app which crawls consumerfinance.gov and gathers interesting data
An electron app which crawls consumerfinance.gov and gathers interesting data
It would be helpful to document some useful searches one could make in SQLite with the output of a crawl.
For example, "find all pages that link to /about-us/", and save as CSV:
sqlite> .mode csv
sqlite> .output about_us_links.csv
sqlite> SELECT DISTINCT url, json_each.value link FROM cfpb, json_each(json(cfpb.contentLinks)) WHERE json_each.value LIKE '/about-us/%' ORDER BY url, link;
sqlite> .output stdout
If the folder for the database doesn't exist, the db creation fails silently and all the results do not get saved.
I've added some initial content for instructions for how to add custom searches. Please check and make sure if it makes sense!
@chosak and @contolini - Do you mind looking it over?
https://github.com/cfpb/cfgov-crawler-app/blob/main/README.md#adding-custom-search-parameters
It would be nice to be able to start a crawl from the command line, without needing to interact with a UI. This will allow use of this project as part of regularly scheduled automation.
A nice-to-have would be to have text-based console output like the graphical interface, to give an idea how far along we are in the crawl. An even nicer-to-have would be the ability to pause/resume the crawl as we can do in the graphical UI.
Presumably there's some nice Node CLI package we could leverage to support the above? Ping @mistergone @contolini.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.