Pastebin-Crawler
This projects represents a Crawler that parses new pastes from the mentioned website.
The system crawls pastebin.com/archive every 2 minutes and stores only the new pastes into storage file.
System's Paste entity a few relevant attribute:
- Paste ID
- content - represents the paste's text content
- date- represents paste's creation date
- title
- author
Used libraries:
- lxml
- arrow
- requests
- pyodbc
Used drivers:
- MS ACCESS DRIVER