This script scrapes data from a list of websites and extracts the following information:
- Social Media Links
- Tech Stack (MVC, CMS, JS type, etc.)
- Meta Title
- Meta Description
- Payment Gateways (e.g., PayPal, Stripe, Razorpay)
- Website Language
- Category of Website
The extracted data is stored in a MySQL database. Moreover, 1.cvs is the exported data of MySQL database.
- Python 3.x
requests
librarybeautifulsoup4
librarylxml
librarymysql-connector-python
librarypython-Wappalyzer
library
- Clone the repository.
git clone https://github.com/smileagain6698/scrappy_100.git
- Create the virtual environment:
python -m venv venv venv\Scripts\activate
- Install the required Python libraries:
pip install requests beautifulsoup4 lxml mysql-connector-python python-Wappalyzer
- Set-up your MySQL database
- Add extensions to your vscode - SQLTools database management && SQLTools MySQL/MariaDB/TiDB Driver
- Open SQLTools, select MySQL database and update
Connection Settings
with similar to thesettings.json
file and as per your database. *(select MySQL) (password is your MySQL database password) - Test your connection and Save Connection.
- Run in terminal
python main.py ```