Giter Club home page Giter Club logo

py_scraping_template's Introduction

This Repository

The py_scraping_template repository contains a Python script for scraping data from a certain website and inserting the retrieved data into a Firestore database.

Setup

The following tools and packages are required to run the scripts in this repository.

  • Docker
  • Docker Compose
  • Python 3.7 or higher

Step 1: Clone the Repository

Clone this repository to your local machine:

Step 2: Create Directories on the Remote Instance

Run the following command to create directories on the remote instance:

make remote-create-dir

This command creates the following directories:

  • /usr/local/hoge_board_scraping/logs
  • /usr/local/hoge_board_scraping/chrome
  • /usr/local/hoge_board_scraping/cred
  • /usr/local/hoge_board_scraping/script/images

Step 3: Install Docker and Docker Compose on the Remote Instance

Run the following commands to install Docker and Docker Compose on the remote instance:

make remote-install-docker
make remote-install-docker-compose

Step 4: Change files in ./cred directories

hoge-board-ipass

ID={Your Id}
PASSWORD={Your  password}

firebase-adminsdk.json

To obtain the Firebase Admin SDK JSON file, follow these steps:

  1. Go to the Firebase console and select your project.
  2. Click on the gear icon at the top left corner and select "Project settings."
  3. Navigate to the "Service accounts" tab and click on "Generate new private key."
  4. A JSON file containing your private key will be downloaded to your computer.
  5. Make sure to keep the private key in a secure location and not to share it with anyone who should not have access to your Firebase project.

Step 5: Copy Files to the Remote Instance

Run the following command to copy the necessary files to the remote instance:

make scp-all

This command copies the following files:

  • docker-compose.yml
  • python-selenium/
  • script/
  • cred/

Step 6: Rebuild and Restart the Python Container

Run the following command to rebuild and restart the Python container:

make rebuild-restart-python

Step 7: Start Scraping

To start scraping, run the following command:

make ssh

This command logs you into the remote instance. Then, run the following command to start scraping:

cd /usr/local/hoge_board_scraping/
docker-compose up -d

Note

The docker-compose.yml file is used to start the Selenium Grid and Python containers. This file specifies the following services:

  • selenium-hub: Selenium Grid hub
  • chrome: Selenium Chrome node
  • python: Python container that runs the scraping script

The python container is built using the python-selenium directory, which contains the necessary packages and dependencies to run the scraping script.

The script directory contains the Python script for scraping the

py_scraping_template's People

Contributors

t-kurimura avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.