Giter Club home page Giter Club logo

scraping-on-cloud-run's Introduction

Node.js Scraping Service with Chrome

This project provides a Node.js environment integrated with Google Chrome and Chromedriver for web scraping tasks.

Structure

Our project lightly follows the principles of Clean Architecture to ensure maintainability, scalability, and testability.

  • infrastructure: This is where the tools, frameworks, and libraries are defined and interacted with. It's the outermost layer of the architecture.
  • adapters : This layer contains code that converts data between the use cases layer and any external agency such as the database or web.
  • use_cases: This layer contains the business rules of the application. All the application's use cases reside here

+----------------------+
|     use_cases        |
| (Business Logic)     |
+----------------------+
          ^
          |
+----------------------+
|      adapters        |
| (Interface Adapters) |
+----------------------+
          ^
          |
+----------------------+
|  infrastructure      |
|(Frameworks & Drivers)|
+----------------------+

Docker Overviews

Node.js

Utilizes Node.js version 18 as the runtime environment.

Google Chrome

Integrated with a specific version of Google Chrome, installed directly from a ZIP binary.

Chromedriver

Corresponding version of Chromedriver is set up and made available.

Port

The service is exposed on port 8080.

Dependencies

Project dependencies are installed via npm, based on the package.json and package-lock.json (if available).

TypeScript

The project seems to be written in TypeScript as there's a build step to compile TypeScript code.

Service Start

The service is initiated using npm start, which would typically start the main application or server.

Getting Started in your local

Follow the steps below to get this project up and running locally:

1. Service Account Creation

Create a new service account for the project. Grant the service account below the permissions.

  • Storage Object Admin
  • Cloud Datastore User

2. Storing the Service Account Key

After creating the service account, you'll receive a JSON key. Save this key as service_account_key.json in the root directory of the repository.

3. Building the Docker Image

Use the following command to build the Docker image:

docker build \
  -t scraping_cloud_run \
  --build-arg GOOGLE_APPLICATION_CREDENTIALS_PATH=/usr/app/service_account_key.json \
  --build-arg GOOGLE_CLOUD_PROJECT=google-cloud-project-id \
  .

4. Running the Container

Once the Docker image is built, you can run the container using:

docker run \
  -p 8080:8080 \
  -v $(pwd)/dist:/usr/app/dist \
  --name scraping_cloud_run \
  scraping_cloud_run

5. Hot Reloading during Local Development (Optional)

If you wish to see changes in real-time as you develop, you can enable hot reloading. This feature allows you to immediately visualize any modifications you make to the code without manually rebuilding the Docker container.

To utilize this:

a. Ensure that you have all project dependencies installed locally by running:

npm install

b. Start the development server with hot reloading enabled by executing:

npm run build

c. The server will now automatically refresh and reflect changes you make to the source files. Navigate to http://localhost:8080/{path} in your browser to see updates in real-time as you code.

GitHub Actions

The .github/workflows/deploy.yml file describes the GitHub Actions workflow for deploying the application to Google Cloud Run.

Replace google-cloud-project-id with your GCP project ID and scraping-app with your Cloud Run application name. Also, set the GCP_SA_KEY secret in your GitHub repository with the content of the GCP Service Account key JSON file.

scraping-on-cloud-run's People

Contributors

t-kurimura avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.