Giter Club home page Giter Club logo

instapuller's Introduction

Insta-puller

A Serverless (compute) approach to scraping Instagram feeds. This application runs on Cloud Run and pulls images and captions from selected Instagram users. It stores these in a Cloud SQL database.

Note that storing data in Cloud SQL isn't a truly serverless data solution.

Setup

๐Ÿฅง This looks like a lot of setup, but it should only take about 5 minutes. It's just a bunch of copy-and-paste scripts to run in cloud shell. ๐Ÿฐ

You can run these steps from any terminal that has gcloud and docker, but the easiest way is to run all the following commands in cloud shell. You'll need a GitHub.com personal account. Recommended: create a new GCP project before proceeding.

Prep

Replace <your_github_username> with your account (e.g. davidstanke):

export GITHUB_USER=<your_github_username>

Duplicate repo

Don't clone this repo directly; instead, click "Use this template" to make a copy (or click here). Call it instapuller. Then clone your copy of the repo, and add a "staging" branch:

git clone https://github.com/${GITHUB_USER}/instapuller && cd instapuller
git checkout -b staging
git push -u origin staging

Alternative setup: you can use Artifact Registry instead of Container Registry:

  • enable artifact registry API and create a registry
  • configure docker (see "setup instructions") on Artifact Registry UI
  • replace all instances of gcr.io/$PROJECT/instapuller with your *.pkg.dev registry

Set everything up...

# set some convenience variables
export PROJECT=$(gcloud config list --format 'value(core.project)')
export PROJECT_NUMBER=$(gcloud projects list --filter="$PROJECT" --format="value(PROJECT_NUMBER)")
export GCB_SERVICE_ACCT="${PROJECT_NUMBER}@cloudbuild.gserviceaccount.com"
export RUN_SERVICE_ACCT="${PROJECT_NUMBER}[email protected]"

# Enable APIs and grant IAM permissions
gcloud services enable cloudbuild.googleapis.com run.googleapis.com sqladmin.googleapis.com sql-component.googleapis.com
gcloud projects add-iam-policy-binding $PROJECT --member=serviceAccount:$GCB_SERVICE_ACCT --role=roles/run.admin
gcloud iam service-accounts add-iam-policy-binding $RUN_SERVICE_ACCT --member=serviceAccount:$GCB_SERVICE_ACCT --role=roles/iam.serviceAccountUser

# Create CloudSQL databases
export PASSWORD=$(openssl rand -base64 15)
gcloud sql instances create instapuller --zone=us-central1-c --root-password=${PASSWORD}
gcloud sql databases create instapuller-prod --instance=instapuller  --charset=utf8mb4
gcloud sql databases create instapuller-staging --instance=instapuller --charset=utf8mb4

# Create initial application container
docker build -t gcr.io/$PROJECT/instapuller .
docker push gcr.io/$PROJECT/instapuller

# Create Cloud Run services
gcloud run deploy instapuller-prod --image=gcr.io/$PROJECT/instapuller --region=us-central1 --platform=managed --allow-unauthenticated --set-env-vars=DB_USER=root,DB_PASS=${PASSWORD},DB_NAME=instapuller-prod,CLOUD_SQL_CONNECTION_NAME=$PROJECT:us-central1:instapuller --set-cloudsql-instances=$PROJECT:us-central1:instapuller

gcloud run deploy instapuller-staging --image=gcr.io/$PROJECT/instapuller --region=us-central1 --platform=managed --allow-unauthenticated --set-env-vars=DB_USER=root,DB_PASS=${PASSWORD},DB_NAME=instapuller-staging,CLOUD_SQL_CONNECTION_NAME=$PROJECT:us-central1:instapuller --set-cloudsql-instances=$PROJECT:us-central1:instapuller

echo -e "======\nHere are the URLs of your Cloud Run services:\n-----\n$(gcloud run services list --platform=managed --format='value(URL)')\n====="

Open both URLs in a browser to verify that they work!

NOTE: the first load may be slow b/c the application will create the database on first request.

OPTIONAL: Verify that Cloud Build pipelines work

gcloud builds submit --substitutions=_DEPLOY_ENVIRONMENT=staging,SHORT_SHA=$(date +%Y%m%d_%H%M%S)
gcloud builds submit --substitutions=_DEPLOY_ENVIRONMENT=prod,SHORT_SHA=$(date +%Y%m%d_%H%M%S)

Then revisit the application URLs. They should look unchanged.

Connect your GitHub repo

For this, you'll use the Cloud Build Triggers page in the GCP console.

See the docs for Connecting to source repositories

  1. Use the "Cloud Build GitHub App" option and grant access if asked to do so.
  2. Select your copy of the instapuller repo
  3. On the "create a push trigger" step, click Skip for now (we'll add the trigger via gcloud)

Add triggers

# On commit to `main`, deploy to prod:
gcloud beta builds triggers create github \
   --repo-name=instapuller \
   --repo-owner=${GITHUB_USER} \
   --branch-pattern="^main$" \
   --build-config="cloudbuild.yaml" \
   --description="On commit to main, deploy to prod service" \
   --substitutions="_DEPLOY_ENVIRONMENT=prod"

# On commit to `staging`, deploy to staging:
gcloud beta builds triggers create github \
   --repo-name=instapuller \
   --repo-owner=${GITHUB_USER} \
   --branch-pattern="^staging$" \
   --build-config="cloudbuild.yaml" \
   --description="On commit to staging, deploy to staging service" \
   --substitutions="_DEPLOY_ENVIRONMENT=staging"
   

Test it out! Make a commit to branch staging and push to GitHub; you should see your changes reflected on your staging service. Merge that branch to main and you should see the changes on prod.

Bonus: configure preview environments for each pull request


Running Locally

See docs > runlocally.md

[TODO: document the GCF functions]

instapuller's People

Contributors

davidstanke avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.