Giter Club home page Giter Club logo

go-google-scraper-challenge's Introduction

Staging Staging testStaging deploymentStaging

Production Production testProduction deploymentProduction

Introduction

A project for Nimble Go Internal Certification on Web

Staging Production

Project Setup

Prerequisites

Development

Create an ENV file

To start the development server, .env file must be created.

  • Copy .env.example file and rename to .env

Build dependencies

  • air is used for live reloading

  • goose is used for database migration.

  • forego manages Procfile-based applications.

  • goview is used for the front-end part

They need to be built as a binary file in $GOPATH.

make install-dependencies

Start development server

make dev

The application runs locally at http://localhost:8080

Test

Execute all unit tests:

make test

Migration

Create migration

make migration/create MIGRATION_NAME={migration name}

List the migration status

make migration/status

Migrate the database

make db/migrate

Rollback the migration

make db/rollback

go-google-scraper-challenge's People

Contributors

carryall avatar

Stargazers

sh_ajunker avatar

Watchers

Somsak Arnon avatar Hoàng Mirs avatar

go-google-scraper-challenge's Issues

Refactor the make request functions for controller

Why

As there're are a lot of make request functions; authenticated and not authenticated, form and JSON request and it's getting messy now so we better refactor it

Acceptance Criteria

  • All tests should pass

[Chore-7] Make sure database in test environment is the same as the development

Why

Currently we didn't run any migration before running the test, and while running orm.RunSyncdb Beego ORM will sync the database tables according to the struct on models, which might produce a table structure that differs from the migration.
So we should either run the migration before the test or generate the schema and load that schema before running the test.

  • run migration before test
  • don't use orm.RunSyncdb

[Bug] The table with less result stretch out

Issue

When displaying 2 result tables and one of them have less result the table row stretch out

Screen Shot 2564-06-08 at 23 16 36

Expected

The table row height should remain the same regardless of the number of results displaying

Steps to reproduce

  1. Go to the production site
  2. Login as
    email: [email protected]
    password: 1234567890
  3. See the result table on the main page

[Backend] As a user, I can search the URL result by Regex.

Why

As a user, I can search the URL result by Regex

Acceptance Criteria

  • query result URLs by Regex

Sample question this query should be able to answer
How many keywords have URLs in stored reports with 2 or more “/” or 1 or more “>”.

[API] As a user, I can search the result by keyword.

Why

Users should be able to search the result by keyword

Acceptance Criteria

  • the /reports API should accept a keyword to search the result's keyword
  • response with a list of search results that match the search keyword

[Backend] Google scraper

  • use the scraper lib https://github.com/gocolly/colly
  • scrap Google search result on the keyword
  • workaround to handle search API rate limit
  • gather the following info
  • Number of AdWords advertisers in the top position.
  • Total number of AdWords advertisers on the page.
  • URLs of the AdWords advertisers in the top position.
  • Number of the non-AdWords results on the page.
  • URLs of the non-AdWords results on the page.
  • Total number of links (all of them) on the page
    HTML code of the page/cache of the page.

Use JSON API structure on the API response

Why

The mobile application clients ofter use a library to parse the JSON response with JSON API format so that the response for all API endpoints should follow that format

Acceptance Criteria

  • Update the /oauth/clients API response to follow JSON API
  • Update the /register API response to follow JSON API
  • Update the /login API response to follow JSON API

Resources

JSON API

Setup CI

  • add test
  • setup CI on Github Action

[API] As a user, I can see a detail of the search result

Why

As a user, I can see a detail of the search result so that I know

  • Number of AdWords advertisers in the top position.
  • Total number of AdWords advertisers on the page.
  • URLs of the AdWords advertisers in the top position.
  • Number of the non-AdWords results on the page.
  • URLs of the non-AdWords results on the page.
  • Total number of links (all of them) on the page
  • HTML code of the page/cache of the page.

Acceptance Criteria

  • Add an API results/:id

JSON Response should include the following fields

  • id
  • keyword
  • array of non-ad links
  • array of ad links object
    • position
    • type
  • page cache

[Backend] Store Google search result on the DB

  • add results table in DB
  • store the result to DB

The model structure:

type Result struct {
	Keyword string
	Adwords []Adword
	NonAdLinks []string
}

type Adword struct {
	Link string
	AdType string
}

const (
	TOP_IMG_AD = "TOP_IMG_AD"
	TOP_LINK_AD = "TOP_LINK_AD"
	SIDE_IMSGE_AD = "SIDE_IMSGE_AD"
	BOTTOM_AD = "BOTTOM_AD"
)

[Web] As a user, I can upload a CSV file of keywords

Why

As a user, I can upload a CSV file of keywords

  • the file can be of any size from 1 to 1,000 keywords

Acceptance Criteria

  • add upload form
  • extract keywords from file
  • upload form validation
  • display error message when user try to upload a file with > 1000 keywords
  • style for upload form

As a user I can register via API

Why

As after I revamp the project from Beego to Gin but the UIs are not migrated yet so to populate the user account a registration API is needed

Acceptance Criteria

  • Add /register API
  • Provide a way to generate client ID and secret

Revamp the application and replace Beego with Gin

Why

After pausing this IC for a while, Nimble has adopted the Gin framework so we better update this IC to align with the Gin template which would be used to bootstrap client projects.
The existing UI features would be deprioritized in favor of API stories.

Acceptance Criteria

  • Replace the codebase with a fresh new application bootstrapped from the Gin template
  • Move any useful files along to the new codebase (so we can use them later)

Resources

[Web] As a user, I can see a detail of the search result

Why

As a user, I can see a detail of the search result so that I know

  • Number of AdWords advertisers in the top position.
  • Total number of AdWords advertisers on the page.
  • URLs of the AdWords advertisers in the top position.
  • Number of the non-AdWords results on the page.
  • URLs of the non-AdWords results on the page.
  • Total number of links (all of them) on the page
  • HTML code of the page/cache of the page.

Acceptance Criteria

  • Add a link from the result link on the main page to the result detail page

Display

  • Links on the page separate by Adwords and non-Adwords with the total number of them
  • Adwords links by position with the total number of them
  • Non-Adwords links with the total number of them
  • A link to display the page cache

As a user I can log in via website

  • login page
  • login form
  • reuse form style from sign up
  • set current user to session when successfully logged in
  • redirect to main page when successfully logged in
  • display errors if failed to login

[API] As a user, I can upload a CSV file of keywords

Why

Users should be able to upload a CSV file of keyword via API

Acceptance Criteria

  • Add a API endpoint with POST method which receive a CSV file
  • get keywords from file and validate keyword size from 1 to 1,000
  • response with error if validation fail
  • save keywords to DB if validation pass

Note: the scraping part would be handled on another backend story

Improve error response rendering

Why

To make the code cleaner, we should centralize the error title, detail and code in one place

Acceptance Criteria

  • Refactor the error response rendering

Fix: Incomplete response from file upload API `POST /results`

Issue

When the file contains more than one keyword, the response contains only one result
Screen Shot 2565-06-16 at 12 41 30

Screen Shot 2565-06-16 at 12 41 51

Expected

The number of results in the response should be as same as the number of keywords in the file

Steps to reproduce

  1. Call the file upload API POST /results with a file containing more than one keyword
  2. Observe the response

As a user I can login via API

Why

As a user, I can log in via API with a valid client id and secret so that I can use keywords to search

Acceptance Criteria

  • Add a API /login
  • Which should return access token and refresh token

[Web] Auto log user in after signup

Why

Auto log user in after they sign up makes it easier for new users

Acceptance Criteria

  • after signup the user supposed to stay logged in and redirect to the main page

Setup CD

  • deploy to staging and production
  • setup the CD on Github Action

[Improvement] Move authenticated request validation to middleware

Why

Currently, a function in base controller is used to ensure the valid authenticated header from the request, but we'll need to call that function on every private controller action and also need to include a pair of test cases to check the valid and invalid request header

Some example can be found in varun-api and recruitmate-web

Acceptance Criteria

  • Use the middleware instead of the guard function
  • Adjust the routes into route groups

[Chore] Refactor form validation

Why

Refactor form validations regarding these constrain

  • current app show only one form error
  • while setting the error to validation there's an error return after set, should we handle and check whether it set or not?
// SetError Set error message for one field in ValidationError
func (v *Validation) SetError(fieldName string, errMsg string) *Error {
	err := &Error{Key: fieldName, Field: fieldName, Tmpl: errMsg, Message: errMsg}
	v.setError(err)
	return err
}
  • prefer separate validation into functions instead of expose them in Valid function
  • prefer setting the error within each validation function

Acceptance Criteria

  • all test should pass

Setup Cron job to handle scraping

Why

After revamping the application and replacing Beego with Gin so the previous Beego task which handles the search result scraping would no longer work so we need to replace it with a background job

Acceptance Criteria

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.