Giter Club home page Giter Club logo

google-scraper-ruby's People

Contributors

goose97 avatar

Watchers

 avatar  avatar

google-scraper-ruby's Issues

[Backend] As a User, I can upload CSV keyword files

Backend for #8

Why

Authenticated users should be able to provide keywords to scrape. To achieve that, we provide an interface for users to upload CSV files.

Acceptance Criteria

  • Authenticated users must be able to upload CSV files containing keywords
  • Keyword files should be validated:
    • A keyword file must contains between 1 and 1000 keywords
    • A keyword file must have size limit of 5MB
  • After upload, all keywords in the file should be persisted in the database
  • Do NOT process the keyword file. This will tackled in #10

[UI] As a User, I can view the details of my keywords

Frontend for #21

Why

The keyword list view only show basic details about the keyword. Users need a way to see the full details of a given keyword

Acceptance Criteria

  • Users MUST have a way to navigate to the detailed view from the keyword list view
  • Users MUST have a way to navigate to the keyword list view from the detailed view

[API] As a User, I can search across all uploaded keywords and results

API for #25

Why

Provide a tool for users to gain a broader view across all keywords. This will aid users in extracting insights, trends from their keyword data. An API enables third-party applications to leverage this feature

Acceptance Criteria

  • The API accepts GET method with the path as: links/search
  • The API should response with:
    • Number of URLs match the query
    • Their corresponding keywords

[Backend] As a User, I can only see details of my keywords

Why

Users must be authenticated before using the app, so that we can segregate users data. An user can only see the details of their uploaded keywords.

Acceptance Criteria

  • For an unauthenticated user, redirect them to the sign in page if they visit the keyword details view

[Chore] Set up CD pipelines

Why

To save time manually deploying when merging code to develop/main branch

Acceptance Criteria

  • Once a PR is merged into the develop branch, it automatically deployed to the staging environment
  • Once a PR is merged into the main branch, it automatically deployed to the production environment

[API] As a User, I can sign in with username and password

Why

Third-party applications need to authenticate before interacting with our application

Acceptance Criteria

  • The API accepts POST method with the path as: accounts/sign_in
  • The API requires username and password
  • Responses with an access token if the credentials is correct. The access token can be used to authenticate subsequent requests
  • Responses with 4xx errors if the credentials don't match

[Backend] As a User, I can view the list of my uploaded keywords

Backend for #12

Why

Users should be able to view their uploaded keywords list

Acceptance Criteria

  • Users should be able to view the list of uploaded keywords

Notes

  • The response will not (yet) contains information about scarping results. These information will be added later once we implement the keyword processing logic

[Backend] As a User, I can view the details of my keywords

Why

The keyword list view only show basic details about the keyword. Users need a way to see the full details of a given keyword

Acceptance Criteria

  • Implement a separate view to see keyword details
  • The view MUST provide:
    • Number of AdWords advertisers in the top position
    • Total number of AdWords advertisers on the page
    • URLs of the AdWords advertisers in the top position
    • Number of the non-AdWords results on the page
    • URLs of the non-AdWords results on the page
    • Total number of links (all of them) on the page
    • A view of the page (using the cached HTML in #14)

[Backend] As a User, I can sign up with username and password

Backend for #5

Why

The application requires user authentication to correctly separate data between different users. To achieve that, users must have a way to register themselves to the application.

Acceptance Criteria

  • Users must be able to register an account
  • Upon successful registration, users should be directed to the dashboard page
  • Upon unsuccessful registration, users should be notify with the failure reason

[Backend] As a User, after my keywords have been uploaded, they should be processed immediately

Why

Keywords after successfully uploaded should be processed immediately

Acceptance Criteria

  • Keywords after persisted in the database should be processed immediately via background job
  • The processing pipeline concurrency must be constraints
  • Retry mechanism should be implemented
  • Should have test cases

Implementation Details

  • Use Sidekiq for background job processing
  • After a keyword is successfully processed, users should be notified about such events. However, this issue won't cover it
  • After too many failed attempts, a keyword should be mark as permanently failed. Users should be notified about such events. However, this issue won't cover it

[API] As a User, I can get the details of my keywords

Why

Provide an API for third-parties to fetch details about a given keyword

Acceptance Criteria

  • The API accepts GET method with the path as: keywords/:keyword_id
  • The API MUST return all informations listed in #21
  • Do NOT implement authentication. This will be added later once we set up the authentication pipeline

[UI] As a User, I can search across all uploaded keywords and results

Frontend for #25

Why

Provide a tool for users to gain a broader view across all keywords. This will aid users in extracting insights, trends from their keyword data

Acceptance Criteria

  • Implement a search box to type query input
  • Implement a dropdown to select query type
  • If the query succeeds, show how many URLs match the given query and their corresponding keywords
  • Notify users in case the query encounters any error or timeout

[UI] As a User, I can view the list of my uploaded keywords

Frontend for #11

Why

Users should be able to view their uploaded keywords list

Acceptance Criteria

  • Provide a function interface
  • The UI should be able to view multiple keywords at once (use table layout, preferably)
  • The UI should implement pagination
  • The UI should support sort by keyword feature (optional)

[UI] As a User, I can upload CSV keyword files

Frontend for #9

Why

Authenticated users should be able to provide keywords to scrape. To achieve that, we provide an interface for users to upload CSV files.

Acceptance Criteria

  • implement a button for users to upload files
  • Should implement filetypes constraint (only allow CSV files)

[Chore] Setup project using Rails template

Why

Quickly bootstraps the project and ensures it meets the company standard. It also keeps the project up-to-date with existing tools, dependencies and conventions

Template: Nimble Rails template

Acceptance Criteria

  • A project with a bare-bone structure for Rails development is generated
  • Can start the Rails server

Design

N/A

Resources

N/A

[API] As a User, I can only see my keywords

Why

Users must be authenticated before using the app, so that we can segregate users data. An user can only see and interact their uploaded keywords.

Acceptance Criteria

  • For an unauthenticated user, returns unauthenticated HTTP error status
  • Use JSON web token for authentication scheme

Affected features:

  • List keywords
  • See keyword details
  • Search keywords result

[UI] As a User, I can sign out of my account

Frontend for #53

Why

Authenticated users should be able to sign out from their account.

Acceptance Criteria

  • Implement a sign out button
  • The sign out button must be visible in every pages (except sign in/sign up pages), preferably in the page header

[Backend] Scrape keyword data from the Google Search page

Why

Scraping keywords data is our core logic

Acceptance Criteria

  • Given a keyword, it should answer:
    • Number of AdWords advertisers in the top position
    • Total number of AdWords advertisers on the page
    • URLs of the AdWords advertisers in the top position
    • Number of the non-AdWords results on the page
    • URLs of the non-AdWords results on the page
    • All search entries on the page. Each entry includes its kind (ads / non_ads), position (top / bottom / nil) and urls (one entry can have multiple urls)
    • Total number of links (all of them) on the page
    • HTML code of the page/cache of the page
  • All results MUST be persisted in the database

[Backend] As a User, my uploaded keywords are processed immediately

Prerequisites: #13 #14

Why

Processing upload keywords immediately provides a smooth and snappier interface, enhances the application UX

Acceptance Criteria

  • Upload keywords are converted to Sidekiq jobs and enqueued immediately
  • Sidekiq jobs are distributed to workers and processed immediately (the keyword processing logic will be handled by #14)
  • Retry mechanism MUST be implemented
  • If a job exceeds maximum retry times, it should be abandon. The status is updated in the database accordingly

[Backend] As a User, I can only see my keywords

Why

Users must be authenticated before using the app, so that we can segregate users data. An user can only see and interact their uploaded keywords.

Acceptance Criteria

  • For an unauthenticated user, redirect them to the sign in page
  • Add user_id to the keywords table

Affected features:

  • List keywords
  • See keyword details
  • Search keywords result

[Backend] As a User, I can sign in with username and password

Backend for #7

Why

The application requires user authentication to correctly separate data between different users. If one has already registered, they should be able to sign in.

Acceptance Criteria

  • Users must be able to sign in with registered credentials (username and password)
  • Upon successful sign in, users should be directed to the dashboard page
  • Upon unsuccessful registration, users should be notify with the failure reason

[API] As a User, I can view the list of my uploaded keywords

Why

Allowing third-party applications to interact with us

Acceptance Criteria

  • Provide an API to fetch users uploaded keywords
  • Pagination must be implemented
  • Do NOT implement authentication. This will be added later once we set up the authentication pipeline

[Backend] As a User, I can sign out of my account

Backend for #54

Why

Authenticated users should be able to sign out from their account.

Acceptance Criteria

  • Authenticated users must be able to sign out from their account
  • After signing out, users must be redirected to the sign in page

[Backend] As a User, I can search across all uploaded keywords and results

Backend for #26

Why

Provide a tool for users to gain a broader view across all keywords. This will aid users in extracting insights, trends from their keyword data

Acceptance Criteria

  • Support these queries
    • Exact match: i.e. how many times the apple.com URL appears?
    • Partial match: i.e. how many URLs contain the word ruby?
    • Pattern match: provide a Regex-subset syntax to perform complex queries
  • A query should return how many URLs satisfy the predicate and also what are their corresponding keywords

Notes

Technically, a three types of queries could be done with the pattern match. However, we reserve pattern matches for only complex queries to not confuse basic users.

[API] As a User, I can upload CSV keyword files

Why

Third-party applications needs an interface to upload keywords to our system

Acceptance Criteria

  • Provide an API to upload keyword files
  • Keyword files should be validated. Rules are specified in #9 and return 4xx error if the validation fails
  • When succeeds, the API should response with a hyperlink. Such hyperlink could be polled continuously to keep track of the progress

[UI] As a User, I can sign in with username and password

Frontend for #6

Why

The application requires user authentication to correctly separate data between different users. If one has already registered, they should be able to sign in.

Acceptance Criteria

  • Implements a form with two attributes: email and password
  • Enable the submit button only if both inputs are filled (optional)
  • Show validation error in case of authentication error

[UI] As a User, I can sign up with username and password

Frontend for #4

Why

The application requires user authentication to correctly separate data between different users. To achieve that, users must have a way to register themselves to the application.

Acceptance Criteria

  • Implement a form with two attributes: email and password
  • Enable the submit button only if both inputs are filled (optional)
  • Show validation error in case of authentication error.

[Chore] Configure CI test automation

Why

Reduce manual testing time and enforce code testing before merging

Acceptance Criteria

Every time a pull request is opened, the test suite must run automatically (via Github Actions)

Design

N/A

Resources

N/A

[UI] As a User, my uploaded keywords are processed immediately

Why

Processing upload keywords immediately provides a smooth and snappier interface, enhances the application UX

Acceptance Criteria

  • After upload a file successfully, the UI is updated to reflect the new scraping job
  • While the file is processed, the UI is updated in real-time to reflect the current status. For each keyword, it should display:
    • The current status: pending / processing / succeeded / failed
    • A link to see details information once the scraping succeeds

Notes

  • retry keywords are keywords which have failed to scrape and have been scheduled for retry (We don't support retry status for now). error keywords are keywords which have exceeded the maximum retry times. See #15 for more details

[Chore] Set up Sidekiq for background job

Why

A background job system provides us a better control of how we should process user's keywords (concurrency control, job persistency, ...)

Acceptance Criteria

  • Can enqueue to Sidekiq
  • Can dequeue to Sidekiq
  • Sidekiq MUST provides persistency, meaning if Sidekiq crashes, we won't lose any ongoing jobs

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.