google-scraper-ruby's People
google-scraper-ruby's Issues
[Backend] As a User, I can upload CSV keyword files
Backend for #8
Why
Authenticated users should be able to provide keywords to scrape. To achieve that, we provide an interface for users to upload CSV files.
Acceptance Criteria
- Authenticated users must be able to upload CSV files containing keywords
- Keyword files should be validated:
- A keyword file must contains between 1 and 1000 keywords
- A keyword file must have size limit of 5MB
- After upload, all keywords in the file should be persisted in the database
- Do NOT process the keyword file. This will tackled in #10
[UI] As a User, I can view the details of my keywords
Frontend for #21
Why
The keyword list view only show basic details about the keyword. Users need a way to see the full details of a given keyword
Acceptance Criteria
- Users MUST have a way to navigate to the detailed view from the keyword list view
- Users MUST have a way to navigate to the keyword list view from the detailed view
[API] As a User, I can search across all uploaded keywords and results
API for #25
Why
Provide a tool for users to gain a broader view across all keywords. This will aid users in extracting insights, trends from their keyword data. An API enables third-party applications to leverage this feature
Acceptance Criteria
- The API accepts GET method with the path as:
links/search
- The API should response with:
- Number of URLs match the query
- Their corresponding keywords
[Backend] As a User, I can only see details of my keywords
Why
Users must be authenticated before using the app, so that we can segregate users data. An user can only see the details of their uploaded keywords.
Acceptance Criteria
- For an unauthenticated user, redirect them to the sign in page if they visit the keyword details view
[Chore] Set up CD pipelines
Why
To save time manually deploying when merging code to develop/main branch
Acceptance Criteria
- Once a PR is merged into the develop branch, it automatically deployed to the staging environment
- Once a PR is merged into the main branch, it automatically deployed to the production environment
[API] As a User, I can sign in with username and password
Why
Third-party applications need to authenticate before interacting with our application
Acceptance Criteria
- The API accepts POST method with the path as:
accounts/sign_in
- The API requires username and password
- Responses with an access token if the credentials is correct. The access token can be used to authenticate subsequent requests
- Responses with 4xx errors if the credentials don't match
[Backend] As a User, I can view the list of my uploaded keywords
Backend for #12
Why
Users should be able to view their uploaded keywords list
Acceptance Criteria
- Users should be able to view the list of uploaded keywords
Notes
- The response will not (yet) contains information about scarping results. These information will be added later once we implement the keyword processing logic
[Backend] As a User, I can view the details of my keywords
Why
The keyword list view only show basic details about the keyword. Users need a way to see the full details of a given keyword
Acceptance Criteria
- Implement a separate view to see keyword details
- The view MUST provide:
- Number of AdWords advertisers in the top position
- Total number of AdWords advertisers on the page
- URLs of the AdWords advertisers in the top position
- Number of the non-AdWords results on the page
- URLs of the non-AdWords results on the page
- Total number of links (all of them) on the page
- A view of the page (using the cached HTML in #14)
[Backend] As a User, I can sign up with username and password
Backend for #5
Why
The application requires user authentication to correctly separate data between different users. To achieve that, users must have a way to register themselves to the application.
Acceptance Criteria
- Users must be able to register an account
- Upon successful registration, users should be directed to the dashboard page
- Upon unsuccessful registration, users should be notify with the failure reason
[Backend] As a User, after my keywords have been uploaded, they should be processed immediately
Why
Keywords after successfully uploaded should be processed immediately
Acceptance Criteria
- Keywords after persisted in the database should be processed immediately via background job
- The processing pipeline concurrency must be constraints
- Retry mechanism should be implemented
- Should have test cases
Implementation Details
- Use Sidekiq for background job processing
- After a keyword is successfully processed, users should be notified about such events. However, this issue won't cover it
- After too many failed attempts, a keyword should be mark as permanently failed. Users should be notified about such events. However, this issue won't cover it
[API] As a User, I can get the details of my keywords
Why
Provide an API for third-parties to fetch details about a given keyword
Acceptance Criteria
- The API accepts GET method with the path as:
keywords/:keyword_id
- The API MUST return all informations listed in #21
- Do NOT implement authentication. This will be added later once we set up the authentication pipeline
[UI] As a User, I can search across all uploaded keywords and results
Frontend for #25
Why
Provide a tool for users to gain a broader view across all keywords. This will aid users in extracting insights, trends from their keyword data
Acceptance Criteria
- Implement a search box to type query input
- Implement a dropdown to select query type
- If the query succeeds, show how many URLs match the given query and their corresponding keywords
- Notify users in case the query encounters any error or timeout
[UI] As a User, I can view the list of my uploaded keywords
Frontend for #11
Why
Users should be able to view their uploaded keywords list
Acceptance Criteria
- Provide a function interface
- The UI should be able to view multiple keywords at once (use table layout, preferably)
- The UI should implement pagination
- The UI should support sort by keyword feature (optional)
[UI] As a User, I can upload CSV keyword files
Frontend for #9
Why
Authenticated users should be able to provide keywords to scrape. To achieve that, we provide an interface for users to upload CSV files.
Acceptance Criteria
- implement a button for users to upload files
- Should implement filetypes constraint (only allow CSV files)
[Chore] Setup project using Rails template
Why
Quickly bootstraps the project and ensures it meets the company standard. It also keeps the project up-to-date with existing tools, dependencies and conventions
Template: Nimble Rails template
Acceptance Criteria
- A project with a bare-bone structure for Rails development is generated
- Can start the Rails server
Design
N/A
Resources
N/A
[API] As a User, I can only see my keywords
Why
Users must be authenticated before using the app, so that we can segregate users data. An user can only see and interact their uploaded keywords.
Acceptance Criteria
- For an unauthenticated user, returns
unauthenticated
HTTP error status - Use JSON web token for authentication scheme
Affected features:
- List keywords
- See keyword details
- Search keywords result
[UI] As a User, I can sign out of my account
Frontend for #53
Why
Authenticated users should be able to sign out from their account.
Acceptance Criteria
- Implement a sign out button
- The sign out button must be visible in every pages (except sign in/sign up pages), preferably in the page header
[Backend] Scrape keyword data from the Google Search page
Why
Scraping keywords data is our core logic
Acceptance Criteria
- Given a keyword, it should answer:
Number of AdWords advertisers in the top positionTotal number of AdWords advertisers on the pageURLs of the AdWords advertisers in the top positionNumber of the non-AdWords results on the pageURLs of the non-AdWords results on the page- All search entries on the page. Each entry includes its kind (ads / non_ads), position (top / bottom / nil) and urls (one entry can have multiple urls)
- Total number of links (all of them) on the page
- HTML code of the page/cache of the page
- All results MUST be persisted in the database
[Backend] As a User, my uploaded keywords are processed immediately
Why
Processing upload keywords immediately provides a smooth and snappier interface, enhances the application UX
Acceptance Criteria
- Upload keywords are converted to Sidekiq jobs and enqueued immediately
- Sidekiq jobs are distributed to workers and processed immediately (the keyword processing logic will be handled by #14)
- Retry mechanism MUST be implemented
- If a job exceeds maximum retry times, it should be abandon. The status is updated in the database accordingly
[Backend] As a User, I can only see my keywords
Why
Users must be authenticated before using the app, so that we can segregate users data. An user can only see and interact their uploaded keywords.
Acceptance Criteria
- For an unauthenticated user, redirect them to the sign in page
- Add user_id to the keywords table
Affected features:
- List keywords
- See keyword details
- Search keywords result
[Backend] As a User, I can sign in with username and password
Backend for #7
Why
The application requires user authentication to correctly separate data between different users. If one has already registered, they should be able to sign in.
Acceptance Criteria
- Users must be able to sign in with registered credentials (username and password)
- Upon successful sign in, users should be directed to the dashboard page
- Upon unsuccessful registration, users should be notify with the failure reason
[API] As a User, I can view the list of my uploaded keywords
Why
Allowing third-party applications to interact with us
Acceptance Criteria
- Provide an API to fetch users uploaded keywords
- Pagination must be implemented
- Do NOT implement authentication. This will be added later once we set up the authentication pipeline
[Backend] As a User, I can sign out of my account
Backend for #54
Why
Authenticated users should be able to sign out from their account.
Acceptance Criteria
- Authenticated users must be able to sign out from their account
- After signing out, users must be redirected to the sign in page
[Backend] As a User, I can search across all uploaded keywords and results
Backend for #26
Why
Provide a tool for users to gain a broader view across all keywords. This will aid users in extracting insights, trends from their keyword data
Acceptance Criteria
- Support these queries
- Exact match: i.e. how many times the
apple.com
URL appears? - Partial match: i.e. how many URLs contain the word
ruby
? - Pattern match: provide a Regex-subset syntax to perform complex queries
- Exact match: i.e. how many times the
- A query should return how many URLs satisfy the predicate and also what are their corresponding keywords
Notes
Technically, a three types of queries could be done with the pattern match. However, we reserve pattern matches for only complex queries to not confuse basic users.
[API] As a User, I can upload CSV keyword files
Why
Third-party applications needs an interface to upload keywords to our system
Acceptance Criteria
- Provide an API to upload keyword files
- Keyword files should be validated. Rules are specified in #9 and return 4xx error if the validation fails
- When succeeds, the API should response with a hyperlink. Such hyperlink could be polled continuously to keep track of the progress
[UI] As a User, I can sign in with username and password
Frontend for #6
Why
The application requires user authentication to correctly separate data between different users. If one has already registered, they should be able to sign in.
Acceptance Criteria
- Implements a form with two attributes:
email
andpassword
- Enable the submit button only if both inputs are filled (optional)
- Show validation error in case of authentication error
[UI] As a User, I can sign up with username and password
Frontend for #4
Why
The application requires user authentication to correctly separate data between different users. To achieve that, users must have a way to register themselves to the application.
Acceptance Criteria
- Implement a form with two attributes:
email
andpassword
- Enable the submit button only if both inputs are filled (optional)
- Show validation error in case of authentication error.
[Chore] Configure CI test automation
Why
Reduce manual testing time and enforce code testing before merging
Acceptance Criteria
Every time a pull request is opened, the test suite must run automatically (via Github Actions)
Design
N/A
Resources
N/A
[UI] As a User, my uploaded keywords are processed immediately
Why
Processing upload keywords immediately provides a smooth and snappier interface, enhances the application UX
Acceptance Criteria
- After upload a file successfully, the UI is updated to reflect the new scraping job
- While the file is processed, the UI is updated in real-time to reflect the current status. For each keyword, it should display:
- The current status: pending / processing / succeeded / failed
- A link to see details information once the scraping succeeds
Notes
(We don't support retry status for now).retry keywords
are keywords which have failed to scrape and have been scheduled for retryerror keywords
are keywords which have exceeded the maximum retry times. See #15 for more details
[Chore] Set up Sidekiq for background job
Why
A background job system provides us a better control of how we should process user's keywords (concurrency control, job persistency, ...)
Acceptance Criteria
- Can enqueue to Sidekiq
- Can dequeue to Sidekiq
- Sidekiq MUST provides persistency, meaning if Sidekiq crashes, we won't lose any ongoing jobs
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.