b-open / jobbuzz Goto Github PK

View Code? Open in Web Editor NEW

1.0 1.0 1.0 447 KB

Brunei job search database and alert notification

Home Page: https://jobbuzz.org

License: MIT License

Go 98.24% Shell 0.19% Makefile 0.16% Dockerfile 1.41%

brunei hacktoberfest

jobbuzz's People

Contributors

Stargazers

Watchers

Forkers

albertsyh

jobbuzz's Issues

Infographic showing number of jobs by location, salary range, industry, etc

Set up project layout

https://github.com/golang-standards/project-layout

Set gorm logger to use zerolog

Implement company graphql resolver

Return pagination result in graphql

  to: Int!
  from: Int!
  per_page: Int!
  current_page: Int!
  total_page: Int!
  total: Int!

Add company rating and reviews feature

https://www.reddit.com/r/Brunei/comments/sxwz8a/rbrunei_random_discussion_and_small_questions/hxxzkb9

Job link in database should be stored as absolute URL

Anonymous salary report

Salary reporting similar to glassdoor.

This would be a fairly big feature so the details should be discussed before development.

Allow users to report salary anonymously.
Include information such as sectors, industry, years of experience, fulltime/parttime.
Show infographic with filters.

References:

Return company object in jobs resolver

Add graphql query for individual job and company

Add company data model and endpoint

Scrape company information

Currently only the company name is saved.

Change it so that the following information are also scraped and saved.

Logo
General information
Contact
etc

JSON returned is capitalised

If we use the following

type Job struct {
    gorm.Model
    Title     string         
    Company   string     
    Salary    string         
    Location  string 
}

It will return the following JSON in API

{
  "ID" : 1,
  "CreatedAt" : "foo",
  "UpdatedAt" : "foo",
  "DeletedAt": "foo",
  "Title" : "foo",
  "Company" : "foo",
  "Salary" : "foo",
  "Location" : "foo",
}

Annotating the struct with json"field" won't lowercase the fields in gorm.Model

type Job struct {
    gorm.Model
    Title     string `json:"title"` 
    Company   string `json:"company"`
    Salary    string `json:"salary"`
    Location  string `json:"location"`
}

It will return the following JSON in API

{
  "ID" : 1,
  "CreatedAt" : "foo",
  "UpdatedAt" : "foo",
  "DeletedAt": "foo",
  "title" : "foo",
  "company" : "foo",
  "salary" : "foo",
  "location" : "foo",
}

To lowercase all the fields we can't use gorm.Model

type Job struct {
	ID        uint           `gorm:"primarykey" json:"id"`
	CreatedAt time.Time      `json:"created_at"`
	UpdatedAt time.Time      `json:"updated_at"`
	DeletedAt gorm.DeletedAt `gorm:"index" json:"deleted_at"`
	Title     string         `json:"title"`
	Company   string         `json:"company"`
	Salary    string         `json:"salary"`
	Location  string         `json:"location"`
}

It will return the following JSON in API

{
  "id" : 1,
  "created_at" : "foo",
  "updated_at" : "foo",
  "deleted_at": "foo",
  "title" : "foo",
  "company" : "foo",
  "salary" : "foo",
  "location" : "foo",
}

I prefer the above JSON fields since it's the common convention in API response. What do you think?

Implement pagination in scraper

Currently scraper only fetches jobs on the first page, implement pagination logic so that all jobs will be scraped.

Add logging library

Add a logging library so that we can use leveled logging

Add gin-gonic for http framework

Minify scraped html before storing it

https://github.com/tdewolff/minify

Job notification subscription feature

User should be able to specify filter parameters
- keywords
- location
- salary
Notify daily?
Notification medium
- web push
- email
- app push (our own standalone app or something like pushover?)
Create database schema
Create system architecture
Create sequence diagram
Implement

Refactor scraper to be modular and testable

Separate logic with external dependencies.
External dependencies should have an interface.

Have a top level function which handles the state and call smaller functions to fetch more data at each stage.

Get job links and company links and add to map
Get job details
Get company details
Return results

WaitGroup details should be hidden in the implementation

Consider using codeconv for code coverage

https://about.codecov.io/blog/getting-started-with-code-coverage-for-golang/

Include provider information in graphql results

Add viper for configuration management

https://github.com/spf13/viper

Changing scraper logic to be more resilient

Currently with go-colly, there is a potential issue where if 1 page fails to load then the data is considered corrupted because we need the whole set of data in order to determine which job listings are active or inactive.

There is no retry functionality in go-colly and error handling is not very useful.

I think it might be better for us to fetch the html as string (where we can have our own retry logic) then use an HTML parser to process the data instead.

This will be more similar to the logic of the scraper in the .NET version.

Get html node in go with css selector: https://github.com/PuerkitoBio/goquery

Retry: https://github.com/avast/retry-go

Job uniqueness

When we run the scraper cmd, it automatically create new jobs based on the jobs returned by the scrapers. Sometimes, the jobs is already exist in the DB, how do we prevent it to be inserted?

Some ideas, store the job links in db as well, then we query based on links before we insert the job to DB

Add colly for scraper

https://github.com/gocolly/colly

Unable to access Jobcentre outside of Brunei

Unable to scrape on servers outside of Brunei.

Have a local program scrape and clean data then POST to server.

Set up automatic deployment to test server

Add api versioning

change the api url to start with /api/v1/

Removal of inactive job listings

Currently implementation adds new job listings but does not remove old ones.

2 ways to do this.

If the job page is no longer accessible when it is no longer valid, then just checking the page regularly and marking it appropriately should be fine.
During the scraper job, scrape all job listings and compare with all entries in the database and mark missing ones as inactive.

Meilisearch Support

For that blazing fast job searching

Implement search and pagination on /Job and /Company endpoints

Query example
GET /Job?search=developer&page=2&limit=10

Response example
{ page: 2, total: 30, from: 1, to: 10, data: [...] }

DB Seeder

It's not an enjoyable development experience to keep on scraping the websites especially when you have unreliable internet connection. Suggestion is to have another cmd programme that seeds data to the DB. We can use export an SQL file from existing data and create a new cmd programme that imports the SQL file.

b-open / jobbuzz Goto Github PK

jobbuzz's People

Contributors

Stargazers

Watchers

Forkers

jobbuzz's Issues

Recommend Projects

Recommend Topics

Recommend Org