b-open / jobbuzz Goto Github PK
View Code? Open in Web Editor NEWBrunei job search database and alert notification
Home Page: https://jobbuzz.org
License: MIT License
Brunei job search database and alert notification
Home Page: https://jobbuzz.org
License: MIT License
to: Int!
from: Int!
per_page: Int!
current_page: Int!
total_page: Int!
total: Int!
Salary reporting similar to glassdoor.
This would be a fairly big feature so the details should be discussed before development.
References:
Jobs Brunei
O&G
Currently only the company name is saved.
Change it so that the following information are also scraped and saved.
If we use the following
type Job struct {
gorm.Model
Title string
Company string
Salary string
Location string
}
It will return the following JSON in API
{
"ID" : 1,
"CreatedAt" : "foo",
"UpdatedAt" : "foo",
"DeletedAt": "foo",
"Title" : "foo",
"Company" : "foo",
"Salary" : "foo",
"Location" : "foo",
}
Annotating the struct with json"field"
won't lowercase the fields in gorm.Model
type Job struct {
gorm.Model
Title string `json:"title"`
Company string `json:"company"`
Salary string `json:"salary"`
Location string `json:"location"`
}
It will return the following JSON in API
{
"ID" : 1,
"CreatedAt" : "foo",
"UpdatedAt" : "foo",
"DeletedAt": "foo",
"title" : "foo",
"company" : "foo",
"salary" : "foo",
"location" : "foo",
}
To lowercase all the fields we can't use gorm.Model
type Job struct {
ID uint `gorm:"primarykey" json:"id"`
CreatedAt time.Time `json:"created_at"`
UpdatedAt time.Time `json:"updated_at"`
DeletedAt gorm.DeletedAt `gorm:"index" json:"deleted_at"`
Title string `json:"title"`
Company string `json:"company"`
Salary string `json:"salary"`
Location string `json:"location"`
}
It will return the following JSON in API
{
"id" : 1,
"created_at" : "foo",
"updated_at" : "foo",
"deleted_at": "foo",
"title" : "foo",
"company" : "foo",
"salary" : "foo",
"location" : "foo",
}
I prefer the above JSON fields since it's the common convention in API response. What do you think?
Currently scraper only fetches jobs on the first page, implement pagination logic so that all jobs will be scraped.
Add a logging library so that we can use leveled logging
User should be able to specify filter parameters
Notify daily?
Notification medium
Create database schema
Create system architecture
Create sequence diagram
Implement
Separate logic with external dependencies.
External dependencies should have an interface.
Have a top level function which handles the state and call smaller functions to fetch more data at each stage.
WaitGroup details should be hidden in the implementation
Currently with go-colly, there is a potential issue where if 1 page fails to load then the data is considered corrupted because we need the whole set of data in order to determine which job listings are active or inactive.
There is no retry functionality in go-colly and error handling is not very useful.
I think it might be better for us to fetch the html as string (where we can have our own retry logic) then use an HTML parser to process the data instead.
This will be more similar to the logic of the scraper in the .NET version.
Get html node in go with css selector: https://github.com/PuerkitoBio/goquery
When we run the scraper cmd, it automatically create new jobs based on the jobs returned by the scrapers. Sometimes, the jobs is already exist in the DB, how do we prevent it to be inserted?
Some ideas, store the job links in db as well, then we query based on links before we insert the job to DB
Unable to scrape on servers outside of Brunei.
Have a local program scrape and clean data then POST to server.
change the api url to start with /api/v1/
Currently implementation adds new job listings but does not remove old ones.
2 ways to do this.
For that blazing fast job searching
Query example
GET /Job?search=developer&page=2&limit=10
Response example
{ page: 2, total: 30, from: 1, to: 10, data: [...] }
It's not an enjoyable development experience to keep on scraping the websites especially when you have unreliable internet connection. Suggestion is to have another cmd programme that seeds data to the DB. We can use export an SQL file from existing data and create a new cmd programme that imports the SQL file.
https://choosealicense.com/
Shall we go with MIT?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.