Giter Club home page Giter Club logo

Comments (8)

zhongdeming428 avatar zhongdeming428 commented on August 16, 2024

目标修正

由于拉勾网数据抓取难度相对较大,所以选择了智联招聘作为抓取目标。

from mymemorandum.

zhongdeming428 avatar zhongdeming428 commented on August 16, 2024

最新进度

已经可以初步抓取智联招聘某一页的公司信息:
image
下一步的工作就是抓取全面的信息,以及抓取更多页的信息。

from mymemorandum.

zhongdeming428 avatar zhongdeming428 commented on August 16, 2024

最新进度

已经实现了抓取某一页面数据的函数,传入页码、地点、工作岗位、数量即可获取数据。
image
上图表示忽略推荐的数据,只选择已有数据。

目前正在实现主函数。

from mymemorandum.

zhongdeming428 avatar zhongdeming428 commented on August 16, 2024

问题

使用获取单页面数据的函数循环获取数据时,容易丢失一些数据,main函数代码如下:

function main(job){
    // for(job in variables.jobs){
        // console.log(`Job : ${variables.jobs[job]}`);
        for(city in variables.cities){
            // console.log(`City : ${variables.cities[city]}`);
            var url = encodeURI(`http://sou.zhaopin.com/jobs/searchresult.ashx?bj=160000&sj=${variables.jobs[job]}&in=160400&jl=${variables.cities[city]}&p=1&isadv=0`);
            (function(Job, City){
                http.get(url, (res)=>{
                    res.setEncoding('utf-8');
                    var str = '';
                    res.on('data', (data)=>{
                        str += data;
                    });
                    res.on('end', function(){
                        var $ = cheerio.load(str);
                        var txt = $('span.search_yx_tj>em').text();
                        var jobsCount = parseInt(txt);
                        var pagesCount = 0;
                        if(jobsCount%60 === 0){
                            pagesCount = jobsCount/60;
                        } 
                        else{ 
                            pagesCount = jobsCount/60 + 1;
                        }
                        for(var i=1; i <= pagesCount; i++){
                            getData(i, Job, City, jobsCount, writeFile);
                        }
                        // console.log(`${Job}方面的工作在${City}共有岗位${txt}个。`);
                    });
                });
            })(job, variables.cities[city]);
        }
    // }
}

感觉到最后总是会漏掉很多数据,不知道是什么原因,需要解决。

问题尚未解决...

from mymemorandum.

zhongdeming428 avatar zhongdeming428 commented on August 16, 2024

最新进度

虽然上一个问题还没有解决,但是我已经继续前进,少一部分数据对我的影响并不是很大。今天已经可以爬取几乎所有数据了,并且我通过命令行传递参数,非常方便。
image
第三个参数是工作岗位,还可以有第四个参数是工作地点。
采集结果:
image
现已采集到MongoDB数据库中,已有数据26940余条(所有数据截止到2018/02/07)。
下一步的任务是准备写一个展示页面。

from mymemorandum.

zhongdeming428 avatar zhongdeming428 commented on August 16, 2024

最新进度

前端使用React框架,已经完成图表显示组件,能够显示出基本图表。
image

from mymemorandum.

zhongdeming428 avatar zhongdeming428 commented on August 16, 2024

最新进度

所有任务都已完成,下一步总结过程。
image

from mymemorandum.

zhongdeming428 avatar zhongdeming428 commented on August 16, 2024

总结

总结发布在博客:我的博客

整个项目基本完结。

from mymemorandum.

Related Issues (13)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.