Giter Club home page Giter Club logo

reptile's Introduction

reptile

import urllib.request
import ssl
from bs4 import BeautifulSoup
import time

num = 1 # 用来计数的一共有多少书
start_time = time.time() # 计算爬虫的时间

url = 'https://read.douban.com/columns/category/all?sort=hot&start='

for i in range(0, 10, 10): # 这里的range(初始, 结束, 间隔)
    # urllib.request库用来网络请求的
    # ssl用来请求https的
    context = ssl._create_unverified_context()
    html = urllib.request.urlopen('https://read.douban.com/columns/category/all?sort=hot&start=%d' % i, context=context)
    # html = urllib.request.urlopen('https://read.douban.com/columns/category/all?sort=hot&start=%d' % i)
    # BeautifulSoup用来解析网页
    bsobj = BeautifulSoup(html, 'lxml')
    # 根据类来查找标签(attrs={'calss', 'item store-item'} 可用 class_ ='item store-item'代替 )
    li_list = bsobj.find_all('li', attrs={'calss', 'item store-item'})
    for li_node in li_list:
        node = li_node.find('div', attrs={'class', 'info'})


        h_div = node.find('h4', attrs={'class', 'title'}) #获得标题
        h_a = h_div.find('a').contents[0]

        h5_subtitle = node.find('h5', attrs={'class', 'subtitle'})
        suntitle = ''
        if h5_subtitle is not None:
            suntitle = h5_subtitle.contents[0]


        h_intro = node.find('div', attrs={'class', 'intro'})
        intro = h_intro.contents[0]

        print('得到的url', h_a, suntitle, intro)
    time.sleep(1)
    num += 1

end_time = time.time()
duration_time = end_time - start_time
print('运行时间%d', duration_time)

reptile's People

Contributors

mrjyuhongjiang avatar

Stargazers

鱼先生 avatar

Watchers

James Cloos avatar 鱼先生 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.