Giter Club home page Giter Club logo

peewee-scrapy's Introduction

peewee-scrapy

Apenas para fins de estudos

https://quotes.toscrape.com/

-requisicao na pagina, apos executar comando o scrapy gera um log tambem. scrapy shell https://quotes.toscrape.com/ caso o status seja 200. 2 formas de extrair os dados respose.xpath response.css -response.url respose.xpath response.xpath("//div") response.xpath("//div/span") response.xpath("//div/span[@class='text']/text()").getall() response.xpath("//span[@class='text']/text()").getall() response.xpath("//small[@class='author']/text()").getall() **response.css response.css("small.author::text").getall() response.css("a.tag::text").getall() response.xpath("//a[@class='tag']/text()").getall()

scrapy runspider scrapy-quotes.py scrapy runspider scrapy-quotes.py -o quotes.json scrapy runspider scrapy-quotes.py -o quotes.csv scrapy runspider scrapy-quotes.py -o result.json -t json scrapy runspider scrapy-quotes.py-o quotes.json --logfile logfile.text scrapy runspider scrapy-quotes.py -o some.json -t json 2> some.text

response.xpath('*//li[@class="next"]/a').get()

response.xpath('.//div[@class="tags"]/a[@class="tag"]/text()').get() response.xpath('//div[@class="tags"]').get() response.xpath('//div[@class="tags"]/a').get()

****** AOS FATOS.ORG **** buscar atributos response.css('a.card::attr(href)')

response.css('html') Out[2]: [] response.css('title').getall() Out[5]: ['<title>Aos Fatos | Valorize o que é real</title>', '<title>twitter</title>', '<title>Instagram</title>', '<title>Facebook</title>', '<title>WhatsApp</title>', '<title>YouTube</title>'] response.css('title::text').getall() Out[7]: ['Aos Fatos | Valorize o que é real', 'twitter', 'Instagram', 'Facebook', 'WhatsApp', 'YouTube']

response.xpath('*//nav[@class="menu"]').getall()

In [12]: response.xpath('//nav//ul//li/a').getall() Out[12]: ['Verdadeiro', 'Impreciso', 'Exagerado', 'Distorcido', 'Contraditório', 'Insustentável', 'Falso', 'Investigamos', 'Explicamos', 'Nas redes', 'Radar', 'Tempo Real', 'Eleições 2020', 'Análises', 'HQ', 'Aos Fatos Lab', 'Eleições 2018', 'Tudo', 'Especiais', 'Manuais', 'Aos Gráficos']

*traz todos os links que tenha "checamos" marcado response.xpath('//nav//ul//li/a[re:test(@href,"checamos")]').getall() Out[13]: ['Verdadeiro', 'Impreciso', 'Exagerado', 'Distorcido', 'Contraditório', 'Insustentável', 'Falso']

fetch('url') executa o response em outra url

article/h1/"conteudo" response.css('article h1::text').get()

response.css('p publish_date::text').get() ''join(response.css('p.publish_date::text').get().split())

response.xpath('//nav//ul//li/a[re:test(@href,"checamos")]').get()

response.css('article blockquote::text').getall()

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.