Light

liffe93 / web-scraping Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 0.0 4.17 MB

Introducing you to Web Scraping with Python

Python 100.00%

web-scraping's Introduction

Web Scraping with Python

This project was created for introduce you to Web Scraping. The code show how to do web scraping content pages using Selenium Project created just for educational proposes.

Getting Started

First choose a browser of your choice. In this example I chose Chrome. After that choose a website to do web scraping.

My choice was: Investing

Prerequisites

Python 3.x
Some Python libraries
Chrome or Mozilla
ChromeDriver
GeckoDriver (Mozilla only)

Installing

Python libraries:

selenium - An API to write functional/acceptance tests using Selenium WebDriver;
lxml - Library for processing XML and HTML;
pandas - A great Python Data Analysis Library;
requests2 - Requests is the only Non-GMO HTTP library for Python, safe for human consumption;
beautfulsoup - Library for pulling data out of HTML and XML files.

With: pip install ..

ChromeDriver

You can find instructions here.

GeckoDriver

You can find instructions here

Tips

Use time.sleep () for the site to load before the script starts.
You can use other parameters to find the content on the site:
- find_element_by_id
- find_element_by_name
You can also search for multiple elements:
- find_elements_by_name
- find_elements_by_xpath
Documentation here.

Next Steps

Learn how API works.
Search for Data Structures in Python.
Learn JSON.
Feel free to build your own API to work with Web Scraping.

Author

Murilo Carlos - Inspired by Código Fonte TV

web-scraping's People

Contributors

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.