Giter Club home page Giter Club logo

python-web-scraper-example's Introduction

Python Web Scraper Sample Program

Example web scraper program using requests and beautifulsoup4 modules. Written in Python 3.6.x.


As a gamer who largely plays RPG games, I've been a big fan of Pokémon games since I was a kid. Now as an adult (and much closer to being considered a "Pokémon Master" than I was as a kid), I find myself wrapped up in more complex Pokémon game details whenever I play. Having accumulated vast amounts of what I'm sure my parents would call "useless info" about the Pokémon games, I largely know exactly what information to look for when I need to query the Internet for answers: what the Male/Female gender ratio for a Pokémon species is, what Abilities a Pokémon may have, how a caught/hatched Pokémon's Nature influences its battle stats, and of course where a given Pokémon can be found in a particular game.

So to learn how web scraping works in Python, I decided to create something practical and relevant to these interests. This command-line program asks users for the name of a Pokémon, and then attempts to return information about where that Pokémon is located based on data it scrapes from the Internet. The site it scrapes data from is a popular independently maintained all-things-Pokémon database site, Bulbapedia.

For instance, everyone's lovable favorite Electric Rat Pokémon, Pikachu. Pikachu has been in every Pokémon game and is thus a perfect comprehensive example. Below is a screenshot of the game locations table on Pikachu's Bulbapedia page:

Pikachu game locations table on Bulbapedia

When asked to retrieve game location data for Pokémon, the program scrapes the table on the above page with help from the Requests and Beautiful Soup 4 modules to grab all of the table's data, and then prints out the same information in a more minimal/compact form:

Generation I
------------
Red/Blue: Viridian Forest, Power Plant
Blue (Japan): Viridian Forest, Power Plant, Rocket Game Corner
Yellow: Starter Pokémon from Professor Oak in Pallet Town

Generation II
-------------
Gold/Silver: Route 2
Crystal: Route 2, Celadon Game Corner

Generation III
--------------
Ruby/Sapphire: Safari Zone
Emerald: Safari Zone
FireRed/LeafGreen: Viridian Forest, Power Plant
Colosseum: Trade
XD: Trade

Generation IV
-------------
Diamond/Pearl: Trophy Garden
Platinum: Trophy Garden
HeartGold/SoulSilver: Viridian Forest
Pal Park: Forest
Pokéwalker: Resort, Yellow Forest, Rally, and Sightseeing

Generation V
------------
Black/White: Poké Transfer
Black 2/White 2: Poké Transfer

Generation VI
-------------
X/Y: Santalune Forest, Route 3, Friend Safari (Electric)
Omega Ruby/Alpha Sapphire: Safari Zone (Zone 1, Tall grass), Any Contest Hall after first contest (Cosplay Pikachu)

Generation VII
--------------
Sun/Moon: Route 1, Hau'oli City (SOS Battle)
Ultra Sun/Ultra Moon: Route 1 (SOS Battle)

Requirements to run locally:

Run the program via command line using python main.py


Sample program run:

python main.py
>Welcome to Pokémon game locations search.
>Searches website 'https://bulbapedia.bulbagarden.net' for Pokémon game locations.

>Please enter a Pokémon name, or 'exit' to exit: charizarb
Pokémon charizarb does not exist! Check for spelling errors or extra characters and try again.


>Please enter a Pokémon name, or 'exit' to exit: charizard

Charizard locations:
====================

Generation I
------------
Red/Blue: Evolve Charmeleon
Blue (Japan): Evolve Charmeleon
Yellow: Evolve Charmeleon

Generation II
-------------
Gold/Silver: Time Capsule
Crystal: Time Capsule

Generation III
--------------
Ruby/Sapphire: Trade
Emerald: Trade
FireRed/LeafGreen: Evolve Charmeleon
Colosseum: Trade
XD: Trade

Generation IV
-------------
Diamond/Pearl: Trade
Platinum: Trade
HeartGold/SoulSilver: Evolve Charmeleon
Pal Park: Field

Generation V
------------
Black/White: Poké Transfer
Black 2/White 2: Poké Transfer

Generation VI
-------------
X/Y: Evolve Charmeleon
Omega Ruby/Alpha Sapphire: Trade, Event

Generation VII
--------------
Sun/Moon: Trade
Ultra Sun/Ultra Moon: Evolve Charmeleon


>Please enter a Pokémon name, or 'exit' to exit: Salamence

Salamence locations:
====================

Generation III
--------------
Ruby/Sapphire: Evolve Shelgon
Emerald: Evolve Shelgon
FireRed/LeafGreen: Trade
Colosseum: Trade
XD: Citadark Isle (Shadow)

Generation IV
-------------
Diamond: Trade, Event
Pearl: Evolve Shelgon
Platinum: Evolve Shelgon
HeartGold/SoulSilver: Evolve Shelgon
Pal Park: Mountain

Generation V
------------
Black: Trade, Event
White: Evolve Shelgon
Black 2/White 2: Evolve Shelgon

Generation VI
-------------
X/Y: Evolve Shelgon
Omega Ruby/Alpha Sapphire: Evolve Shelgon

Generation VII
--------------
Sun/Moon: Route 3 (SOS Battle)
Ultra Sun/Ultra Moon: Route 3 (SOS Battle)


>Please enter a Pokémon name, or 'exit' to exit: Gallade

Gallade locations:
==================

Generation IV
-------------
Diamond/Pearl: Evolve Kirlia♂
Platinum: Evolve Kirlia♂
HeartGold/SoulSilver: Evolve Kirlia♂

Generation V
------------
Black: Trade
White: Evolve Kirlia♂
Black 2/White 2: Evolve Kirlia♂

Generation VI
-------------
X/Y: Evolve Kirlia♂
Omega Ruby/Alpha Sapphire: Evolve Kirlia♂

Generation VII
--------------
Sun/Moon: Trade
Ultra Sun/Ultra Moon: Evolve Kirlia♂


>Please enter a Pokémon name, or 'exit' to exit: exit

May Pokémon RNG be ever in your favor.

Sections of each HTML page parsed in the above example are linked below: https://bulbapedia.bulbagarden.net/wiki/Charizard_(Pokémon)#Game_locations https://bulbapedia.bulbagarden.net/wiki/Salamence_(Pokémon)#Game_locations https://bulbapedia.bulbagarden.net/wiki/Gallade_(Pokémon)#Game_locations

At the time, only up through Generation VII (Ultra Sun, Ultra Moon) have been released. Thus, that is the extent of the data shown in the above output.


Basic initial response and Beautiful Soup HTML extraction code setup borrowed from: https://realpython.com/blog/python/python-web-scraping-practical-introduction/

This application is free to use or be built on in any reasonable way for personal/non-commercial use. Just be careful with web scraping too hard, too often--it counts as site traffic!

python-web-scraper-example's People

Contributors

kakyoin01 avatar

Watchers

James Cloos avatar Tiravat (TO) Assavapokee avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.