Giter Club home page Giter Club logo

mars_news_scraping's Introduction

Mars_News_Scraping

Web-scrape and data analyse using both automated browsing with Splinter and HTML parsing with Beautiful Soup.

Getting Started

This assignment consists of two technical products. You will submit the following deliverables:

  • Deliverable 1: Scrape titles and preview text from Mars news articles.

  • Deliverable 2: Scrape and analyze Mars weather data, which exists in a table.

Prerequisites

Before you begin, ensure you have the following installed:

Coding Instruction

The instructions for this module are divided into the following parts:

  • Part 1: Scrape Titles and Preview Text from Mars News

  • Part 2: Scrape and Analyze Mars Weather Data

Part 1: Scrape Titles and Preview Text from Mars News

Open the Jupyter Notebook in the starter code folder named part_1_mars_news.ipynb. You will work in this code as you follow the steps below to scrape the Mars News website.

  1. Use automated browsing to visit the Mars news . Inspect the page to identify which elements to scrape.

    • Hint: To identify which elements to scrape, you might want to inspect the page by using Chrome DevTools.
  2. Create a Beautiful Soup object and use it to extract text elements from the website.

  3. Extract the titles and preview text of the news articles that you scraped. Store the scraping results in Python data structures as follows:

    1. Store each title-and-preview pair in a Python dictionary and, give each dictionary two keys: title and preview.
    2. Store all the dictionaries in a Python list.
    3. Print the list in your notebook.
  4. Optionally, store the scraped data in a file (to ease sharing the data with others). To do so, export the scraped data to a JSON file. (Note: there will be no extra points for completing this.)

Part 2: Scrape and Analyze Mars Weather Data

Open the Jupyter Notebook in the starter code folder named part_2_mars_weather.ipynb. You will work in this code as you follow the steps below to scrape and analyze Mars weather data.

  1. Use automated browsing to visit the Mars Temperature Data. Inspect the page to identify which elements to scrape. Note that the URL is https://static.bc-edx.com/data/web/mars_facts/temperature.html.

    • Hint: To identify which elements to scrape, you might want to inspect the page by using Chrome DevTools to discover whether the table contains usable classes.
  2. Create a Beautiful Soup object and use it to scrape the data in the HTML table. Note that this can also be achieved by using the Pandas read_html function. However, use Beautiful Soup here to continue sharpening your web scraping skills.

  3. Assemble the scraped data into a Pandas DataFrame. The columns should have the same headings as the table on the website. Here’s an explanation of the column headings:

    • id: the identification number of a single transmission from the Curiosity rover

    • terrestrial_date: the date on Earth

    • sol: the number of elapsed sols (Martian days) since Curiosity landed on Mars

    • ls: the solar longitude

    • month: the Martian month

    • min_temp: the minimum temperature, in Celsius, of a single Martian day (sol)

    • pressure: The atmospheric pressure at Curiosity's location

  4. Examine the data types that are currently associated with each column. If necessary, cast (or convert) the data to the appropriate datetime, int, or float data types.

    • Hint: You can use the Pandas astype and to_datetime methods to accomplish this task.
  5. Analyze your dataset by using Pandas functions to answer the following questions:

    • How many months exist on Mars?

    • How many Martian (and not Earth) days worth of data exist in the scraped dataset?

    • What are the coldest and the warmest months on Mars (at the location of Curiosity)? To answer this question:

      • Find the average minimum daily temperature for all of the months.

      • Plot the results as a bar chart.

    • Which months have the lowest and the highest atmospheric pressure on Mars? To answer this question:

      • Find the average daily atmospheric pressure of all the months.

      • Plot the results as a bar chart.

    • About how many terrestrial (Earth) days exist in a Martian year? To answer this question:

      • Consider how many days elapse on Earth in the time that Mars circles the Sun once.

      • Visually estimate the result by plotting the daily minimum temperature.

  6. Export the DataFrame to a CSV file.

Reference

The Mars News Website is operated by edX Boot Camps LLC for educational purposes only. The news article titles, summaries, dates, and images were scraped from NASA's Mars News website in November 2022. Images are used according to the JPL Image Use Policy, courtesy NASA/JPL-Caltech.

mars_news_scraping's People

Contributors

dalyalami avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.