Giter Club home page Giter Club logo

pfas-web-and-pdf-scrape's Introduction

Per- and polyfluoroalkyl substances (PFAS) in drinking water on or near military installations, 2017 and 2021-2023

This repository contains the Python code that scrapes, cleans, and maps data on the concentrations of per- and polyfluoroalkyl substances (PFAS) collected in drinking water systems on or near military installations. This data is publicly available on the Department of Defense (DOD)'s PFAS website: https://www.acq.osd.mil/eie/eer/ecc/pfas/index.html. There are two sets of data, one dated as of August 31, 2017 and the other dated 2021-2023. The 2021-2023 data was scraped directly from DOD's PFAS website using an API: https://www.acq.osd.mil/eie/eer/ecc/pfas/map/pfasmap.html. The 2017 data was scraped from a PDF file also available on DOD's PFAS website: https://www.denix.osd.mil/derp/denix-files/sites/26/2018/03/FY18-HASC-Brief-on-PFOS-PFOA_Mar2018.pdf. The year "2018" is occassionaly used in reference to the 2017 data because the report the data was scraped from is dated "March 2018." The words military "installation" and "bases" are used interchangeably in the code.

The PFAS concentrations displayed are either Perfluorooctanoic acid (PFOA) or Perfluorooctane sulfonic acid (PFOS) which are members of the PFAS chemical group. For the 2017 data, the concentrations were reported mostly as a combination of PFOA and PFOS or the analyte was unspecified.

The Environmental Protection Agency (EPA) has presented concentration thresholds to be used for evaluating the risk of PFOA and PFOS in drinking water to human health: 70 parts per trillion (ppt) for either PFOS or PFOA separate or combined and, more recenlty, 4 ppt for either PFOS or PFOA. For the 2021-2023 data, the sample results are compared to both thresholds in separate tables. For the 2017 data, all military installations reported concentrations, specifically a combination of PFOS and PFOA concentrations, in drinking water that were above 70 ppt.

If a cell in the "results" column in a table is blank, no concentrations were detected or reported.

Folders and files contained in this repository:

1. 2017 PDF scrape: folder that contains code that scrapes the 2017 data from the PDF file.

2. 2017 spatial: folder that contains code that joins spatial data to the 2017 data, allowing the 2017 concentrations to be mapped.

3. 2021-2023 webscrape: folder that contains code that scrapes the 2021-2023 from a website using an API and then maps it using geopandas.

4. compare 2017 and 2021-2023: folder that contains code that identifies military installations reported in both the 2017 and 2021-2023 datasets and compares the sample results.

5. PFAS PACT Act data dictionary.xlsx: file that contains information on each of the tables generated in the code. There are .csv files that are created throughout the code. The data dictionary defines what is contained in those .csv files and then subsequent tabs define the columns in those .csv tables.

pfas-web-and-pdf-scrape's People

Contributors

plain-jane-gray avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.