Giter Club home page Giter Club logo

openaq-api's Introduction

OpenAQ Platform API

Build Status Slack Chat

NO LONGER IN USE

This codebase is no longer in use, please see https://github.com/openaq/openaq-api-v2. Version 1 of the OpenAQ API is still available via api.openaq.org/v1 but has been reimplemented in the same repository as version 2.

Overview

This is the main API for the OpenAQ project.

Starting with index.js, there is a web-accessible API that provides endpoints to query the air quality measurements. Documentation can be found at https://docs.openaq.org/.

openaq-fetch takes care of fetching new data and inserting into the database. Data format is explained in openaq-data-format.

Getting started

Install prerequisites:

Clone this repository locally (see these instructions) and activate the required Node.js version with:

nvm install

The last step can be skipped if the local Node.js version matches the one defined at .nvmrc.

Install module dependencies:

npm install

Development

Initialize development database:

npm run init-dev-db

This task will start a PostgreSQL container as daemon, run migrations and seed data. Each of these tasks is available to be run independently, please refer to package.json to learn the options.

After initialization is finished, start the development server:

npm run dev

Access http://localhost:3004.

Stop database container after finishing:

npm run stop-dev-db

Testing

Initialize test database:

npm run init-test-db

This task will start a PostgreSQL container as daemon, run migrations and seed data. After initialization is finished, run tests:

npm run test

Stop database container after finishing:

npm run stop-test-db

Deploying to production

The server needs to fetch data about locations and cities in the measurement history using AWS Athena. This service must be configured via the following environment variables:

  • ATHENA_ACCESS_KEY_ID: an AWS Access Key that has permissions to create Athena Queries and store them in S3;
  • ATHENA_SECRET_ACCESS_KEY: the corresponding secret;
  • ATHENA_OUTPUT_BUCKET: S3 location (in the form of s3://bucket/folder) where the results of the Athena queries should be stored before caching them;
  • ATHENA_FETCHES_TABLE: the name of the table registered in AWS Athena.

Automatic Athena synchronization is disabled by default. It can be enabled by setting the environment variable ATHENA_SYNC_ENABLED to true. The sync interval can be set using ATHENA_SYNC_INTERVAL variable, in miliseconds. The default interval is set in file config/default.json.

If needed, the synchronization can be fired manually. First, the WEBHOOK_KEY variable must be set to allow access webhooks endpoint. Sending a POST request to /v1/webhooks, including the parameters key=<WEBHOOK_KEY> and action=ATHENA_SYNC, will start a sync run. An example with curl:

curl --data "key=123&action=ATHENA_SYNC" https://localhost:3004/v1/webhooks

Other environment variables available:

Name Description Default
API_URL Base API URL after deployment http://:3004
NEW_RELIC_LICENSE_KEY New Relic API key for system monitoring not set
WEBHOOK_KEY Secret key to interact with openaq-api '123'
USE_REDIS Use Redis for caching? not set (so not used)
USE_ATHENA Use AWS Athena for aggregations? not set (so not used)
REDIS_URL Redis instance URL redis://localhost:6379
DO_NOT_UPDATE_CACHE Ignore updating cache, but still use older cached results. not set
AGGREGATION_REFRESH_PERIOD How long to wait before refreshing cached aggregations? (in ms) 45 minutes
REQUEST_LIMIT Max number of items that can be requested at one time. 10000
UPLOADS_ENCRYPTION_KEY Key used to encrypt upload token for /upload in database. 'not_secure'
S3_UPLOAD_BUCKET The bucket to upload external files to for /upload. not set

AWS Athena for aggregations

The Athena table is fetches_realtime that represents the fetches from openaq-data and has the following schema:

CREATE EXTERNAL TABLE fetches.fetches_realtime (
  date struct<utc:string,local:string>,
  parameter string,
  location string,
  value float,
  unit string,
  city string,
  attribution array<struct<name:string,url:string>>,
  averagingPeriod struct<unit:string,value:float>,
  coordinates struct<latitude:float,longitude:float>,
  country string,
  sourceName string,
  sourceType string,
  mobile string
 )
 ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
 LOCATION 's3://EXAMPLE_BUCKET'

Uploads & Generating S3 presigned URLs

Via an undocumented /upload endpoint, there is the ability to generate presigned S3 PUT URLs so that external clients can authenticate using tokens stored in the database and upload data to be ingested by openaq-fetch. There is a small utility file called encrypt.js that you can use like UPLOADS_ENCRYPTION_KEY=foo node index.js your_token_here to generate encrypted tokens to be manually stored in database.

Dockerfile

There is a Dockerfile included that will turn the project into a Docker container. The container can be found here and is currently mostly used for deployment purposes for AWS ECS. If someone wanted to make it better for local development, that'd be a great PR!

Contributing

There are a lot of ways to contribute to this project, more details can be found in the contributing guide.

Projects using the API

  • openaq-browser site | code - A simple browser to provide a graphical interface to the data.
  • openaq code - An isomorphic Javascript wrapper for the API
  • py-openaq code - A Python wrapper for the API
  • ropenaq code - An R package for the API

For more projects that are using OpenAQ API, checkout the OpenAQ.org Community page.

openaq-api's People

Contributors

andrewharvey avatar danielfdsilva avatar dolugen avatar jflasher avatar kamicut avatar nickolasclarke avatar olafveerman avatar rocketd0g avatar russbiggs avatar sethvincent avatar sruti avatar vgeorge avatar webbkyr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

openaq-api's Issues

Belgian sources

Measurements for a lot of Belgian measuring stations: http://www.ircel.be/nl/luchtkwaliteit/metingen
This page shows rolling averages for most parameters.

When you drill down, it's possible to get the actual hourly measurements and not the rolling averages. For example:

  1. go to this page
  2. click on table with detailed info per monitoring site

Have not found a programmatic way to access this data, it might need to be scraped.

Store organization / provider

For proper attribution, it could make sense to store the organization providing the data. This could either be set on source level, or set per measurement by the adapter.

In the case of the Dutch (#25) and the Chilean (#29) data, there is a difference between data provider and the maintainer of the station.

@jflasher @RocketD0g Any idea if and how you want to store this?

Automatically create csv of last day's data and put on S3

Paraphrasing Slack conversation:

Basically, I’d like some way to dump either/both database dumps and daily/weekly/monthly csv dumps to an S3 bucket and make them available for easy download. Want to make it easy for someone to grab all of our data at once, and that’s probably not through the API.

Japanese Sources

Sources for Yokohama:

http://cgi.city.yokohama.lg.jp/kankyou/saigai/data/taiki/all/all_0000_00_001.html
http://www.ihe.pref.miyagi.jp/telem/dayreportitem/?itemSelect=10&day=2015%E5%B9%B410%E6%9C%8804%E6%97%A5

Appears to be hourly but unsure. Joe is contacting Miyagi Prefecture regarding details, potentially existing API, station coordinates.

Tokyo:
Just oxides? Need help with translation:
http://www.ox.kankyo.metro.tokyo.jp/index.php?chiku=1
http://www.ox.kankyo.metro.tokyo.jp/

Main page: http://www.kankyo.metro.tokyo.jp/nature/index.html

Standardize data fields

Which data fields will the platform support and what will they be called?

Currently we have defined names for pm25 and pm10.

Handle dates in a better way

We need to do some thinking about how to best handle dates across the platform. Dates should be stored in UTC in the database, but we probably need to keep some track of timezones for location and whether it supports DST? ugh.

Nicer error message

Return a nicer error message when specifying a source that doesn't exist:

$ node fetch.js --dryrun --source=au.json
--- Dry run for Testing, nothing is saved to the database. ---
/home/olaf/projects/openaq-api/fetch.js:46
    var adapter = findAdapter(source.adapter);
                                    ^
TypeError: Cannot read property 'adapter' of undefined
    at /home/olaf/projects/openaq-api/fetch.js:46:37
    [...]

@jflasher

How to handle negative values?

With the inclusion of #61, we are going to be pulling some negative values into the platform. There may already be some, just noticed with latest data source. Some of the negative values are -0.25 and some are -999.

For right now, I think we just store these as is, but in the future do we throw out measurements with negative values? Do we keep them in the platform and leave it up to others to remove them?

Add timeouts to the requests in adapters

If we don't set timeouts on the requests, Heroku may time us out which feels worse. Makes me wonder if we should have a system-wide request object that gets passed around so we can set defaults in one place?

2015 GBD/WHO template for including data in their global databases

No Immediate Action Intended - Background Info

FYI, a useful template of the type of information collected for the upcoming 2015 WHO and GBD global databases of annual average PM2.5 and PM10 pollution is below (They are primarily on the search currently for 2014 data). I can't find the issue, but I think @olafveerman brought up the categorization of sites before (e.g. what is the criteria for residential, urban, industrial, etc?). It has been indicated there is not strict criteria for this currently and countries are directed to fill out the template using their best judgement.

http://www.who.int/entity/phe/health_topics/outdoorair/databases/PHE-Template-OAP-database-entries-June2015.xls?ua=1

Documentation

I'm working on some project related documentation. Mostly a glossary of the project (source, station, measurement), the application's flow and some guidelines on how to contribute.

I can imagine this living in a couple of places:

  1. in the wiki of the openaq.github.io repo
    Advantage: easy to edit
  2. as a chapter on the Open AQ website
    Advantage: easy to read, Disadvantage: less easy to contribute
  3. as markdown files in the /docs folder of a repo
    Disavantage: not easy to read, not easy to contribute

@RocketD0g @jflasher Any thoughts on how you want to set this up?

Turkey - Sources

Map of stations with coordinates and current readings and stations' current data are here (though not on unique urls):

http://www.havaizleme.gov.tr/Default.ltr.aspx

'TÜM İSTASYONLAR' = all stations

Clicking on the stations reveals site coordinates (click 'station description) and pollutant types measured.

(Sidenote: They use an AQI system with breakpoints same as the US EPA)

Comment from Knight Foundation Application - Lodoysamba

This is a constructive comment that was posted on our open Knight Foundation News Challenge (url at bottom).


This is an important work as scientists, researchers, and students should have access to air quality data via an internet portal. This is crucial for timely review and analysis at both the level of individual cities and regions, and at the international level. Superficial reports of atmospheric conditions solely from air quality stations are not adequate. Air quality depends not only on source emissions but also on weather conditions, population activity, and other factors. In some cases, initial data are not always available. In other cases, air quality offices refuse to share their data. Hence, while the environmental scientist aims to analyze and interpret the data, these problems of poor data quality and restricted access injure the scientist’s ability to generate quality measures to reduce air pollution. On the other hand, when comprehensive scientific data is available, policymakers and air quality officers are better able to orient their strategies to reduce air pollution.
An internet forum to warehouse data will help scientists from all nations to learn from one another. Such a forum would lead to improved methods to record data, analyze data, and utilize data more efficiently. By accessing a data warehouse, scientists in less developed countries could quickly learn how reduction measures affect air quality in other cities around the globe.
Furthermore, many students and researchers from non-environmental disciplines could also find value in the data. Mathematicians, for example, could use these types of data sets to improve tools of statistical data analysis.
The prototype http://openaq.org contains data for some analysis, but could use certain enhancements. For instance, the site ought to include information concerning the air quality measuring station type, along with the extent of validation or calibration of recording media to give researchers more confidence in the quality of the information.

prof.S.Lodoysamba, Mongolia

https://www.newschallenge.org/challenge/data/entries/openaq-the-first-open-air-quality-data-hub-for-the-world#c-b367e525a7e574817c19ad24b7b35607

Averaging period

As mentioned in #36, there are a couple of sources that report rolling averages. We seem to agree to store this with every measurement, but how to go about it?

We can either do a general purpose note field that can be used for anything:

{
  parameter: 'pm25',
  value: 4,
  note: '24 hour rolling average'
}

or we can attempt to standardize it in some way:

{
  parameter: 'pm25',
  value: 4,
  averagingPeriod: 24
}

@RocketD0g Do averaging periods tend to fall within a 4 - 24 hour range? Thoughts @jflasher ?

Better email handling

Too many emails are getting sent, figure out a more sane way to handle this. They are currently disabled via Heroku scheduler task until this is fixed.

Chile data sources

Sinca is the Chilean AQ information system. It contains measurements from 194 stations, including the one in Valdivias (see #28).

Have to check:

  • the license under which this is published
  • whether there is an API we can use

Comment from Knight Foundation Application - MH

Connecting individuals (including the media, politicians, citizen-scientists, and even other scientists) with scientific data is a constant challenge, and OpenAQ is an outstanding leap forward. By providing data in a programmatic method that ANYONE can utilize, you all are setting the standard for how this should be done. High school student writing about air pollution downwind from a power plant in your neighborhood? Click a button and get data. Reporter writing a story on how pollution in your city compares to another across the word? Click two buttons and download the data. Scientist wanting to do complex queries across multiple locations, adjusting for seasonality and time of day? Incorporate the API into your Python script.

One suggestion for long-term, future work. I agree with many of the other commenters that OpenAQ would make for a fantastic platform to extend to additional types of data. Meteorological, sea ice, and terrestrial flux data, for example, can be difficult data sets to access for both non-scientists and scientists alike. Often this data is squirreled away on a server in a proprietary format, is confusing to access, is not available by API, etc. Your platform could and should set the standard for how all types of scientific data is made easily and quickly accessible to the public.

Comment from Knight Foundation Application - Chisato

As a social scientist conducting research on air pollution management in Ulaanbaatar, Mongolia, I highly commend this Open AQ initiative. Air pollution is certainly the most pressing environmental health issue in this capital city. Since 2012, there have been increased efforts to improve air quality monitoring by installing more air quality monitors throughout Ulaanbaatar. However, more monitors do not necessarily catalyze better data sharing. Even if these technologies produce reliable, real-time air quality data, this data will not make an impact on the research community, development field, and most importantly, the public-at-large if there is no sustainable, user-friendly, robust system set in place for data sharing. After all, air pollution is an inherently social problem. We need to connect the data to people. I believe that Open AQ would provide this foundational connection. Open source sharing of air quality data would create and sustain a global commitment around air pollution issues -- connecting people across cities and regions to examine the various ways to tackle this environmental challenge. The Open AQ initiative also values the socio-cultural dimensions of the air pollution issue. Different cultures with different political-economic systems approach the air pollution problem, it's management, and potential solutions in different ways. Open AQ will host it's first workshop in Ulaanbaatar this November with the goal of bringing together local experts, media, and community members together to develop, dispute, and deploy strategies that would best suit Ulaanbaatar. This demonstrates that the Open AQ initiative will engage with local communities as key members of developing this platform.

As a suggestion, (once the air quality data is calibrated and complete) I recommend including a section or layer on guidelines and policies from different countries. I think it would be beneficial to make information about different interventions (especially related to health) accessible. How is Delhi tackling the air pollution issue? Can Mexico City use the same model? Why are Chinese residents wearing masks but not residents in Jakarta? People across the globe can learn how different governments are tackling the air pollution problem and/or how local communities are using the data to hold different actors and institutions accountable for air pollution reduction. For example, a lot air pollution protection/air pollution-induced illness information is not readily available or part of the public discourse. In order for people to take ownership over their health as inhabitants on polluted cities, I think that a guidelines "layer" would catalyze more urgency of this issue and strengthen efforts to improve air quality from within communities. I foresee a global "blog" on air pollution issues where people discuss, debate, and learn from each other on how to best tackle the air pollution problem in their own communities.

Validated / unvalidated

@RocketD0g What do you feel about adding a validated / un-validated flag? This might be valuable information, especially when we start adding validated sources.

@jflasher's fine with it. Just checked with him.

How to include alternate unit sources

This just came up when looking to include Chilean data #37. For some of the measurements, they're reporting data in ppb or ppm (it looks like we may also be able to get it in ug/m3, but for the sake of argument, forget about that). We have the unit field in the measurement record for exactly this scenario, but do we actually want to use it? If some of the sources are reporting in ug/m3 and some are reporting in ppb or another alternate unit, it would seem to severely lessen the ability to directly visualize the data next to each other.

Is there an easy way to convert between ppb and ug/m3 or should we even do that?

cc/ @RocketD0g @olafveerman

Rename Heroku app

Or else I will forget what it's doing in 2 months and try a delete it.

Comment from Knight Foundation Application - Langley

As a research scientist working on air pollution issues, more accessible AQ data in different regions of the world is highly critical to understanding sources, transport, and transformation of air pollutants in the atmosphere. A major global health issue, atmospheric particulate formation and transport is still not fully understood by the scientific community, and increasing the geospatial resolution of available AQ measurements for modellers and researchers could really raise our understanding of these issues. Additionally, this platform could help inform the public about local air quality issues and provide needed data for medical workers and journalists.
During my time working in a developing country, I have found it frustrating to try to scour scientific papers for names of scientists that may or may not know where AQ data is kept (when preparing proposals, briefings, and other official reports).
I like the suggestion from Chistato about including a section on guidelines and policies in different countries. This will allow direct impact of regulations to be observed. I am currently working in a developing country attempting to regulate AQ, and it is difficult for the officials to decide what type of AQ monitoring equipment to purchase and which regulations to push initially. Knowing what countries with similar air pollution sources and available resources have done in the past, and how this worked, would really be an asset for developing countries beginning to address AQ issues.
On a more scientific note, if possible and if available, including the meteorological data often captured by AQ monitoring stations (wind direction and wind speed) and a general description of measurement locations would help scientists best utilize these data in models.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.