openaq / openaq-api Goto Github PK

OpenAQ Platform API - NO LONGER IN USE see https://github.com/openaq/openaq-api-v2

License: Other

Shell 1.15% JavaScript 98.04% Dockerfile 0.82%

openaq-api's Introduction

OpenAQ Platform API

NO LONGER IN USE

This codebase is no longer in use, please see https://github.com/openaq/openaq-api-v2. Version 1 of the OpenAQ API is still available via api.openaq.org/v1 but has been reimplemented in the same repository as version 2.

Overview

This is the main API for the OpenAQ project.

Starting with index.js, there is a web-accessible API that provides endpoints to query the air quality measurements. Documentation can be found at https://docs.openaq.org/.

openaq-fetch takes care of fetching new data and inserting into the database. Data format is explained in openaq-data-format.

Getting started

Install prerequisites:

Clone this repository locally (see these instructions) and activate the required Node.js version with:

nvm install

The last step can be skipped if the local Node.js version matches the one defined at .nvmrc.

Install module dependencies:

npm install

Development

Initialize development database:

npm run init-dev-db

This task will start a PostgreSQL container as daemon, run migrations and seed data. Each of these tasks is available to be run independently, please refer to package.json to learn the options.

After initialization is finished, start the development server:

npm run dev

Access http://localhost:3004.

Stop database container after finishing:

npm run stop-dev-db

Testing

Initialize test database:

npm run init-test-db

This task will start a PostgreSQL container as daemon, run migrations and seed data. After initialization is finished, run tests:

npm run test

Stop database container after finishing:

npm run stop-test-db

Deploying to production

The server needs to fetch data about locations and cities in the measurement history using AWS Athena. This service must be configured via the following environment variables:

ATHENA_ACCESS_KEY_ID: an AWS Access Key that has permissions to create Athena Queries and store them in S3;
ATHENA_SECRET_ACCESS_KEY: the corresponding secret;
ATHENA_OUTPUT_BUCKET: S3 location (in the form of s3://bucket/folder) where the results of the Athena queries should be stored before caching them;
ATHENA_FETCHES_TABLE: the name of the table registered in AWS Athena.

Automatic Athena synchronization is disabled by default. It can be enabled by setting the environment variable ATHENA_SYNC_ENABLED to true. The sync interval can be set using ATHENA_SYNC_INTERVAL variable, in miliseconds. The default interval is set in file config/default.json.

If needed, the synchronization can be fired manually. First, the WEBHOOK_KEY variable must be set to allow access webhooks endpoint. Sending a POST request to /v1/webhooks, including the parameters key=<WEBHOOK_KEY> and action=ATHENA_SYNC, will start a sync run. An example with curl:

curl --data "key=123&action=ATHENA_SYNC" https://localhost:3004/v1/webhooks

Other environment variables available:

Name	Description	Default
API_URL	Base API URL after deployment	http://:3004
NEW_RELIC_LICENSE_KEY	New Relic API key for system monitoring	not set
WEBHOOK_KEY	Secret key to interact with openaq-api	'123'
USE_REDIS	Use Redis for caching?	not set (so not used)
USE_ATHENA	Use AWS Athena for aggregations?	not set (so not used)
REDIS_URL	Redis instance URL	redis://localhost:6379
DO_NOT_UPDATE_CACHE	Ignore updating cache, but still use older cached results.	not set
AGGREGATION_REFRESH_PERIOD	How long to wait before refreshing cached aggregations? (in ms)	45 minutes
REQUEST_LIMIT	Max number of items that can be requested at one time.	10000
UPLOADS_ENCRYPTION_KEY	Key used to encrypt upload token for /upload in database.	'not_secure'
S3_UPLOAD_BUCKET	The bucket to upload external files to for /upload.	not set

AWS Athena for aggregations

The Athena table is fetches_realtime that represents the fetches from openaq-data and has the following schema:

CREATE EXTERNAL TABLE fetches.fetches_realtime (
  date struct<utc:string,local:string>,
  parameter string,
  location string,
  value float,
  unit string,
  city string,
  attribution array<struct<name:string,url:string>>,
  averagingPeriod struct<unit:string,value:float>,
  coordinates struct<latitude:float,longitude:float>,
  country string,
  sourceName string,
  sourceType string,
  mobile string
 )
 ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
 LOCATION 's3://EXAMPLE_BUCKET'

Uploads & Generating S3 presigned URLs

Via an undocumented /upload endpoint, there is the ability to generate presigned S3 PUT URLs so that external clients can authenticate using tokens stored in the database and upload data to be ingested by openaq-fetch. There is a small utility file called encrypt.js that you can use like UPLOADS_ENCRYPTION_KEY=foo node index.js your_token_here to generate encrypted tokens to be manually stored in database.

Dockerfile

There is a Dockerfile included that will turn the project into a Docker container. The container can be found here and is currently mostly used for deployment purposes for AWS ECS. If someone wanted to make it better for local development, that'd be a great PR!

Contributing

There are a lot of ways to contribute to this project, more details can be found in the contributing guide.

Projects using the API

openaq-browser site | code - A simple browser to provide a graphical interface to the data.
openaq code - An isomorphic Javascript wrapper for the API
py-openaq code - A Python wrapper for the API
ropenaq code - An R package for the API

For more projects that are using OpenAQ API, checkout the OpenAQ.org Community page.

openaq-api's People

Contributors

Stargazers

Watchers

openaq-api's Issues

Belgian sources

Measurements for a lot of Belgian measuring stations: http://www.ircel.be/nl/luchtkwaliteit/metingen
This page shows rolling averages for most parameters.

When you drill down, it's possible to get the actual hourly measurements and not the rolling averages. For example:

go to this page
click on table with detailed info per monitoring site

Have not found a programmatic way to access this data, it might need to be scraped.

Store organization / provider

For proper attribution, it could make sense to store the organization providing the data. This could either be set on source level, or set per measurement by the adapter.

In the case of the Dutch (#25) and the Chilean (#29) data, there is a difference between data provider and the maintainer of the station.

@jflasher @RocketD0g Any idea if and how you want to store this?

Automatically create csv of last day's data and put on S3

Paraphrasing Slack conversation:

Basically, I’d like some way to dump either/both database dumps and daily/weekly/monthly csv dumps to an S3 bucket and make them available for easy download. Want to make it easy for someone to grab all of our data at once, and that’s probably not through the API.

Brazil - Sources

Sao Paolo - Internews is interested in this.

Hourly data is available by station:
http://sistemasinter.cetesb.sp.gov.br/Ar/php/ar_dados_horarios.php

Station description with pollutant + location here:
http://ar.cetesb.sp.gov.br/configuracao-da-rede-automatica/
Note: As far as I can tell, geographic coordinates not given and will have to be determined by contact, using address, or verifying through other sources.

Japanese Sources

Sources for Yokohama:

http://cgi.city.yokohama.lg.jp/kankyou/saigai/data/taiki/all/all_0000_00_001.html
http://www.ihe.pref.miyagi.jp/telem/dayreportitem/?itemSelect=10&day=2015%E5%B9%B410%E6%9C%8804%E6%97%A5

Appears to be hourly but unsure. Joe is contacting Miyagi Prefecture regarding details, potentially existing API, station coordinates.

Tokyo:
Just oxides? Need help with translation:
http://www.ox.kankyo.metro.tokyo.jp/index.php?chiku=1
http://www.ox.kankyo.metro.tokyo.jp/

Main page: http://www.kankyo.metro.tokyo.jp/nature/index.html

Standardize data fields

Which data fields will the platform support and what will they be called?

Currently we have defined names for pm25 and pm10.

Figure out how to best handle the limit flag on locations endpoint.

Does paging make sense in the return data context? Right now it's just hardcoded to 500 to make sure we get all the results, but should probably respect limit.

London file

https://www.kimonolabs.com/api/ap7bzdcw

Add a deploy to Heroku button

https://devcenter.heroku.com/articles/heroku-button

Handle dates in a better way

We need to do some thinking about how to best handle dates across the platform. Dates should be stored in UTC in the database, but we probably need to keep some track of timezones for location and whether it supports DST? ugh.

New York, City - Data Sources

There are several on this site for NYC and also NY State. Here is one for NY City (CCNY):
http://www.dec.ny.gov/airmon/stationStatus.php?stationNo=73

Nicer error message

Return a nicer error message when specifying a source that doesn't exist:

$ node fetch.js --dryrun --source=au.json
--- Dry run for Testing, nothing is saved to the database. ---
/home/olaf/projects/openaq-api/fetch.js:46
    var adapter = findAdapter(source.adapter);
                                    ^
TypeError: Cannot read property 'adapter' of undefined
    at /home/olaf/projects/openaq-api/fetch.js:46:37
    [...]

@jflasher

How to handle negative values?

With the inclusion of #61, we are going to be pulling some negative values into the platform. There may already be some, just noticed with latest data source. Some of the negative values are -0.25 and some are -999.

For right now, I think we just store these as is, but in the future do we throw out measurements with negative values? Do we keep them in the platform and leave it up to others to remove them?

Valdivias (Chile) data source

Data source reported by @ignacionf

http://recursos.datos.gob.cl/datastreams/94396/estado-del-aire-en-valdivia-2015/
with a JSON like source in: http://api.recursos.datos.gob.cl/datastreams/invoke/ESTAD-DEL-AIRE-EN-VALDI?auth_key=994baa562bdb5f34d17e78dd7957c233b6c0a5f5

The one thing to confirm, is whether MPF refers to PM2.5 and MPG refers to PM10.

cc @jflasher

Add timeouts to the requests in adapters

If we don't set timeouts on the requests, Heroku may time us out which feels worse. Makes me wonder if we should have a system-wide request object that gets passed around so we can set defaults in one place?

Poland - Sources

About 14 sites across the country.

Home page:
http://sojp.wios.warszawa.pl/

View hourly levels at a given station:
http://sojp.wios.warszawa.pl/?page=hourly-report&data=04-10-2015&site_id=12&csq_id=1414&dane=w1

View station description:
http://sojp.wios.warszawa.pl/index.php?page=site-description&t=1&o=2&site_id=69

cc: @jflasher

2015 GBD/WHO template for including data in their global databases

No Immediate Action Intended - Background Info

FYI, a useful template of the type of information collected for the upcoming 2015 WHO and GBD global databases of annual average PM2.5 and PM10 pollution is below (They are primarily on the search currently for 2014 data). I can't find the issue, but I think @olafveerman brought up the categorization of sites before (e.g. what is the criteria for residential, urban, industrial, etc?). It has been indicated there is not strict criteria for this currently and countries are directed to fill out the template using their best judgement.

http://www.who.int/entity/phe/health_topics/outdoorair/databases/PHE-Template-OAP-database-entries-June2015.xls?ua=1

Fix Poland adapter

There is a problem in the async function where the body coming back from a url call is not matching the url sent. This isn't leading to any incorrect data, but does mean we're randomly missing some data sources.

Issue occurs at https://github.com/openaq/openaq-api/blob/master/adapters/poland.js#L32, need to look into this more.

Add way to cut down number of fields in response for /measurements

With more data being added to the measurements record, we should add a way to specify response fields. Either a summary type flag or maybe a fields=date,value,... type option.

Documentation

I'm working on some project related documentation. Mostly a glossary of the project (source, station, measurement), the application's flow and some guidelines on how to contribute.

I can imagine this living in a couple of places:

in the wiki of the openaq.github.io repo
Advantage: easy to edit
as a chapter on the Open AQ website
Advantage: easy to read, Disadvantage: less easy to contribute
as markdown files in the /docs folder of a repo
Disavantage: not easy to read, not easy to contribute

@RocketD0g @jflasher Any thoughts on how you want to set this up?

Turkey - Sources

Map of stations with coordinates and current readings and stations' current data are here (though not on unique urls):

http://www.havaizleme.gov.tr/Default.ltr.aspx

'TÜM İSTASYONLAR' = all stations

Clicking on the stations reveals site coordinates (click 'station description) and pollutant types measured.

(Sidenote: They use an AQI system with breakpoints same as the US EPA)

Moscow, Russia - Data Sources

48 hour Moscow data
http://mosecom.ru/air/air-today/station/spirid/table.html

Make measurements.value field a number

Right now it's a string which doesn't work well if we intend to be able to do allow querying by value ranges.

Add ability to sort by value ranges

Now that values are numbers, we can do this.

Australian source

Hourly data available here.
No API found

Taiwan - Sources

Taiwan PM2.5:

http://taqm.epa.gov.tw/pm25/en/PM25A.aspx?area=10

Station descriptions (including coordinates) available by clicking on station name on link directly above:
http://taqm.epa.gov.tw/taqm/en/Site/Keelung.aspx

Ozone, as well as PM2.5, is available on the main page, but not sure how it is accessible programmatically:
http://taqm.epa.gov.tw/taqm/en/

a

Comment from Knight Foundation Application - Lodoysamba

This is a constructive comment that was posted on our open Knight Foundation News Challenge (url at bottom).

This is an important work as scientists, researchers, and students should have access to air quality data via an internet portal. This is crucial for timely review and analysis at both the level of individual cities and regions, and at the international level. Superficial reports of atmospheric conditions solely from air quality stations are not adequate. Air quality depends not only on source emissions but also on weather conditions, population activity, and other factors. In some cases, initial data are not always available. In other cases, air quality offices refuse to share their data. Hence, while the environmental scientist aims to analyze and interpret the data, these problems of poor data quality and restricted access injure the scientist’s ability to generate quality measures to reduce air pollution. On the other hand, when comprehensive scientific data is available, policymakers and air quality officers are better able to orient their strategies to reduce air pollution.
An internet forum to warehouse data will help scientists from all nations to learn from one another. Such a forum would lead to improved methods to record data, analyze data, and utilize data more efficiently. By accessing a data warehouse, scientists in less developed countries could quickly learn how reduction measures affect air quality in other cities around the globe.
Furthermore, many students and researchers from non-environmental disciplines could also find value in the data. Mathematicians, for example, could use these types of data sets to improve tools of statistical data analysis.
The prototype http://openaq.org contains data for some analysis, but could use certain enhancements. For instance, the site ought to include information concerning the air quality measuring station type, along with the extent of validation or calibration of recording media to give researchers more confidence in the quality of the information.

prof.S.Lodoysamba, Mongolia

https://www.newschallenge.org/challenge/data/entries/openaq-the-first-open-air-quality-data-hub-for-the-world#c-b367e525a7e574817c19ad24b7b35607

Data Sources - Most polluted counties in California

Hourly AQ data for several places in CA that have high levels of PM relative to the rest of the country (specifically: Kern + Merced Counties, CA, the city of Fresno, CA, etc.):

http://www.valleyair.org/Programs/RAAN/raan_monitoring_system.htm

Averaging period

As mentioned in #36, there are a couple of sources that report rolling averages. We seem to agree to store this with every measurement, but how to go about it?

We can either do a general purpose note field that can be used for anything:

{
  parameter: 'pm25',
  value: 4,
  note: '24 hour rolling average'
}

or we can attempt to standardize it in some way:

{
  parameter: 'pm25',
  value: 4,
  averagingPeriod: 24
}

@RocketD0g Do averaging periods tend to fall within a 4 - 24 hour range? Thoughts @jflasher ?

Ulaanbaatar, Mongolia - Data Sources

Two sources of data for UB:

agaar.mn
http://www.ub-air.info/ub-air/en/laq/average-30min.html

Better email handling

Too many emails are getting sent, figure out a more sane way to handle this. They are currently disabled via Heroku scheduler task until this is fixed.

Add a check on data insert to remove non-standard fields

This is probably part of a larger utility piece that would verify the data altogether (verify date is a date, value is a number, etc).

Add a way to do a dryrun of the fetch script for testing purposes

Maybe a flag for --nodb and --noemail, make them default?

Add New Relic

More China data sources

This should be in the exact same format as Beijing, and it is only PM2.5.

Chengdu - https://www.kimonolabs.com/api/bd87c7js
Guangzhou - https://www.kimonolabs.com/api/d4wfxfl2
Shanghai - https://www.kimonolabs.com/api/7jec7wh2
Shenyang - https://www.kimonolabs.com/apis/4lw816j4

Skopje, Macedonia - Data Sources

Hourly data for each pollutant, using the following configuration shown in this attached image:

Chile data sources

Sinca is the Chilean AQ information system. It contains measurements from 194 stations, including the one in Valdivias (see #28).

Have to check:

the license under which this is published
whether there is an API we can use

Comment from Knight Foundation Application - MH

Connecting individuals (including the media, politicians, citizen-scientists, and even other scientists) with scientific data is a constant challenge, and OpenAQ is an outstanding leap forward. By providing data in a programmatic method that ANYONE can utilize, you all are setting the standard for how this should be done. High school student writing about air pollution downwind from a power plant in your neighborhood? Click a button and get data. Reporter writing a story on how pollution in your city compares to another across the word? Click two buttons and download the data. Scientist wanting to do complex queries across multiple locations, adjusting for seasonality and time of day? Incorporate the API into your Python script.

One suggestion for long-term, future work. I agree with many of the other commenters that OpenAQ would make for a fantastic platform to extend to additional types of data. Meteorological, sea ice, and terrestrial flux data, for example, can be difficult data sets to access for both non-scientists and scientists alike. Often this data is squirreled away on a server in a proprietary format, is confusing to access, is not available by API, etc. Your platform could and should set the standard for how all types of scientific data is made easily and quickly accessible to the public.

Comment from Knight Foundation Application - Chisato

As a social scientist conducting research on air pollution management in Ulaanbaatar, Mongolia, I highly commend this Open AQ initiative. Air pollution is certainly the most pressing environmental health issue in this capital city. Since 2012, there have been increased efforts to improve air quality monitoring by installing more air quality monitors throughout Ulaanbaatar. However, more monitors do not necessarily catalyze better data sharing. Even if these technologies produce reliable, real-time air quality data, this data will not make an impact on the research community, development field, and most importantly, the public-at-large if there is no sustainable, user-friendly, robust system set in place for data sharing. After all, air pollution is an inherently social problem. We need to connect the data to people. I believe that Open AQ would provide this foundational connection. Open source sharing of air quality data would create and sustain a global commitment around air pollution issues -- connecting people across cities and regions to examine the various ways to tackle this environmental challenge. The Open AQ initiative also values the socio-cultural dimensions of the air pollution issue. Different cultures with different political-economic systems approach the air pollution problem, it's management, and potential solutions in different ways. Open AQ will host it's first workshop in Ulaanbaatar this November with the goal of bringing together local experts, media, and community members together to develop, dispute, and deploy strategies that would best suit Ulaanbaatar. This demonstrates that the Open AQ initiative will engage with local communities as key members of developing this platform.

As a suggestion, (once the air quality data is calibrated and complete) I recommend including a section or layer on guidelines and policies from different countries. I think it would be beneficial to make information about different interventions (especially related to health) accessible. How is Delhi tackling the air pollution issue? Can Mexico City use the same model? Why are Chinese residents wearing masks but not residents in Jakarta? People across the globe can learn how different governments are tackling the air pollution problem and/or how local communities are using the data to hold different actors and institutions accountable for air pollution reduction. For example, a lot air pollution protection/air pollution-induced illness information is not readily available or part of the public discourse. In order for people to take ownership over their health as inhabitants on polluted cities, I think that a guidelines "layer" would catalyze more urgency of this issue and strengthen efforts to improve air quality from within communities. I foresee a global "blog" on air pollution issues where people discuss, debate, and learn from each other on how to best tackle the air pollution problem in their own communities.

Add a README

Validated / unvalidated

@RocketD0g What do you feel about adding a validated / un-validated flag? This might be valuable information, especially when we start adding validated sources.

@jflasher's fine with it. Just checked with him.

Romania - Sources

142 sources for Romania, updated hourly:
http://www.calitateaer.ro/valori.php

Peru - Data Sources

Lima Metropolitan area:

Station locations and coordinates:
http://calidaddelaire.minam.gob.pe/estaciones.php

Click on the little red dots in the above to get the hourly AQ data for each station. Looks like unique url's:
http://www.senamhi.gob.pe/?p=0412&txt=112192

Dutch data source

http://www.lml.rivm.nl/tabel/

How to include alternate unit sources

This just came up when looking to include Chilean data #37. For some of the measurements, they're reporting data in ppb or ppm (it looks like we may also be able to get it in ug/m3, but for the sake of argument, forget about that). We have the unit field in the measurement record for exactly this scenario, but do we actually want to use it? If some of the sources are reporting in ug/m3 and some are reporting in ppb or another alternate unit, it would seem to severely lessen the ability to directly visualize the data next to each other.

Is there an easy way to convert between ppb and ug/m3 or should we even do that?

cc/ @RocketD0g @olafveerman

Use new averagingPeriod style across all sources

Add status page

Rename Heroku app

Or else I will forget what it's doing in 2 months and try a delete it.

Comment from Knight Foundation Application - Langley

As a research scientist working on air pollution issues, more accessible AQ data in different regions of the world is highly critical to understanding sources, transport, and transformation of air pollutants in the atmosphere. A major global health issue, atmospheric particulate formation and transport is still not fully understood by the scientific community, and increasing the geospatial resolution of available AQ measurements for modellers and researchers could really raise our understanding of these issues. Additionally, this platform could help inform the public about local air quality issues and provide needed data for medical workers and journalists.
During my time working in a developing country, I have found it frustrating to try to scour scientific papers for names of scientists that may or may not know where AQ data is kept (when preparing proposals, briefings, and other official reports).
I like the suggestion from Chistato about including a section on guidelines and policies in different countries. This will allow direct impact of regulations to be observed. I am currently working in a developing country attempting to regulate AQ, and it is difficult for the officials to decide what type of AQ monitoring equipment to purchase and which regulations to push initially. Knowing what countries with similar air pollution sources and available resources have done in the past, and how this worked, would really be an asset for developing countries beginning to address AQ issues.
On a more scientific note, if possible and if available, including the meteorological data often captured by AQ monitoring stations (wind direction and wind speed) and a general description of measurement locations would help scientists best utilize these data in models.