localdata / sensors Goto Github PK

View Code? Open in Web Editor NEW

1.0 1.0 3.0 307 KB

Makefile 0.44% JavaScript 90.21% PLpgSQL 9.35%

sensors's People

Contributors

Stargazers

Watchers

Forkers

tobie gdg isabella232

sensors's Issues

Add New Relic monitoring

Percentile stats would be great (95th and/or 99th percentile for service times), as would visibility into individual request types. Right now we only have coarse POST vs. GET via log-deranged + librato, and only mean/max.

We should test first on a dev service with simulated high-rate POSTs. The agent warnings we've seen elsewhere make me nervous. We should look for warnings that the agent's connection has been cut off, and we should monitor the memory usage.

Add administrative routes for inspecting sources and entries

As administrators of the service, we want to see the sources and their activity. We may also want to see the latest overall activity for the system.

Create a notion of sets of sources

A Set has one or more Sources as well as some metadata.

Creating/managing Sets requires some notion of users/permissions, though, which we don't currently have. Alternatively, this can be admin-only functionality until we create a user system.

Requesting an aggregation for a field that doesn't exist gives a 500

Eg http://sensor-api.localdata.com/api/v1/aggregations?count=10&fields=poultry&resolution=5m&from=2015-02-04T00:00:00-0800&before=2015-02-05T00:00:00-0800&op=mean&each.sources=ci4tmxpz8000002w7au38un50

Ideally it would give a helpful error message.

The same query with a valid field works fine:

http://sensor-api.localdata.com/api/v1/aggregations?count=10&fields=temperature&resolution=5m&from=2015-02-04T00:00:00-0800&before=2015-02-05T00:00:00-0800&op=mean&each.sources=ci4tmxpz8000002w7au38un50

Support time-bounded queries

We can query pages of data, but we should also support querying based on time boundaries. We should probably enforce a max number of entries, unless we can cleanly stream the data end-to-end and avoid any memory issues.

Stale cache issues when querying the API for large data sets

Hi,

I'm seeing stale cache issues across the API this morning, and I'm apparently not the only one with this problem.

A cursory look at the source code doesn't provide any good explanation to this issue as I'm not seeing any HTTP cache headers set at the application level.

cURL'ing the API however indicates the presence of an Etag header, possibly added my Heroku middleware(?).

More importantly, this header doesn't seem to change with the body content, for example, querying the same URL at a two minutes interval gives the following headers:

curl -I "https://localdata-sensors.herokuapp.com/api/v1/sources/ci4lr75sf000602ypyfkxnua3/entries?startIndex=0&count=100000000000"
HTTP/1.1 200 OK
Server: Cowboy
Connection: keep-alive
X-Powered-By: Express
Access-Control-Allow-Origin: *
Access-Control-Allow-Headers: Content-Type
Content-Type: application/json; charset=utf-8
Content-Length: 245773
Etag: W/"EcvTQgeOsE1g83G22g9JuQ=="
Vary: Accept-Encoding
Date: Tue, 20 Jan 2015 13:52:59 GMT
Via: 1.1 vegur

$ curl -I "https://localdata-sensors.herokuapp.com/api/v1/sources/ci4lr75sf000602ypyfkxnua3/entries?startIndex=0&count=100000000000"
HTTP/1.1 200 OK
Server: Cowboy
Connection: keep-alive
X-Powered-By: Express
Access-Control-Allow-Origin: *
Access-Control-Allow-Headers: Content-Type
Content-Type: application/json; charset=utf-8
Content-Length: 245773
Etag: W/"EcvTQgeOsE1g83G22g9JuQ=="
Vary: Accept-Encoding
Date: Tue, 20 Jan 2015 13:54:20 GMT
Via: 1.1 vegur

You'll notice that while the Date varies, neither the Etag header nor the Content-Length header has changed, yet, multiple sensor data points have been added to the database in the meantime:

$ curl "https://localdata-sensors.herokuapp.com/api/v1/sources/ci4lr75sf000602ypyfkxnua3/entries?startIndex=7576&count=8" | python -mjson.tool
[
    {
        "data": {
            "airquality": "Fresh",
            "airquality_raw": 17,
            "dust": 1033.2,
            "humidity": 55.6,
            "light": 272,
            "location": [
                6.035054,
                46.1558904
            ],
            "sound": 1348,
            "temperature": 8,
            "uv": 267.74
        },
        "source": "ci4lr75sf000602ypyfkxnua3",
        "timestamp": "2015-01-20T13:53:07.000Z"
    },
    {
        "data": {
            "airquality": "Fresh",
            "airquality_raw": 17,
            "dust": 903.93,
            "humidity": 55.4,
            "light": 301,
            "location": [
                6.035054,
                46.1558904
            ],
            "sound": 1276,
            "temperature": 8,
            "uv": 267.74
        },
        "source": "ci4lr75sf000602ypyfkxnua3",
        "timestamp": "2015-01-20T13:53:17.000Z"
    },
    {
        "data": {
            "airquality": "Fresh",
            "airquality_raw": 17,
            "dust": 903.93,
            "humidity": 55.4,
            "light": 283,
            "location": [
                6.035054,
                46.1558904
            ],
            "sound": 1324,
            "temperature": 8,
            "uv": 267.74
        },
        "source": "ci4lr75sf000602ypyfkxnua3",
        "timestamp": "2015-01-20T13:53:27.000Z"
    },
    {
        "data": {
            "airquality": "Fresh",
            "airquality_raw": 17,
            "dust": 903.93,
            "humidity": 55.4,
            "light": 283,
            "location": [
                6.035054,
                46.1558904
            ],
            "sound": 1336,
            "temperature": 8,
            "uv": 267.74
        },
        "source": "ci4lr75sf000602ypyfkxnua3",
        "timestamp": "2015-01-20T13:53:37.000Z"
    },
    {
        "data": {
            "airquality": "Fresh",
            "airquality_raw": 17,
            "dust": 1349.25,
            "humidity": 55.3,
            "light": 283,
            "location": [
                6.035054,
                46.1558904
            ],
            "sound": 1312,
            "temperature": 8,
            "uv": 267.74
        },
        "source": "ci4lr75sf000602ypyfkxnua3",
        "timestamp": "2015-01-20T13:53:47.000Z"
    },
    {
        "data": {
            "airquality": "Fresh",
            "airquality_raw": 17,
            "dust": 1349.25,
            "humidity": 55.3,
            "light": 265,
            "location": [
                6.035054,
                46.1558904
            ],
            "sound": 1324,
            "temperature": 8,
            "uv": 267.74
        },
        "source": "ci4lr75sf000602ypyfkxnua3",
        "timestamp": "2015-01-20T13:53:57.000Z"
    },
    {
        "data": {
            "airquality": "Fresh",
            "airquality_raw": 17,
            "dust": 1349.25,
            "humidity": 55.2,
            "light": 265,
            "location": [
                6.035054,
                46.1558904
            ],
            "sound": 1348,
            "temperature": 8,
            "uv": 267.74
        },
        "source": "ci4lr75sf000602ypyfkxnua3",
        "timestamp": "2015-01-20T13:54:07.000Z"
    },
    {
        "data": {
            "airquality": "Fresh",
            "airquality_raw": 17,
            "dust": 996.08,
            "humidity": 55.2,
            "light": 265,
            "location": [
                6.035054,
                46.1558904
            ],
            "sound": 1320,
            "temperature": 8,
            "uv": 267.74
        },
        "source": "ci4lr75sf000602ypyfkxnua3",
        "timestamp": "2015-01-20T13:54:17.000Z"
    }
]

Note that this only seems to affect large data sets which seem to also have incorrect Content-Length headers.

Add an aggregation endpoint

We're going to want some server-side aggregation of stats (daily, weekly, monthly to start?). We'll want it by source and probably additionally grouped by location. If responses are coming in every 10 seconds, that's way more than we want to send over the wire for clients to process.

Out of range query params should trigger a 4XX response

Currently, the count query param is capped at 1000. This is opaque to the user of the API which gets the same data set whether the count query param is set to 1000 or 100000000.

Instead of silently capping the value, I suggest returning a 4XX response with an appropriate error message when the count query param is bigger than 1000.

Switch to an ORM

Raw queries were great for quick prototyping, but now we should introduce a layer of abstraction. Sequelize seems like a good option, and I think it uses the same low-level drive we currently use.

We should have tests in place (#1) before we do this.

Get details for a source

A GET to http://localdata-sensors.herokuapp.com/api/v1/sources/ci4omwt7k0003nm0u92mt03al should return details (like name, latlng)

Enhance logging

We should stop logging the no-api-version warning, since we have a large deployed population of devices that fail to reference the version.

We aren't adding much info, if any, to the Heroku router's logs for entry POSTs. We can consider logging some info for source POSTs or GETs, in case we want to see user agent strings, for example.

The express logs reference the router's internal IP address rather than the IP address of the original client.

Provide timezone info for each data point

Displaying the data with meaningful time indications implies figuring out the timezone and eventual DST of each device.

This is doable on the fly, but requires relying on a third party to get the information for every timestamp.

Could this be provided by the API itself?

As the time offset might change with DST, this info would need to be provided for each data point.

Endpoint for finding sources

We will want to find all sources by location, or simply all sources. A GET to /api/v1/sources should return something interesting that helps!

Create a test suite

Switch to knex.js

Knex.js has a streaming interface that uses a cursor (via pg-query-stream). Since we're accessing chunks of fairly raw time-series data, rather than objects with more structured relationships, we don't take advantage of Sequelize's higher-level features. Initial investigations suggest that knex might just be nicer to work with, too.

The streaming interface indirectly necessitates the use of the javascript pg client (vs. the native client), so we should look at the performance before and after.