sensors's People
sensors's Issues
Add New Relic monitoring
Percentile stats would be great (95th and/or 99th percentile for service times), as would visibility into individual request types. Right now we only have coarse POST vs. GET via log-deranged + librato, and only mean/max.
We should test first on a dev service with simulated high-rate POSTs. The agent warnings we've seen elsewhere make me nervous. We should look for warnings that the agent's connection has been cut off, and we should monitor the memory usage.
Add administrative routes for inspecting sources and entries
As administrators of the service, we want to see the sources and their activity. We may also want to see the latest overall activity for the system.
Create a notion of sets of sources
A Set has one or more Sources as well as some metadata.
Creating/managing Sets requires some notion of users/permissions, though, which we don't currently have. Alternatively, this can be admin-only functionality until we create a user system.
Requesting an aggregation for a field that doesn't exist gives a 500
Ideally it would give a helpful error message.
The same query with a valid field works fine:
Support time-bounded queries
We can query pages of data, but we should also support querying based on time boundaries. We should probably enforce a max number of entries, unless we can cleanly stream the data end-to-end and avoid any memory issues.
Stale cache issues when querying the API for large data sets
Hi,
I'm seeing stale cache issues across the API this morning, and I'm apparently not the only one with this problem.
A cursory look at the source code doesn't provide any good explanation to this issue as I'm not seeing any HTTP cache headers set at the application level.
cURL
'ing the API however indicates the presence of an Etag
header, possibly added my Heroku middleware(?).
More importantly, this header doesn't seem to change with the body content, for example, querying the same URL at a two minutes interval gives the following headers:
curl -I "https://localdata-sensors.herokuapp.com/api/v1/sources/ci4lr75sf000602ypyfkxnua3/entries?startIndex=0&count=100000000000"
HTTP/1.1 200 OK
Server: Cowboy
Connection: keep-alive
X-Powered-By: Express
Access-Control-Allow-Origin: *
Access-Control-Allow-Headers: Content-Type
Content-Type: application/json; charset=utf-8
Content-Length: 245773
Etag: W/"EcvTQgeOsE1g83G22g9JuQ=="
Vary: Accept-Encoding
Date: Tue, 20 Jan 2015 13:52:59 GMT
Via: 1.1 vegur
$ curl -I "https://localdata-sensors.herokuapp.com/api/v1/sources/ci4lr75sf000602ypyfkxnua3/entries?startIndex=0&count=100000000000"
HTTP/1.1 200 OK
Server: Cowboy
Connection: keep-alive
X-Powered-By: Express
Access-Control-Allow-Origin: *
Access-Control-Allow-Headers: Content-Type
Content-Type: application/json; charset=utf-8
Content-Length: 245773
Etag: W/"EcvTQgeOsE1g83G22g9JuQ=="
Vary: Accept-Encoding
Date: Tue, 20 Jan 2015 13:54:20 GMT
Via: 1.1 vegur
You'll notice that while the Date
varies, neither the Etag
header nor the Content-Length
header has changed, yet, multiple sensor data points have been added to the database in the meantime:
$ curl "https://localdata-sensors.herokuapp.com/api/v1/sources/ci4lr75sf000602ypyfkxnua3/entries?startIndex=7576&count=8" | python -mjson.tool
[
{
"data": {
"airquality": "Fresh",
"airquality_raw": 17,
"dust": 1033.2,
"humidity": 55.6,
"light": 272,
"location": [
6.035054,
46.1558904
],
"sound": 1348,
"temperature": 8,
"uv": 267.74
},
"source": "ci4lr75sf000602ypyfkxnua3",
"timestamp": "2015-01-20T13:53:07.000Z"
},
{
"data": {
"airquality": "Fresh",
"airquality_raw": 17,
"dust": 903.93,
"humidity": 55.4,
"light": 301,
"location": [
6.035054,
46.1558904
],
"sound": 1276,
"temperature": 8,
"uv": 267.74
},
"source": "ci4lr75sf000602ypyfkxnua3",
"timestamp": "2015-01-20T13:53:17.000Z"
},
{
"data": {
"airquality": "Fresh",
"airquality_raw": 17,
"dust": 903.93,
"humidity": 55.4,
"light": 283,
"location": [
6.035054,
46.1558904
],
"sound": 1324,
"temperature": 8,
"uv": 267.74
},
"source": "ci4lr75sf000602ypyfkxnua3",
"timestamp": "2015-01-20T13:53:27.000Z"
},
{
"data": {
"airquality": "Fresh",
"airquality_raw": 17,
"dust": 903.93,
"humidity": 55.4,
"light": 283,
"location": [
6.035054,
46.1558904
],
"sound": 1336,
"temperature": 8,
"uv": 267.74
},
"source": "ci4lr75sf000602ypyfkxnua3",
"timestamp": "2015-01-20T13:53:37.000Z"
},
{
"data": {
"airquality": "Fresh",
"airquality_raw": 17,
"dust": 1349.25,
"humidity": 55.3,
"light": 283,
"location": [
6.035054,
46.1558904
],
"sound": 1312,
"temperature": 8,
"uv": 267.74
},
"source": "ci4lr75sf000602ypyfkxnua3",
"timestamp": "2015-01-20T13:53:47.000Z"
},
{
"data": {
"airquality": "Fresh",
"airquality_raw": 17,
"dust": 1349.25,
"humidity": 55.3,
"light": 265,
"location": [
6.035054,
46.1558904
],
"sound": 1324,
"temperature": 8,
"uv": 267.74
},
"source": "ci4lr75sf000602ypyfkxnua3",
"timestamp": "2015-01-20T13:53:57.000Z"
},
{
"data": {
"airquality": "Fresh",
"airquality_raw": 17,
"dust": 1349.25,
"humidity": 55.2,
"light": 265,
"location": [
6.035054,
46.1558904
],
"sound": 1348,
"temperature": 8,
"uv": 267.74
},
"source": "ci4lr75sf000602ypyfkxnua3",
"timestamp": "2015-01-20T13:54:07.000Z"
},
{
"data": {
"airquality": "Fresh",
"airquality_raw": 17,
"dust": 996.08,
"humidity": 55.2,
"light": 265,
"location": [
6.035054,
46.1558904
],
"sound": 1320,
"temperature": 8,
"uv": 267.74
},
"source": "ci4lr75sf000602ypyfkxnua3",
"timestamp": "2015-01-20T13:54:17.000Z"
}
]
Note that this only seems to affect large data sets which seem to also have incorrect Content-Length
headers.
Add an aggregation endpoint
We're going to want some server-side aggregation of stats (daily, weekly, monthly to start?). We'll want it by source
and probably additionally grouped by location
. If responses are coming in every 10 seconds, that's way more than we want to send over the wire for clients to process.
Out of range query params should trigger a 4XX response
Currently, the count
query param is capped at 1000
. This is opaque to the user of the API which gets the same data set whether the count
query param is set to 1000
or 100000000
.
Instead of silently capping the value, I suggest returning a 4XX response with an appropriate error message when the count
query param is bigger than 1000
.
Switch to an ORM
Raw queries were great for quick prototyping, but now we should introduce a layer of abstraction. Sequelize seems like a good option, and I think it uses the same low-level drive we currently use.
We should have tests in place (#1) before we do this.
Get details for a source
A GET to http://localdata-sensors.herokuapp.com/api/v1/sources/ci4omwt7k0003nm0u92mt03al
should return details (like name, latlng)
Enhance logging
We should stop logging the no-api-version warning, since we have a large deployed population of devices that fail to reference the version.
We aren't adding much info, if any, to the Heroku router's logs for entry POSTs. We can consider logging some info for source POSTs or GETs, in case we want to see user agent strings, for example.
The express logs reference the router's internal IP address rather than the IP address of the original client.
Provide timezone info for each data point
Displaying the data with meaningful time indications implies figuring out the timezone and eventual DST of each device.
This is doable on the fly, but requires relying on a third party to get the information for every timestamp.
Could this be provided by the API itself?
As the time offset might change with DST, this info would need to be provided for each data point.
Endpoint for finding sources
We will want to find all sources by location, or simply all sources. A GET
to /api/v1/sources
should return something interesting that helps!
Create a test suite
Switch to knex.js
Knex.js has a streaming interface that uses a cursor (via pg-query-stream). Since we're accessing chunks of fairly raw time-series data, rather than objects with more structured relationships, we don't take advantage of Sequelize's higher-level features. Initial investigations suggest that knex might just be nicer to work with, too.
The streaming interface indirectly necessitates the use of the javascript pg client (vs. the native client), so we should look at the performance before and after.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.