api-traffic-processors's People
api-traffic-processors's Issues
track user-agent or other headers
Update log parsing for xonacatl
Xonacatl is making subrequests back into fastly. We should not double count these requests. Xonacatl is passing a request header that can be used to distinguish these requests.
pelias bouncer integration confusing cache-hit logic
looks like things that hit apiaxle-bouncer may be called cache-hits, which confuses the logic for determining if things are duplicates or not.
analytics=# select * from api_hits where key = 'mapzen-Lp2EVzY' and ts >= '2017-08-11' and status != '200';
ts | api | key | status | origin | duplicate
-------------------------+---------------+----------------+------------------+---------+-----------
2017-08-11 03:24:18.471 | pelias-search | mapzen-Lp2EVzY | QpsExceededError | apiaxle | f
2017-08-11 03:24:18 | pelias-search | mapzen-Lp2EVzY | 429 | fastly | t
2017-08-11 08:31:34 | pelias-search | mapzen-Lp2EVzY | 429 | fastly | t
2017-08-11 08:05:32 | pelias-search | mapzen-Lp2EVzY | 400 | fastly | t
2017-08-11 01:55:21.875 | pelias-search | mapzen-Lp2EVzY | QpsExceededError | apiaxle | f
2017-08-11 01:55:22 | pelias-search | mapzen-Lp2EVzY | 429 | fastly | t
2017-08-11 05:20:42 | pelias-search | mapzen-Lp2EVzY | 502 | fastly | t
2017-08-11 06:08:49 | pelias-search | mapzen-Lp2EVzY | 400 | fastly | t
2017-08-11 03:24:18.497 | pelias-search | mapzen-Lp2EVzY | QpsExceededError | apiaxle | f
2017-08-11 03:16:40.936 | pelias-search | mapzen-Lp2EVzY | QpsExceededError | apiaxle | f
2017-08-11 03:16:41 | pelias-search | mapzen-Lp2EVzY | 429 | fastly | t
2017-08-11 03:23:58 | pelias-search | mapzen-Lp2EVzY | 400 | fastly | t
2017-08-11 03:24:18 | pelias-search | mapzen-Lp2EVzY | 429 | fastly | t
2017-08-11 08:31:33.785 | pelias-search | mapzen-Lp2EVzY | QpsExceededError | apiaxle | f
2017-08-11 09:02:08 | pelias-search | mapzen-Lp2EVzY | 400 | fastly | t
(15 rows)
analytics=# select * from api_hits_minute where key = 'mapzen-Lp2EVzY' and ts >= '2017-08-11' and status != '200';
ts | api | key | status | source | hits
---------------------+---------------+----------------+------------------+---------+------
2017-08-11 01:55:00 | pelias-search | mapzen-Lp2EVzY | QpsExceededError | apiaxle | 1
2017-08-11 03:16:00 | pelias-search | mapzen-Lp2EVzY | QpsExceededError | apiaxle | 1
2017-08-11 03:24:00 | pelias-search | mapzen-Lp2EVzY | QpsExceededError | apiaxle | 2
2017-08-11 08:31:00 | pelias-search | mapzen-Lp2EVzY | QpsExceededError | apiaxle | 1
(4 rows)
Logs from Routing, Matrix, and Elevation
@migurski what do you think about adding path and query columns to api_hits? that way there'd be basic regex searchability into all services, and there's nothing that needs updating if people's url schemes change.
if we decide we need easier querying we can always add in custom service tables on a case by case basis.
cc: @kdiluca
add batching for apiaxle to kinesis
Update log parsing for tapalcatl
With tapalcatl getting deployed shortly, we should look into ensuring that the logs will get parsed correctly. Fastly will be adding a new header for requests served by tapalcatl.
Investigate null sizes for tile_traffic_v4
The size is not getting parsed correctly for tile_traffic_v4
, and we are seeing null
values in the column.
map pelias-search to search
add apiaxle status column to api_hits
currently apiaxle has it's own version of various status codes, for example 429 will show up as QpdExceededError or QpsExceededError etc.
it would be better to have all 429s show up as status 429, with an optional apiaxle status column that shows the specific apiaxle version.
firehose errors not rendering fully
look like this:
2016-10-06_10:47:30.40877 firehose error 1: null[object Object]
2016-10-06_11:09:12.78511 firehose error 1: null[object Object]
2016-10-06_11:30:55.58640 firehose error 1: null[object Object]
2016-10-06_11:58:20.12794 firehose error 1: null[object Object]
2016-10-06_12:25:54.48580 firehose error 1: null[object Object]
https://github.com/mapzen/api-traffic-processors/blob/master/exporters/kinesisExporter.js#L34
bad paths cause vector-tiles parser to enter nulls in api and key columns
Update vector service log parsing
With new tile urls landing shortly, we'll need to update the log parsing accordingly.
The high level changes are that we need to grow the schema to track:
- username used in the url
- service, ie basemap tile request vs terrain tile
keyless ips getting parsed as keys
this is ok, but currently its only happening on the apiaxle side. maybe we should log ips for keyless usage from fastly as well.
add sortkeys and compression to redshift tables
api_hits, vector_traffic_v2, pelias_traffic_v2 could use sortkeys and column compression
add process for populating vector_traffic and pelias_traffic tables
standardize handling of multiple apikeys
fastly, bouncer, and traffic-processors all seem to do slightly different things when handling requests like this:
curl 'https://tile.mapzen.com/mapzen/vector/v1/all/16/235/123.json?api_key=badkey&api_key=mapzen-sHDNySA'
this messes up our stats and can potentially be abused for infinite free requests etc.
retry on failed firehose put
it wasn't clear initially how or if firehose puts were going to fail.
if they fail like this then we should be returning an error to trigger a retry.
apiaxle firehose errors need to be retried
diagram for readme
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.