thecacophonyproject / cacophony-api Goto Github PK
View Code? Open in Web Editor NEWAPI server where Cacophony Project recordings are stored & managed
Home Page: https://api.cacophony.org.nz
License: GNU Affero General Public License v3.0
API server where Cacophony Project recordings are stored & managed
Home Page: https://api.cacophony.org.nz
License: GNU Affero General Public License v3.0
The version of sequelize
(and consequently pg
) that we use is related to several security vulnerabilities. It would also be nice to have some of the v4 feature available.
Upgrading will probably break a few things but the tests will hopefully highlight this.
The web UI functionality has been moved out to a different repo but it also still exists here.
This field is redundant and less meaningful than the HTTP status code.
This can't be done until the Cacophonometer app stops relying on this field
With the code deployed to the test server (fd2b628) the queries for next and previous recordings are failing. This is possibly related to the sequelize 4 changes.
Specifically:
Also remove these from the model.
README says that only config.js needs to be made, however it appears that server.json & database.json are also required. There is minimal documentation on how to do this and I am not confident in what they do.
Neither the raw nor processes files have their fileSize recorded in the metadata, so their fields are always null.
Error msg is clear in that the user needs set up the server, but there are not doc's for where to store the public/private key (In the config.js I presume).
Let's go with all lower case as is commonly done for Go projects.
When the JWT is missing or invalid the API should return 401 not 422./
"animal" and "event" are aliases for "what" and "detail". Once we are sure the legacy names are no longer being used, remove them from the API server.
Sometimes when there is bad connectivity files get uploaded a second time. We want to make sure the the second recording gets deleted to stop confusing our users and save space and processing time.
It seems that it's only possible to delete tags that you added yourself. If you are a (write) superuser or a device admin you should also be delete tags for a device, regardless of who created them.
When MP4 files are downloaded you get a nicely named file based on the timestamp but CPTV files are just called "rawData". It would be great if CPTV files could use the same naming scheme.
The API 'where', 'limit', 'offset' parameters are currently given in the request headers. This should be changed to the request parameters.
API doc is missing the devices field
The current Docker setup is great for running the tests but it would be nice to be able to have a Docker container for developing with too. This should:
cacophony-api-test
so that a development container is separate from any test containernodemon
to reload the server when the code changesOnce this is done much of the README can be removed as the Docker setup describes how everything hangs together.
Especially for the processed recordings, if recording content isn't available, a JWT shouldn't be returned for it. Current a JWT with an empty key is returned.
When a DB update fails during an upload, the client is left hanging for a long time instead of the request failing immediately.
If a user has been associated with a device via DeviceUsers they don't get the same level of access as if they were associated via GroupUsers.
Instead of managing its own log file the API server should just log to stdout and let systemd manage the log output. This is better aligned with modern practices things and avoids issues with managing log file permissions and rotation.
The cacophony-web project uses and uppercase JWT query param, the api now requires lowercase as of the following commit:
Fixed by changing to:
const token = ExtractJwt.fromAuthHeaderWithScheme('JWT')(req);
Username and password constraints are not checked in PATCH, so users can set their username or password to a single digit.
Luckily the username is enforced unique at the DB level so you can't overwrite users or do anything too silly.
I discovered this when playing around with visualising the number of recordings per device.
For the following query: {type: "thermalRaw", DeviceId: "373"}
To this URL: https://api-test.cacophony.org.nz/api/v1/recordings
When I first call the API with limit = 100, it returns 100 rows of recordings and a count of 2402
When I call the API again with limit = 2402, it returns 2322 rows ie 80 rows are missing.
I hunted around with different dates and found this situation - clearly a count of 3 but only 2 rows showing:
Add track based tags to output, the same as Recording query API.
Add a flag to the query to get all track based tag data including positions
Currently they fail with a 5xx.
When we deploy the API server in our test and prod environments we frequently see 502 errors from the Caddy server which proxies requests from the outside world to the API server. This only happens for recording uploads.
We're not sure yet whether there's a problem with Caddy or problem with the API server.
Currently the PATCH /api/v1/users endpoint expects a JSON object containing a single "data" field, which contains a string which is supposed to decode to the expected JSON blob.
There's no reason to have this indirection, instead we should take the fields directly in JSON.
Current:
{
"data": "{\"password\": \"a_new_password\", \"username\": \"a_new_username\"}"
}
New proposed:
{
"password": "a_new_password",
"username": "a_new_username"
}
As an intermediate period we can still support both while downstream services switch. This would be done by checking if there was a "data" field, if so use the legacy implementation. If not then use the new. Once all services are migrated, remove legacy.
When adding comments via the UI the API request fails with a 500 error.
Validation error responses (see customErrors.js) have a message
field while other responses have a messages
field. This is confusing and annoying. We should settle on a single way of returning messages. An array (i.e. messages
) is a better choice as it gives the client more flexibility in terms of presenting multiple errors to the user.
To aid the transition period as we update client code the API server should probably return both message
and messages
for a while for validation errors. Once all client code has been updated the message
field can go.
Currently the recording just gets deleted from the database but the files should also be deleted from the object store too. To avoid keeping the client waiting, it might be nice to return back to HTTP client once the database deletion has been completed and then continue with the object store removals afterwards.
The apidoc styling we use means the sidebar takes up most of the screen (it doesn't disappear on smaller displays). See if a newer apidoc or an alternative set of templates fixes this.
It's looking like the file processing API is sometimes giving out the same recording to multiple processing instances. This is likely to be because transactions aren't being used properly when setting the processing state for a recording.
With with AudioRecordings PUT API the fields that are allowed to be updated are limited. The Recordings PATCH API has no limitations on the fields that may be updated. This is somewhat dangerous.
This will probably fix #23.
In the README it is stated that bcrypt is needed to be installed separately from the initial npm install
to "fix a bug." Sounds dubious and should be resolved.
The send
function in api/V1/responseUtil.js
should automatically generate the success boolean based on the HTTP status code instead of requiring that it is passed in.
All call sites need to be updated to match.
Doing this then makes #48 easier when we're ready to do that.
When a file processing job is requests, if the file-processing fails and doesn't respond the file processing job should be included when doing an API request so it can be processed.
This can be detected by if the file processing start time is over an hour old, as a file should not take more than an hour to be processed.
It would be nice if the logs directory was created if not present.
When the client drops the connection during an upload we get this:
Jun 7 08:49:26 cacostore01 node[30755]: info: POST Request: /api/v1/recordings
Jun 7 08:49:26 cacostore01 node[30755]: debug: started streaming upload to bucket...
Jun 7 08:49:56 cacostore01 node[30755]: error: Error: Request aborted
Jun 7 08:49:56 cacostore01 node[30755]: at IncomingMessage.onReqAborted (/srv/cacophony/api/node_modules/multiparty/index.js:183:17)
Jun 7 08:49:56 cacostore01 node[30755]: at emitNone (events.js:105:13)
Jun 7 08:49:56 cacostore01 node[30755]: at IncomingMessage.emit (events.js:207:7)
Jun 7 08:49:56 cacostore01 node[30755]: at abortIncoming (_http_server.js:410:9)
Jun 7 08:49:56 cacostore01 node[30755]: at socketOnClose (_http_server.js:404:3)
Jun 7 08:49:56 cacostore01 node[30755]: at emitOne (events.js:120:20)
Jun 7 08:49:56 cacostore01 node[30755]: at Socket.emit (events.js:210:7)
Jun 7 08:49:56 cacostore01 node[30755]: at TCP._handle.close [as _onclose] (net.js:545:12)
Jun 7 08:49:56 cacostore01 node[30755]: info: POST /api/v1/recordings 500 29935ms
Jun 7 08:49:56 cacostore01 node[30755]: (node:30755) UnhandledPromiseRejectionWarning: Unhandled promise rejection (rejection id: 22): Error: Request aborted
This isn't too bad but could probably be handled more cleanly.
The total recording count shown on the search page is often completely wrong. It's typically much higher (double?) than the actual number. I suspect this is because of joins to other tables.
Instead of accepting sequelize query expressions the recordings search API should take the different search critiera as query parameters.
This is more robust and gives us flexibility in terms of moving away from sequelizejs.
It also makes things like this easier: TheCacophonyProject/cacophony-browse#150
There's a lot of errors reported. Some are minor (but worth sorting out) - some look more serious.
Currently multiple endpoints expect a field in the request body to be a string serialised JSON object. This makes sense in GET
requests where the query params must be strings, but in the bodys of POST
/PATCH
endpoints, we should either enforce or support JSON objects as opposed to their string representations.
In some cases, the request
PATCH
/api/v1/users
- data field in bodyPOST
/api/v1/tags
- tag field in bodyPATCH
/api/v1/recordings/:id
- updates field in bodyPOST
/api/v1/recordings/:id/tracks
- data and algorithm fields in bodyPOST
/api/v1/recordings/:id/tracks/:trackId/tags
- data field in bodyPOST
/api/v1/recordings/reprocess/multiple
- recordings field in bodyPOST
/api/v1/schedules
- devices and schedule fields in body/api/fileProcessing
- Multiple locationsIn some cases these APIs only expect this single string, in which case the expected content in that JSON string should be come the new root values in the JSON body.
In other cases, there is other expected input in the body as well. In these cases the same field should be used, but its content should be the actual JSON object, not a string serialisation of it.
If single root field in body:
Current:
{
"data": "{\"password\": \"a_new_password\", \"username\": \"a_new_username\"}"
}
New proposed:
{
"password": "a_new_password",
"username": "a_new_username"
}
If multiple root fields in body:
Current:
{
"data": "{\"password\": \"a_new_password\", \"username\": \"a_new_username\"}",
"otherFields": "blah"
}
New proposed:
{
"data": {
"password": "a_new_password",
"username": "a_new_username"
},
"otherFields": "blah"
}
As an intermediate period we can still support both while downstream services switch. This would be done by checking the type of the field, if it's a string use the legacy implementation. If not then use the new. Once all services are migrated, remove legacy.
There is something to be noted in the priority of fixes. I've already made a PR to fix the PATCH
/api/v1/users
endpoint, as that's something that we could expect to be used more frequently than others. We should prioritise endpoints that will see more frequent use, especially by other developers if they so choose. Internal endpoints such as those related to file processing should still be fixed, but aren't as necessary.
Instead of logging in each API handler method
Perhaps pedantic but it makes sense as json is specifically for persistence and server.json & database.json already exist.
It's currently harder to troubleshoot issues then it should be because request logs don't include the related user name or device id.
https://api.cacophony.org.nz/#api-SignedUrl-GetFile
Spec states that the JWT should be given in the "Authorization" header and begin with "JWT "
Actual current implementation expects a query param named "jwt" not starting with the "JWT "
Should fix to the documented way as it's better.
This is another multi step fix, first add support for header method, then migrate existing to that, then remove query param.
We could just update the spec to the query param but a header is the way a JWT should be sent so we should really fix it properly IMO.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.