tnc-ca-geo / animl-api Goto Github PK
View Code? Open in Web Editor NEWBackend for https://animl.camera
Backend for https://animl.camera
Per Dev Seed's recommendation. The backend has a variety of clients and increasing number of users, so it probably makes sense to move to a dedicated instance.
If a model detects something it's been trained on, return that, if both are empty, return empty, if there's a conflict (e.g. mira-large thinks it's a fox and mira-small thinks it's a rodent), return both.
Right now if a user is a superuser, getProjects will return all projects, but the user may not have roles for all of those projects in Cognito, which causes the front end to crash if they then select a project from the project nav for which they have no roles. This might be more appropriate to fix on the front end. Needs investigation.
The following query should return 3 images from that camera, but something about the time comparison is off.
"input": {
...
"cameras": ["X8114541"],
"addedEnd": "2020:10:27 05:05:24",
"addedStart": "2020:07:27 05:05:24"
}
We call moment's toDate() method when we build the query, which converts it to UTC (no offset):
addedStart: Moment<2020-07-27T05:05:24-07:00>
input.addedStart.toDate(): 2020-07-27T12:05:24.000Z
The dateAdded is formatted correctly when it's sent to the front end:
"2020-10-27 03:03:44"
But the time looks strange in the db???
2020-10-27T22:03:47.322+00:00
Not an issue, but I wanted to document my decision making on this somewhere. @postfalk, this might be relevant to keep in mind as we work on developing serverless best practices.
Originally, we were using .env files to store secrets and sensitive config values (DB connection strings, API keys, etc.). However, this got clunky and hard to maintain because often the same secrets are used by multiple individual stacks (animl-ingest, animl-api, animl-frontend), each of which have their own .env, and in some cases some of those shared values are generated dynamically by the Serverless stack build process (e.g. API Gateway entrypoints, SQS URLs).
Option 1: Cloudfront cross-stack-references
At first I tried using Cloudfront Outputs and creating cross-stack-references to import them into other serverless stacks. Essentially, within each stack's Serverless config file, you can concat AWS vars to create some of those dynamic strings (e.g. API gateway URL, SQS URL), which can then be passed directly into that specific stack environment and/or Output to be available to other stacks:
From animl-api serverless.yml file:
...
custom:
apiUrl: !Sub https://${ApiGatewayRestApi}.execute-api.${AWS::Region}.amazonaws.com/${self:provider.stage}/
inferenceQueueUrl: !Sub https://sqs.${AWS::Region}.amazonaws.com/${AWS::AccountId}/inferenceQueue-${self:provider.stage}
imagesUrl: ${cf:animl-ingest.imagesUrl-${opt:stage, self:provider.stage, 'dev'}} # importing values created by other stacks
environment:
ANIML_API_URL: ${self.custom.apiUrl}
INFERENCE_QUEUE_URL: ${self.custom.inferenceQueueUrl}
ANIML_IMAGES_URL: ${self.custom.imagesUrl}
...
Outputs:
animlApiUrl:
Value: ${self.custom.apiUrl}
Export:
Name: animlApiUrl-${opt:stage, self:provider.stage, 'dev'}. # exporting values to be available to other stacks
From animl-ingest serverless.yml file:
Outputs:
imagesUrl:
Description: Cloudfront distro domain name for image bucket
Value: { "Fn::GetAtt": ["CloudfrontDistributionAnimldataproduction", "DomainName"] }
Export:
Name: imagesUrl-${opt:stage, self:provider.stage, 'dev'}
The advantages of this are that if those if dynamically created values change, they would be automatically pulled into the stack's env and also made available to others. However, there are a bunch of disadvantages: other stacks would still have to redeploy to get the updated URLs if they were to change, it creates a convoluted web of outputs and imports that will be hard to maintain, we'd still have a bunch of other env variables (e.g. API keys) that are not Serverless outputs that we need to manage, and the frontend doesn’t use Serverless at all so we'd still be manually keeping those values in sync.
Clearly, a centralized store for these variables seemed sensible.
Option 2: SSM Parameter Store
After reading up a bit on cloud-based secrets management options, it seemed like the AWS SSM Parameter Store would offer a simple, cheap, secure tool to store and retrieve secrets and shared config data, with the caveat that we'd still need to manually create and manage the keys & values in the AWS console*. Another nice advantage of this is that while we could pull SSM values in the serverless.yml config files, we can also use the AWS SDK to fetch SSM values at runtime, so we wouldn't have to redeploying stacks that depend on certain SSM parameters every time they change.
Anyhow, that's the route I've taken for now, and so far I'm pretty happy with it.
*Future improvement idea: one way to add dynamically created Serverless Output values to the Parameter Store from the serverless build process might be to use serverless-stack-output plugin, which allows you to hook in a custom script that takes in the output as an argument. We could write an output handler script that uses the AWS-SDK’s SSM class to put and/or update those dynamically generated outputs in the parameter store.
Would have to draw the bbox on the image and then embed the image as base64 directly into the email.
Example Python code for how they do baked-in bounding box annotations in the MegaDetector repo: https://github.com/microsoft/CameraTraps/blob/5f0a558e3c923b99685ce2efa57d1bc92fc85de8/visualization/visualization_utils.py#L421
To support tnc-ca-geo/animl-frontend#19
We can process MIRA requests immediately (no API rate limit threshold like the Megadetector API) so this might make sense to speed up getting results.
After tnc-ca-geo/animl-ml#63 is complete.
After implementing #53, I discovered our approach for merging all of the filters in utils.js
buildFilters()
had a bug: I was using the spread operator to merge all of the filter objects, but if multiple filter objects had the $or
key, the earlier ones would get overwritten by later ones. Not sure how I missed this earlier!
So that users can validate/invalidate that prediction as well. Set the bounding box to the full image dimensions.
Related: tnc-ca-geo/animl-frontend#20
Borrow DevSeed's tooling & approach from shp2json? https://github.com/tnc-ca-geo/shp2json/blob/main/upload/test/generic.test.js
Check out Apollo Server's mocking and integration testing documentation: https://www.apollographql.com/docs/apollo-server/testing/testing/
E.g., after a user creates an automation rule, review all existing matches in the db, check whether or not they already have predictions from that model, and if not add to queue.
Look into using DataLoader for batching and caching results of queries
[Resource][Operation][Input]
(ImageCreateInput
rather than CreateImageInput
) for alphabetical groupingRight now the alert system (animl-api/src/automation/alerts.js) is just a bare-bones proof of concept.
Megadetector API can handle 8 images per request.
Currently, we do dupe Image checking when saving to the db by (1) ensuring the _id
field, which was generated by a md5 hashing function in animl-ingest, is unique, and (2) by enforcing a unique compound index in MongoDB for images (cameraId
+ dateTimeOriginal
):
ImageSchema.index(
{ cameraId: 1, dateTimeOriginal: -1 },
{ unique: true, sparse: true }
);
This has worked fine the vast majority of the time, however, sometimes two sequential images that were taken in a bursts have the same timestamp b/c their temporal resolution is limited to seconds, not milliseconds. This causes an incorrect duplicate error to be thrown, and the image gets rejected. I think the solution is to remove the unique
constraint from the secondary compound index and just rely on the _id
s being unique.
The dateTimeOriginal
timestamps are in PST and don't have a TZ offset (e.g. 2021:09:25 21:36:12
), and there’s no info in the exif data to indicate timezone. So when we cast the string to a date moment assumes they’re in UTC and they get saved to the DB as 2021-09-25T21:36:12.000+00:00
, which is the wrong time.
Meanwhile, we don’t convert Created Date or Added Date to local timezone on the frontend, even though added date is correctly created in UTC. Because we don’t parse it as PST when filtering and displaying it, it shows up as the UTC time and looks off.
Not sure what to do here... as a first step I'll compare dateTimeOriginal
from other camera makes and see if they do the same.
When we fix this we'll need to do a scripted update of all images in the prod DB.
A "deployment" is a specific camera at a specific location for a certain period of time. Users should not be forced to add "deployments" or set them manually (all cameras should start with a default deployment that doesn't have start/end dates). If a camera is put out in the field and never moved, the default deployment would suffice. But users should also be able to retroactively add & adjust deployments (move the start & end date, change the name, and potentially change permissions).
A first step will be to expand the "Camera" schema & implement revolvers to allow users to add names and other metadata.
Things to think about:
Right now if you try to upload an image that doesn't have a make, or if the make isn't Reconyx or BuckeyeCam, it fails at getUserSetData()
in animl-api/src/api/db/models/utils.js
. We should either have a more formal supported make check and fail sooner or be more flexible and accommodate images w/out make / alternative makes.
Think about making this work for the ecological / data analysis use case as well as the ML data training use case.
It might be a little tricky with GraphQL... would likely have to stringify the entire csv and return it as one field in the JSON response, and/or potentially convert it to base64.
We'd also have to make some decisions about the structure of the CSV:
Add data on what models an image is currently queued up and waiting on inference from to mage schema?
Just a list to think about:
(capable of handling duplicate messages). It's rare but a sure bet that we will replay SQS messages that have already been processed, resulting in unnecessary inference and duplicate labels. Do the following:
and move messages there that fail a bunch.
Use representations of "objects" (in the "megadetector detected an object" sense of the word) as the unit of storage, with arrays of potential labels stored within them. When new label is added (e.g. by MIRA), first see if there's already an object with the same bbox and add it to that, else create new object.
Right now the Megadetector API caps us at 10 requests per 5 minutes. I've talked with Dan Morris at MS AI 4 Earth, and he's totally fine with increasing this for us, I just need to get back to him with our expected usage. So that would be an obvious first step.
However, the whole inference worker is structured around this limitation: in order to not inundate the Megadetector with requests once we've maxed out, I have the inference handler.js function poll SQS for new messages every 5 minutes, pull the first 10 out of the queue, and process request inference on them. There's a bunch of ways this could be improved:
It's also worth noting that there are really two separate use cases that we need to support but might have different solutions: (1) real time inference from images coming into the system from wireless camera traps, and (2) bulk inference from users uploading images from a hard drive.
images with non-validated labels within locked objects are still passing the labels filter. For example, if you only have the 'rodent' label selected, it's returning locked objects that have a different label validated (e.g. skunk, bird, etc.) and the rodent label validation = null. There must be something wrong with the query here.
GraphQL doesn't send back HTTP status codes like a regular REST API would because there is not a 1-to-1 mapping of requests/queries to route handlers (or "resolvers" in graphQL parlance). Single queries can require many resolvers to fire and respond, so errors can occur in many places, multiple errors can be returned at once, and partial data might get returned. Because of this graphQL APIs will almost always return a 200
status, partial data if it can and an array of error messages if they occur, e.g.:
{
"data": {
"getInt": 12,
"getString": null
},
"errors": [
{
"message": "Failed to get string!",
// ...additional fields...
}
]
}
I like the approach Apollo Server took in Apollo server 2.0: it returns arrays of errors with an error[i].extensions
field that can contain useful info, most importantly a standardized code
field. You can also read more about their strategy here.
Unfortunately, the latest graphql-yoga is not running apollo-server 2.0 under the hood, so I had to implement a hacky formatError() function to make the errors look more like [ApolloErrors](https://github.com/apollographql/apollo-server/blob/main/packages/apollo-server-errors/src/index.ts)
. We'll see how that goes.
Another thing to consider might be to wrap all resolver functions in a helmet()
function that catches all db-related errors. See this article for more info.
It looks like if animl-ingest doesn’t get a 200 response, it will keep trying the saveImage call to animl-api. When trying to upload p_000309.jpg and p_000310.jpg in tandem, 310 saved successfully, but 309 got stuck I think in the addToAutomationQueue() step, and never returned a response to animal-ingest, causing it to keep trying (it saved multiple 309’s - which also mongo shouldn’t allow - so there seems to be a separate de-dupe issue)
- happened again with p_000536.jpg
Currently users set automation rules at the View level, but I think this is somewhat a relic from before we had implemented multi-tenancy, and it also poses issues now that users can configure confidence thresholds & disable classes. For example, if an image belongs to two views, and each view has an automation rule set to request inference from the same ml model but with different class settings, reconciling that would get complicated. We could maybe treat class settings as a front-end filter and always actually request inference with the same default settings, but I think the simpler option is to move all automation rules to the project level & apply them to all views within that project.
It would entail the following:
buildCallstack()
in scr/automation/utils.js
to reflect new structure. This will become much simpler: we will no longer need to check what view an image belongs to or de-dupe rules.updateView()
resolver, which we should still keep).Right now the images themselves are unprotected: if someone knew an ID they'd be able to request it.
To fix, create a resolver that checks authentication and then reads in the images data from the S3 bucket, encodes it in base 64, and returns it to the front end in a JSON object.
There are a couple places where animl-frontend requests two different updates to the same image in rapid succession.
Essentially we've created a race condition, and MongoDB blocks the second update by throwing a versioning error, e.g. : "Error: VersionError: No matching document found for id <_id> version 2"
. Because often the second request is to set the object.locked
property to true
, this most commonly manifests as objects not locking after being validated. I'm not yet sure how many times this has occurred. Relevant discussions of the issue:
I think the solution is to use Model.updateOne()
or Model.findOneAndUpdate()
rather than Model.find()
and Model.save()
in the Image model's updateObject()
function. Good documentation and explaination of the difference between .save() and .updateOne() can be found here, and here.
Things to test:
diffs
to set)?The goal is to implement an auth and user management system that supports the following:
We will utilize Cognito groups to organize both the projects and roles.
animl/sci_biosecurity/project_owner
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.