tnc-ca-geo / animl-api Goto Github PK

View Code? Open in Web Editor NEW

4.0 8.0 0.0 2.84 MB

Backend for https://animl.camera

JavaScript 99.96% Dockerfile 0.04%

camera-traps

animl-api's People

Contributors

Stargazers

Watchers

animl-api's Issues

Transition to ECS

Per Dev Seed's recommendation. The backend has a variety of clients and increasing number of users, so it probably makes sense to move to a dedicated instance.

Add MIRA label reconcilation function

If a model detects something it's been trained on, return that, if both are empty, return empty, if there's a conflict (e.g. mira-large thinks it's a fox and mira-small thinks it's a rodent), return both.

Set up AWS SES for automated emails, integrate into automation lambda

https://aws.amazon.com/ses/getting-started/

If an image has no objects, consider it "needs review"

see tnc-ca-geo/animl-frontend#92

Create script tools for updating db

Nice to have features:

modular design to support future scripted updates
first automatically backup db with mongodump
print out some of matched results? Match count?
before performing update, require command line confirmation w/ prompt

Fix bug with getProjects if user is superuser

Right now if a user is a superuser, getProjects will return all projects, but the user may not have roles for all of those projects in Cognito, which causes the front end to crash if they then select a project from the project nav for which they have no roles. This might be more appropriate to fix on the front end. Needs investigation.

Add Mira models to automation lambda

DateAdded query issue

The following query should return 3 images from that camera, but something about the time comparison is off.

    "input": {
        ...
        "cameras": ["X8114541"],
        "addedEnd": "2020:10:27 05:05:24",
        "addedStart": "2020:07:27 05:05:24"
    }

We call moment's toDate() method when we build the query, which converts it to UTC (no offset):

addedStart:  Moment<2020-07-27T05:05:24-07:00>
input.addedStart.toDate():  2020-07-27T12:05:24.000Z

The dateAdded is formatted correctly when it's sent to the front end:

"2020-10-27 03:03:44"

But the time looks strange in the db???

2020-10-27T22:03:47.322+00:00

Create query reucer for getStats

See tnc-ca-geo/animl-frontend#91

A note on using SSM Parameter Store for storing secrets

Not an issue, but I wanted to document my decision making on this somewhere. @postfalk, this might be relevant to keep in mind as we work on developing serverless best practices.

Originally, we were using .env files to store secrets and sensitive config values (DB connection strings, API keys, etc.). However, this got clunky and hard to maintain because often the same secrets are used by multiple individual stacks (animl-ingest, animl-api, animl-frontend), each of which have their own .env, and in some cases some of those shared values are generated dynamically by the Serverless stack build process (e.g. API Gateway entrypoints, SQS URLs).

Option 1: Cloudfront cross-stack-references
At first I tried using Cloudfront Outputs and creating cross-stack-references to import them into other serverless stacks. Essentially, within each stack's Serverless config file, you can concat AWS vars to create some of those dynamic strings (e.g. API gateway URL, SQS URL), which can then be passed directly into that specific stack environment and/or Output to be available to other stacks:

From animl-api serverless.yml file:

  ...
  custom:
    apiUrl: !Sub https://${ApiGatewayRestApi}.execute-api.${AWS::Region}.amazonaws.com/${self:provider.stage}/
    inferenceQueueUrl: !Sub https://sqs.${AWS::Region}.amazonaws.com/${AWS::AccountId}/inferenceQueue-${self:provider.stage}
    imagesUrl: ${cf:animl-ingest.imagesUrl-${opt:stage, self:provider.stage, 'dev'}}  # importing values created by other stacks
  environment: 
    ANIML_API_URL: ${self.custom.apiUrl}
    INFERENCE_QUEUE_URL: ${self.custom.inferenceQueueUrl}
    ANIML_IMAGES_URL: ${self.custom.imagesUrl}
  ...
  Outputs:
    animlApiUrl:
      Value: ${self.custom.apiUrl}
      Export:
        Name: animlApiUrl-${opt:stage, self:provider.stage, 'dev'}. # exporting values to be available to other stacks

From animl-ingest serverless.yml file:

  Outputs:
    imagesUrl:
      Description: Cloudfront distro domain name for image bucket
      Value: { "Fn::GetAtt": ["CloudfrontDistributionAnimldataproduction", "DomainName"] }
      Export:
        Name: imagesUrl-${opt:stage, self:provider.stage, 'dev'}

The advantages of this are that if those if dynamically created values change, they would be automatically pulled into the stack's env and also made available to others. However, there are a bunch of disadvantages: other stacks would still have to redeploy to get the updated URLs if they were to change, it creates a convoluted web of outputs and imports that will be hard to maintain, we'd still have a bunch of other env variables (e.g. API keys) that are not Serverless outputs that we need to manage, and the frontend doesn’t use Serverless at all so we'd still be manually keeping those values in sync.

Clearly, a centralized store for these variables seemed sensible.

Option 2: SSM Parameter Store
After reading up a bit on cloud-based secrets management options, it seemed like the AWS SSM Parameter Store would offer a simple, cheap, secure tool to store and retrieve secrets and shared config data, with the caveat that we'd still need to manually create and manage the keys & values in the AWS console*. Another nice advantage of this is that while we could pull SSM values in the serverless.yml config files, we can also use the AWS SDK to fetch SSM values at runtime, so we wouldn't have to redeploying stacks that depend on certain SSM parameters every time they change.

Anyhow, that's the route I've taken for now, and so far I'm pretty happy with it.

*Future improvement idea: one way to add dynamically created Serverless Output values to the Parameter Store from the serverless build process might be to use serverless-stack-output plugin, which allows you to hook in a custom script that takes in the output as an argument. We could write an output handler script that uses the AWS-SDK’s SSM class to put and/or update those dynamically generated outputs in the parameter store.

bake bboxes into images in alert emails

Would have to draw the bbox on the image and then embed the image as base64 directly into the email.

Example Python code for how they do baked-in bounding box annotations in the MegaDetector repo: https://github.com/microsoft/CameraTraps/blob/5f0a558e3c923b99685ce2efa57d1bc92fc85de8/visualization/visualization_utils.py#L421

Add geospatial functionality

To support tnc-ca-geo/animl-frontend#19

Create a seperate queue for MIRA inference requests?

We can process MIRA requests immediately (no API rate limit threshold like the Megadetector API) so this might make sense to speed up getting results.

Make inference requests more uniform

After tnc-ca-geo/animl-ml#63 is complete.

Image querying bug

After implementing #53, I discovered our approach for merging all of the filters in utils.js buildFilters() had a bug: I was using the spread operator to merge all of the filter objects, but if multiple filter objects had the $or key, the earlier ones would get overwritten by later ones. Not sure how I missed this earlier!

Accomodate image filters for "date added" and "labels"

Treat "empty" predictions from megadetector as an object/label

So that users can validate/invalidate that prediction as well. Set the bounding box to the full image dimensions.

Deploy staging and prod stacks

Related: tnc-ca-geo/animl-frontend#20

Implement testing, CI/CD

Borrow DevSeed's tooling & approach from shp2json? https://github.com/tnc-ca-geo/shp2json/blob/main/upload/test/generic.test.js

Check out Apollo Server's mocking and integration testing documentation: https://www.apollographql.com/docs/apollo-server/testing/testing/

Create ML integrations layer

Implement retroactive inference

E.g., after a user creates an automation rule, review all existing matches in the db, check whether or not they already have predictions from that model, and if not add to queue.

Look into using Dataloader

Look into using DataLoader for batching and caching results of queries

example of using Dataloader with mongoose.

Clean up GraphQL type-defs

follow shopify naming convention for type-defs & resolvers, i.e. [Resource][Operation][Input] (ImageCreateInput rather than CreateImageInput ) for alphabetical grouping
Review schema for enums and explicitly add them to graphql type-defs. see: https://www.apollographql.com/docs/apollo-server/data/resolvers/#resolver-chains
I never have loved the use of "Payload". use "Output" or something like that instead

Flesh out automated email / text notifications more

Right now the alert system (animl-api/src/automation/alerts.js) is just a bare-bones proof of concept.

Send images to Megadetector in batches

Megadetector API can handle 8 images per request.

Transition from Serverless Framework to SAM

Use VPC for connecting to db

https://blog.sarasarya.com/building-lambda-functions-with-mongodb-atlas-via-vpc-peering-459d53b4d3ee

Bug: prevent false duplicate error from being thrown

Currently, we do dupe Image checking when saving to the db by (1) ensuring the _id field, which was generated by a md5 hashing function in animl-ingest, is unique, and (2) by enforcing a unique compound index in MongoDB for images (cameraId + dateTimeOriginal):

ImageSchema.index(
  { cameraId: 1, dateTimeOriginal: -1 },
  { unique: true, sparse: true }
);

This has worked fine the vast majority of the time, however, sometimes two sequential images that were taken in a bursts have the same timestamp b/c their temporal resolution is limited to seconds, not milliseconds. This causes an incorrect duplicate error to be thrown, and the image gets rejected. I think the solution is to remove the unique constraint from the secondary compound index and just rely on the _ids being unique.

Bug: Timezone issue with dateTimeOriginal from Buckeye exif data

The dateTimeOriginal timestamps are in PST and don't have a TZ offset (e.g. 2021:09:25 21:36:12), and there’s no info in the exif data to indicate timezone. So when we cast the string to a date moment assumes they’re in UTC and they get saved to the DB as 2021-09-25T21:36:12.000+00:00, which is the wrong time.

Meanwhile, we don’t convert Created Date or Added Date to local timezone on the frontend, even though added date is correctly created in UTC. Because we don’t parse it as PST when filtering and displaying it, it shows up as the UTC time and looks off.

Not sure what to do here... as a first step I'll compare dateTimeOriginal from other camera makes and see if they do the same.

When we fix this we'll need to do a scripted update of all images in the prod DB.

Add "view" mutation handler

create view schema ("filters" as string or object?, "name" of view)
create mutation handler for creating, updating, and deleting view
create query handler for getting all views

Implement concept of "Deployments"

A "deployment" is a specific camera at a specific location for a certain period of time. Users should not be forced to add "deployments" or set them manually (all cameras should start with a default deployment that doesn't have start/end dates). If a camera is put out in the field and never moved, the default deployment would suffice. But users should also be able to retroactively add & adjust deployments (move the start & end date, change the name, and potentially change permissions).

A first step will be to expand the "Camera" schema & implement revolvers to allow users to add names and other metadata.

Things to think about:

how to define the start/end date in the schema? Should we include a start date? Or just set end dates, and if there's another deployment's end date before it, consider that the start date?
if we start by setting access control on the camera level, how would we move to deployment-level permissions down the road? Deployments are really the natural resource to perform access control on.

Get serverless-offline-sqs working

Figure out whether/where to reject images with missing metadata or unsupported Makes

Right now if you try to upload an image that doesn't have a make, or if the make isn't Reconyx or BuckeyeCam, it fails at getUserSetData() in animl-api/src/api/db/models/utils.js. We should either have a more formal supported make check and fail sooner or be more flexible and accommodate images w/out make / alternative makes.

Implement CSV export

Think about making this work for the ecological / data analysis use case as well as the ML data training use case.

It might be a little tricky with GraphQL... would likely have to stringify the entire csv and return it as one field in the JSON response, and/or potentially convert it to base64.

We'd also have to make some decisions about the structure of the CSV:

only include validated labels? include all validated labels on an object or just the first validated label in the labels array?
create a column for each available label, and for each image row use the number of validated labels present as the label column's value? YES

Indicate inference queue status on images

Add data on what models an image is currently queued up and waiting on inference from to mage schema?

Potential Refactors

Just a list to think about:

graphql-yoga to Apollo Server

Make app idempotent

(capable of handling duplicate messages). It's rare but a sure bet that we will replay SQS messages that have already been processed, resulting in unnecessary inference and duplicate labels. Do the following:

before adding a label, check image to see whether a label from that model has been applied to it already

Write db seeding script

seed models
seed default view w/ automation rule

Add dead-letter-queue

and move messages there that fail a bunch.

Refactor label schema

Use representations of "objects" (in the "megadetector detected an object" sense of the word) as the unit of storage, with arrays of potential labels stored within them. When new label is added (e.g. by MIRA), first see if there's already an object with the same bbox and add it to that, else create new object.

Fix invalid date error handling

Make inference pipeline more efficient

Right now the Megadetector API caps us at 10 requests per 5 minutes. I've talked with Dan Morris at MS AI 4 Earth, and he's totally fine with increasing this for us, I just need to get back to him with our expected usage. So that would be an obvious first step.

However, the whole inference worker is structured around this limitation: in order to not inundate the Megadetector with requests once we've maxed out, I have the inference handler.js function poll SQS for new messages every 5 minutes, pull the first 10 out of the queue, and process request inference on them. There's a bunch of ways this could be improved:

set up a separate queue & worker function for MIRA inference requests that runs inference requests as soon as they hit SQS (we host it so there's not request limit)
send 8 images per request to Megadetector API
host Megadetector ourselves

It's also worth noting that there are really two separate use cases that we need to support but might have different solutions: (1) real time inference from images coming into the system from wireless camera traps, and (2) bulk inference from users uploading images from a hard drive.

Fix label filtering bug

images with non-validated labels within locked objects are still passing the labels filter. For example, if you only have the 'rodent' label selected, it's returning locked objects that have a different label validated (e.g. skunk, bird, etc.) and the rodent label validation = null. There must be something wrong with the query here.

A note on error handling in GraphQL

GraphQL doesn't send back HTTP status codes like a regular REST API would because there is not a 1-to-1 mapping of requests/queries to route handlers (or "resolvers" in graphQL parlance). Single queries can require many resolvers to fire and respond, so errors can occur in many places, multiple errors can be returned at once, and partial data might get returned. Because of this graphQL APIs will almost always return a 200 status, partial data if it can and an array of error messages if they occur, e.g.:

{
  "data": {
    "getInt": 12,
    "getString": null
  },
  "errors": [
    {
      "message": "Failed to get string!",
      // ...additional fields...
    }
  ]
}

I like the approach Apollo Server took in Apollo server 2.0: it returns arrays of errors with an error[i].extensions field that can contain useful info, most importantly a standardized code field. You can also read more about their strategy here.

Unfortunately, the latest graphql-yoga is not running apollo-server 2.0 under the hood, so I had to implement a hacky formatError() function to make the errors look more like [ApolloErrors](https://github.com/apollographql/apollo-server/blob/main/packages/apollo-server-errors/src/index.ts). We'll see how that goes.

Another thing to consider might be to wrap all resolver functions in a helmet() function that catches all db-related errors. See this article for more info.

Duplicate image saving bug

It looks like if animl-ingest doesn’t get a 200 response, it will keep trying the saveImage call to animl-api. When trying to upload p_000309.jpg and p_000310.jpg in tandem, 310 saved successfully, but 309 got stuck I think in the addToAutomationQueue() step, and never returned a response to animal-ingest, causing it to keep trying (it saved multiple 309’s - which also mongo shouldn’t allow - so there seems to be a separate de-dupe issue)
- happened again with p_000536.jpg

Move automation rules to Project level

Currently users set automation rules at the View level, but I think this is somewhat a relic from before we had implemented multi-tenancy, and it also poses issues now that users can configure confidence thresholds & disable classes. For example, if an image belongs to two views, and each view has an automation rule set to request inference from the same ml model but with different class settings, reconciling that would get complicated. We could maybe treat class settings as a front-end filter and always actually request inference with the same default settings, but I think the simpler option is to move all automation rules to the project level & apply them to all views within that project.

It would entail the following:

Move Automation Rules from View to Project schema, migrate existing automation rules in DB to their respective Projects
Update buildCallstack() in scr/automation/utils.js to reflect new structure. This will become much simpler: we will no longer need to check what view an image belongs to or de-dupe rules.
Create new mutation resolvers for creating/updating/deleting automation rules (currently we're just using the updateView() resolver, which we should still keep).
Update frontend

Restrict access to S3 and serve images through API

Right now the images themselves are unprotected: if someone knew an ID they'd be able to request it.

To fix, create a resolver that checks authentication and then reads in the images data from the S3 bucket, encodes it in base 64, and returns it to the front end in a JSON object.

Fix race condition issue when updating objects

There are a couple places where animl-frontend requests two different updates to the same image in rapid succession.

Essentially we've created a race condition, and MongoDB blocks the second update by throwing a versioning error, e.g. : "Error: VersionError: No matching document found for id <_id> version 2". Because often the second request is to set the object.locked property to true, this most commonly manifests as objects not locking after being validated. I'm not yet sure how many times this has occurred. Relevant discussions of the issue:

I think the solution is to use Model.updateOne() or Model.findOneAndUpdate() rather than Model.find() and Model.save() in the Image model's updateObject() function. Good documentation and explaination of the difference between .save() and .updateOne() can be found here, and here.

Things to test:

how to update a field in a nested array (i.e., a single object in an image's array of objects)?
how to update multiple fields at once (because we might have multiple object diffs to set)?

Write unit tests

https://hackernoon.com/better-local-development-for-serverless-functions-b96b5a4cfa8f

Authorization and multi-tenant user management system

The goal is to implement an auth and user management system that supports the following:

Separate, siloed “projects”, in which read access to images and all other permissions are limited to users who are members of of that project
Tiered user roles, to further limit/grant permissions within within each project:
- Product Owner/Super user/admin: Natty/Falk created 2 new projects: one for SCI biosecurity, and one for a large carnivore study on the Dangermond preserve. They can create and add users to those projects and set those users permission levels.
- Project Manager/Project admin: Juli, the NPS Biosecurity manager, (and probably Natty & Falk as well) are admins for the SCI biosecurity project, they can add users, configure inference pipelines and other automation rules, see all images that belong to that project, and edit labels. They can create and edit views. If the data model requires that cameras or base stations be registered to projects, the project admins can do this. They can also upload images directly from their computers.
- Project Contributor: Miles added this one, but I am not sure we need it. Essentially A role permitting image/asset write access.
- Project Member/Project reviewer: Will is an NPS biosecurity intern for the summer, and part of his job is to help review and edit labels - he will need read access to all images within a project and write access to the labels, and he can create views to help him with his review workflow. He can not edit inference pipelines or create new users.
- Project observer: I’m not 100% sure we need this one, but basically a read-only role. Perhaps this is useful for ecologists or researchers outside of our organization who want to consume the reviewed, validated data to build a population model but we don’t necessarily want them editing any labels.

We will utilize Cognito groups to organize both the projects and roles.

Each group name will contain both the project name/id and role, e.g.: animl/sci_biosecurity/project_owner
When a user is authenticated by Cognito, the returned ID token will include a list of all groups (project + role combinations) they are a member of.
The front end parses out the projects and roles the user is a member of, builds list of projects in state to allow a user to navigate between projects so that they know what project they're "logged into" and acting on behalf of.
Front end will also show/hide/disable certain functionality based on role.
All requests to the API will contain the ID token, but we also need to figure out a way to pass in the current group the user is acting on behalf of... perhaps a custom header?
The animl-api resolvers (at the model level) will be responsible for roll gatekeeping operations (making sure the user's role has permission to perform the action)
The siloing of data by project will happen at the DB query/filtering level: every entity/record in the DB will be associated with a project, and we will simply append "project = usersCurrentProjectId" to all queries.
This requires associating all new entities with a project when they're created.
- For manually created records, like a view or automation rule, the project would be whatever project the user is currently "logged into"
- For automatically created image records, users must first "register" that particular camera's serial number with their project. They would use the UI do request to add a new camera, the API would then check to make sure that serial number isn't already associated with a different project, if not it associates it with the user's current project, and all new images coming into the system from that camera will be tagged with that project.
- Future ideas: implement the ability for projects to "release"/"unregister" a serial number in case the camera needs to be moved to a different project, freeing it up for a different project to grab it
- also maybe implement a way for different projects to request access to those cameras images or some subset of them.... so there's a primary owner of each image but maybe a list of other projects that have read-only access or something? not sure how that would work exactly.
- Not sure we need to associate projects w/ label records. Kind of depends if an image can be viewed or owned by more than one project. If no, then we don’t really need to, if yes, then maybe we do, as different projects might have different labeling goals/needs?

tnc-ca-geo / animl-api Goto Github PK

animl-api's People

Contributors

Stargazers

Watchers

animl-api's Issues

Recommend Projects

Recommend Topics

Recommend Org