Protect sync button from multiple presses

Tasks

Beta Give feedback

Check time difference from last sync
Set disabled on button if < 15 minutes
Options

Sync button: hide from non admins

Hide sync button from non admins

Tasks

Beta Give feedback

Pass UserToken into template
Check in template for 'admin' in user.roles
Options

IngestFlow

Create IngestFlow to ingest all assets from SonyCi

PEP 621

TLDR: poetry came before the python standard for metadata in pyproject.toml, so they do things in a (now) non-standard way.

This will allow us to make use of python's [project.optional-dependencies], so we can install "extra" dependencies with pip, and not need an extra package manager for normal (non-developer) images.

e.g:

pip install chowda[test]

Todo

PEP 621

Beta Give feedback

Migrate pyproject.toml from poetry to PDM
Update docker build stages
Update pytest action for new install process
Update docs action for new install process
Publish to pypi
Options

Because

As a Developer
I can use a migration tool to

Tasks

Beta Give feedback

create / set up Chowda db for tests
create / set up Chowda db for local development
use of SQLModel.metadata.create_all(engine) is removed
Options

In order to

have a single tool for creating / setting up database in test, development, and production environments
have a production-ready tool for managing DB schema changes without data loss or wholesale DB reset operations

avoid concurrency issues in Github CI due to using SQLModel.metadata.create_all()

ERROR tests/test_app.py - sqlalchemy.exc.IntegrityError: (psycopg2.errors.UniqueViolation) duplicate key value violates unique constraint "pg_type_typname_nsp_index"
DETAIL:  Key (typname, typnamespace)=(mediatype, 2200) already exists.

Done when

Users story is satisfied (check boxes)

Additional context

Most sources say that Alembic is the best tool for SQLAlchemy, and thus also SQLModel.

Dashboard: Sync button

Add sync button to dashboard.

Does not need to work.

Automate production migration on deployment

(Copied from #76)

k8s migrations

After some research, a fully automated k8s migration process seems more complicated than we initially anticipated:

Strategy

On deploy:

(optional) Backup database
Scale down Chowda to 0 pods
Perform migrations

On success:

Update Chowda deployment image
Scale back up to n pods

On failure:

Rollback db
Scale back up to n pods

Problems

This simplistic strategy has some drawbacks:

Downtime for Chowda (not a big problem)
Can't be run as init container in Chowda deployment, because there could be multiple instances that try to run

Options

bash script
- Using kubectl
- Run migrations manually, when needed
Metaflow
- Running in argo, attached to argo-events
- Needs special permissions to change kube deployments
argo-cd
- Attach to resource_hooks: Sync or PreSync

Tasks

Beta Give feedback

Create ingress yml
Point chowda.wgbh-mla.org to cluster
Options

Add database migrations to deployment

Tasks

Beta Give feedback

Add DB migration tool #58

3 of 3
🐋 Dockerfile migrations #77

CD 🏗️ CI 🦾 bug 🐛
Run migrations in production
Automate production migration on deployment #108

maintenance 🔧 old maintenance 🔧
Options

Display sync history on Dashboard

As an Admin
When I view the dashboard
I can a list of entries where

each entry represents an attempt to sync Chowda with Sony Ci
each entry shows the date/time of the attempt, and whether the attempt succeeded or failed
entries are ordered by date/time in descending order
entries are limited to the most recent 10 attempt.
In order to determine whether Chowda has recently been fully sync'd with Sony Ci

Alerting Clammer to costs of batch runs

Because

As a Clammer,
When I have defined a batch and I am just about to "Start" it and I am warned that what I am about to do may cost money,
I am given some very rough estimate (either in dollar amount or just relative to the cost of other batches) of how expensive the batch would be to run.
Rationale: Some batches will be larger than others, and pipelines vary a lot in terms of how computationally intensive they are. It would be easy to be about to run a batch that was crazily enormous. It would be very helpful to know how enormous.

Done when

The Chowda interface presents some representation of the expense at the appropriate moment.

Additional context

A rough estimate, or even a bit of information that the Clammer can use to make a rough estimate, would be the most valuable thing.

Change row count values

As a Clammer, when I am looking at a list view
I can select the number of rows displayed: 10, 25, 100, 500, 2000, all
In order to more efficiently navigate the section.

Batch sort by ID descending

On the Batch list page
Sort the batches by id
Descending order

Use session for flash and error msgs to UI

As a developer
I can use the session to temporarily store flash and error messages for UI display
in order to avoid passing them with URL params
which keeps URLs cleaner
and allows flash/error messages to be seen just once

Dashboard page

Done when

/admin shows Dashboard page

Display mmif better

Done When

Done when mmif is pretty.

Update ClamsEvents view

Done When

As a Chowda Admin
I can view Clams Events
and for every Media File I can see:

which batch the media file is part of
which pipeline the file was run through
which app within the pipeline was run
the resulting status of having run the media file through that app

in order to monitor individual media files as they go through a pipeline

`GUID` Resolution

Tasks

Beta Give feedback

Filter out non-guids
Remove suffixes
Add DB links: MediaFile <--> SonyCiAsset
Update IngestFlow to upsert MediaFile
Add documentation example
Options

Pipelines: hide create features from non-admin

Tasks

Beta Give feedback

BaseModelView Clammers can CRUD
AdminModelView Admins can CRUD
Options

Save most recent row count

As a User, when I select the number of rows to show in a section and then later return to that section,
I can still have selected the number I previously selected (e.g., 100)
In order to not have to change that every time.

Forthcoming: jowilf/starlette-admin#298

Create Batch from existing Batch

Done when

As a Chowda Admin
I can create a new Batch from a subset of items in an existing batch
in order to
re-run failed items
or re-run a batch with some additional items
or re-run a batch that succeeded with different parameters
etc.

Broken URL for media files links

Allow Starting a Batch for App Bars Detection

Done when

Beta Give feedback

Can create a Batch from Media File GUIDs
Creating the batch publishes a app-barsdetection event (or whatever it's named)
Options

De-duplicate `MediaFile.assets`

Duplicates

They aren't all duplicates, and the first one isn't always the media.

https://chowda.wgbh-mla.org/admin/media-file/detail/cpb-aacip-00d35d49377

This media file has 12 assets:

cpb-aacip-00d35d49377.mxf.sha256 x2
cpb-aacip-00d35d49377.mp4 (video) x3
cpb-aacip-00d35d49377.mxf.mediainfo.xml x2
cpb-aacip-00d35d49377.mp4.mediainfo.xml
cpb-aacip-00d35d49377.mxf x2
cpb-aacip-00d35d49377_proxy.mp4.mp4
cpb-aacip-00d35d49377.mp4.sha256

Tasks

Beta Give feedback

IngestFlow: kind=video
IngestFlow: error on multiple media files
📼 Asset check mario#16

enhancement ➕
Options

MediaFile -> AAPBAsset

Tasks

Beta Give feedback

Rename model MediaFile -> AAPBAsset
Rename views
Migrate database
Options

Write CLAMS app results to Chowda database for tracking progress

Done when

As a Chowda Admin
When a media file has been processed by a pipeline
I can see the results including:

batch
pipeline
app
result
link to mmif

In order to check status of of individual media files
and access output mmif

Bug: Batches detail - GUID links broken

From a batch detail page
When I click on a media-file GUID
It redirects me to media-file/detail/

It should redirect me to media-file/detail/{{ media.guid }}

Pydantic 2.0

Pydantic 2

Pydantic 2.0 was released last week.

Migration

https://docs.pydantic.dev/latest/migration/

A code transformation tool is available:

pip install bump-pydantic

Tasks

Beta Give feedback

Read Migration guide
Use bump-pydantic tool (if necessary)
Update models (if necessary)
Pin pydantic version to ~2.0
~~Run alembic migrations~~
Options

Create batch via GUID list from txt file

Done when
Chowda Batch can be created from a list of GUIDs from uploading a text file

Tasks

Beta Give feedback

Overrwite Batch validate
Change media_files to text area
Return validation errors to user
Add tests
Options

Filter MediaFile list by list of guids

Done when

As a Chowda Admin (like Kevin)
When viewing Media Files list interface
I can filter by a list of GUIDs
In order to run exports on my custom GUID list

Abort CLAMS batch

Done when

As a Chowda admin
when a batch is running
I can abort it
in order to recover from problems
and save time and resources.

Tasks

Beta Give feedback

Pass batch.id to event and save as metaflow Parameter
In pipeline, update Chowda db with run.id
Give Chowda permissions to stop argo run
Create action for Batch Abort
Options

Feature: Add page title to header

As a user, when viewing any page,
I should be able to see the title of that page in the browser title
So I can see my browsing history better.

API: Sync

Create /api/sync to resync all assets from SonyCi

Get all assets from SonyCi, update db
"Remove" chowda assets that are not in SonyCi
- Keep db record, flag as deleted in SonyCi

Tasks

Beta Give feedback

Create route /api/sony_ci/sync
Publish event
Return error if event is not published successfully
Test: success
Test: fail
Options

BUG: Sync Now button does not display error correctly

As an Admin
When I go to /admin
And click Sync Now
And there was an error starting the Sync
Then I am redirected back to /admin
And I can see a user-frinedly error message.

Display Pipeline Statuses

Done When

Each batch accurately displays the pipeline run information from Mario.

Statuses are

unstarted
running
successful / failed

Create a batch from a collection

As a clammer, I can:

Tasks

Beta Give feedback

Select multiple collections and create a single batch from their media_files
Select multiple collections and create multiple batches from their media_files
Options

Bug: Batches list page breaks

The /batches/list page errors with:

DataTables warning: table id=dt - Requested unknown parameter 'media_files' for row 0, column 4. For more information about this error, please see http://datatables.net/tn/4

BUG: Dashboard should not error if no Sync has been run yet

To reproduce:

Start Chowda with no IngestFlow having been run yet.
Go to Dashboard page at /admin

Expected result

See a message indicating that no Sync has yet been run

Actual result

Error

Queuing system in place

SQS set up

Start Batch should add events to the queue, which get pulled off as the cluster is ready for more jobs.

SonyCi Webhook -> Chowda API -> database

When a new asset is added, modified, or deleted in SonyCi,
It should fire a webhook request to chowda.wgbh-mla.org/api/sonyci
And update the chowda database.

MediaFile edit page missing batch relationships

When I edit a MediaFile attached to a batch,
the edit page does not include the batch relationship.

Export all rows

As a Clammer
When I am in any section and I choose “Export”
I can get the full export, not limited to the number of rows (e.g., 10, 100, etc.) currently shown In order to have exports suitable for large batches.

User accounts

Done When
As a developer, I can set up user accounts with admin role

Postgres tests

Migrate from SQLite -> Postgres for all environments, including test

🏭 `pydantic-factories` -> `polyfactory`

Because

pydantic-factories has rebranded to polyfactory

Done when

Migrate to new name in all usage files.

Chowda displays status of media file in a batch

Done When

As a clammer, when I go to the view batch page I can see a list of media files with their link and current status within the metaflow run.

Database backups

Create db backups

Tasks

Beta Give feedback

Compare RDS vs internal postgres
Enable backups
Test backups
Options

Push IngestFlow to Argo

Add Metaflow code that generates Argo Events Sensor that triggers a Whisper workflow

Tasks

Beta Give feedback

Push to argo-workflows
Attach to argo-events
Options

:bug: Batches: Can't update `media_files` list

Batches

When a clammer edits a Batch with existing MediaFiles, we get an Internal Server Error with:

File "/Users/harpo/gbh/clams/chowda/.venv/lib/python3.11/site-packages/sqlalchemy/orm/identity.py", line 151, in add
    raise sa_exc.InvalidRequestError(
sqlalchemy.exc.InvalidRequestError: Can't attach instance <MediaFile at 0x108274b90>; another instance with key (<class 'chowda.models.MediaFile'>, ('cpb-aacip-000043a51d5',), None) is already present in this session.

Collections

This does not happen when editing a Collection with media_files, which uses the starlette-admin default RelatedField

wgbh-mla / chowda Goto Github PK

chowda's People

Contributors

Stargazers

Watchers

chowda's Issues

Tasks

Tasks

PEP 621

Todo

PEP 621

Because

Tasks

Done when

Additional context

k8s migrations

Strategy

On deploy:

On success:

On failure:

Problems

Options

Tasks

Tasks

Because

Done when

Additional context

Done when

Done When

Tasks

Tasks

Done when

Done when

Duplicates

Tasks

Tasks

Done when

Pydantic 2

Migration

Tasks

Tasks

Done when

Done when

Tasks

Tasks

Tasks

To reproduce:

Expected result

Actual result

Because

Done when

Tasks

Tasks

Batches

Collections

Recommend Projects

Recommend Topics

Recommend Org