Giter Club home page Giter Club logo

chowda's People

Contributors

afred avatar foglabs avatar github-actions[bot] avatar mrharpo avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

chowda's Issues

Protect sync button from multiple presses

Tasks

Sync button: hide from non admins

Hide sync button from non admins

Tasks

IngestFlow

Create IngestFlow to ingest all assets from SonyCi

🌶 PEP 621

PEP 621

TLDR: poetry came before the python standard for metadata in pyproject.toml, so they do things in a (now) non-standard way.

This will allow us to make use of python's [project.optional-dependencies], so we can install "extra" dependencies with pip, and not need an extra package manager for normal (non-developer) images.

e.g:

pip install chowda[test]

Todo

PEP 621

Add DB migration tool

Because

As a Developer
I can use a migration tool to

Tasks

In order to

  • have a single tool for creating / setting up database in test, development, and production environments
  • have a production-ready tool for managing DB schema changes without data loss or wholesale DB reset operations
  • avoid concurrency issues in Github CI due to using SQLModel.metadata.create_all()
    ERROR tests/test_app.py - sqlalchemy.exc.IntegrityError: (psycopg2.errors.UniqueViolation) duplicate key value violates unique constraint "pg_type_typname_nsp_index"
    DETAIL:  Key (typname, typnamespace)=(mediatype, 2200) already exists.
    

Done when

Users story is satisfied (check boxes)

Additional context

Most sources say that Alembic is the best tool for SQLAlchemy, and thus also SQLModel.

Automate production migration on deployment

(Copied from #76)

k8s migrations

After some research, a fully automated k8s migration process seems more complicated than we initially anticipated:

Strategy

On deploy:

  • (optional) Backup database
  • Scale down Chowda to 0 pods
  • Perform migrations

On success:

  • Update Chowda deployment image
  • Scale back up to n pods

On failure:

  • Rollback db
  • Scale back up to n pods

Problems

This simplistic strategy has some drawbacks:

  • Downtime for Chowda (not a big problem)
  • Can't be run as init container in Chowda deployment, because there could be multiple instances that try to run

Options

  • bash script
    • Using kubectl
    • Run migrations manually, when needed
  • Metaflow
    • Running in argo, attached to argo-events
    • Needs special permissions to change kube deployments
  • argo-cd

Create ingress

Tasks

Display sync history on Dashboard

As an Admin
When I view the dashboard
I can a list of entries where

  • each entry represents an attempt to sync Chowda with Sony Ci
  • each entry shows the date/time of the attempt, and whether the attempt succeeded or failed
  • entries are ordered by date/time in descending order
  • entries are limited to the most recent 10 attempt.
    In order to determine whether Chowda has recently been fully sync'd with Sony Ci

Alerting Clammer to costs of batch runs

Because

As a Clammer,
When I have defined a batch and I am just about to "Start" it and I am warned that what I am about to do may cost money,
I am given some very rough estimate (either in dollar amount or just relative to the cost of other batches) of how expensive the batch would be to run.
Rationale: Some batches will be larger than others, and pipelines vary a lot in terms of how computationally intensive they are. It would be easy to be about to run a batch that was crazily enormous. It would be very helpful to know how enormous.

Done when

The Chowda interface presents some representation of the expense at the appropriate moment.

Additional context

A rough estimate, or even a bit of information that the Clammer can use to make a rough estimate, would be the most valuable thing.

Change row count values

As a Clammer, when I am looking at a list view
I can select the number of rows displayed: 10, 25, 100, 500, 2000, all
In order to more efficiently navigate the section.

Use session for flash and error msgs to UI

As a developer
I can use the session to temporarily store flash and error messages for UI display
in order to avoid passing them with URL params
which keeps URLs cleaner
and allows flash/error messages to be seen just once

Update ClamsEvents view

Done When

As a Chowda Admin
I can view Clams Events
and for every Media File I can see:

  • which batch the media file is part of
  • which pipeline the file was run through
  • which app within the pipeline was run
  • the resulting status of having run the media file through that app

in order to monitor individual media files as they go through a pipeline

`GUID` Resolution

Tasks

Save most recent row count

As a User, when I select the number of rows to show in a section and then later return to that section,
I can still have selected the number I previously selected (e.g., 100)
In order to not have to change that every time.

Forthcoming: jowilf/starlette-admin#298

Create Batch from existing Batch

Done when

As a Chowda Admin
I can create a new Batch from a subset of items in an existing batch
in order to
re-run failed items
or re-run a batch with some additional items
or re-run a batch that succeeded with different parameters
etc.

Allow Starting a Batch for App Bars Detection

Done when

De-duplicate `MediaFile.assets`

Duplicates

They aren't all duplicates, and the first one isn't always the media.

https://chowda.wgbh-mla.org/admin/media-file/detail/cpb-aacip-00d35d49377

Image

This media file has 12 assets:

  • cpb-aacip-00d35d49377.mxf.sha256 x2
  • cpb-aacip-00d35d49377.mp4 (video) x3
  • cpb-aacip-00d35d49377.mxf.mediainfo.xml x2
  • cpb-aacip-00d35d49377.mp4.mediainfo.xml
  • cpb-aacip-00d35d49377.mxf x2
  • cpb-aacip-00d35d49377_proxy.mp4.mp4
  • cpb-aacip-00d35d49377.mp4.sha256

Tasks

  1. enhancement ➕
    mrharpo

MediaFile -> AAPBAsset

Tasks

Bug: Batches detail - GUID links broken

From a batch detail page
When I click on a media-file GUID
It redirects me to media-file/detail/

It should redirect me to media-file/detail/{{ media.guid }}

Pydantic 2.0

Pydantic 2

Pydantic 2.0 was released last week.

Migration

https://docs.pydantic.dev/latest/migration/

A code transformation tool is available:

pip install bump-pydantic

Tasks

Create batch via GUID list from txt file

Done when
Chowda Batch can be created from a list of GUIDs from uploading a text file

Tasks

Filter MediaFile list by list of guids

Done when

As a Chowda Admin (like Kevin)
When viewing Media Files list interface
I can filter by a list of GUIDs
In order to run exports on my custom GUID list

Abort CLAMS batch

Done when

As a Chowda admin
when a batch is running
I can abort it
in order to recover from problems
and save time and resources.

Tasks

Feature: Add page title to header

As a user, when viewing any page,
I should be able to see the title of that page in the browser title
So I can see my browsing history better.

API: Sync

Create /api/sync to resync all assets from SonyCi

  • Get all assets from SonyCi, update db
  • "Remove" chowda assets that are not in SonyCi
    • Keep db record, flag as deleted in SonyCi

Tasks

Display Pipeline Statuses

Done When

Each batch accurately displays the pipeline run information from Mario.

Statuses are

  • unstarted
  • running
  • successful / failed

Create a batch from a collection

As a clammer, I can:

Tasks

Bug: Batches list page breaks

The /batches/list page errors with:

DataTables warning: table id=dt - Requested unknown parameter 'media_files' for row 0, column 4. For more information about this error, please see http://datatables.net/tn/4

Queuing system in place

SQS set up

Start Batch should add events to the queue, which get pulled off as the cluster is ready for more jobs.

SonyCi Webhook -> Chowda API -> database

When a new asset is added, modified, or deleted in SonyCi,
It should fire a webhook request to chowda.wgbh-mla.org/api/sonyci
And update the chowda database.

Export all rows

As a Clammer
When I am in any section and I choose “Export”
I can get the full export, not limited to the number of rows (e.g., 10, 100, etc.) currently shown In order to have exports suitable for large batches.

User accounts

Done When
As a developer, I can set up user accounts with admin role

Postgres tests

Migrate from SQLite -> Postgres for all environments, including test

Database backups

Create db backups

Tasks

Push IngestFlow to Argo

Add Metaflow code that generates Argo Events Sensor that triggers a Whisper workflow

Tasks

:bug: Batches: Can't update `media_files` list

Batches

When a clammer edits a Batch with existing MediaFiles, we get an Internal Server Error with:

File "/Users/harpo/gbh/clams/chowda/.venv/lib/python3.11/site-packages/sqlalchemy/orm/identity.py", line 151, in add
    raise sa_exc.InvalidRequestError(
sqlalchemy.exc.InvalidRequestError: Can't attach instance <MediaFile at 0x108274b90>; another instance with key (<class 'chowda.models.MediaFile'>, ('cpb-aacip-000043a51d5',), None) is already present in this session.

Collections

This does not happen when editing a Collection with media_files, which uses the starlette-admin default RelatedField

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.