Giter Club home page Giter Club logo

Comments (23)

jlewi avatar jlewi commented on July 17, 2024 1

Argo v2.2.0 now supports setting a TTLAfterSeconds flag.
https://github.com/argoproj/argo/blob/7864ad36788dc78d035d59ddb27ecd979f7216f4/pkg/apis/workflow/v1alpha1/types.go

We need to do a couple things

  1. Upgrade Argo
  2. Start setting that on our workflows.
  3. Consider adding an admission controller to ensure its set.

from testing.

jlewi avatar jlewi commented on July 17, 2024

Navigating to a specific workflow doesn't seem to work either
http://testing-argo.kubeflow.org/timeline/kubeflow-test-infra/kubeflow-presubmit-tf-serving-image-387-12405f1-511-aec5;tab=workflow

from testing.

jlewi avatar jlewi commented on July 17, 2024

I tried manually deleting the older workflows as follows

Dump all workflows sorted by age to a file

kubectl get wf -a --sort-by='.metadata.creationTimestamp'  > /tmp/wf.list

Manually delete the lines corresponding to workflows I don't want to delete.

Then do

cat /tmp/wf.list | awk '{print $1}' | xargs kubectl delete wf 

from testing.

jlewi avatar jlewi commented on July 17, 2024

Deleting the workflows appears to have worked so it appears to be an issue with the UI not performing well when there are lots of workflows.

/cc @jessesuen @alexmt

from testing.

jessesuen avatar jessesuen commented on July 17, 2024

Argo does not have a GC policy/configuration. It should though.

from testing.

jlewi avatar jlewi commented on July 17, 2024

UI is having problems again.

Currently 697 workflows

wf.list.txt

Delete workflows more than a day old

from testing.

alexmt avatar alexmt commented on July 17, 2024

@jlewi, unfortunately, Argo GC is still not implemented. UI is not performing well since it does not have pagination (because kubernetes api does not have pagination yet)

As a workaround, you might try argo v2.1.0-alpha1 +. UI had been migrated to React and can render much more workflows.

from testing.

jlewi avatar jlewi commented on July 17, 2024

@alexmt Great thank you so much.

from testing.

jlewi avatar jlewi commented on July 17, 2024

1330 workflows

Deleted workflows older than 3 days to make UI more responsive.

from testing.

lluunn avatar lluunn commented on July 17, 2024

deleted 1391 workflows older than 3d

from testing.

jlewi avatar jlewi commented on July 17, 2024

Deleted 307 workflows; older than 3d

from testing.

jlewi avatar jlewi commented on July 17, 2024

1726 workflows
Deleting 1601 workflows older >=3d

from testing.

jlewi avatar jlewi commented on July 17, 2024

Running get wf was having problems on my local machine but when I ran it from within cluster it succeeded. I think its probably because there's so much K8s master traffic.

from testing.

jlewi avatar jlewi commented on July 17, 2024

The number of workflows is currently so large that I'm having trouble getting them and deleting them.

Using labels to subselect some of the workflows seems to help.
Unfortunately we don't necessarily have good labels on most of our workflows.

from testing.

jlewi avatar jlewi commented on July 17, 2024

I deleted the namespace because deleting individual workflows wasn't working.

kubectl delete namespace kubeflow-test-infra

from testing.

jlewi avatar jlewi commented on July 17, 2024

Waiting for the namespace to be deleted. Hopefully that succeeds.

from testing.

jlewi avatar jlewi commented on July 17, 2024

Namespace was successfully deleted.

Now redeploy test infra

kubectl create namespace kubeflow-test-infra
ks apply kubeflow-ci -c argo

ksonnet app is here:
https://github.com/kubeflow/testing/tree/master/test-infra

from testing.

jlewi avatar jlewi commented on July 17, 2024

Argo UI is backup

from testing.

jlewi avatar jlewi commented on July 17, 2024

I had to delete and recreate the nfs component

ks delete kubeflow-ci nfs-external
ks apply kubeflow-ci nfs-external

from testing.

jlewi avatar jlewi commented on July 17, 2024

Will be fixed by #300 which includes a cron job to GC Argo workflows.

from testing.

jlewi avatar jlewi commented on July 17, 2024

See #324 seems like there's a problem with our cron job and we aren't GC'ing the resources.

from testing.

jlewi avatar jlewi commented on July 17, 2024

#267 Issue to monitor cron jobs.

from testing.

jlewi avatar jlewi commented on July 17, 2024

Haven't had problems in a while so closing this issue.

from testing.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.