Giter Club home page Giter Club logo

Comments (11)

johnhcasallasl avatar johnhcasallasl commented on July 29, 2024

The Active state is "calculated" in the client side of WMStats, so a direct verification is not possible. However the input data (a huge Json document) received by the client is available via REST I think. The criteria used to mark a run as "active" is to check if any of the workflows is in "new" state. This could be checked from the clean up script.

from t0.

hufnagel avatar hufnagel commented on July 29, 2024

The cleanup script in its current form (cron job) will likely never be fully integrated into the Tier0 repacking status or send any alarms. As such what is asked here wonlt be provided, but we might be able to implement another way to do streamer deletions (from within the Tier0 for instance).

Have to think about it a bit.

from t0.

drkovalskyi avatar drkovalskyi commented on July 29, 2024

I think we must check if it's safe to delete data. If the current tools cannot support that, we have to develop new tools. So let's identified what exactly needs to be done and define a time line.

from t0.

hufnagel avatar hufnagel commented on July 29, 2024

Its desirable. OTOH, unmerged is cleaned up on a 14 day timer. We don't check here whether the data is safe either. And the unmerged space at CERN is in the RAW data path the same way as the streamer buffer.

So it's not like we don't already rely on following procedures in a timely manner...

from t0.

drkovalskyi avatar drkovalskyi commented on July 29, 2024

While I agree that if everything is done in a timely manner everything works fine, I do think we need to protect from a potential unrecoverable data loss by checking if it's safe to delete.

from t0.

hufnagel avatar hufnagel commented on July 29, 2024

Will see what we can do. Just saying that unmerged cleanup is the same and we have no hooks there either into the Tier0 (not do we necessarily want to).

from t0.

drkovalskyi avatar drkovalskyi commented on July 29, 2024

I would say we need the protection for anything that cannot be recovered, i.e. anything that may lead to loss of RAW data. For recoverable data, i.e. all other data tiers and processing types the current system is good enough.

from t0.

hufnagel avatar hufnagel commented on July 29, 2024

RAW goes through unmerged like anything else, therefore it's not recoverable...

from t0.

drkovalskyi avatar drkovalskyi commented on July 29, 2024

Ok, just to make sure there is no misunderstanding we need a system that would prevent deletion of:

  1. streamer files till they are no longer needed to get RAW unmerged files
  2. RAW unmerged files till they are no longer needed to get RAW files
  3. RAW files on EOS till we have a custodial copy on tape.
    It's possible that we are talking about a multiple tools, but they all are part of one main objective: protect unrecoverable data and create back-pressure in Tier-0 processing.

from t0.

hufnagel avatar hufnagel commented on July 29, 2024

First part of this is implemented, I added a run/stream processing completion publication into the Tier0 Data Service. Will take a while before this becomes available in cmsweb though. Once it does I can look at using it in the t0streamer cleanup script.

from t0.

hufnagel avatar hufnagel commented on July 29, 2024

This has been deployed long ago.

from t0.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.