Giter Club home page Giter Club logo

Comments (9)

G-White-ISB avatar G-White-ISB commented on September 14, 2024 1

from etl.

fedorov avatar fedorov commented on September 14, 2024

FYI @wlongabaugh @G-White-ISB

from etl.

G-White-ISB avatar G-White-ISB commented on September 14, 2024

I'm not quite sure what you mean by 'deprecate the content.' Do you mean to push a new commit to the repo with the old scripts deleted and the notebook added in place? That would make sense.

Do we have any protocols worked out for ETL processes in general? I haven't come across anything in our docs. I think ETL code is as critical as any other code if it generates data that may be used in production.

from etl.

fedorov avatar fedorov commented on September 14, 2024

Do you mean to push a new commit to the repo with the old scripts deleted and the notebook added in place? That would make sense.

Something like that. We should decide if we should use notebook as the primary, or have a standalone script.

Do we have any protocols worked out for ETL processes in general? I haven't come across anything in our docs. I think ETL code is as critical as any other code if it generates data that may be used in production.

No, I don't think we have any protocols. Do you have an example of such protocol? What would you want to be determined by that protocol?

Our ETL will partially (mostly?) rely on GHC ETL for extracting metadata into BQ tables. The way I see it, our ETL scripts will just shuffle GHC-generated content around.

We most definitely need to keep track of the code used to do that shuffling in a repository, and ideally we should have code reviews for such critical pieces of code. That's the basics of the protocol I would suggest.

We can/should discuss at the Friday meeting.

from etl.

wlongabaugh avatar wlongabaugh commented on September 14, 2024

ISB-CGC ETL is migrating to small task-oriented scripts to do each ETL operation. After trying to use Jupyter notebooks for ETL, we decided to go to straight scripts driven by a yaml configuration file that can be archived.

from etl.

fedorov avatar fedorov commented on September 14, 2024

I look forward to learn more details about this to understand what it means.

from etl.

G-White-ISB avatar G-White-ISB commented on September 14, 2024

Along with keeping scripts in a repo I would think we would also want to keeps logs of any ETL process, ie these scripts were applied to this table on this date etc. Unfortunately Bill will be away for today's meeting.

from etl.

fedorov avatar fedorov commented on September 14, 2024

On a related note, can we filter out and keep in a log forever all write queries for every table we maintain?

from etl.

fedorov avatar fedorov commented on September 14, 2024

this is not relevant anymore at this point

from etl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.