Giter Club home page Giter Club logo

etl-talend-studio-example's Introduction

etl-talend-studio-example

This is a example of ETL task for CDC using Talend Studio, Java, MySQL


Problem Description

  • 2 schemas Olist and NewMart. The goal is to detect any update in table olist.olist_customers_dataset but only extract updates belong to customer_number listed in NewMart db. Output: customer information in JSON format.

Expected Result Example

  • table newmart.cif_client
client_id
060e732b5b29e8181a18229c7b0b2b5e
  • table olist.olist_customers_dataset
customer_unique_id
060e732b5b29e8181a18229c7b0b2b5e
290c77bc529b7ac935b93aa66c333dc3
  • Today olist.olist_customers_dataset has 2 updates on id 060e732b5b29e8181a18229c7b0b2b5e, 290c77bc529b7ac935b93aa66c333dc3

  • Talend job will detect these updates and convert information of id 060e732b5b29e8181a18229c7b0b2b5e to JSON file with new structure:

[ {
  "customer_number" : "060e732b5b29e8181a18229c7b0b2b5e",
  "oder_info" : [ {
    "order_id" : "5741ea1f91b5fbab2bd2dc653a5b5099",
    "product_id" : "0be701e03657109a8a4d5168122777fb",
    "price" : "259.90",
    "review_id" : "9a6614162d285301aa3ef6de4be75265",
    "review_score" : "5",
    "review_content" : ":Loja responsável"
  }, {
    "order_id" : "98b737f8bd00d73d9f61f7344aadf717",
    "product_id" : "223d34a3d9334039f5ff9511dc044bbb",
    "price" : "246.62",
    "review_id" : "fd0e493eac47b2e64aec60efcb2b3dc2",
    "review_score" : "5",
    "review_content" : ":Produtos de primeira linha."
  } ]
} ]

Environment

  • Talend Studio v7.3
  • MySQL 8.0
  • Talend Studio custom components: tJSONDoc
  • JDK minimum 1.8.0_161

Job Flow

  • Talend Repository layout

Talend Repository Layout

  • Setup environment: import tJSONDoc custom component into Talend Studio. You can download this component from Talend Exchange. The Database ddl file and data is stored in ..\setup_environment. Run job import_data to import data from csv files to database. Talend Import Data

  • Run extract_cif job to extract the client_id list from newmart.cif_client. Output is csv file contains list of client_id to track update in ..output\bk Talend Extract Cif

  • Run extract_customer job to extract a list of customer_unique_id that has update record today and in client_id list. This job uses MD5 to hash the record to compare with yesterday data to detect changes. Talend Extract Customers

  • Run extract_json job to pull all information of customer_unique_id list that has update from Olist db. Convert the returned records into JSON objects and write to file. Talend Extract JSON

etl-talend-studio-example's People

Contributors

sdfdsfsfd avatar icedtea0000 avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.