Giter Club home page Giter Club logo

dynamodbexporter's Introduction

Tinybird DynamoDB Export Utilities

This repository contains example solutions for replicating data out of DynamoDB into Tinybird.

Both solutions, scanToExport and DDBStreamCDC contain their own readme files on usage.

DDBStreamCDC [Recommended]

This solution leverages AWS DynamoDB Export to S3 & AWS DynamoDB Streams functionalities via a Python Lambda to forward both snapshots and changes to a Tinybird Datasource with Keys automatically indexed, which is then deduplicated for the latest values and made available for users.

A simple implementation pathway is provided which may be expanded upon.

More detail in the dedicated readme

scanToExport [Legacy]

This solution uses a simple DynamoDB Scan to export a file to an S3 bucket, and then uses the Tinybird Datasource Replace functionality, implemented as a Python Lambda.

It is simpler in nature than the CDC solution, though it lacks the low latency updates and you may find a Scan is more expensive in practice.

dynamodbexporter's People

Contributors

chaffelson avatar jcontesti-tinybird avatar alrocar avatar

Stargazers

Rajiv Mounguengue avatar Andreas Motl avatar Jesse House avatar  avatar Alasdair Brown avatar gnzjgo avatar Joe avatar

Watchers

Javier Álvarez Medina avatar Sergio Álvarez Leiva avatar Félix López  avatar @snowman avatar

Forkers

daq-tools

dynamodbexporter's Issues

alpha issues

API Endpoint expects trailing slash due to URL Join

We should check that the customer has provided a well-structure Tinybird API endpoint. A regex should suffix.

Hash in PK causes string match failure

A DynamoDB Key may have a # as a field separator in the String, this causes strange behaviour when that field is used in WHERE clauses.

DynamoDB character set to Tinybird character set mapping

DynamoDB allows - in Table names, but Tinybird does not allow them in Datasource Names. We should enforce character remapping when re-using names from source systems.

Relying on user setting DynamoDB Table name in the PITR Export is brittle.

We may be able to extract the name reliably from the export manifest, but should move away from relying on it being in the s3 key.

DDBStreamCDC function does not handle SS sets properly

Issue

The error "Object of type set is not JSON serializable" occurs because the JSON encoder does not know how to handle sets, which can be used in JSON objects (under the "SS" field) in the https://github.com/tinybirdco/DynamoDBExporter/blob/main/DDBStreamCDC/lambda_function.py function.

JSON export fragment example

"companyAssociations": {
    "SS": [
        "01FW81573Y8GAP9ACYHXJK6RYT",
        "01FW81573YVZP29K5RZWJG02EM"
    ]
},

Solution

To handle this it's required to extend the custom JSON encoder to also handle sets by converting them to lists, which are JSON serializable.

import json
from decimal import Decimal
from boto3.dynamodb.types import Binary
import base64

class DDBEncoder(json.JSONEncoder):
    def default(self, o):
        if isinstance(o, Decimal):
            return float(o)
        elif isinstance(o, Binary):
            return base64.b64encode(o.value).decode('utf-8')
        elif isinstance(o, bytes):
            return base64.b64encode(o).decode('utf-8')
        elif isinstance(o, set):
            return list(o)
        return super().default(o)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.