Giter Club home page Giter Club logo

azure-event-hub-partition-processor's Introduction

Selective Event Stream Processor with Azure Event Hub

Mystique enterprise's stores generate numerous sales and inventory events across multiple locations. To efficiently handle this data and enable further processing, Mystique enterprise requires a solution for data ingestion and centralized storage.

Additionally, Mystique enterprise needs the ability to selectively process events based on specific properties. They are interested in filtering events classified as sale_event or inventory_event. This selective processing helps them focus on relevant data for analysis and decision-making. Below is a sample of their event. How can we assist them?

{
  "id": "743da362-69df-4e63-a95f-a1d93e29825e",
  "request_id": "743da362-69df-4e63-a95f-a1d93e29825e",
  "store_id": 5,
  "store_fqdn": "localhost",
  "store_ip": "127.0.0.1",
  "cust_id": 549,
  "category": "Notebooks",
  "sku": 169008,
  "price": 45.85,
  "qty": 34,
  "discount": 10.3,
  "gift_wrap": true,
  "variant": "red",
  "priority_shipping": false,
  "ts": "2023-05-19T14:36:09.985501",
  "contact_me": "github.com/miztiik",
  "is_return": true
}

Event properties,

{
   "event_type":"sale_event",
   "priority_shipping":false,
}

Can you provide guidance on how to accomplish this?

๐ŸŽฏ Solution

To meet their needs, Mystique enterprise has chosen Azure Event Hub as the platform of choice. It provides the necessary capabilities to handle the ingestion and storage of the data. We can utilize the capture streaming events capability of Azure Event Hub to persist the message to Azure Blob Storage. This allows us to store the data in a central location for archival needs. Additionally, we can leverage the partitioning feature of Azure Event Hub to segregate the events based on specific criteria. This allows us to focus on relevant data for analysis and decision-making.

Event Producer: To generate events, an Azure Function with a managed identity will send them to an event hub within a designated event hub namespace. The event hub is configured with 4 subscriptions, and we can utilize partitions to segregate events based on the event_type property. For instance, sale_events and inventory_events can be directed to different partitions. This partitioning enables us to process events based on specific criteria, allowing us to filter events based on the partition_id. By doing so, we can focus on relevant data for analysis and decision-making.

Event Consumer: To process the incoming events, a consumer function is set up with an event hub trigger, specifically targeting a consumer group. This consumer function efficiently handles and persists these events to an Azure Storage Account and Cosmos DB. To ensure secure and controlled access to the required resources, a scoped managed identity with RBAC (Role-Based Access Control) permissions is utilized. This approach allows for granular control over access to resources based on assigned roles. Furthermore, the trigger connection itself is authenticated using a managed identity. This ensures that only authorized entities can interact with the event hub, enhancing the overall security of the system.

IMPORTANT Note:

  • You should implement sent events to specific streams rather than use partitions to segregate events. This is because the partitioning feature is intended to be used for load balancing purposes and not for data segregation. For more information, see Partitioning in Event Hubs.

  • If you do not intend to process all events and only want to consume a subset of events, it is recommended to avoid using event hub triggers for your function. Instead, opt for a pull mechanism where you can configure event pulling using the SDK (Software Development Kit) of your choice (not covered here). This allows you to selectively retrieve events only from specific partitions, providing more control over the event consumption process.

By leveraging the capabilities of Bicep, all the required resources can be easily provisioned and managed with minimal effort. Bicep simplifies the process of defining and deploying Azure resources, allowing for efficient resource management.

Miztiik Automation - Event Streaming with Azure Event Hub

  1. ๐Ÿงฐ Prerequisites

    This demo, along with its instructions, scripts, and Bicep template, has been specifically designed to be executed in the northeurope region. However, with minimal modifications, you can also try running it in other regions of your choice (the specific steps for doing so are not covered in this context)

  2. โš™๏ธ Setting up the environment

    • Get the application code

      git clone https://github.com/miztiik/azure-event-hub-partition-processor.git
      cd azure-event-hub-partition-processor
  3. ๐Ÿš€ Prepare the local environment

    Ensure you have jq, Azure Cli and bicep working

    jq --version
    func --version
    bicep --version
    bash --version
    az account show
  4. ๐Ÿš€ Deploying the Solution

    • Stack: Main Bicep We will create the following resources

      • Storage Accounts for storing the events
        • General purpose Storage Account - Used by Azure functions to store the function code
        • warehouse* - Azure Function will store the events here
      • Event Hub Namespace
        • Event Hub Stream, with 4 Partitions
          • Even Partitions - 0 & 2 - inventory_Event
          • Odd Partitions - 1 & 3 - sale_event
          • Event Hub Capture - Enabled
            • Events will be stored in warehouse* storage account.
      • Managed Identity
        • This will be used by the Azure Function to interact with the service bus
      • Python Azure Function
        • Producer: HTTP Trigger. Customized to send count number of events to the service bus, using parameters passed in the query string. count defaults to 10
        • Consumer: HTTP Trigger. Customized to receive 5 events from service bus. It is by default configure to receive only inventory_events. You can change the parition id to 1 to receive sale_events
      • Note: There are few additional resources created, but you can ignore them for now, they aren't required for this demo, but be sure to clean them up later

      Initiate the deployment with the following command,

      # make deploy
      sh deployment_scripts/deploy.sh

      After successfully deploying the stack, Check the Resource Groups/Deployments section for the resources.

      Miztiik Automation - Event Streaming with Azure Event Hub Miztiik Automation - Event Streaming with Azure Event Hub Miztiik Automation - Event Streaming with Azure Event Hub

  5. ๐Ÿ”ฌ Testing the solution

    • Trigger the Producer function

      FUNC_URL="https://partition-processor-store-backend-fn-app-004.azurewebsites.net/store-events-producer-fn"
      curl ${FUNC_URL}?count=10

      You should see an output like this,

      {
         "miztiik_event_processed": true,
         "msg": "Generated 10 messages",
         "resp": {
            "status": true,
            "tot_msgs": 10,
            "bad_msgs": 1,
            "sale_evnts": 4,
            "inventory_evnts": 6,
            "tot_sales": 468.46000000000004
         },
         "count": 10,
         "last_processed_on": "2023-05-26T14:47:34.574772"
      }
    • Trigger the Consumer function

      FUNC_URL="https://partition-processor-store-backend-fn-app-004.azurewebsites.net/store-events-consumer-fn"
      curl ${FUNC_URL}

      You should see an output like this,

       {
          "miztiik_event_processed": true,
          "msg": "",
          "count": 5,
          "last_processed_on": "2023-05-26T14:56:31.093886"
       }

      During the execution of this function, a total of 10 messages were produced, with 4 of them being classified as sale_events and 6 of them as inventory_events. Please note that the numbers may vary for your specific scenario if you run the producer function multiple times.

      Additionally, when observing the storage of events in Blob Storage, you will notice that inventory_events are stored in blob prefixes either labeled as 0 or 2, while sale_events are stored in blob prefixes labeled as either 1 or 3.This configuration is a result of setting up the event hub to utilize 4 partitions, with 2 partitions allocated for each event type.

      Miztiik Automation - Event Streaming with Azure Event Hub Miztiik Automation - Event Streaming with Azure Event Hub Miztiik Automation - Event Streaming with Azure Event Hub

  6. ๐Ÿ“’ Conclusion

    In this demonstration, we showcase a streamlined process for event streaming ingestion with Azure Event Hub and stream processing with Azure Functions. This allows for optimized event processing and enables targeted handling of events based on specific criteria, facilitating seamless downstream processing with the use of consumer groups or partition_id filters or event properties.

  7. ๐Ÿงน CleanUp

    If you want to destroy all the resources created by the stack, Execute the below command to delete the stack, or you can delete the stack from console as well

    # Delete from resource group
    az group delete --name Miztiik_Enterprises_xxx --yes
    # Follow any on-screen prompt

    This is not an exhaustive list, please carry out other necessary steps as maybe applicable to your needs.

๐Ÿ“Œ Who is using this

This repository aims to show how to Bicep to new developers, Solution Architects & Ops Engineers in Azure.

๐Ÿ’ก Help/Suggestions or ๐Ÿ› Bugs

Thank you for your interest in contributing to our project. Whether it is a bug report, new feature, correction, or additional documentation or solutions, we greatly value feedback and contributions from our community. Start here

๐Ÿ‘‹ Buy me a coffee

ko-fi Buy me a coffee โ˜•.

๐Ÿ“š References

  1. Azure Docs - Event Hub
  2. Azure Docs - Event Hub - Streaming Event Capture
  3. Azure Docs - Event Hub Python Samples
  4. Azure Docs - Event Hub Explorer Tool
  5. Azure Docs - Event Hub Partitions
  6. Azure Docs - Managed Identity
  7. Azure Docs - Managed Identity Caching
  8. Gitub Issue - Default Credential Troubleshooting
  9. Gitub Issue - Default Credential Troubleshooting

๐Ÿท๏ธ Metadata

miztiik-success-green

Level: 200

azure-event-hub-partition-processor's People

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.