Giter Club home page Giter Club logo

pnda-guide's Introduction

PNDA Guide

PNDA is a simple, scalable, open big data platform supporting operational and business intelligence analysis for networks and services. This guide provides an overview of PNDA, and will tell you how to set up and use PNDA in your own environment.

Quick Links

This chapter covers the main components of PNDA, including:

  • Data ingress using Logstash, Open Daylight & the bulk ingest tool
  • Data distribution with Kafka & Zookeeper
  • High velocity stream processing with Spark Streaming
  • High volume batch processing with Spark
  • Free form data exploration with Jupyter
  • Structured query over big data with Impala
  • Handling time series with OpenTSDB & Grafana

This checklist will get you started setting up a fully operational PNDA cluster, with data flowing in and out.

This chapter describes how to provision a PNDA cluster, and includes some background information on SaltStack, OpenStack Heat and AWS CloudFormation.

Background information

For a complete list of all technologies brought together by PNDA

The PNDA console provides a real-time overview of all the components in a PNDA cluster. The home page shows health statistics for each component, color-coded by status. Components are grouped into categories, including data distribution, data processing, data storage, applications, etc.

Other pages on the console let you view detailed metrics, deploy packages, run applications, and set data retention policies.

Kafka is the "front door" of PNDA. It handles ingest of data streams from network sources and distributes data to all interested consumers. This chapter covers how to integrate and develop "producers", which feed data into Kafka.

In addition to streaming ingest via Kafka producers, PNDA also provides an offline bulk ingest tool for those who would like to migrate pre-existing data into the PNDA platform.

Kafka has a simple, clean design that moves complexity traditionally found inside message brokers into its producers and consumers. A Kafka consumer pulls messages from one or more topics using Zookeeper for discovery, issuing fetch requests to the brokers leading the partitions it wants to consume. Rather than the broker maintaining state and controlling the flow of data, each consumer controls the rate at which it consumes messages.

Packages are independently deployable units of application layer functionality, and applications are instances of packages. You can use the PNDA console to deploy packages and manage the application lifecycle. The Deployment Manager documentation explains the structure of packages, and the REST API used to deploy them.

Logs from the various component services that make up PNDA, and the applications that run on PNDA, are collected and stored on the logserver node.

Apache Impala is a parallel execution engine for SQL queries. It supports low-latency access and interactive exploration of data in HDFS and HBase. Impala allows data to be stored in a raw form, with aggregation performed at query time without requiring upfront aggregation of data.

The Jupyter Notebook is a web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text. In PNDA, it supports exploration and presentation of data from HDFS and HBase.

OpenTSDB is a scalable time series database that lets you store and serve massive amounts of time series data, without losing granularity. Grafana is a graph and dashboard builder for visualizing time series metrics.

A big data infrastructure like PNDA involves a multitude of technologies and tools, and may be deployed in a multi-tenant environment. Providing enterprise grade security for such system is not only complex, but is of primary concern for any production deployment. If you are implementing a client for a PNDA interface or developing a PNDA application, this chapter will cover some security guidelines that you should adhere to when working with individual components.

The PNDA distribution consists of the following source code repositories and sub-projects:

Provisioning

Platform

Producers

Examples

Documentation

Versions

This guide is the latest version and describes the software found on the pndaproject develop branches.

To refer to guides for specific releases, please navigate to the relevant release tag on github.

pnda-guide's People

Contributors

trsmith2 avatar jeclarke avatar jgarnier avatar jubarbot-cisco avatar klyr avatar jobstfeld avatar johnevansuk avatar pndacicd avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.