Giter Club home page Giter Club logo

ku-aiac557-dataacquisitionmanagementsystem's Introduction

KU-AIAC557 Data Acquisition Management System

Kathmandu University Department of Computer Science and Engineering

Subject: Data Acquisition Management System

Course Code: AIAC 557

Level: MTech in AI, Year 1, Semester II

Credit Hours: 3

Type: Elective [Theory + Practical]

Course Description

Course Objective

After completiton of the course, students should be able to

  • acquire data from various sources and ingest them to a data store( data lake, eDW ,data mart, delta lakes)
  • work on cloud ecosystem and be able to complete a certification in on cloud provider (AWS, Google, Azure)
  • Create ETL and ELT pipelines through various data processing (in SQL, DBT, DataFoam (part of bigquery), Spark) for different applications including data visualization and reporting (understand BI concepts)
  • demonstrate understanding of data management issues, data quality, data governance
  • Perform Basics of ML Operations: deploying model in production (batch/). Monitoring the performance the model (DataDrift/ ModelDrift)
    • Orchestration (airflow) and creation of pipeline and be able to handover the pipeline to the operation team taking care of data management aspect such as incident management and such
    • Data governance: creating data catalogue, lineage of data, identifying personal information from data, standard data models, using Open APIs.
  • pull requirement from business stakeholder to build high level design by enterprise architect, solution architect design mid level, data engineer build solution architect

Prerequisites

  • Python Programming : Numpy, Pandas, Matplotlib, REST API, Web Scraping,
  • Linux
  • Git and Github
  • Basic Data Science

Course Evaluation

In-Semester evaluation - 60 marks End-Semester Evaluation - 40 marks

Chapters

  • Introduction to Data Science, Data Engineering and Data Management
  • DIKW Pyramid and its issues
  • Big Data and Big Data Ecosystem
  • Data Lifecycle
  • Data Management Principles and Challenges
  • Data Management Strategy and Frameworks
  • Data Engineering in Data Science (or ML) Lifecycle

Pre-requisites review

  • Important Python Libraries: Numpy, Pandas, Matplotlib, Seaborn to perform EDA
  • REST API, Request and Web Scraping
  • Text processing, extraction and classification
  • Data Science
  • Git and github
  • Linux and Shell Scripting
  • Docker
  • Distributed Systems

Chapter 2: Data Handling [12 Hr]

  • Importance of Data Architecture
  • Lambda and Kappa Architectures
  • Data Architecture concepts and practices
  • Data Engineering Architectures and Pipelines
  • Orchestration
  • Streaming Pipelines
  • Model Deployment
  • Model Monitoring
  • Data Governance
  • Data Security
  • Data Integration and Interoperability
  • Context Management
  • Meta-data Management
  • Data Management Maturity
  • Organizational Change Management

Optional: System Design

  • System Design System Components
  • System Design System Components
  • Scaling Data Systems
  • Distributed System Design
  • System Design Patterns for distributed systems
  • Case Studies

Reference

  • DAMA-DMBOK2 Data Management Book of Knowledge
  • Fundamentals of Data Engineering by Joe Reis, Matt Housley
  • Data Pipelines Pocket Reference by James Densmore
  • Streaming Systems The What, Where, When, and How of Large-Scale Data Processing. by Akidau, Tyler Chernyak, Slava Lax, Reuven
  • Designing Data-Intensive Applications The Big Ideas Behind Reliable, Scalable, and Maintainable Systems by Martin Kleppmann

ku-aiac557-dataacquisitionmanagementsystem's People

Contributors

rojesh-shikhrakar avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.