Giter Club home page Giter Club logo

bdc's Introduction

Big Data Computing

Big Data phenomenon

  • Technological progress

    • storage capacity
    • communication bandwidth
    • computing power
    • Reduction of ICT costs
  • Digital Universe

    • Integration of digital technologies in every human activity
    • Scientific research (produces a lot of data)
    • Exponential growth of data
  • Data can be either structured (database records) or unstructured (textual data)

Application Domains

  • The analysis of large datasets arises in:
    • Retailing: product improvement, recommandation systems
    • Banking/Finance: fraud detection...
    • Telecommunications: user profiling
    • Science: validation methods
    • Medicine: diagnosis/therapy
    • Social studies: IOT

The Four V's of DATA

  1. Volume
    • size of data poses several computational challenges and requires a data-centric perspective
  2. Velocity
    • the data arrives at such high rate that tey cannot be stored and processed offline, but need to be processed in streaming
  3. Variety
    • large datasets often come unconstructed and may relate to very different scenarios
  4. Veracity
    • large datasets coming form real-word applications are likely to contain noisy, uncerain data
  • All points above require a paradigm shift with respect to traditional computing

Course presentation

Main objectives

  • Novel computing/programming frameworks for big data processing: theory and practice
    • Spark
  • A sample of key primitives for data analysis
    • Rigorous setting (be able to analitically predict what's going to happen)
    • Algorithmic solutions with focus on large inputs

Specific Content

  • Computational Frameworks: MapReduce, Apache Spark
  • Clustering primitives (Professor's focus)
  • Graph analysis primitives
  • Association analysis primitives (Data mining)
  • Data stream processing

Evaluation

  • Written exam (26 points)
  • Homeworks (6+1 points)
    • groups of max 3/4 sudents
    • 4 assignments, one every 2/3 weeks
    • Use of Apache Spark on individual PCs (assignments 1-3) and CloudVeneto (assignment 4)

Online tools

bdc's People

Contributors

mac40 avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.