Giter Club home page Giter Club logo

data_course's Introduction

Preface

The purpose of this repository is to create a draft of a complete course dedicated to everything data-related:

  1. Data Analysis
  2. Data Engineering
  3. Data Science

I want to create a project-oriented course that will introduce users to large scope of data tasks and tools. We will look both at small quick ad hoc analysis and larger projects requiring multiple diffrent tools and broader skillset to solve.

Who am I?

Hi, my name is Marcin and I have been working with data for 15 years in various positions from analyst to data scientinst to team leader. In those years I have been working with large set of different tools and enviroments, trained many employees and worked on multitude of data-driven projects.

I have master degree (Magister Inżynier) in Computer Science from WWSI specialising in Databases.

I want to solidify my skills in sharing knowledge hence this course existance.

Content of this course

This course will cover large scope of tools that are usefull in day-to-day data tasks that you can encounter in your data-related job.

I will focus on following tools and languages:

  • Python - my language of choice for data dasks, focusing on:
    • Pandas - for data manipulation and analysis
    • Matplotlib / Seaborn - for quick data visualization
    • Beautiful Soup - for web scrapping
    • Scikit-Learn - for machine learning
    • PySpark - for Big Data processing
  • SQL - language used to access and store data in database
  • Bash - linux terminal that we will be using for file managment and writing scripts for automating OS tasks
  • GIT - for code versioning and for team colaboration on project
  • Power BI - for "proper" data visualisation and dashboards for managment

Why Python and not X?

At the end of the day I chose python because I have been working with Python for many years on daily basis and I feel most comfortable teaching. But its more than just that.

Other options

There are other popular choices, mostly R and Julia.

I love R. Its the language I started with when I decided to move from using just Excel. To this day I believe that it is better language for begginers as it focuses on one thing - data and computations on said data. Tidyverse ecosystem - a set of libraries designed to tackle vide range of data tasks developed by Hadley Wickham - is IMHO much more user friendly than python data stack.

If this is Your first introduction to anything programming related it is a good choice. Even recently published Google Data Analytics Professional Certificate use R as language of choice.

Personally I havent used Julia (yet). It is considered to be faster than Python so for situations when performance is key it can be better choice, while at the same time it is considered to be easy to use (while not as easy as Python or R).

So why Python?

Python is still king when it comes to overall popularity and therefore it have huge community behind it. If we look at google trends for past 5 years there is no competition.

google trends for python, R and Julia

And it is easy. Syntax is easy to learn and you dont have to worry about types of variables and memory managment - its all handled for you behind the scene.

But "easy" does not mean "simple". Its fully fleged multi-purpose programming language that is being used worldwide for apps as small as simple automation scripts to huge web platforms like Youtube. It has wide array of free and open source libraries available that cover wide scope of different tasks that you may need to solve when working with data: from loading data to performing simple analysis to building neural networks. It has huge community allowing you to quickly troubleshoot most problems you will enounter with a single google search. Its the third most popular programming language in the world according to 2021 Stack OVerflow Survey.

At the end of the day language you use is a choice. It's just a tool. If a tool You choose is working for uou thats fine. You can try to follow this course using different language as I will be focusing more on a task we are trying to acomplish instead of teaching you list of commands to type, but it will be up to you to research how to complete those tasks using different language.

data_course's People

Contributors

tetlanesh avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.