hmcuesta / pda_book Goto Github PK

View Code? Open in Web Editor NEW

157.0 19.0 148.0 1.71 MB

Code Examples Data Science using Python

Python 70.06% JavaScript 29.94%

pda_book's Introduction

PDA_Book

Practical Data Analysis Book - Code Examples

Transform, model, and visualize your data through hands-on projects, developed in open source tools

Overview

Explore how to analyze your data in various innovative ways and turn them into insight Learn to use the D3.js visualization tool for exploratory data analysis Understand how to work with graphs and social data analysis Discover how to perform advanced query techniques and run MapReduce on MongoDB In Detail

Plenty of small businesses face big amounts of data but lack the internal skills to support quantitative analysis. Understanding how to harness the power of data analysis using the latest open source technology can lead them to providing better customer service, the visualization of customer needs, or even the ability to obtain fresh insights about the performance of previous products. Practical Data Analysis is a book ideal for home and small business users who want to slice and dice the data they have on hand with minimum hassle.

Practical Data Analysis is a hands-on guide to understanding the nature of your data and turn it into insight. It will introduce you to the use of machine learning techniques, social networks analytics, and econometrics to help your clients get insights about the pool of data they have at hand. Performing data preparation and processing over several kinds of data such as text, images, graphs, documents, and time series will also be covered.

Practical Data Analysis presents a detailed exploration of the current work in data analysis through self-contained projects. First you will explore the basics of data preparation and transformation through OpenRefine. Then you will get started with exploratory data analysis using the D3js visualization framework. You will also be introduced to some of the machine learning techniques such as, classification, regression, and clusterization through practical projects such as spam classification, predicting gold prices, and finding clusters in your Facebook friends' network. You will learn how to solve problems in text classification, simulation, time series forecast, social media, and MapReduce through detailed projects. Finally you will work with large amounts of Twitter data using MapReduce to perform a sentiment analysis implemented in Python and MongoDB.

Practical Data Analysis contains a combination of carefully selected algorithms and data scrubbing that enables you to turn your data into insight.

What you will learn from this book

Work with data to get meaningful results from your data analysis projects Visualize your data to find trends and correlations Build your own image similarity search engine Learn how to forecast numerical values from time series data Create an interactive visualization for your social media graph Explore the MapReduce framework in MongoDB Create interactive simulations with D3js Approach

Practical Data Analysis is a practical, step-by-step guide to empower small businesses to manage and analyze your data and extract valuable information from the data

Who this book is written for

This book is for developers, small business users, and analysts who want to implement data analysis and visualization for their company in a practical way. You need no prior experience with data analysis or data processing; however, basic knowledge of programming, statistics, and linear algebra is assumed.

pda_book's People

Stargazers

Watchers

Forkers

megaroge milti mattygyo arcolife ike-okonkwo imxiaohui clausia iguanajazz wavelets ashok98 dhritimanh dougneedham rahmanhayes borjaeg astro1860 madarah nguyennhatnam krishnatray gchoi sarathrajalvs tkelleyireland lehiboux sxfmol zhlfree raymondeng godsallen bolice waekh alwaysanirudh bertomartin liuyun1217 gyawtun chaowu2009 andrewyiannaki honglongwu araymund acaballero2010 bewithjitendrapatel surajvv12 avinashdevicode nlk1934 visgean rchaitanyapradeep linshifei chansonz caotianwei mfreyeso linfengzhou morehooks hk-zhang jiangzhw maomaoto yaojenkuo ganwy edcmartin tjphilpot jenkinsliu elviswf charany1 kemele antakasa mikejsullivan rcbull kvp246 vikinghu shiwuyisheng adi-bec yan92 leetschau aslanunal books-source-codes botheredbybees spacewardtortoise flyingclouds2015 liao1995 swimsweet rms15 bittergreen ishareone kinnevo simongong924 leeaandrob rtbarber zousss anhnguyendepocen deadflowers fox1313 ammy2020 strongdan riccheng yaozizhenjun 80kr robertour pburns216 hddata marcelomiky danielscarvalho pravinsham auchanan gangzhuzi

pda_book's Issues

Pie Chart is not shown in the browser

Hello sir, I am reading your newly published book. And tried some codes in Chapter 3 on data visualization using D3. I followed the instruction in constructing the D3 Pie chart, but when I run it in my local server, nothing is shown except the header I made. I am new to JavaScript, so I wasn't able to fix this. And by the way, I was not successful in plotting the Bar chart too, I tried to use your codes in this repository and renamed sumPokemon.csv to pokemonByType.csv, but still nothing happens. I was hoping if you could help me with this, so that I can proceed with the other charts. Thanks, below is the code.

<!DOCTYPE html>
<html>
<head>My first D3 Pie Chart</head>
<style>

body {
  font: 16px ARIAL;
}
</style>
<body>
<script src="d3.v3.min.js"></script>
<script>
var w = 1160,
    h = 700,
    radius = Math.min(w, h) / 2;

var color = d3.scale.ordinal()
    .range(["#04B486", "#F2F2F2", "F5F6CE", "#00BFFF"]);

var arc = d3.svg.arc()
    .outerRadius(radius - 10)
    .innerRadius(0);

var pie = d3.layout.pie()
    .sort(null)
    .value(function(d){return d.amount;});

var svg = d3.select("body").append("svg")
    .attr("width", w)
    .attr("height", h)
    .append("g")
    .attr("transform", "translate("+w/2+","+h/2+")");

d3.csv("sumPokemon.csv", function(error, data){
data.forEach(function(d){
d.amount = +d.amount;
});

var g = svg.selectAll(".arc")
    .data(pie(data))
    .enter().append("g")
    .attr("class", "arc");

g.append("path")
    .attr("d", arc)
    .style("fill", function(d){return color(d.data.type);});
g.append("text")
    .attr("transform", function(d){return "translate("+ arc.centroid(d)+")";})
    .attr("dy", ".60em")
    .style("text-anchor", "middle")
    .text(function(d){return d.data.type;});
});
</script>
</body>
</html>

RegEx date check (Chapter2)

RegEx.py outputs valid for date 13/01/2013 (mm/dd/yy) ..which shouldn't be the case.

I mean, the logic is wrong for regex. (I modified the variable to take input at run-time)

where is datasets in char05

Chapter 2, WebScraping.py

The structure of the HTML on gold.org has changed. This illustrates the danger of webpage screping, but it also breaks the example given in WebScraping.py.
In order to make things difficult, there are no 'id' attributes on the HTML elements with the prices now.

The result is an error: IndexError: list index out of range.

Changing the line where price is determined to:

price= scraping.findAll("dd",attrs={"class":"value"})[0].text

seems to work.

It might be useful to add that the output file is buffered, so it will take some time before something appears in it.

Chapter5/DTW_Implementation.py. you only take the first pixel of every row

Chapter5/DTW_Implementation.py:

for fn in range(1,658):
    img = Image.open("ImgFolder\\{0}.jpg".format(fn))
    arr = array(img)
    list = []
    for n in arr: list.append(n[0][0])
    for n in arr: list.append(n[0][1])
    for n in arr: list.append(n[0][2])
    data[fn] = list

you are taking only the first pixel of every row, and ignore all the rest.
it does not make sense.

Also, in your book, Chapter 5, "Image similarity search", you say:

The trick is to turn the pixels of the image into a numerical sequence, as is shown in the following figure:

and you show a matrix with dij values (original image), and a vector with vi, but you don't explain how you compute those vi.

Question on Chapter 8

In your book, chapter 8, you mention:

However, to visualize all the possible features' combination we will need the binomial coefficient of the number of features. In this case, 13 features are equal to 78 different combinations. Due to this, it is mandatory to perform dimensionality reduction.

We would need to see in 13 dimensions to visualize 13 features. We can only see in 3 dimensions, so we need to perform dimensionality reduction.
So, what where does this 78 number come from, and what does it mean?

Chapter4, NaiveBayes.py, invalid formula

In Chapter4, NaiveBayes.py, you have:

    for c in c_categories:
        prob_c = float(c_categories[c])/float(c_texts)
        words = list_words(subject_line)
        prob_total_c = prob_c
        for p in words:
            if p in c_words:
                prob_p= float(c_words[p][c])/float(c_tot_words)
                prob_cond = prob_p/prob_c
                prob =(prob_cond * prob_p)/ prob_c
                prob_total_c = prob_total_c * prob

so, this results in prob = (prob_b/prob_c)^2, which is incorrect.
I guess that you should use update prob_total_c not with prob, but with prob_cond, or even prob_p.

hmcuesta / pda_book Goto Github PK

pda_book's Introduction

PDA_Book

pda_book's People

Stargazers

Watchers

Forkers

pda_book's Issues

Pie Chart is not shown in the browser

RegEx date check (Chapter2)

where is datasets in char05

Chapter 2, WebScraping.py

Chapter5/DTW_Implementation.py. you only take the first pixel of every row

Question on Chapter 8

Chapter4, NaiveBayes.py, invalid formula

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent