tomaugspurger / effective-pandas Goto Github PK
View Code? Open in Web Editor NEWSource code for my collection of articles on using pandas.
Home Page: https://leanpub.com/effective-pandas
License: Creative Commons Attribution 4.0 International
Source code for my collection of articles on using pandas.
Home Page: https://leanpub.com/effective-pandas
License: Creative Commons Attribution 4.0 International
@TomAugspurger, I found pandas-dev/pandas#22509 by chasing down an error in part 2. You use the broken form of groupby().transform('rank')
in "Chaining methods", to create this graph:
If you use dep_time.rank()
instead you get the correct result
Github just says there was a problem with the file, Jupyter provides the following error message:
Unreadable Notebook:
..\modern_2_method_chaining.ipynb NotJSONError('Notebook does not appear to be JSON: '\n\n\n\n\n<html lang="en...',)
To setup the same flight data in part 1, it calls for the following to set up the data variable; however, I can't find the text file ('modern-1-url.txt') to pull the data set. Where is this text file or what is in it?
with open('modern-1-url.txt', encoding='utf-8') as f:
data = f.read().strip()
These all tutorial are superbly great, and the author has put lots of efforts on them. The code might have been running excellent on 2016 but now (at 2019 Feb),
some of the codes of chapter_4_tidy_data fails.
example:
g = sns.FacetGrid(tidy, col='team', col_wrap=6, hue='team', size=2)
g.map(sns.barplot, 'variable', 'rest');
Gives:
imagur link
rest.unstack()
.query('away_team < 7')
.rolling(7)
.mean()
Gives all NANS and plot fails.
The merge (many-to-one) at the end of the third notebook results in an empty data frame, because the weather data is for 2014 and the flights data for 2017. Your results show flight data for 2014, so I imagine you may be using a different dataset.
This may also have to do with the source data being changed; I also noticed that the underscores are removed from the flight data set, e.g. fl_date has become flightdate and unique_carrier has become uniquecarrier.
p.s. Thanks for sharing your well-written code and insights into pandas, they are a very welcome and useful read!
Not able to find enough resolution on this issue, so I decided to post (relatively new to GitHub, so apologies in advance if I'm doing something wrong).
Operating system: Windows 10
IDE: VS Code (1.37.0)
Python: 3.7.3
Distribution: Anaconda
ModuleNotFoundError Traceback (most recent call last)
in
----> 1 import pandas_datareader
ModuleNotFoundError: No module named 'pandas_datareader
I think another option is to install a previous version of pandas....but at this point, I'm not entirely sure what to do. I wanted to check before I take any further steps.
Thanks
Arguably would have been nicer to see
from pandas_datareader import fred
at the top with other imports, so I could see dependencies more easily.
Feel free to close with no change of course.
The last cell of notebook 1 is throwing an exception, along with the message: UnsortedIndexError: 'MultiIndex slicing requires the index to be lexsorted: slicing on levels [4], lexsort depth 3'. I don't understand this, since cell 13 seems like it should be sorting the index.
python 3.7, pandas 0.23.4 and 0.24.1.
Incidentally, upgrading to 0.24.1 also broke cell 6, which crashes with KeyError: "['tail_num'] not in index" (which I don't understand at all).
All of that said, I've now learned about IndexSlice, which is truly awesome! Thanks.
I am running Windows 10 64 bit, python 3.6, spyder. I have both pandas_datareader and pandas_datareader-0.5.0.dist-info installed in the same site packages as pandas. When I attempt to import the module I receive the following error message:
Python 3.6.2 |Anaconda, Inc.| (default, Sep 19 2017, 08:03:39) [MSC v.1900 64 bit (AMD64)]
Type "copyright", "credits" or "license" for more information.
IPython 6.1.0 -- An enhanced Interactive Python.
import pandas as pd
import pandas_datareader as pdr
Traceback (most recent call last):
File "", line 1, in
import pandas_datareader as pdr
ModuleNotFoundError: No module named 'pandas_datareader'
Who do I import the pandas_datareader via another method or have python recognize the module?
When executing the cell with file download from the Transtats website, we do not download a zip file but an HTML page containing:
<head>
<script type="text/javascript" src="js/dot_ostr_analytics.js"></script>
</head>
<body>
start time ==> 5:59:28 PM<br>complete time==> 5:59:28 PM
</body></html>
Can you precise which table we should download from the government website?
Hi,
I've just tried to use the first notebook in the series and it turns out the data is downloaded to a file called "flights.csv" which is then opened as "flights.csv.zip".
Suggestion: correct save filename to "flights.csv.zip". Might be especially useful for beginners...
Best regards,
Florian
This material appears to have outlived the data sources used. For example https://www.transtats.bts.gov/DownLoad_Table.asp doesn't work, and the weather data isn't working for me either. Perhaps you (or anyone else) might upload copies of the data to this repo if you still have them?
when running cell 2 of the first notebook I get:
SSLError: HTTPSConnectionPool(host='www.transtats.bts.gov', port=443): Max retries exceeded with url: /DownLoad_Table.asp?Table_ID=236&Has_Group=3&Is_Zipped=0 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1123)')))
It looks a problem related to the connection to the API.
Would it be possible to include as part of the repo the data to work with?
Thanks
See conda-forge/feather-format-feedstock#1 for a hint on this. Installation is at best problematic - and I found it impossible.
I worked as follows:
Comment out all the following
import feather
%load_ext rpy2.ipython
%%R
suppressPackageStartupMessages(library(ggplot2))
library(feather)
write_feather(diamonds, 'diamonds.fthr')
And then replace
import feather
df = feather.read_dataframe('diamonds.fthr')
df.head()
with:
from ggplot import diamonds
// type(diamonds) # dataframe...
df = diamonds # primitive!
df.head()
There is one much more mundane issue, which I'll raise separately
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.