Comments (12)
2 things to fix in the notebook :
- new version for
url_template
:url_template = "http://climate.weather.gc.ca/climate_data/bulk_data_e.html?stationID=5415&Year={year}&Month={month}&format=csv&timeframe=1&submit=%20Download+Data"
- in
weather_mar2012 = pd.read_csv("data/eng-hourly-03012012-03312012.csv", skiprows=15, index_col='Date/Time', parse_dates=True, encoding='latin1')
, removeheader=True
Sent a PR.
from pandas-cookbook.
also, the encoding='latin1'
should go (at least on Python3)
from pandas-cookbook.
Hello. I tried with the updated codes. But I got an error stating as follows:
File b'data/eng-hourly-03012012-03312012.csv' does not exist
Please kindly help me thanks!
from pandas-cookbook.
At this point (July 2018), the following works in Python3:
In[]: url_template = "http://climate.weather.gc.ca/climate_data/bulk_data_e.html?stationID=5415&Year={year}&Month={month}&format=csv&timeframe=1&submit=%20Download+Data"
and:
In[]: url = url_template.format(month=3, year=2012)
weather_mar2012 = pd.read_csv(url, skiprows=15, index_col='Date/Time', parse_dates=True, encoding='utf-8', header=0)
An important change, apart from the URL itself, is that header
accepts an integer (row number) instead of a boolean.
Because of the encoding change, we need to change this, as well:
In[]: weather_mar2012[u"Temp (°C)"].plot(figsize=(15, 5))
Also, the “Data Quality” column disappeared. This requires tweaks while working with columns.
In[]: weather_mar2012.columns = [ u'Year', u'Month', u'Day', u'Time', u'Temp (C)', u'Temp Flag', u'Dew Point Temp (C)', u'Dew Point Temp Flag', u'Rel Hum (%)', u'Rel Hum Flag', u'Wind Dir (10s deg)', u'Wind Dir Flag', u'Wind Spd (km/h)', u'Wind Spd Flag', u'Visibility (km)', u'Visibility Flag', u'Stn Press (kPa)', u'Stn Press Flag', u'Hmdx', u'Hmdx Flag', u'Wind Chill', u'Wind Chill Flag', u'Weather']
In[]: weather_mar2012 = weather_mar2012.drop(['Year', 'Month', 'Day', 'Time'], axis=1)
In[]:
def download_weather_month(year, month):
if month == 1:
year += 1
url = url_template.format(year=year, month=month)
weather_data = pd.read_csv(url, skiprows=15, index_col='Date/Time', parse_dates=True, header=0)
weather_data = weather_data.dropna(axis=1)
weather_data.columns = [col.replace('\xb0', '') for col in weather_data.columns]
weather_data = weather_data.drop(['Year', 'Day', 'Month', 'Time'], axis=1)
return weather_data
from pandas-cookbook.
At this point (July 2018), the following works in Python3:
In[]:url_template = "http://climate.weather.gc.ca/climate_data/bulk_data_e.html?stationID=5415&Year={year}&Month={month}&format=csv&timeframe=1&submit=%20Download+Data"
and:
In[]:url = url_template.format(month=3, year=2012)
weather_mar2012 = pd.read_csv(url, skiprows=15, index_col='Date/Time', parse_dates=True, encoding='utf-8', header=0)
An important change, apart from the URL itself, is that
header
accepts an integer (row number) instead of a boolean.Because of the encoding change, we need to change this, as well:
In[]:weather_mar2012[u"Temp (°C)"].plot(figsize=(15, 5))
Also, the “Data Quality” column disappeared. This requires tweaks while working with columns.
In[]:
weather_mar2012.columns = [ u'Year', u'Month', u'Day', u'Time', u'Temp (C)', u'Temp Flag', u'Dew Point Temp (C)', u'Dew Point Temp Flag', u'Rel Hum (%)', u'Rel Hum Flag', u'Wind Dir (10s deg)', u'Wind Dir Flag', u'Wind Spd (km/h)', u'Wind Spd Flag', u'Visibility (km)', u'Visibility Flag', u'Stn Press (kPa)', u'Stn Press Flag', u'Hmdx', u'Hmdx Flag', u'Wind Chill', u'Wind Chill Flag', u'Weather']
In[]:weather_mar2012 = weather_mar2012.drop(['Year', 'Month', 'Day', 'Time'], axis=1)
In[]:
def download_weather_month(year, month): if month == 1: year += 1 url = url_template.format(year=year, month=month) weather_data = pd.read_csv(url, skiprows=15, index_col='Date/Time', parse_dates=True, header=0) weather_data = weather_data.dropna(axis=1) weather_data.columns = [col.replace('\xb0', '') for col in weather_data.columns] weather_data = weather_data.drop(['Year', 'Day', 'Month', 'Time'], axis=1) return weather_data
When using the url template and the weather data to compare the temperatures with bikes data, code seems to be not working. I modified url template and made the changes required in later parts, and everything is running well. But when I tried to output first three rows of the data, its showing nothing.
from pandas-cookbook.
Here's the code :
`
getting weather data to look at temps
def get_weather_data(year):
url_template = "http://climate.weather.gc.ca/climate_data/bulk_data_e.html?stationID=5415&Year={year}&Month={month}&format=csv&timeframe=1&submit=%20Download+Data"
# airport station is 5415, hence that was used
data_by_month = []
for month in range(1,13):
url = url_template.format(year=year, month=month)
weather_data = pd.read_csv(url, skiprows=15, index_col='Date/Time', parse_dates=True, encoding='utf-8', header=0)
weather_data.columns = map(lambda x: x.replace('\xb0', ''), weather_data.columns)
# xbo is degree symbol
weather_data = weather_data.drop(['Year', 'Day', 'Month', 'Time'], axis=1)
data_by_month.append(weather_data.dropna())
return pd.concat(data_by_month).dropna(axis=1, how='all').dropna()
weather_data = get_weather_data(2012)
weather_data[:5]
`
from pandas-cookbook.
url_template = "http://climate.weather.gc.ca/climate_data/bulk_data_e.html?stationID=5415&Year={year}&Month={month}&format=csv&timeframe=1&submit=%20Download+Data"
# url_template = 'https://raw.githubusercontent.com/kbridge/weather-data/main/weather_data_{year}_{month}.csv'
url = url_template.format(month=3, year=2012)
weather_mar2012 = pd.read_csv(url, index_col='Date/Time (LST)', parse_dates=True, encoding='utf-8-sig')
Summary:
url_template
is the same as @GillesMoyse posted. That url is too slow to load. You can change it to my mirror at gist.header=True
is removed.skiprows=15
is removed because there is no metadata before the CSV data anymore.index_col
is changed from'Date/Time'
to'Date/Time (LST)'
.encoding
is changed from'latin1'
to'utf-8-sig'
. We need to use the-sig
variant to skip the UTF-8 BOM; otherwise, the first column will contain weird characters
.
from pandas-cookbook.
Before renaming the columns to eliminate °
characters, drop some unexpected new columns first:
weather_mar2012 = weather_mar2012.drop(['Longitude (x)', 'Latitude (y)', 'Station Name', 'Climate ID', 'Precip. Amount (mm)', 'Precip. Amount Flag'], axis=1)
And the renaming code becomes
weather_mar2012.columns = [
u'Year', u'Month', u'Day', u'Time', u'Temp (C)',
u'Temp Flag', u'Dew Point Temp (C)', u'Dew Point Temp Flag',
u'Rel Hum (%)', u'Rel Hum Flag', u'Wind Dir (10s deg)', u'Wind Dir Flag',
u'Wind Spd (km/h)', u'Wind Spd Flag', u'Visibility (km)', u'Visibility Flag',
u'Stn Press (kPa)', u'Stn Press Flag', u'Hmdx', u'Hmdx Flag', u'Wind Chill',
u'Wind Chill Flag', u'Weather']
Column Data Quality
is removed because the new data doesn't contain the column anymore.
This also renames the column Time (LST)
to Time
.
from pandas-cookbook.
No need to drop the column Data Quality
anymore:
-weather_mar2012 = weather_mar2012.drop(['Year', 'Month', 'Day', 'Time', 'Data Quality'], axis=1)
+weather_mar2012 = weather_mar2012.drop(['Year', 'Month', 'Day', 'Time'], axis=1)
from pandas-cookbook.
temperatures.head
is a method now, so you should
-print(temperatures.head)
+print(temperatures.head())
from pandas-cookbook.
Change download_weather_month
to this:
# mirror
# url_template = 'https://raw.githubusercontent.com/kbridge/weather-data/main/weather_data_{year}_{month}.csv'
def download_weather_month(year, month):
url = url_template.format(year=year, month=month)
weather_data = pd.read_csv(url, index_col='Date/Time (LST)', parse_dates=True, encoding='utf-8-sig')
weather_data = weather_data.dropna(axis=1)
weather_data.columns = [col.replace('\xb0', '') for col in weather_data.columns]
weather_data = weather_data.drop([
'Year',
'Day',
'Month',
'Time (LST)',
'Longitude (x)',
'Latitude (y)',
'Station Name',
'Climate ID',
], axis=1)
return weather_data
which was
def download_weather_month(year, month):
if month == 1:
year += 1
url = url_template.format(year=year, month=month)
weather_data = pd.read_csv(url, skiprows=15, index_col='Date/Time', parse_dates=True, header=True)
weather_data = weather_data.dropna(axis=1)
weather_data.columns = [col.replace('\xb0', '') for col in weather_data.columns]
weather_data = weather_data.drop(['Year', 'Day', 'Month', 'Time', 'Data Quality'], axis=1)
return weather_data
from pandas-cookbook.
Sorry I have used this issue as if it is my own memo. But I will be glad if my comments help you.
from pandas-cookbook.
Related Issues (20)
- ImportError: No module named notebook.notebookapp
- Update to Chapter 6
- Unspecified why used .loc()
- Chapter 1: first cell raises FutureWarning about mpl_style HOT 1
- Where is semicolon separated bikes.csv file? HOT 1
- Chapter 1: broken dataframe file is really broken HOT 2
- Chapter 7.3: pandas.DataFrame.sort()
- Dockerfile Out-of-Date
- Feature: Run Examples Online
- Prueba
- readme.md links
- Ch.3 link is out of date on your site
- Is there a known port for Python 3? HOT 2
- Chapter 4 TypeError:unhashable type:'slice' HOT 1
- Chapter 4 TypeError:unhashable type:'slice' HOT 2
- Binder: Could not find a version that satisfies the requirement matplotlib==3.7.1
- Something is wrong here where Binder doesn't work HOT 1
- Broken link on the home page
- Issue with MyBinder Build: pip install command fails with non-zero exit code 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pandas-cookbook.