Comments (9)
-
numpy recap:
reshape
nodig? -
pandas_03_indexing -> eerste exercise (density): not using bootstrap format
-
resample: remove note about older versions
-
biodiversity: occurrenceId columns -> bemerken dat 'reset_index(drop=True)' ook werkt, maar hier
df.index = np.arange(1,..)
omdat we willen dat start bij één -
concat (air quality): The scenario of combining individual data sets with concat from a list is very useful. We should extend this with more options/showcases/alternatives:
- example where information is collected as key/value data in dictionary + pd.DataFrame({dict})
- example with the usage of {'': [], '': [],...} (a dictionary of lists)
- example with the usage of a dictionary of DataFrames
-
end of the notebook -> link to file / main (is this the workflow notebook?)
-
exercise sorted air quality ?
varia:
- ipython greedy completion?
- overzicht cheatsheet figuur : hline/vline + fill_between
from ds-python-data-analysis.
-
%whos
-> list of variable in current namespace
from ds-python-data-analysis.
With respect to the drop_duplicates
action in the bike count data; this should refer to an alternative of
df3 = df3.reset_index().drop_duplicates().set_index("index")
: shoud the comparison actually take the entire row into account? Or should we just check the datetime-index?
The latter as solution would indeed make a shorter version possible, e.g. df3[~df3.index.duplicated()]
. However the result contains 1 record less compared to the entire row comparison (data-point: 2013-11-21 03:40:23 | 1 | 9 | OFF
). Hence, the current solution contains 2 records on that datetime:
@jorisvandenbossche how do we handle this incosistency between both initial data sets??
from ds-python-data-analysis.
The difference is apparently on only dropping based on the index:
>>> df3.reset_index().drop_duplicates().set_index("index").shape
(91831, 3)
>>> df3.reset_index().drop_duplicates(subset=['index']).set_index("index").shape
(91830, 3)
and that is due to a difference in the active column:
>>> df3[df3.index.duplicated(keep=False)].head()
| north | south | active
-- | -- | -- | --
1 | 9 | NaN
1 | 9 | OFF
10 | 10 | NaN
7 | 15 | NaN
14 | 15 | NaN
so only dropping based on the index is more correct anyway
from ds-python-data-analysis.
But on the other hand, with only using the index
, you will not be aware of the differnece in active
for the 2013-11-21 03:40:23 timestamp...
from ds-python-data-analysis.
with respect to numpy recap: reshape
nodig? -> reshape is present in the notebook.
from ds-python-data-analysis.
with respect to numpy recap: reshape nodig? -> reshape is present in the notebook.
Do we use reshape somewhere in the case studies / pandas notebooks?
from ds-python-data-analysis.
We should check while rechecking the notebooks...
from ds-python-data-analysis.
Dus to the new data provided by city of Ghent, we can not use the drop_duplicates
exercise anymore, removing the issue as well.
from ds-python-data-analysis.
Related Issues (20)
- Typo - `pandas_01_data_structures` exercise 2 HOT 1
- Python rehearsal - Change all even positions of matrix AR to the value 30 HOT 2
- Remark - case_1_bike_count HOT 3
- Remark - case1_bike_count - Exercise 7 HOT 1
- Remark - case_1_bike_count - power of groupby HOT 1
- Remark - pandas_02 - Exercise 6 HOT 1
- Notes from December 2021 workshop
- Remark - case2_observation_processing - Exercise 13 HOT 1
- Typo - case2_observation_processing - Exercise 14 HOT 1
- Remark - case2_observation_analysis - Exercise 5
- Remark - case_2_observation_analysis - Exercise 6 HOT 1
- Remark - case2_biodervisty_analysis - Exercise 6&7 HOT 1
- Remark - case2_observation_analysis - Exercise 14 & 15
- Remark - case2_observation_analysis - Exercise 24
- Remark - case_3 - Exercise 7 HOT 1
- Bug - case_4_air_quality_processing - Exercise 6 HOT 1
- Remark - case4_air_quality_analysis - Just above Exercise 6 HOT 1
- Bug - case4_air_quality_analysis13.py
- Remark - case4_air_quality_analysis - Exercise 16
- Update check_environment.py (don't include xlrd)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ds-python-data-analysis.