<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<input type="checkbox" id="" disabled=""

Dus to the new data provided by city of Ghent, we can not use the <code class="notrans

Collection of errors / suggestions about ds-python-data-analysis HOT 9 CLOSED

jorisvandenbossche commented on June 28, 2024

Collection of errors / suggestions

from ds-python-data-analysis.

Comments (9)

jorisvandenbossche commented on June 28, 2024

numpy recap: reshape nodig?
pandas_03_indexing -> eerste exercise (density): not using bootstrap format
resample: remove note about older versions
biodiversity: occurrenceId columns -> bemerken dat 'reset_index(drop=True)' ook werkt, maar hier df.index = np.arange(1,..) omdat we willen dat start bij één
concat (air quality): The scenario of combining individual data sets with concat from a list is very useful. We should extend this with more options/showcases/alternatives:
- example where information is collected as key/value data in dictionary + pd.DataFrame({dict})
- example with the usage of {'': [], '': [],...} (a dictionary of lists)
- example with the usage of a dictionary of DataFrames
end of the notebook -> link to file / main (is this the workflow notebook?)
exercise sorted air quality ?

varia:

ipython greedy completion?
overzicht cheatsheet figuur : hline/vline + fill_between

from ds-python-data-analysis.

stijnvanhoey commented on June 28, 2024

%whos -> list of variable in current namespace

from ds-python-data-analysis.

stijnvanhoey commented on June 28, 2024

With respect to the drop_duplicates action in the bike count data; this should refer to an alternative of
df3 = df3.reset_index().drop_duplicates().set_index("index") : shoud the comparison actually take the entire row into account? Or should we just check the datetime-index?

The latter as solution would indeed make a shorter version possible, e.g. df3[~df3.index.duplicated()]. However the result contains 1 record less compared to the entire row comparison (data-point: 2013-11-21 03:40:23 | 1 | 9 | OFF). Hence, the current solution contains 2 records on that datetime:

@jorisvandenbossche how do we handle this incosistency between both initial data sets??

from ds-python-data-analysis.

jorisvandenbossche commented on June 28, 2024

The difference is apparently on only dropping based on the index:

>>> df3.reset_index().drop_duplicates().set_index("index").shape
(91831, 3)

>>> df3.reset_index().drop_duplicates(subset=['index']).set_index("index").shape
(91830, 3)

and that is due to a difference in the active column:

>>> df3[df3.index.duplicated(keep=False)].head()
  | north | south | active
-- | -- | -- | --
1 | 9 | NaN
1 | 9 | OFF
10 | 10 | NaN
7 | 15 | NaN
14 | 15 | NaN

so only dropping based on the index is more correct anyway

from ds-python-data-analysis.

stijnvanhoey commented on June 28, 2024

But on the other hand, with only using the index, you will not be aware of the differnece in active for the 2013-11-21 03:40:23 timestamp...

from ds-python-data-analysis.

stijnvanhoey commented on June 28, 2024

with respect to numpy recap: reshape nodig? -> reshape is present in the notebook.

from ds-python-data-analysis.

jorisvandenbossche commented on June 28, 2024

with respect to numpy recap: reshape nodig? -> reshape is present in the notebook.

Do we use reshape somewhere in the case studies / pandas notebooks?

from ds-python-data-analysis.

stijnvanhoey commented on June 28, 2024

We should check while rechecking the notebooks...

from ds-python-data-analysis.

stijnvanhoey commented on June 28, 2024

Dus to the new data provided by city of Ghent, we can not use the drop_duplicates exercise anymore, removing the issue as well.

from ds-python-data-analysis.

Recommend Projects

Collection of errors / suggestions about ds-python-data-analysis HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent