Giter Club home page Giter Club logo

Comments (9)

jorisvandenbossche avatar jorisvandenbossche commented on June 28, 2024
  • numpy recap: reshape nodig?

  • pandas_03_indexing -> eerste exercise (density): not using bootstrap format

  • resample: remove note about older versions

  • biodiversity: occurrenceId columns -> bemerken dat 'reset_index(drop=True)' ook werkt, maar hier df.index = np.arange(1,..) omdat we willen dat start bij één

  • concat (air quality): The scenario of combining individual data sets with concat from a list is very useful. We should extend this with more options/showcases/alternatives:

    • example where information is collected as key/value data in dictionary + pd.DataFrame({dict})
    • example with the usage of {'': [], '': [],...} (a dictionary of lists)
    • example with the usage of a dictionary of DataFrames
  • end of the notebook -> link to file / main (is this the workflow notebook?)

  • exercise sorted air quality ?

varia:

  • ipython greedy completion?
  • overzicht cheatsheet figuur : hline/vline + fill_between

from ds-python-data-analysis.

stijnvanhoey avatar stijnvanhoey commented on June 28, 2024
  • %whos -> list of variable in current namespace

from ds-python-data-analysis.

stijnvanhoey avatar stijnvanhoey commented on June 28, 2024

With respect to the drop_duplicates action in the bike count data; this should refer to an alternative of
df3 = df3.reset_index().drop_duplicates().set_index("index") : shoud the comparison actually take the entire row into account? Or should we just check the datetime-index?

The latter as solution would indeed make a shorter version possible, e.g. df3[~df3.index.duplicated()]. However the result contains 1 record less compared to the entire row comparison (data-point: 2013-11-21 03:40:23 | 1 | 9 | OFF). Hence, the current solution contains 2 records on that datetime:
image

@jorisvandenbossche how do we handle this incosistency between both initial data sets??

from ds-python-data-analysis.

jorisvandenbossche avatar jorisvandenbossche commented on June 28, 2024

The difference is apparently on only dropping based on the index:

>>> df3.reset_index().drop_duplicates().set_index("index").shape
(91831, 3)

>>> df3.reset_index().drop_duplicates(subset=['index']).set_index("index").shape
(91830, 3)

and that is due to a difference in the active column:

>>> df3[df3.index.duplicated(keep=False)].head()
  | north | south | active
-- | -- | -- | --
1 | 9 | NaN
1 | 9 | OFF
10 | 10 | NaN
7 | 15 | NaN
14 | 15 | NaN

so only dropping based on the index is more correct anyway

from ds-python-data-analysis.

stijnvanhoey avatar stijnvanhoey commented on June 28, 2024

But on the other hand, with only using the index, you will not be aware of the differnece in active for the 2013-11-21 03:40:23 timestamp...

from ds-python-data-analysis.

stijnvanhoey avatar stijnvanhoey commented on June 28, 2024

with respect to numpy recap: reshape nodig? -> reshape is present in the notebook.

from ds-python-data-analysis.

jorisvandenbossche avatar jorisvandenbossche commented on June 28, 2024

with respect to numpy recap: reshape nodig? -> reshape is present in the notebook.

Do we use reshape somewhere in the case studies / pandas notebooks?

from ds-python-data-analysis.

stijnvanhoey avatar stijnvanhoey commented on June 28, 2024

We should check while rechecking the notebooks...

from ds-python-data-analysis.

stijnvanhoey avatar stijnvanhoey commented on June 28, 2024

Dus to the new data provided by city of Ghent, we can not use the drop_duplicates exercise anymore, removing the issue as well.

from ds-python-data-analysis.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.