Giter Club home page Giter Club logo

minimally-sufficient-pandas's Introduction

Become an Expert!

If you are looking to become an expert, check out my books:

They are all extremely comprehensive and offer lots of exercises with detailed solutions.

Mininally Sufficient Pandas Guidelines

This notebook contains a summary of all the guidelines in this tutorial along with a list of attributes and methods that provide nearly all of the functionality of Pandas.

Minimally sufficient Pandas

  • is simple, explicit, straightforward, and boring

  • has one obvious way to accomplish a task

  • uses this obvious way every single time

  • is easier to retain in memory

  • is easier to read and debug by yourself an others

  • uses less of the library by eliminating methods that provide no additional functionality

  • avoids Pandas bugs because of less code

  • doesn't rely on being tricky to impress friends

  • makes it easier to use in production

  • Selecting Subsets of Data

    • Select a single column of data with the brackets
    • Do not use dot notation
    • Be explicit and use loc and iloc
    • Never use ix
    • No need to use at or iat
  • Handling the SettingWithCopyWarning

    • Know the three cases when it appears
      • Correct assignment with side effects
      • No assignment
      • Correct assignment without side effects
    • To handle the warning, you will be in one of two scenarios
      • You want to work with a new independent DataFrame - use the copy method
      • You want to work with original DataFrame. Assign data with a single indexer, loc. Avoid chained indexing.
  • Method Duplication

    • Many methods are aliases or provide no extra functionality. Only use one
    • All operators have methods. Only use methods when necessary
    • Always use Pandas methods and not builtin Python functions
  • Say No to apply

    • apply is an automated for loop that passes each column or row to a user-defined function
    • Use apply as a method of last resort
    • Using apply with axis='columns' is one of the slowest operations in all of Pandas
  • Standardizing groupby

    • Know the three components
      • Grouping columns
      • Aggregating columns
      • Aggregating functions
    • Use the syntax `df.groupby('grouping columns').agg({'aggregating column': 'aggregating function'})
  • Handling a MultiIndex

    • A MultiIndex is difficult to make selections and further process
    • I suggest having a single level index
    • Rename the columns and reset the index after a groupby
  • Say no to apply with groupby

    • Can be extremely slow to use apply with groupby
    • Call all methods independent of the group, outside of the custom function
  • Similarity between groupby, pivot_table, crosstab

  • Similarity between melt, pivot, stack, unstack

Minimal set of DataFrame attributes and methods

Below is a short list of DataFrame attributes and methods that allows you maximum coverage of the library.

  • T
  • abs
  • all
  • any
  • append
  • asfreq
  • astype
  • clip
  • columns
  • copy
  • corr
  • count
  • cov
  • cummax
  • cummin
  • cumprod
  • cumsum
  • describe
  • diff
  • drop
  • drop_duplicates
  • dropna
  • dtypes
  • equals
  • expanding
  • fillna
  • groupby
  • head
  • idxmax
  • idxmin
  • iloc
  • index
  • interpolate
  • isin
  • isna
  • loc
  • max
  • mean
  • median
  • melt
  • merge
  • min
  • mode
  • nlargest
  • notna
  • nsmallest
  • nunique
  • pct_change
  • pivot_table
  • plot
  • prod
  • quantile
  • rank
  • rename
  • replace
  • resample
  • reset_index
  • rolling
  • round
  • sample
  • select_dtypes
  • shape
  • shift
  • sort_index
  • sort_values
  • std
  • sum
  • tail
  • to_csv
  • to_sql
  • values
  • var

minimally-sufficient-pandas's People

Contributors

tdpetrou avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.