Giter Club home page Giter Club logo

Comments (4)

psimm avatar psimm commented on June 30, 2024

Hi Nick, thanks for reaching out!

Yes, I agree that the blog post is outdated now and would be happy to make the edit you proposed.

Would you like to open a PR or would you like me to make the edit?

from website.

NickCrews avatar NickCrews commented on June 30, 2024

I think it would be easier if I left the actual PR to you because I'm not sure your formatting/linting/release process. But here is my suggestion as to the content.

I found what I thought was the CSV source data online, but I am getting different results so I think I must have started with slightly different data. If you want to re-run this with the real CSVs, or send me the real CSVs, that would be great!

UPDATE October 2023:

  • Duckdb is now a supported backend (along with many more). So performance is going to be very similar to duckdb.
  • Directly load/save data
  • join(), clip(), and case() are well-supported
  • Ibis is much more popular and now very actively maintained. There are more examples, better documentation, and community. Still definitely less than pandas, but perhaps comparable to polars.
import ibis
from ibis import _

ibis.options.interactive = True

flights_ib = ibis.read_csv("flights.csv")
airlines_ib = ibis.read_csv("airlines.csv")
flights_ib
┏━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━┳━━━━━━━━┓
┃ year  ┃ month ┃ day   ┃ dep_time ┃ dep_delay ┃ arr_time ┃ arr_delay ┃ carrier ┃ tailnum ┃ flight ┃ origin ┃ dest   ┃ air_time ┃ distance ┃ hour  ┃ minute ┃
┡━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━╇━━━━━━━━┩
│ int64 │ int64 │ int64 │ int64    │ int64     │ int64    │ int64     │ string  │ string  │ int64  │ string │ string │ int64    │ int64    │ int64 │ int64  │
├───────┼───────┼───────┼──────────┼───────────┼──────────┼───────────┼─────────┼─────────┼────────┼────────┼────────┼──────────┼──────────┼───────┼────────┤
│  2013 │     6 │    30 │      940 │        15 │     1216 │        -4 │ VX      │ N626VA  │    407 │ JFK    │ LAX    │      313 │     2475 │     9 │     40 │
│  2013 │     5 │     7 │     1657 │        -3 │     2104 │        10 │ DL      │ N3760C  │    329 │ JFK    │ SJU    │      216 │     1598 │    16 │     57 │
│  2013 │    12 │     8 │      859 │        -1 │     1238 │        11 │ DL      │ N712TW  │    422 │ JFK    │ LAX    │      376 │     2475 │     8 │     59 │
│  2013 │     5 │    14 │     1841 │        -4 │     2122 │       -34 │ DL      │ N914DL  │   2391 │ JFK    │ TPA    │      135 │     1005 │    18 │     41 │
│  2013 │     7 │    21 │     1102 │        -3 │     1230 │        -8 │ 9E      │ N823AY  │   3652 │ LGA    │ ORF    │       50 │      296 │    11 │      2 │
│  2013 │     1 │     1 │     1817 │        -3 │     2008 │         3 │ AA      │ N3AXAA  │    353 │ LGA    │ ORD    │      138 │      733 │    18 │     17 │
│  2013 │    12 │     9 │     1259 │        14 │     1617 │        22 │ WN      │ N218WN  │   1428 │ EWR    │ HOU    │      240 │     1411 │    12 │     59 │
│  2013 │     8 │    13 │     1920 │        85 │     2032 │        71 │ B6      │ N284JB  │   1407 │ JFK    │ IAD    │       48 │      228 │    19 │     20 │
│  2013 │     9 │    26 │      725 │       -10 │     1027 │        -8 │ AA      │ N3FSAA  │   2279 │ LGA    │ MIA    │      148 │     1096 │     7 │     25 │
│  2013 │     4 │    30 │     1323 │        62 │     1549 │        60 │ EV      │ N12163  │   4162 │ EWR    │ JAX    │      110 │      820 │    13 │     23 │
│     … │     … │     … │        … │         … │        … │         … │ …       │ …       │      … │ …      │ …      │        … │        … │     … │      … │
└───────┴───────┴───────┴──────────┴───────────┴──────────┴───────────┴─────────┴─────────┴────────┴────────┴────────┴──────────┴──────────┴───────┴────────┘
(
    flights_ib.filter(
        [
            _.year == 2013,
            _.month == 1,
            _.arr_delay.notnull(),
        ]
    ).join(airlines_ib, "carrier", how="left")
    .select(arr_delay=_.arr_delay.clip(lower=0), airline=_.name)
    .group_by("airline")
    .agg(
      flights = _.count(),
      mean_delay = _.arr_delay.mean()
    )
    .order_by(_.mean_delay.desc())
)
┏━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━┓
┃ airline                  ┃ flights ┃ mean_delay ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━┩
│ string                   │ int64   │ float64    │
├──────────────────────────┼─────────┼────────────┤
│ Hawaiian Airlines Inc.   │       3 │ 433.333333 │
│ ExpressJet Airlines Inc. │     401 │  28.319202 │
│ Frontier Airlines Inc.   │       8 │  25.125000 │
│ Alaska Airlines Inc.     │       2 │  20.500000 │
│ Endeavor Air Inc.        │     157 │  19.082803 │
│ Mesa Airlines Inc.       │       6 │  17.500000 │
│ Southwest Airlines Co.   │     108 │  15.833333 │
│ Envoy Air                │     224 │  13.254464 │
│ JetBlue Airways          │     407 │  13.147420 │
│ United Air Lines Inc.    │     459 │  12.111111 │
│ …                        │       … │          … │
└──────────────────────────┴─────────┴────────────┘

from website.

psimm avatar psimm commented on June 30, 2024

Thank you! That looks good. I'll update the article this weekend.

from website.

psimm avatar psimm commented on June 30, 2024

I've updated the article with your suggested changes and added a thanks for the update. The resulting data frame with ibis is the same as with dplyr, pandas etc. so it's all correct.

Thanks again!

from website.

Related Issues (1)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.