Comments (4)
Hi Nick, thanks for reaching out!
Yes, I agree that the blog post is outdated now and would be happy to make the edit you proposed.
Would you like to open a PR or would you like me to make the edit?
from website.
I think it would be easier if I left the actual PR to you because I'm not sure your formatting/linting/release process. But here is my suggestion as to the content.
I found what I thought was the CSV source data online, but I am getting different results so I think I must have started with slightly different data. If you want to re-run this with the real CSVs, or send me the real CSVs, that would be great!
UPDATE October 2023:
- Duckdb is now a supported backend (along with many more). So performance is going to be very similar to duckdb.
- Directly load/save data
- join(), clip(), and case() are well-supported
- Ibis is much more popular and now very actively maintained. There are more examples, better documentation, and community. Still definitely less than pandas, but perhaps comparable to polars.
import ibis
from ibis import _
ibis.options.interactive = True
flights_ib = ibis.read_csv("flights.csv")
airlines_ib = ibis.read_csv("airlines.csv")
flights_ib
┏━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━┳━━━━━━━━┓
┃ year ┃ month ┃ day ┃ dep_time ┃ dep_delay ┃ arr_time ┃ arr_delay ┃ carrier ┃ tailnum ┃ flight ┃ origin ┃ dest ┃ air_time ┃ distance ┃ hour ┃ minute ┃
┡━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━╇━━━━━━━━┩
│ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ string │ string │ int64 │ string │ string │ int64 │ int64 │ int64 │ int64 │
├───────┼───────┼───────┼──────────┼───────────┼──────────┼───────────┼─────────┼─────────┼────────┼────────┼────────┼──────────┼──────────┼───────┼────────┤
│ 2013 │ 6 │ 30 │ 940 │ 15 │ 1216 │ -4 │ VX │ N626VA │ 407 │ JFK │ LAX │ 313 │ 2475 │ 9 │ 40 │
│ 2013 │ 5 │ 7 │ 1657 │ -3 │ 2104 │ 10 │ DL │ N3760C │ 329 │ JFK │ SJU │ 216 │ 1598 │ 16 │ 57 │
│ 2013 │ 12 │ 8 │ 859 │ -1 │ 1238 │ 11 │ DL │ N712TW │ 422 │ JFK │ LAX │ 376 │ 2475 │ 8 │ 59 │
│ 2013 │ 5 │ 14 │ 1841 │ -4 │ 2122 │ -34 │ DL │ N914DL │ 2391 │ JFK │ TPA │ 135 │ 1005 │ 18 │ 41 │
│ 2013 │ 7 │ 21 │ 1102 │ -3 │ 1230 │ -8 │ 9E │ N823AY │ 3652 │ LGA │ ORF │ 50 │ 296 │ 11 │ 2 │
│ 2013 │ 1 │ 1 │ 1817 │ -3 │ 2008 │ 3 │ AA │ N3AXAA │ 353 │ LGA │ ORD │ 138 │ 733 │ 18 │ 17 │
│ 2013 │ 12 │ 9 │ 1259 │ 14 │ 1617 │ 22 │ WN │ N218WN │ 1428 │ EWR │ HOU │ 240 │ 1411 │ 12 │ 59 │
│ 2013 │ 8 │ 13 │ 1920 │ 85 │ 2032 │ 71 │ B6 │ N284JB │ 1407 │ JFK │ IAD │ 48 │ 228 │ 19 │ 20 │
│ 2013 │ 9 │ 26 │ 725 │ -10 │ 1027 │ -8 │ AA │ N3FSAA │ 2279 │ LGA │ MIA │ 148 │ 1096 │ 7 │ 25 │
│ 2013 │ 4 │ 30 │ 1323 │ 62 │ 1549 │ 60 │ EV │ N12163 │ 4162 │ EWR │ JAX │ 110 │ 820 │ 13 │ 23 │
│ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │
└───────┴───────┴───────┴──────────┴───────────┴──────────┴───────────┴─────────┴─────────┴────────┴────────┴────────┴──────────┴──────────┴───────┴────────┘
(
flights_ib.filter(
[
_.year == 2013,
_.month == 1,
_.arr_delay.notnull(),
]
).join(airlines_ib, "carrier", how="left")
.select(arr_delay=_.arr_delay.clip(lower=0), airline=_.name)
.group_by("airline")
.agg(
flights = _.count(),
mean_delay = _.arr_delay.mean()
)
.order_by(_.mean_delay.desc())
)
┏━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━┓
┃ airline ┃ flights ┃ mean_delay ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━┩
│ string │ int64 │ float64 │
├──────────────────────────┼─────────┼────────────┤
│ Hawaiian Airlines Inc. │ 3 │ 433.333333 │
│ ExpressJet Airlines Inc. │ 401 │ 28.319202 │
│ Frontier Airlines Inc. │ 8 │ 25.125000 │
│ Alaska Airlines Inc. │ 2 │ 20.500000 │
│ Endeavor Air Inc. │ 157 │ 19.082803 │
│ Mesa Airlines Inc. │ 6 │ 17.500000 │
│ Southwest Airlines Co. │ 108 │ 15.833333 │
│ Envoy Air │ 224 │ 13.254464 │
│ JetBlue Airways │ 407 │ 13.147420 │
│ United Air Lines Inc. │ 459 │ 12.111111 │
│ … │ … │ … │
└──────────────────────────┴─────────┴────────────┘
from website.
Thank you! That looks good. I'll update the article this weekend.
from website.
I've updated the article with your suggested changes and added a thanks for the update. The resulting data frame with ibis is the same as with dplyr, pandas etc. so it's all correct.
Thanks again!
from website.
Related Issues (1)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from website.