pola-rs / polars-book Goto Github PK
View Code? Open in Web Editor NEWBook documentation of the Polars DataFrame library
License: MIT License
Book documentation of the Polars DataFrame library
License: MIT License
How to create new columns in polars is not very intuitive when coming from pandas.
Hence I would recommend to add the content of this stack overflow page to the Coming from Pandas guide.
https://stackoverflow.com/questions/72245243/polars-how-to-add-a-column-with-numerical/72245435#72245435
Especially the part on how to add a list:
my_list = [10, 20, 30, 40, 50]
df.with_column(pl.Series(name="col_list", values=my_list))
This would fit well into the Column assignment section.
In chapter Coming from Pandas, would you consider to include an example comparing manipulation with custom data types? The point here is to highlight that polars is as handy as pandas when dealing with different data types.
For instance
data = {"x": [1, 2, 3], "d": [{"k": True}, {"k": False}, {"k": True, "l": [1, "2"]}]}
Pandas
(
pd.DataFrame(data)
.assign(y=lambda df: df.apply(lambda row: [row["x"], len(row["d"])], axis=1))
.loc[lambda df: df["d"].apply(lambda d: d["k"])]
)
Polars
(
pl.DataFrame(data)
.with_column(pl.map(["x", "d"], lambda ls: pl.Series([[x, len(d)] for x, d in zip(*ls)])).alias("y"))
.filter(pl.col("d").map(lambda s: pl.Series([d["k"] for d in list(s)])))
)
In a few of the contexts subsections, like this one (and the ones below) the code snippet is not showing for me.
I think the line number needs to be changed (to 4) here
and elsewhere, but can't run to check right now.
At least the following sections seem to have broken/missing images (either a tiny square image or a plain old 404):
I assume the link should point here
Will create a PR to fix
There is no mention of concat
or join
in the current book. Propose adding a section to howcani with a first version of this
This issue covers some of the feedback I have as a new (and currently limited) user of Polars. There's a lot to unpack for these proposals, so I'll start with some context for a foundation.
I'm a long time Pandas user who's worked on projects ranging from discovery work to implementing packaged compute pipelines in Python. I love what Pandas can do, but not how it's designed and often used. This is a major factor behind why I've grown interested in Polars and hope to get more involved to help polish and promote it as the viable DataFrame library it is.
Before listing my feedback I want to establish some core assumptions behind my perspective.
First I believe that (1) as an onboarding user the onus is on you to learn the library. (2) Sufficient documentation isn't absolutely necessary to learn Polars, but it will scale Polars' ability to onboard new users (in turn hopefully surrounding the project with more support), and thus should be prioritized within the context of documentation. (3) The API design is a primary feature of a DataFrame library and is a great selling point for a library like Polars. (4) Some people are thorough and will read through the documentation entirely, while others will look for quicker ways to get hands on. IME some people are just hands on and once you get them started it's easier to guide them through more details. It's really difficult to incentivize these users to read through the docs in their entirety without being more creative with the presentation/flow of the docs. FWIW I started out by reading through the entire user guide.
I'm putting this issue together to gauge interest. Please feel free to tear this apart.
The walkthrough could be composed of the following:
This gives the Cookbook a chance to provide surface-level explanations for certain decisions Polars makes through a relatable medium that includes pointers, recommendations, or just links to more relevant content. To me this is closer to actually guiding users through their onboarding rather than just giving them a topic-by-topic guide.
Edit: I'm getting this out to get this out. Reading this over I want to start with clearly defining directions for User Guide content, Cookbook content, and Reference content. I think that's the common denominator here.
https://pola-rs.github.io/polars-book/user-guide/howcani/io/postgres.html
https://github.com/pola-rs/polars-book/blob/master/user_guide/src/howcani/io/postgres.md
Some typos:
cur = conn.cursort()
should be cur = conn.cursor()
INSERT INTO ({}) VALUES({});
should be INSERT INTO {}({}) VALUES({});
There are a number of different topics for time series. I propose an intro page with a table of contents and pages for:
Any thoughts?
Hi everyone. I have developed an alternative documentation with mkdocs, google colab and github actions. This repository lives at fralfaro/polars-book (documentation: link). In each chapter there is a jupyter notebook connected with google colab. This project consider some chapters right now.
This documentation is similar than mdBook, since mkdocs uses .md
files. The main difference is about is possible compile .ipynb
files. This is a benefits, because you can use a .ipynb
for replicate an example and it is not necessary use this kind of syntaxis:
{{#include ../examples/expressions/window_1.py:0:}}
{{#include ../outputs/expressions/window_1.txt}}
For this reason, it is not necessary to have a folder for the python output.
I hope this is a good alternative. If you have any questions, happy to answer them!
Copied from: pola-rs/polars#2540
Hi,
I was going through the user guide (https://pola-rs.github.io/polars-book/user-guide/index.html) to get a feel for the library and ran into some discrepancies I thought you might want to know about. I wasn't sure how to provide this information otherwise, so here is just an overview of the things I noticed:
Cheers,
Pieter
Repurposing to yank Docker to prevent confusion.
As a new contributor my number one goal on this repo is to get set up. I like the idea of using a Docker image, but I'm not sure if the README provides complete instructions. I'd like to use Docker more, but my experience with it is limited to using it for a couple projects in the past.
Note the mdBook executable is downloaded and not compiled to fasten the building of the image.
In order to compile mdBook
wouldn't Rust need to be installed (along with cargo
) on the image?
As mentioned on Discord
Run black --check .
Traceback (most recent call last):
File "/opt/hostedtoolcache/Python/3.8.12/x64/bin/black", line 8, in <module>
sys.exit(patched_main())
File "/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/black/__init__.py", line 6606, in patched_main
patch_click()
File "/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/black/__init__.py", line 6595, in patch_click
from click import _unicodefun # type: ignore
ImportError: cannot import name '_unicodefun' from 'click' (/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/click/__init__.py)
Documenting this to either pick up myself or for someone else to grab it before me.
TBD
A section on how to create a DataFrame in code (Rust) would be great.
There is some in the rustdocs on Series etc. but I believe a section in the book makes sense.
And neither rustdoc nor the book show an example of how to create a series/column of type List
In Coming From Pandas documentation, the missing_data link is dead.
repost from pola-rs/polars#4170
When clicking on the "Next Chapter" button (the big arrow on the right) from this page: https://pola-rs.github.io/polars-book/user-guide/dsl/intro.html, it goes back to that same page, while it should be going to https://pola-rs.github.io/polars-book/user-guide/dsl/contexts.html
This happens when the chapter page and first sub chapter page are the same (in SUMMARY.md
):
- [Polars expressions](dsl/intro.md)
- [Expressions](dsl/intro.md)
It might be a bug (or feature) in mdBook
, but couldn't find anything about it.
I was reading through this and noticed some incorrect results, see https://pola-rs.github.io/polars-book/user-guide/dsl/groupby.html#filtering.
avg M birthday
all > 2000avg F birthday
all nullAlso, given the compute_age
calculation being used, the dataset should probably be using legislators-current.csv"
rather than legislators-historical.csv"
.
At the end of window expressions there is the following link:
https://pola-rs.github.io/polars/py-polars/html/reference/expression.html#aggregation
There is a Various aggregations section on this page rather than an aggregation section. I think it should link to that
In the python cookbook the "Coming from Pandas" section is key for new users. However, it could be structured better. I propose to re-work it so it starts with the key conceptual differences (rust, arrow, no index, lazy evaluation, query optimization...) before going into differences in writing queries.
@ritchie46 - I can start a PR on this basis if there are no objections.
The user_guide/src/_examples/
does not need a _
prefix.
The names of the subfolders contained in this directory should not contain -
, but _
instead (to allow python -m user_guide.src.examples.<SUBFOLDER>.snippetX.py
calls).
Those changes involve fixing paths over the whole repo, several files would be impacted.
This rust example does not work as is.
Per the compiler I had to add use std::io::Cursor;
to in order for the example to work
I did some work on the documentation and made a pull request which for reasons unknown has failed a check. Sorry, don't know how to fix that.
These are:
https://pola-rs.github.io/polars/py-polars/html/reference/
https://pola-rs.github.io/polars-book/api-python/
I suppose the 2nd one is outdated but pops up frequently when searching google. Confusing.
Under the "Rolling Groupby" section, there's a typo in the year on the later bound of the values -- I believe it should be '2021', not '20210' in this sentence:
So imagine having a time column with the values {2021-01-01, 20210-01-05} and a period="5d" this would create the following windows
See: https://pola-rs.github.io/polars-book/user-guide/timeseries/intro.html#rolling-groupby
Unless I am mistaken, a few of the examples have output which instead looks like what the input should look like, e.g. this one is missing the new "e"
column.
I've written a quickstart notebook for Polars here:
We need a quickstart for the user guide as well - do you think this could be the basis of a quick start for the user guide? Obviously needs some tweaks for pep styling and removing references to the course
I am getting a 404 error when trying to access the resampling page.
Might be this commit but idk how the fix would look like.
user_guide/src/quickstart/intro.md
polars-book/user_guide/src/quickstart/intro.md
Lines 100 to 107 in 7490a99
I think it should be toml
rather than text
.
```toml
# Rust Cargo.toml dependencies
[dependencies]
polars = { version = "0.24.3", features = ["lazy"] }
reqwest = { version = "0.11.12", features = ["blocking"] }
color-eyre = "0.6"
```
# Rust Cargo.toml dependencies
[dependencies]
polars = { version = "0.24.3", features = ["lazy"] }
reqwest = { version = "0.11.12", features = ["blocking"] }
color-eyre = "0.6"
When enabling "is_in" I get a compiler error unless I also add "dtype-categorical" and that's not documented anywhere I can see.
Rust
& Python
code taken care of; .md
files deserve some attention/formatting too.
.github/workflows/docs.yaml
still install py-polars.
In the quickstart section there is this code:
import polars as pl
df = pl.read_csv("https://j.mp/iriscsv")
print(df.filter(pl.col("sepal_length") > 5).groupby("species").agg(pl.all().sum()))
)
The last line has a closing ')', which should not be there.
See: https://pola-rs.github.io/polars-book/user-guide/quickstart/intro.html
The documentation nowhere mentions the feature flags needed to actually get started using polars. It jumps straight into using "col" which is not available.
One seems to at least need the "lazy" flag but I'm not sure.
The docs also switch between polars::prelude and polars_core::prelude which is also something that's never mentioned anywhere.
The Getting Started guide for example: https://pola-rs.github.io/polars-book/user-guide/quickstart/intro.html
It mentions cargo add polars and then jumps right into something called lazy polars without ever explaining what that is and it uses the col function as well. None of that works without enabling at least one feature.
Hi,
just wanted to report a minor issue.
https://pola-rs.github.io/polars-book/user-guide/coming_from_pandas.html
is missing a comma after col 'type'
df = pd.DataFrame({
"type": ["m", "n", "o", "m", "m", "n", "n"]
"c": [1, 1, 1, 2, 2, 2, 2],
})
https://pola-rs.github.io/polars-book/user-guide/datatypes.html seems to be missing Float64 and Float32
All images (.png
logo and .svg
diagrams) should be found in https://github.com/ritchie46/static
.
All images created during the make run
step remain in the user_guide/src/examples_outputs/
folder.
The user_guide/src/_images/
folder could be entirely removed.
This hard-coding makes using another Python version awkward. I'm also wondering if it's worth just bumping the Python version. I've been doing fine in the User Guide so far with 3.10.3.
Dynamic path resolution/Python version flexibility. Upgrades in general.
Given the Pokemon dataset example, the code snippet
out = filtered.with_columns(
[
pl.col(["Name", "Speed"]).sort(reverse=True).over("Type 1"),
]
)
print(out)
apparently sorts Name and Speed individually. While not obviously meaningful (rows now combine names of Pokemon with speed values of different Pokemon, see for example the suggested "Speed of Slowbro" now being 15 instead of 30), this is probably still correct.
Furthermore, the last example should probably try to order descending or call the first 3 by "speed" the slowest 3 instead of the fastest 3.
I would like to (understand and) see the connection to usual 'window functions', and how to write things such as lead
or lag
known from SQL, as well as the explicit connection between SQL's OVER (PARTITION BY x ORDER BY y)
, where the documentation currently only illustrates the PARTITION BY x
part (and the API is calling it over
), while the ordering remains unclear as of the seemingly meaningless results mentioned above.
See for e.g. here:
In the first example, the assignment variable is wrong, as are in the input arguments to df.select
:
single_select_df = df.select(["id"]) # not list_select_df
print(single_select_df)
shape: (3, 1)
┌─────┐
│ id │
│ --- │
│ i64 │
╞═════╡
│ 1 │
├╌╌╌╌╌┤
│ 2 │
├╌╌╌╌╌┤
│ 3 │
└─────┘
In the second example, the assigned variable should be list_select_df
.
In the section Selecting Rows and Columns, the example only selects rows, not rows and columns.
Successor to #153
tl;dr There's a lot of great content in Cookbook 1.0 which, if reorganized and updated, could provide a powerful revised flow for new and existing users reading through the book.
graph LR
A[Intro] --> B[Getting Started] --> C[Fundamentals] --> D[Walkthrough] --> E[Recipes] --> F[References and Contributing]
If you're aware of what you're about to begin learning it's easier to spot where the fundamental information to learn is located. And from there you can branch out. For broader success we can revise the Cookbook to emphasize what should be emphasized (like Expressions) and refine the surrounding flow for the user to follow.
Step | Content | Description |
---|---|---|
1 | Intro | An overview of the Polars DataFrame library. |
2 | Getting Started | Create a DataFrame and get started. |
3 | Fundamentals | Foundations and fundamental concepts for new and existing Polars users. |
4 | Walk-through | A 5 minute walk-through of basic Polars API usage. |
5 | Recipes, References, and Contributing | More material to refer to while learning and using Polars . |
To do this we can break up the work into a few PRs.
Also the other topic I'd love to solve here would be the User Guide vs Cookbook confusion. IMO the book should be one or the other. If a third option The Polars Book could be considered that'd be sweet 😎.
The challenge we'll have with doing this is coordinating the cohesive revision of the book. If you're interested in coordinating with me on this please reach out 👋.
Polars book could benefit from a Javascript section with a few basic examples.
some good ones that come to mind are
fetch
callcsv -> json
That way we ensure that the code is up to date.
In https://github.com/pola-rs/polars-book/blob/master/user_guide/src/howcani/intro.md
The link points to:
https://github.com/pola-rs/polars/blob/master/examples/python/10_minutes_to_polars.ipynb
which gives a 404.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.