Giter Club home page Giter Club logo

Comments (14)

josevalim avatar josevalim commented on June 29, 2024 1

I think we should be stricter, unless we have a valid use case. Returning nil or wrapping are both undesired IMO.

from explorer.

billylanchantin avatar billylanchantin commented on June 29, 2024

This issue is a blocker for the property test PR. Should I fix it there or do it separately?

from explorer.

lkarthee avatar lkarthee commented on June 29, 2024

Lists in polars are tricky:

  • If it is empty it does not matter how many levels of nested list. It takes the dtype of non empty list element

For [[] , [[]] , [[[[]]]], [[2]] ] => dtype will be list[list[s64]]

  • if all elements are empty in list it results in a different dtype - dtype of nested list having highest levels.

[[[[]]]] => dtype will be list[list[list[list[null]]]]

What is the rationale behind this, any clue ?

from explorer.

billylanchantin avatar billylanchantin commented on June 29, 2024

@lkarthee Interesting I didn't realize. Is this the behavior of Rust Polars or Python Polars?

from explorer.

lkarthee avatar lkarthee commented on June 29, 2024

I have tested it in py polars - from your rust example it looks like rust is no different from py.

from explorer.

billylanchantin avatar billylanchantin commented on June 29, 2024

I ask because I'm not sure you could make it in Rust because the compiler might prevent you.

Also, what's resulting series from [[] , [[]] , [[[[]]]], [[2]]]? I'd assume this:

[[] , [[]] , [[]], [[2]]] 
             ^^^^
             Inner nesting removed

Otherwise the dtypes wouldn't match, right?

I'll try and test these in Rust later today.

from explorer.

lkarthee avatar lkarthee commented on June 29, 2024

Yeah true.

s = pl.Series([[[[[[[]]]]]], [], [[]]])
shape: (3,)
Series: '' [list[list[list[list[list[list[null]]]]]]]
[
	[[[[[[]]]]]]
	[]
	[[]]
]
s = pl.Series([[[[[[[]]]]]], [], [[]], [[]], [[0], [], []]])
shape: (5,)
Series: '' [list[list[i64]]]
[
	[null]
	[]
	[[]]
	[[]]
	[[0], [], []]
]

Interesting to note that first element reduced to [null] not [[]].

from explorer.

billylanchantin avatar billylanchantin commented on June 29, 2024

Interesting to note that first element reduced to [null] not [[]].

Hm yeah I would've expected [[]] or [[null]], not that.

from explorer.

josevalim avatar josevalim commented on June 29, 2024

Hm yeah I would've expected [[]] or [[null]], not that.

That can potentially be a polars bug?

from explorer.

billylanchantin avatar billylanchantin commented on June 29, 2024

Yeah it might be a bug. Although it's technically legal since every element can potentially be null. Who knows, maybe it's actually the correct way to do it for some non-obvious reason 🤷

(But I think bug is most likely)

from explorer.

lkarthee avatar lkarthee commented on June 29, 2024
s = pl.Series([[], [[[3]]], [[2], [2]]])
shape: (3,)
Series: '' [list[list[list[i64]]]]
[
	[]
	[[[3]]]
	[[[2]], [[2]]]
]

This makes me believe there is a bug - a list 2 level nesting [[2]] is converted to 3 level nesting [[[2]]]. Better to log an issue with polars before attempting a fix ? It only considers dtype of first non empty element.

s = pl.Series([[], [3], [[2], [2]]])
shape: (3,)
Series: '' [list[i64]]
[
	[]
	[3]
	[null, null]
]

from explorer.

billylanchantin avatar billylanchantin commented on June 29, 2024

I started to make an issue, but now I think Polars is more correct than we initially thought. Consider this example:

pl.Series([["a"], [1]])
# shape: (2,)
# Series: '' [list[str]]
# [
#         ["a"]
#         ["1"]
# ]

pl.Series([[1], ["a"]])
# shape: (2,)
# Series: '' [list[i64]]
# [
#         [1]
#         [null]
# ]

The dtypes of these series are either list[str] or list[i64]. Polars needs a tie-breaker, so letting the first element decide seems like a sensible choice. I think that's what's happening with some of our examples.

However, I still can't account for some other stuff we're seeing. These cases still seem wrong even if we use the first-non-empty-element-wins rule:

# Wrapping elements in additional layers of nesting to make the dtypes compatible
pl.Series([[[2]], [1, 1]])
# shape: (2,)
# Series: '' [list[list[i64]]]
# [
#         [[2]]
#         [[1], [1]]
# ]

# An empty list as the errant element pushes the null up one layer of nesting
pl.Series([ [[2, 2]], [["b"]] ])
# shape: (2,)
# Series: '' [list[list[i64]]]
# [
#         [[2, 2]]
#         [[null]]
# ]

pl.Series([ [[2, 2]], [[[]]] ])
# shape: (2,)
# Series: '' [list[list[i64]]]
# [
#         [[2, 2]]
#         [null]
# ]

Then there's this:

pl.Series([ [[2, 2]], [[2, 2]] ])
# shape: (2,)
# Series: '' [list[list[i64]]]
# [
#         [[2, 2]]
#         [[2, 2]]
# ]

pl.Series([ [[2, 2]], [[2, []]] ])
# shape: (2,)
# Series: '' [o][object]
# [
#         [[2, 2]]
#         [[2, []]]
# ]

Not sure what to make of that. Does anyone know if the intended behavior is documented anywhere? I didn't see it in the Python docs, but I might've just missed it.

from explorer.

josevalim avatar josevalim commented on June 29, 2024

The dtypes of these series are either list[str] or list[i64]. Polars needs a tie-breaker, so letting the first element decide seems like a sensible choice. I think that's what's happening with some of our examples.

You are right, that makes sense. This means we should raise on these cases, including in several of the examples below:

pl.Series([[[2]], [1, 1]])

Should raise.

pl.Series([ [[2, 2]], [["b"]] ])

Should raise.

pl.Series([ [[2, 2]], [[[]]] ])

Should raise. You can think that [[[]]] means it is at least a three element deep list, which violates the first value.

pl.Series([ [[2, 2]], [[2, []]] ])

Should raise.


The example you originally reported is still wrong and has to be fixed though. [[], [[]], [[1]]] means that the 1+ level list, then 2+ level list, and officially confirmed with a {:list, {:list, :s64}} list. I can try to work on a PR, since I wrote the current version of the code.

from explorer.

billylanchantin avatar billylanchantin commented on June 29, 2024

@josevalim I think Polars is making non-matching elements null instead of raising. So for:

pl.Series([[[2]], [1, 1]])

Should raise.

I think it should yield [[[2]], [null, null]] instead since each 1 it encounters ought to be a list. I have no idea why they decided to wrap each 1 to yield [[[2]], [[1], [1]]].

That said, we can also chose to differ from how Python Polars is resolving these situations by raising.

from explorer.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.