Giter Club home page Giter Club logo

Comments (5)

awusan125 avatar awusan125 commented on June 2, 2024

I have also encountered the same situation. Can the developer reply?

from dask-sql.

charlesbluca avatar charlesbluca commented on June 2, 2024

It looks like the underlying issue here is that dask-cuDF is failing because we're trying to do a binop between a float column and a string scalar:

import cudf
import dask_cudf

s = cudf.Series(["A"])

ds = dask_cudf.from_cudf(s, npartitions=1)

ds != "A"

On dask-sql's end, it seems like the specific combination of cross joins planned when selecting from all 3 tables in your reproducer is making it such that t1_gpu.c0 is registered a float column rather than the string/object column I would've expected - will dig into this deeper to get a sense of what's happening

from dask-sql.

charlesbluca avatar charlesbluca commented on June 2, 2024

Okay seems like the issue here is handling of duplicate column names on GPU, it looks like the cross join itself fails with 3 tables:

import pandas as pd
import dask.dataframe as dd
from dask_sql import Context

c = Context()
c.create_table('df1', pd.DataFrame({"a": [1]}), gpu=True)
c.create_table('df2', pd.DataFrame({"a": [2]}), gpu=True)
c.create_table('df3', pd.DataFrame({"a": [3]}), gpu=True)

query = "SELECT * FROM df1, df2, df3"
explain = c.explain(query)
res = c.sql(query)  # AssertionError

There's a good chance that this could be related to #1133 and potentially resolved with #1134, I can look into reviving that PR and giving it a try here

EDIT:

Tried #1134 without much luck, looks like the cross join issue is independent of CPU/GPU as I get errors with the above block even when gpu=False, think it just ends up failing on GPU for your particular reproducer because dask-cuDF doesn't support float column / string scalar binops whereas Dask CPU does

from dask-sql.

charlesbluca avatar charlesbluca commented on June 2, 2024

Trying out your reproducer with #1250, I'm now able to get things passing:

CPU Result:
   Utf8("A") != t1.c0
0                True
GPU Result:
   Utf8("A") != t1_gpu.c0
0                    True

Would you mind giving this branch a look on your end?

from dask-sql.

qwebug avatar qwebug commented on June 2, 2024

The bug came up at dask-sql version: 2023.6.0. After my verification, the bug was fixed at #1250 .
Thanks for your work.

from dask-sql.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.