Comments (5)
I have also encountered the same situation. Can the developer reply?
from dask-sql.
It looks like the underlying issue here is that dask-cuDF is failing because we're trying to do a binop between a float column and a string scalar:
import cudf
import dask_cudf
s = cudf.Series(["A"])
ds = dask_cudf.from_cudf(s, npartitions=1)
ds != "A"
On dask-sql's end, it seems like the specific combination of cross joins planned when selecting from all 3 tables in your reproducer is making it such that t1_gpu.c0
is registered a float column rather than the string/object column I would've expected - will dig into this deeper to get a sense of what's happening
from dask-sql.
Okay seems like the issue here is handling of duplicate column names on GPU, it looks like the cross join itself fails with 3 tables:
import pandas as pd
import dask.dataframe as dd
from dask_sql import Context
c = Context()
c.create_table('df1', pd.DataFrame({"a": [1]}), gpu=True)
c.create_table('df2', pd.DataFrame({"a": [2]}), gpu=True)
c.create_table('df3', pd.DataFrame({"a": [3]}), gpu=True)
query = "SELECT * FROM df1, df2, df3"
explain = c.explain(query)
res = c.sql(query) # AssertionError
There's a good chance that this could be related to #1133 and potentially resolved with #1134, I can look into reviving that PR and giving it a try here
EDIT:
Tried #1134 without much luck, looks like the cross join issue is independent of CPU/GPU as I get errors with the above block even when gpu=False
, think it just ends up failing on GPU for your particular reproducer because dask-cuDF doesn't support float column / string scalar binops whereas Dask CPU does
from dask-sql.
Trying out your reproducer with #1250, I'm now able to get things passing:
CPU Result:
Utf8("A") != t1.c0
0 True
GPU Result:
Utf8("A") != t1_gpu.c0
0 True
Would you mind giving this branch a look on your end?
from dask-sql.
The bug came up at dask-sql version: 2023.6.0. After my verification, the bug was fixed at #1250 .
Thanks for your work.
from dask-sql.
Related Issues (20)
- [BUG]] [GPU Logic Bug] "SELECT ((1) NOT BETWEEN (CASE ((<column>)) WHEN (1) THEN 0 END ) AND (<column>)) FROM <table>" brings Error
- [BUG][GPU Logic Bug] "SELECT ((<column>) IS DISTINCT FROM ((CASE <column> WHEN <number> THEN <number> END ))) FROM <table>" brings Error
- [BUG][GPU Logic Bug] "SELECT ( (CASE (CASE (<number>) WHEN <column> THEN (<number>) END ) WHEN <number> THEN (<number>) ELSE <column> END )) FROM <table>" brings Error
- [BUG][GPU Logic Bug] "SELECT (CASE (<column>) WHEN <number> THEN <number> END) FROM <table>" brings Error
- [BUG] [Logic Bug] "SELECT <column> FROM <table>" by JDBC brings Error
- [BUG][Logic Bug] "SELECT (<column>)*(<decimal>) FROM <table>" by JDBC brings Error
- [BUG][Logic Bug] "SELECT <column> FROM <table>" brings Error
- SchemaError / NotImplementedError: The python type string is not implemented (yet) HOT 2
- Implement date_trunc function [ENH]
- [BUG] `dynamic_partition_pruning::read_table` errors on single-file Parquet datasets
- [BUG] [GPU Error Bug] "SELECT (('b햦]D7Jr31')||((CASE 'Kx}lzJ^' WHEN <column> THEN '' END ))) FROM <table>" brings Error
- [BUG] [GPU Error Bug] "SELECT (((<column> LIKE '\뽞^' ESCAPE 'M')) IS NULL) FROM <tables>" brings Error
- [BUG] [GPU Error Bug] "SELECT -2613 FROM <table> HAVING (<TIMESTAMP> NOT BETWEEN <TIMESTAMP> AND MAX(<TIMESTAMP>))" brings Error HOT 1
- ⚠️ Upstream CI Dask failed ⚠️
- ⚠️ Upstream CI failed ⚠️ HOT 1
- Push pre-built Python 3.11 wheels [ENH] HOT 2
- Spatial SQL Support HOT 1
- [BUG] `dask-sql` fails to import with `dask-expr` enabled HOT 6
- [BUG] on Starter example HOT 1
- ⚠️ Upstream CI failed ⚠️
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dask-sql.