Giter Club home page Giter Club logo

Comments (5)

vraghavan78 avatar vraghavan78 commented on July 18, 2024

Hi,

You are correct. Currently nested loop join does not request hash distribution on the inner side.
Thanks for bringing this up.

In a typical case (except in the case of Indexed Nested Loop join) picking hash join it preferred. We will look into it soon.

V

from gporca.

guangzhouzhang avatar guangzhouzhang commented on July 18, 2024

Thanks for looking into it!! This case was hit in a perf testing. It's quite a common case where we join two big tables with indexed nested loop join with a LIMIT clause.

from gporca.

vraghavan78 avatar vraghavan78 commented on July 18, 2024

@guangzhouzhang I miss spoke earlier:

Orca does generate the correct distribution request for Index Nested Loop Join (https://github.com/greenplum-db/gporca/blob/master/libgpopt/src/operators/CPhysicalInnerIndexNLJoin.cpp#L95).

create table foo(a int, b int) distributed by (a);
create table bar(c int, d int) distributed by (c);
create INDEX bar_index on bar using btree (c);
vraghavan=# explain select * from foo, bar where a = c;
                                   QUERY PLAN
---------------------------------------------------------------------------------
 Gather Motion 3:1  (slice1; segments: 3)  (cost=0.00..433.00 rows=1 width=16)
   ->  Nested Loop  (cost=0.00..433.00 rows=1 width=16)
         Join Filter: true
         ->  Table Scan on foo  (cost=0.00..431.00 rows=1 width=8)
         ->  Index Scan using bar_index on bar  (cost=0.00..2.00 rows=1 width=8)
               Index Cond: bar.c = foo.a
 Settings:  optimizer=on
 Optimizer status: PQO version 1.633
(8 rows)

As you can see we do not broadcast.

However when we have a text column we introduce a broadcast as shown below:


create table jcj_jjxx_r (
  jjbh varchar(20) not null,
  jjdbh varchar(30) default null,
  primary key (jjbh)
);

create table jcj_cjxx_r (
  cjbh varchar(20) not null,
  jjbh varchar(20) not null,
  primary key (jjbh)
);

explain select * from jcj_jjxx_r join jcj_cjxx_r on jcj_jjxx_r.jjbh= jcj_cjxx_r.jjbh;

                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
 Gather Motion 3:1  (slice2; segments: 3)  (cost=0.00..433.00 rows=1 width=32)
   ->  Nested Loop  (cost=0.00..433.00 rows=1 width=32)
         Join Filter: true
         ->  Broadcast Motion 3:3  (slice1; segments: 3)  (cost=0.00..431.00 rows=1 width=16)
               ->  Table Scan on jcj_jjxx_r  (cost=0.00..431.00 rows=1 width=16)
         ->  Index Scan using jcj_cjxx_r_pkey on jcj_cjxx_r  (cost=0.00..2.00 rows=1 width=16)
               Index Cond: jcj_cjxx_r.jjbh::text = jcj_jjxx_r.jjbh::text
 Settings:  optimizer=on
 Optimizer status: PQO version 1.633
(9 rows)

Will update the story to work on this soon. Thanks!

from gporca.

hsyuan avatar hsyuan commented on July 18, 2024

This issue has been open for nearly 1 year! I am planning to jump into the rabbit hole.

create table R1(a varchar, primary key(a));
create table S1(b varchar, primary key(b));
explain select * from R1, S1 where R1.a = S1.b;
                                         QUERY PLAN
---------------------------------------------------------------------------------------------
 Gather Motion 3:1  (slice2; segments: 3)  (cost=0.00..433.00 rows=1 width=16)
   ->  Nested Loop  (cost=0.00..433.00 rows=1 width=16)
         Join Filter: true
         ->  Broadcast Motion 3:3  (slice1; segments: 3)  (cost=0.00..431.00 rows=1 width=8)
               ->  Table Scan on r1  (cost=0.00..431.00 rows=1 width=8)
         ->  Index Scan using s1_pkey on s1  (cost=0.00..2.00 rows=1 width=8)
               Index Cond: s1.b::text = r1.a::text
 Settings:  optimizer=on
 Optimizer status: PQO version 2.12.0
(9 rows)

create table R(a text, primary key(a));
create table S(b text, primary key(b));
explain select * from R, S where R.a = S.b;
                                  QUERY PLAN
-------------------------------------------------------------------------------
 Gather Motion 3:1  (slice1; segments: 3)  (cost=0.00..433.00 rows=1 width=16)
   ->  Nested Loop  (cost=0.00..433.00 rows=1 width=16)
         Join Filter: true
         ->  Table Scan on r  (cost=0.00..431.00 rows=1 width=8)
         ->  Index Scan using s_pkey on s  (cost=0.00..2.00 rows=1 width=8)
               Index Cond: s.b = r.a
 Settings:  optimizer=on
 Optimizer status: PQO version 2.12.0
(8 rows)

When the join column is text type, we can generate plan without broadcasting the outer side, but for varchar type, plan with broadcast motion is generated.

I took a look at the memo generated by these 2 plans, they almost have the same plan space/plan alternatives, same optimization request.

By setting the following GUC on,

set optimizer=on;
set client_min_messages='log';
set optimizer_enumerate_plans=on;
set optimizer_print_optimization_context = on;
set optimizer_print_memo_after_optimization = on;
explain select * from R1, S1 where R1.a = S1.b;

Check the memo structure for outer child R1:

2017-03-25 10:47:28:813289 PDT,THD000,TRACE,"
Group 0 (#GExprs: 4):
  0: CLogicalGet "r1" ("r1"), Columns: ["a" (0), "ctid" (1), "xmin" (2), "cmin" (3), "xmax" (4), "cmax" (5), "tableoid" (6), "gp_segment_id" (7)] Key sets: {[0], [1,7]} [ ]
  1: CPhysicalTableScan "r1" ("r1") [ ]
    Cost Ctxts:
      main ctxt (stage 0)6.0, child ctxts:[], rows:1.000000 (group), cost: 431.000021
      main ctxt (stage 0)1.0, child ctxts:[], rows:1.000000 (group), cost: 431.000021
      main ctxt (stage 0)3.0, child ctxts:[], rows:1.000000 (group), cost: 431.000021
      main ctxt (stage 0)2.0, child ctxts:[], rows:1.000000 (group), cost: 431.000021
  2: CPhysicalMotionGather(master) [ 0 ]
    Cost Ctxts:
      main ctxt (stage 0)4.0, child ctxts:[1], rows:1.000000 (group), cost: 431.000125
      main ctxt (stage 0)0.0, child ctxts:[1], rows:1.000000 (group), cost: 431.000125
  3: CPhysicalMotionBroadcast  [ 0 ]
    Cost Ctxts:
      main ctxt (stage 0)7.0, child ctxts:[1], rows:1.000000 (group), cost: 431.000465
      main ctxt (stage 0)5.0, child ctxts:[1], rows:1.000000 (group), cost: 431.000465
  Grp OptCtxts:
    6 (stage 0): (req cols: ["a" (0)], req CTEs: [], req order: [<empty> match: satisfy ], req dist: [HASHED: [ +--CScalarCast   origin: [Grp:3, GrpExpr:0]
   +--CScalarIdent "a" (0)   origin: [Grp:2, GrpExpr:0]
 , nulls colocated ] match: satisfy], req rewind: [NON-REWINDABLE match: satisfy], req partition propagation: [Filters: [] match: satisfy ]) => Best Expr:1
    4 (stage 0): (req cols: ["a" (0)], req CTEs: [], req order: [<empty> match: satisfy ], req dist: [SINGLETON (master) match: satisfy], req rewind: [NON-REWINDABLE match: satisfy], req partition propagation: [Filters: [] match: satisfy ]) => Best Expr:2
    7 (stage 0): (req cols: ["a" (0)], req CTEs: [], req order: [<empty> match: satisfy ], req dist: [REPLICATED  match: exact], req rewind: [NON-REWINDABLE match: satisfy], req partition propagation: [Filters: [] match: satisfy ]) => Best Expr:3
    1 (stage 0): (req cols: ["a" (0)], req CTEs: [], req order: [<empty> match: satisfy ], req dist: [ANY  EOperatorId: 121  match: satisfy], req rewind: [NON-REWINDABLE match: satisfy], req partition propagation: [Filters: [] match: satisfy ]) => Best Expr:1
    5 (stage 0): (req cols: ["a" (0)], req CTEs: [], req order: [<empty> match: satisfy ], req dist: [REPLICATED  match: satisfy], req rewind: [NON-REWINDABLE match: satisfy], req partition propagation: [Filters: [] match: satisfy ]) => Best Expr:3
    3 (stage 0): (req cols: ["a" (0)], req CTEs: [], req order: [<empty> match: satisfy ], req dist: [HASHED: [ +--CScalarIdent "a" (0)   origin: [Grp:2, GrpExpr:0]
 , nulls colocated ] match: exact], req rewind: [NON-REWINDABLE match: satisfy], req partition propagation: [Filters: [] match: satisfy ]) => Best Expr:1
    2 (stage 0): (req cols: ["a" (0)], req CTEs: [], req order: [<empty> match: satisfy ], req dist: [NON-SINGLETON  match: satisfy], req rewind: [NON-REWINDABLE match: satisfy], req partition propagation: [Filters: [] match: satisfy ]) => Best Expr:1
    0 (stage 0): (req cols: ["a" (0)], req CTEs: [], req order: [<empty> match: satisfy ], req dist: [SINGLETON (master) match: exact], req rewind: [NON-REWINDABLE match: satisfy], req partition propagation: [Filters: [] match: satisfy ]) => Best Expr:2
",

The CPhysicalInnerIndexNLJoin decided to request replication for R1 (Group 0), with the child group optimization context is 7, why not 3 or 6?

I suspect that this might related with that GPDB internally cast varchar to text, Orca can't correctly deal with ScalarIdent wrapped by a ScalarCast. But looking at the code of CDistributionSpecHashed.cpp, it does extract the column by calling CUtils::PexprWithoutBinaryCoercibleCasts.

@vraghavan78 Any thoughts?

from gporca.

hsyuan avatar hsyuan commented on July 18, 2024

Fixed by commit: 34fc6a7

from gporca.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.