Giter Club home page Giter Club logo

gporca's Introduction

======================================================================
               __________  ____  ____  _________
              / ____/ __ \/ __ \/ __ \/ ____/   |
             / / __/ /_/ / / / / / /_/ / /   / /| |
            / /_/ / ____/ /_/ / _, _/ /___/ ___ |
            \____/_/    \____/_/ |_|\____/_/  |_|
                  The Greenplum Query Optimizer
              Copyright (c) 2015, Pivotal Software, Inc.
            Licensed under the Apache License, Version 2.0
======================================================================

Welcome to GPORCA, the Greenplum Next Generation Query Optimizer!

To understand the objectives and architecture of GPORCA please refer to the following articles:

Want to Contribute?

GPORCA supports various build types: debug, release with debug info, release. On x86 systems, GPORCA can also be built as a 32-bit or 64-bit library. You'll need CMake 3.1 or higher to build GPORCA. Get it from cmake.org, or your operating system's package manager.

First Time Setup

Clone GPORCA

git clone https://github.com/greenplum-db/gporca.git
cd gporca

Pre-Requisites

GPORCA uses the following library:

  • GP-Xerces - Greenplum's patched version of Xerces-C 3.1.X

Installing GP-Xerces

GP-XERCES is available here. The GP-XERCES README gives instructions for building and installing.

Build and install GPORCA

ORCA is built with CMake, so any build system supported by CMake can be used. The team uses Ninja because it's really really fast and convenient.

Go into gporca directory:

cmake -GNinja -H. -Bbuild
ninja install -C build

Test GPORCA

To run all GPORCA tests, simply use the ctest command from the build directory after build finishes.

ctest

Much like make, ctest has a -j option that allows running multiple tests in parallel to save time. Using it is recommended for faster testing.

ctest -j7

By default, ctest does not print the output of failed tests. To print the output of failed tests, use the --output-on-failure flag like so (this is useful for debugging failed tests):

ctest -j7 --output-on-failure

To run a specific individual test, use the gporca_test executable directly.

./server/gporca_test -U CAggTest

To run a specific minidump, for example for ../data/dxl/minidump/TVFRandom.mdp:

./server/gporca_test -d ../data/dxl/minidump/TVFRandom.mdp

Note that some tests use assertions that are only enabled for DEBUG builds, so DEBUG-mode tests tend to be more rigorous.

Adding tests

Most of the regression tests come in the form of a "minidump" file. A minidump is an XML file that contains all the input needed to plan a query, including information about all tables, datatypes, and functions used, as well as statistics. It also contains the resulting plan.

A new minidump can be created by running a query on a live GPDB server:

  1. Run these in a psql session:
set client_min_messages='log';
set optimizer=on;
set optimizer_enumerate_plans=on;
set optimizer_minidump=always;
set optimizer_enable_constant_expression_evaluation=off;
  1. Run the query in the same psql session. It will create a minidump file under the "minidumps" directory, in the master's data directory:
$ ls -l ~/data-master/minidumps/
total 12
-rw------- 1 heikki heikki 10818 Jun 10 22:02 Minidump_20160610_220222_4_14.mdp
  1. Run xmllint on the minidump to format it better, and copy it under the data/dxl/minidump directory:
xmllint --format ~/data-master/minidumps/Minidump_20160610_220222_4_14.mdp > data/dxl/minidump/MyTest.xml
  1. Add it to the test suite, in server/src/unittest/gpopt/minidump/CICGTest.cpp
--- a/server/src/unittest/gpopt/minidump/CICGTest.cpp
+++ b/server/src/unittest/gpopt/minidump/CICGTest.cpp
@@ -217,6 +217,7 @@ const CHAR *rgszFileNames[] =
                "../data/dxl/minidump/EffectsOfJoinFilter.mdp",
                "../data/dxl/minidump/Join-IDF.mdp",
                "../data/dxl/minidump/CoerceToDomain.mdp",
+               "../data/dxl/minidump/Mytest.mdp",
                "../data/dxl/minidump/LeftOuter2InnerUnionAllAntiSemiJoin.mdp",
 #ifndef GPOS_DEBUG
                // TODO:  - Jul 14 2015; disabling it for debug build to reduce testing time

[Experimental] Concourse

GPORCA contains a series of pipeline and task files to run various sets of tests on concourse. You can learn more about deploying concourse with bosh at bosh.io.

Our concourse currently runs the following sets of tests:

  • build and ctest on centos5
  • build and ctest on debian8

We are currently working on adding support for the following sets of tests:

  • build and ctest on centos6
  • build GPDB with GPORCA and run make installcheck-good on centos6

All configuration files for our concourse pipelines can be found in the concourse/ directory.

Note: concourse jobs and pipelines for GPORCA are currently experimental and should not be considered ready for use in production-level CI environments.

Advanced Setup

How to generate build files with different options

Here are a few build flavors (commands run from the ORCA checkout directory):

# debug build
cmake -GNinja -D CMAKE_BUILD_TYPE=DEBUG -H. -Bbuild.debug
# release build with debug info
cmake -GNinja -D CMAKE_BUILD_TYPE=RelWithDebInfo -H. -Bbuild.release

Explicitly Specifying GP-Xerces For Build

GP-XERCES

It is recommended to use the --prefix option to the Xerces-C configure script to install GP-Xerces in a location other than the default under /usr/local/, because you may have other software that depends on Xerces-C, and the changes introduced in the GP-Xerces patch make it incompatible with the upstream version. Installing in a non-default prefix allows you to have GP-Xerces installed side-by-side with unpatched Xerces without incompatibilities.

You can point cmake at your patched GP-Xerces installation using the XERCES_INCLUDE_DIR and XERCES_LIBRARY options like so:

However, to use the current build scripts in GPDB, Xerces with the gp_xerces patch will need to be placed on the /usr path.

cmake -GNinja -D XERCES_INCLUDE_DIR=/opt/gp_xerces/include -D XERCES_LIBRARY=/opt/gp_xerces/lib/libxerces-c.so ..

Again, on Mac OS X, the library name will end with .dylib instead of .so.

Cross-Compiling 32-bit or 64-bit libraries

GP-XERCES

Unless you intend to cross-compile a 32 or 64-bit version of GP-Orca, you can ignore these instructions. If you need to explicitly compile for the 32 or 64-bit version of your architecture, you need to set the CFLAGS and CXXFLAGS environment variables for the configure script like so (use -m32 for 32-bit, -m64 for 64-bit):

CFLAGS="-m32" CXXFLAGS="-m32" ../configure --prefix=/opt/gp_xerces_32

GPORCA

For the most part you should not need to explicitly compile a 32-bit or 64-bit version of the optimizer libraries. By default, a "native" version for your host platform will be compiled. However, if you are on x86 and want to, for example, build a 32-bit version of Optimizer libraries on a 64-bit machine, you can do so as described below. Note that you will need a "multilib" C++ compiler that supports the -m32/-m64 switches, and you may also need to install 32-bit ("i386") versions of the C and C++ standard libraries for your OS. Finally, you will need to build 32-bit or 64-bit versions of GP-Xerces as appropriate.

Toolchain files for building 32 or 64-bit x86 libraries are located in the cmake directory. Here is an example of building for 32-bit x86:

cmake -GNinja -D CMAKE_TOOLCHAIN_FILE=../cmake/i386.toolchain.cmake ../

And for 64-bit x86:

cmake -GNinja -D CMAKE_TOOLCHAIN_FILE=../cmake/x86_64.toolchain.cmake ../

How to debug the build

Show all command lines while building (for debugging purpose)

ninja -v -C build

Extended Tests

Debug builds of GPORCA include a couple of "extended" tests for features like fault-simulation and time-slicing that work by running the entire test suite in combination with the feature being tested. These tests can take a long time to run and are not enabled by default. To turn extended tests on, add the cmake arguments -D ENABLE_EXTENDED_TESTS=1.

Installation Details

GPORCA has four libraries:

  1. libnaucrates --- has all DXL related classes, and statistics related classes
  2. libgpopt --- has all the code related to the optimization engine, meta-data accessor, logical / physical operators, transformation rules, and translators (DXL to expression and vice versa).
  3. libgpdbcost --- cost model for GPDB.
  4. libgpos --- abstraction of memory allocation, scheduling, error handling, and testing framework.

By default, GPORCA will be installed under /usr/local. You can change this by setting CMAKE_INSTALL_PREFIX when running cmake, for example:

cmake -GNinja -D CMAKE_INSTALL_PREFIX=/home/user/gporca -H. -Bbuild

By default, the header files are located in:

/usr/local/include/naucrates
/usr/local/include/gpdbcost
/usr/local/include/gpopt
/usr/local/include/gpos

the library is located at:

/usr/local/lib/libnaucrates.so*
/usr/local/lib/libgpdbcost.so*
/usr/local/lib/libgpopt.so*
/usr/local/lib/libgpos.so*

Build and install:

ninja install -C build

Common Issues

Note that because Red Hat-based systems do not normally look for shared libraries in /usr/local/lib, it is suggested to add /usr/local/lib to the /etc/ld.so.conf and run ldconfig to rebuild the shared library cache if developing on one of these Linux distributions.

Cleanup

Remove the cmake files generated under build folder of gporca repo:

rm -fr build/*

Remove gporca header files and library, (assuming the default install prefix /usr/local)

rm -rf /usr/local/include/naucrates
rm -rf /usr/local/include/gpdbcost
rm -rf /usr/local/include/gpopt
rm -rf /usr/local/include/gpos
rm -rf /usr/local/lib/libnaucrates.so*
rm -rf /usr/local/lib/libgpdbcost.so*
rm -rf /usr/local/lib/libgpopt.so*
rm -rf /usr/local/lib/libgpos.so*

How to Contribute

We accept contributions via Github Pull requests only.

Follow the steps below to open a PR:

  1. Fork the project’s repository
  2. Create your own feature branch (e.g. git checkout -b better_orca) and make changes on this branch.
    • Follow the previous sections on this page to setup and build in your environment.
  3. Follow the naming and formatting style guide described here.
  4. Run through all the tests in your feature branch and ensure they are successful.
    • Follow the Add tests section to add new tests.
  5. Push your local branch to your fork (e.g. git push origin better_orca) and submit a pull request

Your contribution will be analyzed for product fit and engineering quality prior to merging.
Note: All contributions must be sent using GitHub Pull Requests.

Your pull request is much more likely to be accepted if it is small and focused with a clear message that conveys the intent of your change.

Overall we follow GPDB's comprehensive contribution policy. Please refer to it here for details.

Bumping ORCA version

Bump the GPORCA_VERSION_MINOR in CMakeLists.txt whenever your changes affect the ORCA functionality. GPORCA_VERSION_PATCH is bumped only in case where the changes do not affect ORCA functionality e.g. updating the README.md, adding a test case, fixing comments etc.

gporca's People

Contributors

d avatar hsyuan avatar bhuvnesh2703 avatar vraghavan78 avatar craig-chasseur avatar xinzweb avatar dhanashreek89 avatar sambitesh avatar hlinnaka avatar khannaekta avatar karthijrk avatar zaksoup avatar melanieplageman avatar cramja avatar asubramanya avatar lpetrov-pivotal avatar jemishp avatar ryantang avatar ashuka24 avatar entong avatar armenatzoglou avatar addisonhuddy avatar atris avatar iyerr3 avatar chrishajas avatar challiwill avatar danielgustafsson avatar hanfei1991 avatar dotyjim-work avatar

Watchers

seyi avatar James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.