Giter Club home page Giter Club logo

hive-lookup-udf's Introduction

Lookup Prefix Hive UDF example

  1. You might want to use local Hive installation to test that UDF

  2. Build the package and copy it to the server

mvn package
docker cp target/hive-lookup-udf-1.0-SNAPSHOT.jar docker-hive_hive-server_1:/root/
docker-compose exec hive-server bash
  1. Upload UDF jar to HDFS, generate example data and example prefix table, login to Hive
hdfs dfs -put /root/hive-lookup-udf-1.0-SNAPSHOT.jar /tmp
cat > /root/numbers.txt <<EOF
123456789
987654321
124456789
988654321
EOF
cat > /root/prefixes.txt <<EOF
1234,1
9876,2
12,3
98,4
EOF
hdfs dfs -put /root/prefixes.txt /tmp
/opt/hive/bin/beeline -u jdbc:hive2://localhost:10000
  1. Create function, create example table, call the function
CREATE TABLE numbers (number STRING);
LOAD DATA LOCAL INPATH '/root/numbers.txt' OVERWRITE INTO TABLE numbers;
DROP FUNCTION IF EXISTS longest_prefix_int;
CREATE FUNCTION longest_prefix_int AS 'org.dartov.hive.LookupPrefixInt' USING JAR 'hdfs://namenode:8020/tmp/hive-lookup-udf-1.0-SNAPSHOT.jar';
CREATE TABLE number_prefix_int AS SELECT number,longest_prefix_int(number,'/tmp/prefixes.txt') AS prefix_id FROM numbers;
SELECT * FROM number_prefix_int;

The results should be:

+-----------------------+--------------------------+
| number_prefix.number  | number_prefix.prefix_id  |
+-----------------------+--------------------------+
| 123456789             | 1                        |
| 987654321             | 2                        |
| 124456789             | 3                        |
| 988654321             | 4                        |
+-----------------------+--------------------------+

To use the function that returns String code instead of Integer ID:

DROP FUNCTION IF EXISTS longest_prefix_str;
CREATE FUNCTION longest_prefix_str AS 'org.dartov.hive.LookupPrefixString' USING JAR 'hdfs://namenode:8020/tmp/hive-lookup-udf-1.0-SNAPSHOT.jar';
CREATE TABLE number_prefix_str AS SELECT number,longest_prefix_str(number,'/tmp/prefixes.txt') AS prefix_cd FROM numbers;
SELECT * FROM number_prefix_str;

The results should be:

+---------------------------+------------------------------+
| number_prefix_str.number  | number_prefix_str.prefix_cd  |
+---------------------------+------------------------------+
| 123456789                 | 1                            |
| 987654321                 | 2                            |
| 124456789                 | 3                            |
| 988654321                 | 4                            |
+---------------------------+------------------------------+

To use the function that returns the prefix itself instead of prefix ID or code:

DROP FUNCTION IF EXISTS longest_prefix;
CREATE FUNCTION longest_prefix AS 'org.dartov.hive.LookupPrefix' USING JAR 'hdfs://namenode:8020/tmp/hive-lookup-udf-1.0-SNAPSHOT.jar';
CREATE TABLE number_prefix AS SELECT number,longest_prefix(number,'/tmp/prefixes.txt') AS prefix FROM numbers;
SELECT * FROM number_prefix;

The results should be:

+-----------------------+-----------------------+
| number_prefix.number  | number_prefix.prefix  |
+-----------------------+-----------------------+
| 123456789             | 1234                  |
| 987654321             | 9876                  |
| 124456789             | 12                    |
| 988654321             | 98                    |
+-----------------------+-----------------------+

hive-lookup-udf's People

Contributors

dartov avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.