Giter Club home page Giter Club logo

hive-third-functions's Issues

What is the syntax for json path?

create table temp.test_explode (userid string, log string) partitioned by (day string) stored as orc;
insert into table temp.test_explode partition (day = '2017-11-01') values ('u1', '[{  "action": "a" },  {  
"action": "b"} ]');
insert into table temp.test_explode partition (day = '2017-11-02') values ('u2', '[{  "action": "a" , "arg1": "1"},  {  "action": "b", "arg2": "2"} ]');


create temporary function udf_json_array_extract as 'cc.shanruifeng.functions.json.UDFJsonArrayExtract';

create temporary function udf_json_array_extract_scalar as 'cc.shanruifeng.functions.json.UDFJsonExtractScalar';


select userid, udf_json_array_extract ( log, '$.action' ) from  temp.test_explode  where day = '2017-11-01';

-- result
u1	["\"a\"","\"b\""]

select userid, udf_json_array_extract_scalar ( log, '$.action' ) from  temp.test_explode  where day = '2017-11-01';

-- result
u1	NULL

I want to get an array like ["a","b"], in the above second query, I got NULL.

Am I using wrong syntax for json path? What is the syntax for the path?

some function cannot work

0: jdbc:hive2://XX> create temporary function wgs_distance as 'com.github.aaronshan.functions.geo.UDFGeoWgsDistance'; ERROR : FAILED: Class com.github.aaronshan.functions.geo.UDFGeoWgsDistance does not implement UDF, GenericUDF, or UDAF Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask (state=08S01,code=1) 0: jdbc:hive2://XX> create temporary function gcj_to_bd as 'com.github.aaronshan.functions.geo.UDFGeoGcjToBd'; ERROR : FAILED: Class com.github.aaronshan.functions.geo.UDFGeoGcjToBd does not implement UDF, GenericUDF, or UDAF Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask (state=08S01,code=1) 0: jdbc:hive2://XX> create temporary function bd_to_gcj as 'com.github.aaronshan.functions.geo.UDFGeoBdToGcj'; No rows affected (0.014 seconds) 0: jdbc:hive2://XX> create temporary function wgs_to_gcj as 'com.github.aaronshan.functions.geo.UDFGeoWgsToGcj'; ERROR : FAILED: Class com.github.aaronshan.functions.geo.UDFGeoWgsToGcj does not implement UDF, GenericUDF, or UDAF Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask (state=08S01,code=1) 0: jdbc:hive2://XX> create temporary function gcj_to_wgs as 'com.github.aaronshan.functions.geo.UDFGeoGcjToWgs'; ERROR : FAILED: Class com.github.aaronshan.functions.geo.UDFGeoGcjToWgs does not implement UDF, GenericUDF, or UDAF Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask (state=08S01,code=1) 0: jdbc:hive2://XX> create temporary function gcj_extract_wgs as 'com.github.aaronshan.functions.geo.UDFGeoGcjExtractWgs'; ERROR : FAILED: Class com.github.aaronshan.functions.geo.UDFGeoGcjExtractWgs does not implement UDF, GenericUDF, or UDAF Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask (state=08S01,code=1)

the jar is come from your released 2.2.0 , is there some problems ?

UDFArrayIntersect代码逻辑有问题,会出现数组越界异常或者出现错误返回值。

https://github.com/aaronshan/hive-third-functions/blame/f98fef86d328882c85ea40b69b14375d90d44201/src/main/java/com/github/aaronshan/functions/array/UDFArrayIntersect.java#L160
select default.array_intersect(array("39236600","38943350","39007633"),array("39236600","38943350","39007633","38593565","39165420","39119191","39223090","39273131","39113697","39264583","38643724","39243639","39273301","39153039","39152750","38422867","39194210"));
返回值应该是{"39236600","38943350","39007633"},但实际上只返回了一个。

查看了代码,发现compare方法逻辑错误,应修复为一下:
private int compare(ListObjectInspector arrayOI, Object array, int[] positions, int position1, int position2) {
ObjectInspector arrayElementOI = arrayOI.getListElementObjectInspector();
Object arrayElementTmp1 = arrayOI.getListElement(array, positions[position1]);
Object arrayElementTmp2 = arrayOI.getListElement(array, positions[position2]);
return ObjectInspectorUtils.compare(arrayElementTmp1, arrayElementOI, arrayElementTmp2, arrayElementOI);
}
即参数中增加int[] positions,传递进来leftPositions或者rightPositions,相应方法调用出也需一并修改。

array_intersect bug

array_intersect 有数组越界bug,代码也复杂,简化代码:
` @OverRide
public Object evaluate(DeferredObject[] arguments) throws HiveException {
Object leftArray = arguments[0].get();
Object rightArray = arguments[1].get();

    int leftArrayLength = leftArrayOI.getListLength(leftArray);
    int rightArrayLength = rightArrayOI.getListLength(rightArray);

    // Check if array is null or empty
    if (leftArray == null || rightArray == null || leftArrayLength < 0 || rightArrayLength < 0) {
        return null;
    }

    if (leftArrayLength == 0) {
        return leftArray;
    }
    if (rightArrayLength == 0) {
        return rightArray;
    }

    List<?> leftList = leftArrayOI.getList(leftArray);
    List<?> rightList = leftArrayOI.getList(rightArray);
    HashSet<?> result_set = Sets.newHashSet(leftList);
    result_set.retainAll(rightList);

    return new ArrayList(result_set);
}`

[BUG]when apply the udf, appears null pointer error

Hello,

I tried to apply these UDFs in spark with hive support. Here is the code:

// register UDF
spark.sql("create temporary function id_card_province as 'cc.shanruifeng.functions.card.UDFChinaIdCardProvince'");
        	
// get file
Dataset<Row> rawdata = spark.read().csv("./src/main/resources/starM.csv");

// use UDF
rawdata.createOrReplaceTempView("starM");
Dataset<Row> udfModified = spark.sql("SELECT *, id_card_province(_c13) FROM starM");
udfModified.show();

and I got error:

org.apache.hadoop.hive.ql.metadata.HiveException: Unable to execute method public org.apache.hadoop.io.Text cc.shanruifeng.functions.card.UDFChinaIdCardProvince.evaluate(org.apache.hadoop.io.Text)  on object cc.shanruifeng.functions.card.UDFChinaIdCardProvince@5c622859 of class cc.shanruifeng.functions.card.UDFChinaIdCardProvince with arguments {652423184510291234:org.apache.hadoop.io.Text} of size 1
	at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:981)
	at org.apache.spark.sql.hive.HiveSimpleUDF.eval(hiveUDFs.scala:91)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply_6$(Unknown Source)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)
	at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:235)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:228)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
	at org.apache.spark.scheduler.Task.run(Task.scala:108)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:957)
	... 18 more
Caused by: java.lang.NullPointerException
	at org.apache.hadoop.io.Text.encode(Text.java:450)
	at org.apache.hadoop.io.Text.set(Text.java:198)
	at cc.shanruifeng.functions.card.UDFChinaIdCardProvince.evaluate(UDFChinaIdCardProvince.java:23)
	... 23 more

Could you please give me some advice on this problem? The column is not null anyway.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.