Giter Club home page Giter Club logo

Comments (5)

yzhou2001 avatar yzhou2001 commented on July 19, 2024

Yes, we should support. Please send out your test case and the explicit error message.

from spark-sql-on-hbase.

xuzhiliang avatar xuzhiliang commented on July 19, 2024

astro> describe kd_trade;
+-------------+---------+-------+
| col_name|data_type|comment|
+-------------+---------+-------+
| rowkey| string| |
| sysTradeId| string| |
| name| string| |
| mobile| string| |
|buyerAlipayNo| string| |
| province| string| |
| city| string| |
| district| string| |
| address| string| |
| platFromId| string| |
| outNick| string| |
+-------------+---------+-------+

astro> describe kd_order;
+----------+---------+-------+
| col_name|data_type|comment|
+----------+---------+-------+
| rowkey| string| |
|sysTradeId| string| |
| title| string| |
+----------+---------+-------+

  1. != 条件出错
    astro> select outNick,platFromId,buyerAlipayNo from kd_trade where platFromId!='1';
    15/09/23 09:03:37 INFO hbase.HBaseSQLCliDriver: Processing select outNick,platFromId,buyerAlipayNo from kd_trade where platFromId!='1'
    15/09/23 09:03:37 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=slave4:2181,slave2:2181,slave1:2181,master:2181,slave3:2181 sessionTimeout=60000 watcher=catalogtracker-on-hconnection-0x7849896, quorum=slave4:2181,slave2:2181,slave1:2181,master:2181,slave3:2181, baseZNode=/hbase
    15/09/23 09:03:37 INFO zookeeper.RecoverableZooKeeper: Process identifier=catalogtracker-on-hconnection-0x7849896 connecting to ZooKeeper ensemble=slave4:2181,slave2:2181,slave1:2181,master:2181,slave3:2181
    15/09/23 09:03:37 INFO zookeeper.ClientCnxn: Opening socket connection to server slave2/192.168.1.222:2181. Will not attempt to authenticate using SASL (unknown error)
    15/09/23 09:03:37 INFO zookeeper.ClientCnxn: Socket connection established to slave2/192.168.1.222:2181, initiating session
    15/09/23 09:03:37 INFO zookeeper.ClientCnxn: Session establishment complete on server slave2/192.168.1.222:2181, sessionid = 0x34fdf88af361f6a, negotiated timeout = 60000
    15/09/23 09:03:37 INFO zookeeper.ZooKeeper: Session: 0x34fdf88af361f6a closed
    15/09/23 09:03:37 INFO zookeeper.ClientCnxn: EventThread shut down
    15/09/23 09:03:38 INFO hbase.HBaseRelation: Number of HBase regions for table kd_trade: 2
    15/09/23 09:03:38 INFO spark.SparkContext: Starting job: main at NativeMethodAccessorImpl.java:-2
    15/09/23 09:03:38 INFO scheduler.DAGScheduler: Got job 12 (main at NativeMethodAccessorImpl.java:-2) with 1 output partitions (allowLocal=false)
    15/09/23 09:03:38 INFO scheduler.DAGScheduler: Final stage: ResultStage 22(main at NativeMethodAccessorImpl.java:-2)
    15/09/23 09:03:38 INFO scheduler.DAGScheduler: Parents of final stage: List()
    15/09/23 09:03:38 INFO scheduler.DAGScheduler: Missing parents: List()
    15/09/23 09:03:38 INFO scheduler.DAGScheduler: Submitting ResultStage 22 (MapPartitionsRDD[44] at main at NativeMethodAccessorImpl.java:-2), which has no missing parents
    15/09/23 09:03:38 INFO storage.MemoryStore: ensureFreeSpace(16048) called with curMem=270692, maxMem=278302556
    15/09/23 09:03:38 INFO storage.MemoryStore: Block broadcast_17 stored as values in memory (estimated size 15.7 KB, free 265.1 MB)
    15/09/23 09:03:38 INFO storage.MemoryStore: ensureFreeSpace(14363) called with curMem=286740, maxMem=278302556
    15/09/23 09:03:38 INFO storage.MemoryStore: Block broadcast_17_piece0 stored as bytes in memory (estimated size 14.0 KB, free 265.1 MB)
    15/09/23 09:03:38 INFO storage.BlockManagerInfo: Added broadcast_17_piece0 in memory on 192.168.1.220:47802 (size: 14.0 KB, free: 265.3 MB)
    15/09/23 09:03:38 INFO spark.SparkContext: Created broadcast 17 from broadcast at DAGScheduler.scala:874
    15/09/23 09:03:38 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 22 (MapPartitionsRDD[44] at main at NativeMethodAccessorImpl.java:-2)
    15/09/23 09:03:38 INFO scheduler.TaskSchedulerImpl: Adding task set 22.0 with 1 tasks
    15/09/23 09:03:38 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 22.0 (TID 813, 192.168.1.221, ANY, 2932 bytes)
    15/09/23 09:03:38 INFO storage.BlockManagerInfo: Added broadcast_17_piece0 in memory on 192.168.1.221:47997 (size: 14.0 KB, free: 1060.2 MB)
    15/09/23 09:03:38 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 22.0 (TID 813, 192.168.1.221): org.apache.hadoop.hbase.DoNotRetryIOException: Failed after retry of OutOfOrderScannerNextException: was there a rpc timeout?
    at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:403)
    at org.apache.spark.sql.hbase.HBaseSQLReaderRDD$$anon$1.hasNext(HBaseSQLReaderRDD.scala:174)
    at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
    at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:308)
    at scala.collection.Iterator$class.foreach(Iterator.scala:727)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
    at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
    at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
    at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
    at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
    at scala.collection.AbstractIterator.to(Iterator.scala:1157)
    at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
    at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
    at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
    at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$3.apply(SparkPlan.scala:143)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$3.apply(SparkPlan.scala:143)
    at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1765)
    at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1765)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
    at org.apache.spark.scheduler.Task.run(Task.scala:70)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
    Caused by: org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected nextCallSeq: 1 But the nextCallSeq got from client: 0; request=scanner_id: 601 number_of_rows: 5000 close_scanner: false next_call_seq: 0
    at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2144)
    at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31443)
    at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035)
    at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
    at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
    at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
    at java.lang.Thread.run(Thread.java:745)

    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
    at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
    at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
    at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:285)
    at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:204)
    at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:59)
    at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:114)
    at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:90)
    at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:355)
    ... 25 more
    Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException): org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected nextCallSeq: 1 But the nextCallSeq got from client: 0; request=scanner_id: 601 number_of_rows: 5000 close_scanner: false next_call_seq: 0
    at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2144)
    at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31443)
    at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035)
    at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
    at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
    at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
    at java.lang.Thread.run(Thread.java:745)

    at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1457)
    at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1661)
    at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719)
    at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:29990)
    at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:174)
    ... 29 more

15/09/23 09:03:38 INFO scheduler.TaskSetManager: Starting task 0.1 in stage 22.0 (TID 814, 192.168.1.221, ANY, 2932 bytes)
15/09/23 09:03:39 INFO scheduler.TaskSetManager: Lost task 0.1 in stage 22.0 (TID 814) on executor 192.168.1.221: org.apache.hadoop.hbase.DoNotRetryIOException (Failed after retry of OutOfOrderScannerNextException: was there a rpc timeout?) [duplicate 1]
15/09/23 09:03:39 INFO scheduler.TaskSetManager: Starting task 0.2 in stage 22.0 (TID 815, 192.168.1.221, ANY, 2932 bytes)
15/09/23 09:03:40 INFO scheduler.TaskSetManager: Lost task 0.2 in stage 22.0 (TID 815) on executor 192.168.1.221: org.apache.hadoop.hbase.DoNotRetryIOException (Failed after retry of OutOfOrderScannerNextException: was there a rpc timeout?) [duplicate 2]
15/09/23 09:03:40 INFO scheduler.TaskSetManager: Starting task 0.3 in stage 22.0 (TID 816, 192.168.1.222, ANY, 2932 bytes)
15/09/23 09:03:40 INFO storage.BlockManagerInfo: Added broadcast_17_piece0 in memory on 192.168.1.222:37081 (size: 14.0 KB, free: 1060.2 MB)
15/09/23 09:03:41 INFO scheduler.TaskSetManager: Lost task 0.3 in stage 22.0 (TID 816) on executor 192.168.1.222: org.apache.hadoop.hbase.DoNotRetryIOException (Failed after retry of OutOfOrderScannerNextException: was there a rpc timeout?) [duplicate 3]
15/09/23 09:03:41 ERROR scheduler.TaskSetManager: Task 0 in stage 22.0 failed 4 times; aborting job
15/09/23 09:03:41 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 22.0, whose tasks have all completed, from pool
15/09/23 09:03:41 INFO scheduler.TaskSchedulerImpl: Cancelling stage 22
15/09/23 09:03:41 INFO scheduler.DAGScheduler: ResultStage 22 (main at NativeMethodAccessorImpl.java:-2) failed in 3.199 s
15/09/23 09:03:41 INFO scheduler.DAGScheduler: Job 12 failed: main at NativeMethodAccessorImpl.java:-2, took 3.212873 s
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 22.0 failed 4 times, most recent failure: Lost task 0.3 in stage 22.0 (TID 816, 192.168.1.222): org.apache.hadoop.hbase.DoNotRetryIOException: Failed after retry of OutOfOrderScannerNextException: was there a rpc timeout?
at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:403)
at org.apache.spark.sql.hbase.HBaseSQLReaderRDD$$anon$1.hasNext(HBaseSQLReaderRDD.scala:174)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:308)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$3.apply(SparkPlan.scala:143)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$3.apply(SparkPlan.scala:143)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1765)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1765)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected nextCallSeq: 1 But the nextCallSeq got from client: 0; request=scanner_id: 607 number_of_rows: 5000 close_scanner: false next_call_seq: 0
at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2144)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31443)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
at java.lang.Thread.run(Thread.java:745)

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:285)
at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:204)
at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:59)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:114)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:90)
at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:355)
... 25 more

Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException): org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected nextCallSeq: 1 But the nextCallSeq got from client: 0; request=scanner_id: 607 number_of_rows: 5000 close_scanner: false next_call_seq: 0
at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2144)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31443)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
at java.lang.Thread.run(Thread.java:745)

at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1457)
at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1661)
at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:29990)
at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:174)
... 29 more

Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1266)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1257)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1256)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1256)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1450)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1411)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)

  1. is not null 出错
    astro> select outNick,platFromId,buyerAlipayNo from kd_trade where buyerAlipayNo is not null;
    15/09/23 09:05:25 INFO hbase.HBaseSQLCliDriver: Processing select outNick,platFromId,buyerAlipayNo from kd_trade where buyerAlipayNo is not null
    15/09/23 09:05:25 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=slave4:2181,slave2:2181,slave1:2181,master:2181,slave3:2181 sessionTimeout=60000 watcher=catalogtracker-on-hconnection-0x7849896, quorum=slave4:2181,slave2:2181,slave1:2181,master:2181,slave3:2181, baseZNode=/hbase
    15/09/23 09:05:25 INFO zookeeper.RecoverableZooKeeper: Process identifier=catalogtracker-on-hconnection-0x7849896 connecting to ZooKeeper ensemble=slave4:2181,slave2:2181,slave1:2181,master:2181,slave3:2181
    15/09/23 09:05:25 INFO zookeeper.ClientCnxn: Opening socket connection to server slave2/192.168.1.222:2181. Will not attempt to authenticate using SASL (unknown error)
    15/09/23 09:05:25 INFO zookeeper.ClientCnxn: Socket connection established to slave2/192.168.1.222:2181, initiating session
    15/09/23 09:05:25 INFO zookeeper.ClientCnxn: Session establishment complete on server slave2/192.168.1.222:2181, sessionid = 0x34fdf88af361f6e, negotiated timeout = 60000
    15/09/23 09:05:25 INFO zookeeper.ZooKeeper: Session: 0x34fdf88af361f6e closed
    15/09/23 09:05:25 INFO zookeeper.ClientCnxn: EventThread shut down
    15/09/23 09:05:25 INFO hbase.HBaseRelation: Number of HBase regions for table kd_trade: 2
    15/09/23 09:05:25 INFO spark.SparkContext: Starting job: main at NativeMethodAccessorImpl.java:-2
    15/09/23 09:05:25 INFO scheduler.DAGScheduler: Got job 15 (main at NativeMethodAccessorImpl.java:-2) with 1 output partitions (allowLocal=false)
    15/09/23 09:05:25 INFO scheduler.DAGScheduler: Final stage: ResultStage 25(main at NativeMethodAccessorImpl.java:-2)
    15/09/23 09:05:25 INFO scheduler.DAGScheduler: Parents of final stage: List()
    15/09/23 09:05:25 INFO scheduler.DAGScheduler: Missing parents: List()
    15/09/23 09:05:25 INFO scheduler.DAGScheduler: Submitting ResultStage 25 (MapPartitionsRDD[50] at main at NativeMethodAccessorImpl.java:-2), which has no missing parents
    15/09/23 09:05:25 INFO storage.MemoryStore: ensureFreeSpace(16048) called with curMem=167840, maxMem=278302556
    15/09/23 09:05:25 INFO storage.MemoryStore: Block broadcast_20 stored as values in memory (estimated size 15.7 KB, free 265.2 MB)
    15/09/23 09:05:25 INFO storage.MemoryStore: ensureFreeSpace(14362) called with curMem=183888, maxMem=278302556
    15/09/23 09:05:25 INFO storage.MemoryStore: Block broadcast_20_piece0 stored as bytes in memory (estimated size 14.0 KB, free 265.2 MB)
    15/09/23 09:05:25 INFO storage.BlockManagerInfo: Added broadcast_20_piece0 in memory on 192.168.1.220:47802 (size: 14.0 KB, free: 265.3 MB)
    15/09/23 09:05:25 INFO spark.SparkContext: Created broadcast 20 from broadcast at DAGScheduler.scala:874
    15/09/23 09:05:25 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 25 (MapPartitionsRDD[50] at main at NativeMethodAccessorImpl.java:-2)
    15/09/23 09:05:25 INFO scheduler.TaskSchedulerImpl: Adding task set 25.0 with 1 tasks
    15/09/23 09:05:25 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 25.0 (TID 825, 192.168.1.222, ANY, 2672 bytes)
    15/09/23 09:05:25 INFO storage.BlockManagerInfo: Added broadcast_20_piece0 in memory on 192.168.1.222:37081 (size: 14.0 KB, free: 1060.2 MB)
    15/09/23 09:05:26 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 25.0 (TID 825, 192.168.1.222): org.apache.hadoop.hbase.DoNotRetryIOException: Failed after retry of OutOfOrderScannerNextException: was there a rpc timeout?
    at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:403)
    at org.apache.spark.sql.hbase.HBaseSQLReaderRDD$$anon$1.hasNext(HBaseSQLReaderRDD.scala:174)
    at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
    at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:308)
    at scala.collection.Iterator$class.foreach(Iterator.scala:727)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
    at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
    at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
    at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
    at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
    at scala.collection.AbstractIterator.to(Iterator.scala:1157)
    at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
    at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
    at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
    at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$3.apply(SparkPlan.scala:143)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$3.apply(SparkPlan.scala:143)
    at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1765)
    at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1765)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
    at org.apache.spark.scheduler.Task.run(Task.scala:70)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
    Caused by: org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected nextCallSeq: 1 But the nextCallSeq got from client: 0; request=scanner_id: 625 number_of_rows: 5000 close_scanner: false next_call_seq: 0
    at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2144)
    at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31443)
    at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035)
    at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
    at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
    at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
    at java.lang.Thread.run(Thread.java:745)

    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
    at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
    at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
    at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:285)
    at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:204)
    at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:59)
    at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:114)
    at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:90)
    at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:355)
    ... 25 more
    Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException): org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected nextCallSeq: 1 But the nextCallSeq got from client: 0; request=scanner_id: 625 number_of_rows: 5000 close_scanner: false next_call_seq: 0
    at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2144)
    at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31443)
    at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035)
    at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
    at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
    at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
    at java.lang.Thread.run(Thread.java:745)

    at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1457)
    at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1661)
    at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719)
    at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:29990)
    at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:174)
    ... 29 more

15/09/23 09:05:26 INFO scheduler.TaskSetManager: Starting task 0.1 in stage 25.0 (TID 826, 192.168.1.221, ANY, 2672 bytes)
15/09/23 09:05:26 INFO storage.BlockManagerInfo: Added broadcast_20_piece0 in memory on 192.168.1.221:47997 (size: 14.0 KB, free: 1060.2 MB)
15/09/23 09:05:26 INFO scheduler.TaskSetManager: Lost task 0.1 in stage 25.0 (TID 826) on executor 192.168.1.221: org.apache.hadoop.hbase.DoNotRetryIOException (Failed after retry of OutOfOrderScannerNextException: was there a rpc timeout?) [duplicate 1]
15/09/23 09:05:26 INFO scheduler.TaskSetManager: Starting task 0.2 in stage 25.0 (TID 827, 192.168.1.222, ANY, 2672 bytes)
15/09/23 09:05:27 INFO scheduler.TaskSetManager: Lost task 0.2 in stage 25.0 (TID 827) on executor 192.168.1.222: org.apache.hadoop.hbase.DoNotRetryIOException (Failed after retry of OutOfOrderScannerNextException: was there a rpc timeout?) [duplicate 2]
15/09/23 09:05:27 INFO scheduler.TaskSetManager: Starting task 0.3 in stage 25.0 (TID 828, 192.168.1.223, ANY, 2672 bytes)
15/09/23 09:05:27 INFO storage.BlockManagerInfo: Added broadcast_20_piece0 in memory on 192.168.1.223:51252 (size: 14.0 KB, free: 1060.2 MB)
15/09/23 09:05:28 INFO scheduler.TaskSetManager: Lost task 0.3 in stage 25.0 (TID 828) on executor 192.168.1.223: org.apache.hadoop.hbase.DoNotRetryIOException (Failed after retry of OutOfOrderScannerNextException: was there a rpc timeout?) [duplicate 3]
15/09/23 09:05:28 ERROR scheduler.TaskSetManager: Task 0 in stage 25.0 failed 4 times; aborting job
15/09/23 09:05:28 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 25.0, whose tasks have all completed, from pool
15/09/23 09:05:28 INFO scheduler.TaskSchedulerImpl: Cancelling stage 25
15/09/23 09:05:28 INFO scheduler.DAGScheduler: ResultStage 25 (main at NativeMethodAccessorImpl.java:-2) failed in 2.947 s
15/09/23 09:05:28 INFO scheduler.DAGScheduler: Job 15 failed: main at NativeMethodAccessorImpl.java:-2, took 2.958849 s
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 25.0 failed 4 times, most recent failure: Lost task 0.3 in stage 25.0 (TID 828, 192.168.1.223): org.apache.hadoop.hbase.DoNotRetryIOException: Failed after retry of OutOfOrderScannerNextException: was there a rpc timeout?
at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:403)
at org.apache.spark.sql.hbase.HBaseSQLReaderRDD$$anon$1.hasNext(HBaseSQLReaderRDD.scala:174)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:308)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$3.apply(SparkPlan.scala:143)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$3.apply(SparkPlan.scala:143)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1765)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1765)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected nextCallSeq: 1 But the nextCallSeq got from client: 0; request=scanner_id: 631 number_of_rows: 5000 close_scanner: false next_call_seq: 0
at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2144)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31443)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
at java.lang.Thread.run(Thread.java:745)

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:285)
at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:204)
at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:59)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:114)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:90)
at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:355)
... 25 more

Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException): org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected nextCallSeq: 1 But the nextCallSeq got from client: 0; request=scanner_id: 631 number_of_rows: 5000 close_scanner: false next_call_seq: 0
at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2144)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31443)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
at java.lang.Thread.run(Thread.java:745)

at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1457)
at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1661)
at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:29990)
at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:174)
... 29 more

Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1266)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1257)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1256)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1256)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1450)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1411)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)

  1. 表关联无返回值,但实际上是有匹配的数据的。

astro> SELECT kd.outNick,kd.platFromId,ko.title FROM kd_trade kd,kd_order ko WHERE kd.sysTradeId=ko.sysTradeId;

+-------+----------+-----+
|outNick|platFromId|title|
+-------+----------+-----+
+-------+----------+-----+

astro> select sysTradeId,title from kd_order where sysTradeId="826126064117768";
+---------------+--------------------+
| sysTradeId| title|
+---------------+--------------------+
|826126064117768|美国直发包邮 普丽普莱终极深海鱼油...|
+---------------+--------------------+

astro> select sysTradeId,nme from kd_trade where sysTradeId="826126064117768";
+---------------+----+
| sysTradeId|name|
+---------------+----+
|826126064117768| 杨燕|
+---------------+----+

ps:望楼主有支持该功能,通知一声

from spark-sql-on-hbase.

yzhou2001 avatar yzhou2001 commented on July 19, 2024

Could you try to turn off the use of custom filter (and/or coprocessor) to see if the problems go away?

from spark-sql-on-hbase.

yzhou2001 avatar yzhou2001 commented on July 19, 2024

I simulated, with use of custom filter and coprocessor which is the default, on the "!=" and "not null" predicates on a non-key string column,
and both work well. I do not try the join case. I suspect your environment is questionable and recommend to double check. For one, the column mapping between the HBase table and the SQL table, the primary key setup, etc., need to be correct for the system to work properly.

from spark-sql-on-hbase.

Zhangshunyu avatar Zhangshunyu commented on July 19, 2024

Just as @yzhou2001 said, I think @xuzhiliang your environment is questionable, and I simulated the tests, they both work well.Maybe it is coused by the version of hbase or spark.

from spark-sql-on-hbase.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.