camsas / musketeer Goto Github PK
View Code? Open in Web Editor NEWThe Musketeer workflow manager.
Home Page: http://camsas.org/musketeer
License: Apache License 2.0
The Musketeer workflow manager.
Home Page: http://camsas.org/musketeer
License: Apache License 2.0
This issue occurs only in @n1v0lg's fork of Musketeer, when using the non-mergeable Viff framework's operators. The equivalent job in stock Musketeer does not exhibit this issue, possibly the GroupBy
operator that replaces GroupBySEC
is mergeable.
The input is this Mindi program.
Trace:
$ build/musketeer --dry_run --run_daemon=0 --beer_query=tests/foo.rap --root_dir=/tmp/ --output_ir_dag_gv --use_frameworks="hadoop-viff" --use_heuristic=false
I0706 18:22:53.907582 1994 musketeer.cc:184] Adding Hadoop Framework
I0706 18:22:53.907663 1994 musketeer.cc:208] Adding Viff (MPC) Framework
I0706 18:22:53.907677 1994 musketeer.cc:267] Looking for new Job to schedule
digraph OpDAG {
node [shape=box]; edges_sel [label="15"];
edges_sel->sum_group_by [label="edges_sel"];
sum_group_by->sec_sum_group_by [label="sum_group_by"];
}
I0706 18:22:53.908088 1994 musketeer.cc:315] Scheduling entire DAG
I0706 18:22:53.908100 1994 scheduler_dynamic.cc:170] Determine inputs size for DAG
IsGeneratedByOp 0
I0706 18:22:53.908126 1994 utils.cc:373] edges is an input
IsGeneratedByOp 1
I0706 18:22:53.908154 1994 utils.cc:378] edges_sel is not an input
IsGeneratedByOp 1
I0706 18:22:53.908176 1994 utils.cc:378] sum_group_by is not an input
I0706 18:22:53.908200 1994 scheduler_dynamic.cc:195] Size of: edges is: 0
I0706 18:22:53.908217 1994 scheduler_dynamic.cc:240] DynamicSchedule DAG
I0706 18:22:53.908227 1994 scheduler_dynamic.cc:439] Topological order: edges_sel
I0706 18:22:53.908238 1994 scheduler_dynamic.cc:439] Topological order: sum_group_by
I0706 18:22:53.908252 1994 scheduler_dynamic.cc:439] Topological order: sec_sum_group_by
I0706 18:22:53.908264 1994 utils.cc:260] Node order after optimisation: edges_sel
I0706 18:22:53.908273 1994 utils.cc:260] Node order after optimisation: sum_group_by
I0706 18:22:53.908280 1994 utils.cc:260] Node order after optimisation: sec_sum_group_by
I0706 18:22:53.908291 1994 scheduler_dynamic.cc:342] Refresh rel size of edges_sel is 0
I0706 18:22:53.908303 1994 scheduler_dynamic.cc:342] Refresh rel size of sum_group_by is 0
I0706 18:22:53.908315 1994 scheduler_dynamic.cc:342] Refresh rel size of sec_sum_group_by is 0
SELECT
IsGeneratedByOp 0
I0706 18:22:53.930142 1994 utils.cc:373] edges is an input
IsGeneratedByOp 1
I0706 18:22:53.930179 1994 utils.cc:378] edges_sel is not an input
IsGeneratedByOp 1
I0706 18:22:53.930198 1994 utils.cc:378] sum_group_by is not an input
SELECT
size of node set to schedule: 1
###DAG###
I0706 18:22:53.930243 1994 utils.cc:100] DAG input node: edges_sel
I0706 18:22:53.930253 1994 utils.cc:114] DAG edge: edges_sel sum_group_by
I0706 18:22:53.930263 1994 utils.cc:114] DAG edge: sum_group_by sec_sum_group_by
###DAG###
AGG
IsGeneratedByOp 1
I0706 18:22:53.930299 1994 utils.cc:378] edges_sel is not an input
IsGeneratedByOp 1
I0706 18:22:53.930320 1994 utils.cc:378] sum_group_by is not an input
AGG
size of node set to schedule: 1
###DAG###
I0706 18:22:53.930359 1994 utils.cc:100] DAG input node: sum_group_by
I0706 18:22:53.930371 1994 utils.cc:114] DAG edge: sum_group_by sec_sum_group_by
###DAG###
SELECT
AGG
IsGeneratedByOp 0
I0706 18:22:53.930418 1994 utils.cc:373] edges is an input
IsGeneratedByOp 1
I0706 18:22:53.930444 1994 utils.cc:378] edges_sel is not an input
IsGeneratedByOp 1
I0706 18:22:53.930461 1994 utils.cc:378] sum_group_by is not an input
SELECT
AGG
size of node set to schedule: 2
###DAG###
I0706 18:22:53.930505 1994 utils.cc:100] DAG input node: edges_sel
I0706 18:22:53.930516 1994 utils.cc:114] DAG edge: edges_sel sum_group_by
I0706 18:22:53.930526 1994 utils.cc:114] DAG edge: sum_group_by sec_sum_group_by
###DAG###
AGG_SEC
IsGeneratedByOp 1
I0706 18:22:53.930559 1994 utils.cc:378] sum_group_by is not an input
AGG_SEC
size of node set to schedule: 1
###DAG###
I0706 18:22:53.930595 1994 utils.cc:100] DAG input node: sec_sum_group_by
###DAG###
IsGeneratedByOp 1
I0706 18:22:53.930624 1994 utils.cc:378] sum_group_by is not an input
Secure operator detected.
SELECT
AGG_SEC
SELECT
AGG_SEC
size of node set to schedule: 2
###DAG###
I0706 18:22:53.930680 1994 utils.cc:100] DAG input node: edges_sel
I0706 18:22:53.930691 1994 utils.cc:114] DAG edge: edges_sel sum_group_by
I0706 18:22:53.930702 1994 utils.cc:114] DAG edge: sum_group_by sec_sum_group_by
###DAG###
AGG
AGG_SEC
IsGeneratedByOp 1
I0706 18:22:53.930747 1994 utils.cc:378] edges_sel is not an input
IsGeneratedByOp 1
I0706 18:22:53.930768 1994 utils.cc:378] sum_group_by is not an input
AGG
AGG_SEC
size of node set to schedule: 2
###DAG###
I0706 18:22:53.930814 1994 utils.cc:100] DAG input node: sum_group_by
I0706 18:22:53.930824 1994 utils.cc:114] DAG edge: sum_group_by sec_sum_group_by
###DAG###
SELECT
AGG
AGG_SEC
IsGeneratedByOp 0
I0706 18:22:53.930876 1994 utils.cc:373] edges is an input
IsGeneratedByOp 1
I0706 18:22:53.930897 1994 utils.cc:378] edges_sel is not an input
IsGeneratedByOp 1
I0706 18:22:53.930918 1994 utils.cc:378] sum_group_by is not an input
SELECT
AGG
AGG_SEC
size of node set to schedule: 3
###DAG###
I0706 18:22:53.930970 1994 utils.cc:100] DAG input node: edges_sel
I0706 18:22:53.930981 1994 utils.cc:114] DAG edge: edges_sel sum_group_by
I0706 18:22:53.930992 1994 utils.cc:114] DAG edge: sum_group_by sec_sum_group_by
###DAG###
I0706 18:22:53.931013 1994 scheduler_dynamic.cc:558] The minimum cost of running the DAG: 21
I0706 18:22:53.931025 1994 scheduler_dynamic.cc:562] Cur cost: 21
I0706 18:22:53.931032 1994 scheduler_dynamic.cc:565] ---------- Job boundary ----------
I0706 18:22:53.931041 1994 scheduler_dynamic.cc:569] edges_sel
I0706 18:22:53.931049 1994 scheduler_dynamic.cc:569] sum_group_by
I0706 18:22:53.931061 1994 scheduler_dynamic.cc:562] Cur cost: 1
I0706 18:22:53.931068 1994 scheduler_dynamic.cc:565] ---------- Job boundary ----------
I0706 18:22:53.931077 1994 scheduler_dynamic.cc:569] sec_sum_group_by
OUTPUT OUTPUT
viff
I0706 18:22:53.941830 1994 utils.cc:100] DAG input node: sec_sum_group_by
hadoop
I0706 18:22:53.941892 1994 utils.cc:100] DAG input node: edges_sel
I0706 18:22:53.941901 1994 utils.cc:100] DAG input node: sum_group_by
I0706 18:22:53.941912 1994 utils.cc:114] DAG edge: edges_sel sum_group_by
I0706 18:22:53.941922 1994 utils.cc:114] DAG edge: sum_group_by sec_sum_group_by
SCHEDULER TIME: 0.033615
I0706 18:22:53.942006 1994 scheduler_dynamic.cc:300] Dispatching relation sec_sum_group_by in framework viff
I0706 18:22:53.942023 1994 translator_viff.cc:116] Viff generate code
I0706 18:22:53.942040 1994 translator_viff.cc:102] Job input: /tmp/sum_group_by/
FileInputFormat.addInputPath(job, new Path("/tmp/sum_group_by/"));
String[] sum_group_by = value.toString().trim().split(" ");
I0706 18:22:53.945426 1994 scheduler_dynamic.cc:170] Determine inputs size for DAG
IsGeneratedByOp 1
I0706 18:22:53.945461 1994 utils.cc:378] sum_group_by is not an input
I0706 18:22:53.945487 1994 scheduler_dynamic.cc:761] Size of output: sec_sum_group_by is: 0
I0706 18:22:53.945516 1994 scheduler_dynamic.cc:283] Running operators 1 1 on viff
I0706 18:22:53.945529 1994 scheduler_dynamic.cc:289] Number of operators scheduled: 1
I0706 18:22:53.945544 1994 scheduler_dynamic.cc:342] Refresh rel size of sum_group_by is 0
I0706 18:22:53.945554 1994 scheduler_dynamic.cc:342] Refresh rel size of sec_sum_group_by is 0
AGG
IsGeneratedByOp 1
I0706 18:22:53.961684 1994 utils.cc:378] edges_sel is not an input
IsGeneratedByOp 1
I0706 18:22:53.961711 1994 utils.cc:378] sum_group_by is not an input
AGG
size of node set to schedule: 1
###DAG###
I0706 18:22:53.961752 1994 utils.cc:100] DAG input node: sum_group_by
I0706 18:22:53.961765 1994 utils.cc:114] DAG edge: sum_group_by sec_sum_group_by
###DAG###
AGG_SEC
IsGeneratedByOp 1
I0706 18:22:53.961807 1994 utils.cc:378] sum_group_by is not an input
AGG_SEC
size of node set to schedule: 1
###DAG###
I0706 18:22:53.961844 1994 utils.cc:100] DAG input node: sec_sum_group_by
###DAG###
IsGeneratedByOp 1
I0706 18:22:53.961874 1994 utils.cc:378] sum_group_by is not an input
Secure operator detected.
AGG
AGG_SEC
IsGeneratedByOp 1
I0706 18:22:53.961918 1994 utils.cc:378] edges_sel is not an input
IsGeneratedByOp 1
I0706 18:22:53.961941 1994 utils.cc:378] sum_group_by is not an input
AGG
AGG_SEC
size of node set to schedule: 2
###DAG###
I0706 18:22:53.961987 1994 utils.cc:100] DAG input node: sum_group_by
I0706 18:22:53.961997 1994 utils.cc:114] DAG edge: sum_group_by sec_sum_group_by
###DAG###
I0706 18:22:53.962016 1994 scheduler_dynamic.cc:558] The minimum cost of running the DAG: 21
I0706 18:22:53.962025 1994 scheduler_dynamic.cc:562] Cur cost: 21
I0706 18:22:53.962033 1994 scheduler_dynamic.cc:565] ---------- Job boundary ----------
I0706 18:22:53.962043 1994 scheduler_dynamic.cc:569] sum_group_by
I0706 18:22:53.962052 1994 scheduler_dynamic.cc:562] Cur cost: 1
I0706 18:22:53.962059 1994 scheduler_dynamic.cc:565] ---------- Job boundary ----------
I0706 18:22:53.962067 1994 scheduler_dynamic.cc:569] sec_sum_group_by
OUTPUT OUTPUT
viff
I0706 18:22:53.970327 1994 utils.cc:100] DAG input node: sec_sum_group_by
hadoop
I0706 18:22:53.970353 1994 utils.cc:100] DAG input node: sum_group_by
I0706 18:22:53.970367 1994 utils.cc:114] DAG edge: sum_group_by sec_sum_group_by
SCHEDULER TIME: 0.024822
I0706 18:22:53.970415 1994 scheduler_dynamic.cc:300] Dispatching relation sec_sum_group_by in framework viff
I0706 18:22:53.970428 1994 translator_viff.cc:116] Viff generate code
I0706 18:22:53.970444 1994 translator_viff.cc:102] Job input: /tmp/sum_group_by/
FileInputFormat.addInputPath(job, new Path("/tmp/sum_group_by/"));
String[] sum_group_by = value.toString().trim().split(" ");
I0706 18:22:53.973784 1994 scheduler_dynamic.cc:170] Determine inputs size for DAG
IsGeneratedByOp 1
I0706 18:22:53.973822 1994 utils.cc:378] sum_group_by is not an input
I0706 18:22:53.973845 1994 scheduler_dynamic.cc:761] Size of output: sec_sum_group_by is: 0
I0706 18:22:53.973867 1994 scheduler_dynamic.cc:283] Running operators 2 2 on viff
I0706 18:22:53.973878 1994 scheduler_dynamic.cc:289] Number of operators scheduled: 1
I0706 18:22:53.973892 1994 scheduler_dynamic.cc:342] Refresh rel size of sec_sum_group_by is 0
AGG_SEC
IsGeneratedByOp 1
I0706 18:22:53.988021 1994 utils.cc:378] sum_group_by is not an input
AGG_SEC
size of node set to schedule: 1
###DAG###
I0706 18:22:53.988067 1994 utils.cc:100] DAG input node: sec_sum_group_by
###DAG###
IsGeneratedByOp 1
I0706 18:22:53.988098 1994 utils.cc:378] sum_group_by is not an input
Secure operator detected.
I0706 18:22:53.988116 1994 scheduler_dynamic.cc:558] The minimum cost of running the DAG: 1
I0706 18:22:53.988126 1994 scheduler_dynamic.cc:562] Cur cost: 1
I0706 18:22:53.988137 1994 scheduler_dynamic.cc:565] ---------- Job boundary ----------
I0706 18:22:53.988147 1994 scheduler_dynamic.cc:569] sec_sum_group_by
OUTPUT OUTPUT
viff
I0706 18:22:53.996474 1994 utils.cc:100] DAG input node: sec_sum_group_by
SCHEDULER TIME: 0.022593
I0706 18:22:53.996526 1994 scheduler_dynamic.cc:300] Dispatching relation sec_sum_group_by in framework viff
I0706 18:22:53.996539 1994 translator_viff.cc:116] Viff generate code
I0706 18:22:53.996556 1994 translator_viff.cc:102] Job input: /tmp/sum_group_by/
FileInputFormat.addInputPath(job, new Path("/tmp/sum_group_by/"));
String[] sum_group_by = value.toString().trim().split(" ");
I0706 18:22:53.999891 1994 scheduler_dynamic.cc:170] Determine inputs size for DAG
IsGeneratedByOp 1
I0706 18:22:53.999929 1994 utils.cc:378] sum_group_by is not an input
I0706 18:22:53.999953 1994 scheduler_dynamic.cc:761] Size of output: sec_sum_group_by is: 0
I0706 18:22:53.999974 1994 scheduler_dynamic.cc:283] Running operators 3 3 on viff
I0706 18:22:53.999984 1994 scheduler_dynamic.cc:289] Number of operators scheduled: 1
I0706 18:22:54.000041 1994 musketeer.cc:339] Finished scheduling job
Note that sec_sum_group_by
gets dispatched before sum_group_by
, which is the wrong way around since sec_sum_group_by
depends on the output of sum_group_by
.
Trying to "make dependecies" I get this message,
include/Makefile.common:39: SUFFIX not set, will default to root of source tree!
/home/hadoop/projects/musketeer/musketeer/scripts/setup.sh
+---------------------------------------------------------------------
| FETCHING & INSTALLING EXTERNAL DEPENDENCIES
+---------------------------------------------------------------------
Detected Ubuntu...
--> OS COMPATIBILITY CHECK (Ubuntu 15.04)
Ubuntu 15.04 is compatible. [ OK ]
Checking if necessary packages are installed...
--> Ubuntu PACKAGE CHECK
The following packages are required to run Musketeer, but are not currently installed:
libprotobuf-c0-dev
Please install them using the following commmand:
$ sudo apt-get install libprotobuf-c0-dev
Makefile:31: recipe for target 'ext/.ext-ok' failed
make: *** [ext/.ext-ok] Error 1
Then as root, I have installed protobuf-c/protobuf-c from
https://github.com/protobuf-c/protobuf-c.
I try again to "make dependencies" to not avail. Finally, when I try,
apt-get install libprotobuf-c0-dev
I get this message,
Reading package lists... Done
Building dependency tree
Reading state information... Done
Note, selecting 'libprotobuf-c-dev' instead of 'libprotobuf-c0-dev'
libprotobuf-c-dev is already the newest version.
0 upgraded, 0 newly installed, 0 to remove and 4 not upgraded.
What should I do?
I have Ubuntu 15.04 as OS.
Thanks.
We currently support (at least) two flags that control how workflows are scheduled:
--use_dynamic_scheduler
, which is a bool
indicating whether the scheduler invokes DynamicScheduleDAG
or ScheduleDAG
.--use_heuristic
, which is a bool
indicating whether to use exhaustive search or the dynamic programming algorithm to find the system assignments (this manifests itself by calling SchedulerDynamic::ComputeOptimal
or SchedulerDynamic::ComputeHeuristic
).IIRC, the difference between a dynamic and a non-dynamic scheduling call is whether Musketeer re-assesses its decisions after each job completion (@ICGog -- correct?).
The naming of things is a bit misleading here, for two reasons:
SchedulerDynamic
, but the --use_dynamic_scheduler
flag controls which method in this class gets called, rather than whether a SchedulerDynamic
or something else is used.--use_dynamic_scheduler
refers to the heuristic, which it does not.Proposal: let's rename the flags, so that they're more intuitive.
--use_dynamic_scheduler
becomes --continuously_reschedule
--use_heuristic
becomes a --scheduling_algorithm
flag that can take three values: "automatic" (use heuristic if >18 operators, optimal otherwise), "heuristic" and "optimal".This came up while investigating bugs reported by @n1v0lg.
Could you provide some examples and documentation on how to use your system? Can Musketeer be used in an environment already having installed for example Spark or Hadoop? How can this be done? If I am not wrong, the Makefile installs the various engines automatically.
I'm running the following BEER query, with only Hadoop enabled (--use_frameworks='hadoop'
):
CREATE RELATION edges WITH COLUMNS (INTEGER, INTEGER),
SELECT [edges_0, edges_1] FROM (edges) WHERE [(edges_1 < 5)] AS edges_sel,
AGG [edges_sel_1, +] FROM (edges_sel) GROUP BY [edges_sel_0] AS sum_group_by,
AGG [sum_group_by_1, +] FROM (sum_group_by) GROUP BY [sum_group_by_0] AS sec_sum_group_by
... and get the following output from the heuristic scheduler (--use_heuristic=true
):
$ build/musketeer --dry_run --run_daemon=0 --beer_query=tests/foo.rap --root_dir=/tmp/ --output_ir_dag_gv --use_frameworks="hadoop"
I0706 16:21:06.385500 28560 musketeer.cc:182] Adding Hadoop Framework
I0706 16:21:06.385642 28560 musketeer.cc:261] Looking for new Job to schedule
[...]
digraph OpDAG {
node [shape=box]; edges_sel [label="14"];
edges_sel->sum_group_by [label="edges_sel"];
sum_group_by->sec_sum_group_by [label="sum_group_by"];
}
I0706 16:21:06.386896 28560 musketeer.cc:305] Scheduling entire DAG
I0706 16:21:06.386917 28560 scheduler_dynamic.cc:170] Determine inputs size for DAG
I0706 16:21:06.386996 28560 scheduler_dynamic.cc:195] Size of: edges is: 0
I0706 16:21:06.387027 28560 scheduler_dynamic.cc:240] DynamicSchedule DAG
I0706 16:21:06.387049 28560 scheduler_dynamic.cc:439] Topological order: edges_sel
I0706 16:21:06.387071 28560 scheduler_dynamic.cc:439] Topological order: sum_group_by
I0706 16:21:06.387092 28560 scheduler_dynamic.cc:439] Topological order: sec_sum_group_by
I0706 16:21:06.387111 28560 utils.cc:256] Node order after optimisation: edges_sel
I0706 16:21:06.387125 28560 utils.cc:256] Node order after optimisation: sum_group_by
I0706 16:21:06.387138 28560 utils.cc:256] Node order after optimisation: sec_sum_group_by
I0706 16:21:06.387158 28560 scheduler_dynamic.cc:342] Refresh rel size of edges_sel is 0
I0706 16:21:06.387179 28560 scheduler_dynamic.cc:342] Refresh rel size of sum_group_by is 0
I0706 16:21:06.387199 28560 scheduler_dynamic.cc:342] Refresh rel size of sec_sum_group_by is 0
I0706 16:21:06.387222 28560 scheduler_dynamic.cc:602] ComputeHeuristic
I0706 16:21:06.387487 28560 scheduler_dynamic.cc:648] 1 1 20 0
I0706 16:21:06.387503 28560 scheduler_dynamic.cc:648] 2 1 20 0
I0706 16:21:06.387511 28560 scheduler_dynamic.cc:648] 2 2 40 1
I0706 16:21:06.387517 28560 scheduler_dynamic.cc:648] 3 1 4294967295 0
I0706 16:21:06.387524 28560 scheduler_dynamic.cc:648] 3 2 40 2
I0706 16:21:06.387531 28560 scheduler_dynamic.cc:648] 3 3 60 2
I0706 16:21:06.387537 28560 scheduler_dynamic.cc:671] Schedulable operators: [1, 3]
SCHEDULER TIME: 0.000333
I0706 16:21:06.387599 28560 scheduler_dynamic.cc:300] Dispatching relation sum_group_by in framework hadoop
[...]
I0706 16:21:07.162153 28560 scheduler_dynamic.cc:170] Determine inputs size for DAG
I0706 16:21:07.162215 28560 scheduler_dynamic.cc:195] Size of: edges is: 0
I0706 16:21:07.162271 28560 scheduler_dynamic.cc:759] Size of output: sum_group_by is: 0
I0706 16:21:07.162322 28560 scheduler_dynamic.cc:283] Running operators 1 2 on hadoop
I0706 16:21:07.162343 28560 scheduler_dynamic.cc:289] Number of operators scheduled: 2
I0706 16:21:07.162372 28560 scheduler_dynamic.cc:342] Refresh rel size of sec_sum_group_by is 0
I0706 16:21:07.162390 28560 scheduler_dynamic.cc:602] ComputeHeuristic
I0706 16:21:07.162441 28560 scheduler_dynamic.cc:648] 1 1 20 0
I0706 16:21:07.162463 28560 scheduler_dynamic.cc:671] Schedulable operators: [1, 1]
SCHEDULER TIME: 9.2e-05
I0706 16:21:07.162528 28560 scheduler_dynamic.cc:300] Dispatching relation sec_sum_group_by in framework spark
Segmentation fault
... which is strange, since Spark should not even be an option.
Backtrace:
#0 0x0000000000879de0 in ?? ()
#1 0x0000000000594057 in musketeer::scheduling::SchedulerDynamic::DispatchWithHistory (this=0x837570, bind=...,
nodes=std::vector of length 1, capacity 1 = {...}, relation="sec_sum_group_by")
at musketeer/src/scheduling/scheduler_dynamic.cc:305
#2 0x00000000005948f7 in musketeer::scheduling::SchedulerDynamic::DynamicScheduleDAG (this=0x837570,
dag=std::vector of length 1, capacity 1 = {...})
at musketeer/src/scheduling/scheduler_dynamic.cc:281
#3 0x00000000004174c3 in main (argc=7, argv=0x7fffffffe3f8) at musketeer/src/musketeer.cc:307
The exhaustive scheduler gets it right:
$ build/musketeer --dry_run --run_daemon=0 --beer_query=tests/foo.rap --root_dir=/tmp/ --output_ir_dag_gv --use_frameworks="hadoop" --use_heuristic=false
I0706 16:27:10.152679 28792 musketeer.cc:182] Adding Hadoop Framework
I0706 16:27:10.152823 28792 musketeer.cc:261] Looking for new Job to schedule
[...]
digraph OpDAG {
node [shape=box]; edges_sel [label="14"];
edges_sel->sum_group_by [label="edges_sel"];
sum_group_by->sec_sum_group_by [label="sum_group_by"];
}
I0706 16:27:10.154114 28792 musketeer.cc:305] Scheduling entire DAG
I0706 16:27:10.154135 28792 scheduler_dynamic.cc:170] Determine inputs size for DAG
I0706 16:27:10.154217 28792 scheduler_dynamic.cc:195] Size of: edges is: 0
I0706 16:27:10.154248 28792 scheduler_dynamic.cc:240] DynamicSchedule DAG
I0706 16:27:10.154271 28792 scheduler_dynamic.cc:439] Topological order: edges_sel
I0706 16:27:10.154294 28792 scheduler_dynamic.cc:439] Topological order: sum_group_by
I0706 16:27:10.154316 28792 scheduler_dynamic.cc:439] Topological order: sec_sum_group_by
I0706 16:27:10.154336 28792 utils.cc:256] Node order after optimisation: edges_sel
I0706 16:27:10.154351 28792 utils.cc:256] Node order after optimisation: sum_group_by
I0706 16:27:10.154366 28792 utils.cc:256] Node order after optimisation: sec_sum_group_by
I0706 16:27:10.154387 28792 scheduler_dynamic.cc:342] Refresh rel size of edges_sel is 0
I0706 16:27:10.154408 28792 scheduler_dynamic.cc:342] Refresh rel size of sum_group_by is 0
I0706 16:27:10.154429 28792 scheduler_dynamic.cc:342] Refresh rel size of sec_sum_group_by is 0
I0706 16:27:10.177091 28792 scheduler_dynamic.cc:558] The minimum cost of running the DAG: 40
I0706 16:27:10.177129 28792 scheduler_dynamic.cc:562] Cur cost: 40
I0706 16:27:10.177139 28792 scheduler_dynamic.cc:570] ---------- Job boundary ----------
I0706 16:27:10.177146 28792 scheduler_dynamic.cc:574] sec_sum_group_by
I0706 16:27:10.177157 28792 scheduler_dynamic.cc:562] Cur cost: 20
I0706 16:27:10.177165 28792 scheduler_dynamic.cc:570] ---------- Job boundary ----------
I0706 16:27:10.177172 28792 scheduler_dynamic.cc:574] edges_sel
I0706 16:27:10.177181 28792 scheduler_dynamic.cc:574] sum_group_by
SCHEDULER TIME: 0.029416
I0706 16:27:10.188161 28792 scheduler_dynamic.cc:300] Dispatching relation sum_group_by in framework hadoop
[...]
I0706 16:27:10.936656 28792 scheduler_dynamic.cc:170] Determine inputs size for DAG
I0706 16:27:10.936723 28792 scheduler_dynamic.cc:195] Size of: edges is: 0
I0706 16:27:10.936769 28792 scheduler_dynamic.cc:759] Size of output: sum_group_by is: 0
I0706 16:27:10.936813 28792 scheduler_dynamic.cc:283] Running operators 1 2 on hadoop
I0706 16:27:10.936827 28792 scheduler_dynamic.cc:289] Number of operators scheduled: 2
I0706 16:27:10.936861 28792 scheduler_dynamic.cc:342] Refresh rel size of sec_sum_group_by is 0
I0706 16:27:10.957470 28792 scheduler_dynamic.cc:558] The minimum cost of running the DAG: 20
I0706 16:27:10.957501 28792 scheduler_dynamic.cc:562] Cur cost: 20
I0706 16:27:10.957509 28792 scheduler_dynamic.cc:570] ---------- Job boundary ----------
I0706 16:27:10.957517 28792 scheduler_dynamic.cc:574] sec_sum_group_by
SCHEDULER TIME: 0.029033
I0706 16:27:10.965978 28792 scheduler_dynamic.cc:300] Dispatching relation sec_sum_group_by in framework hadoop
[...]
I0706 16:27:11.746538 28792 scheduler_dynamic.cc:170] Determine inputs size for DAG
I0706 16:27:11.746582 28792 scheduler_dynamic.cc:759] Size of output: sec_sum_group_by is: 0
I0706 16:27:11.746605 28792 scheduler_dynamic.cc:283] Running operators 3 3 on hadoop
I0706 16:27:11.746618 28792 scheduler_dynamic.cc:289] Number of operators scheduled: 1
I0706 16:27:11.746676 28792 musketeer.cc:329] Finished scheduling job
Thanks @n1v0lg for reporting this issue.
When using the heuristic scheduler, we segfault in the scheduler code when there is no suitable back-end for an operator:
I0615 23:11:03.220222 21852 scheduler_dynamic.cc:558] The minimum cost of running the DAG: 100001
I0615 23:11:03.220257 21852 scheduler_dynamic.cc:562] Cur cost: 100001
Program received signal SIGSEGV, Segmentation fault.
0x000000000059c494 in musketeer::scheduling::SchedulerDynamic::ComputeOptimal (this=0x83c800, serial_dag=std::vector of length 3, capacity 4 = {...})
at /home/malte/Projects/musketeer/src/scheduling/scheduler_dynamic.cc:563
563 uint32_t prev_jobs_exec = parent[cur_cost][cur_jobs_exec];
To reproduce, craft a minimal DAG that contains an operator that cannot be expressed in any of the available execution engines (i.e., for which the framework's scoring method returns FLAGS_max_scheduler_cost
), and try to schedule it using the heuristic scheduler.
The solution is two-fold:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.