Comments (25)
@son2408 : according to the query profile you supplied in this ticket, the main driver of the query's execution time is the EnumerateViewNode and not the COLLECT. The EnumerateViewNode takes 12 out of the 18 seconds total runtime.
So this should be the key part to optimize.
I tried running the query locally with the provided dataset, and I get a similar execution plan, with higher execution times (potentially due to my inferior hardware):
Execution plan:
Id NodeType Calls Items Filtered Runtime [s] Comment
1 SingletonNode 1 1 0 0.00001 * ROOT
2 EnumerateViewNode 5000 5000000 0 34.32340 - FOR i IN search_hosotiepnhan_2 SEARCH (i.`tinhId` > 0) /* view query */
3 LimitNode 5000 5000000 0 0.56718 - LIMIT 0, 5000000
4 CalculationNode 5000 5000000 0 2.18918 - LET #9 = i.`tinhId` /* attribute expression */
6 CalculationNode 5000 5000000 0 4.63421 - LET #13 = { "tinhId" : i.`tinhId`, "daTra" : i.`daTra` } /* simple expression */
7 CollectNode 1 37 0 1.83643 - COLLECT tinhId = #9 AGGREGATE count = LENGTH() INTO group = #13 /* hash */
17 SortNode 1 37 0 0.04750 - SORT tinhId ASC /* sorting strategy: standard */
18 SubqueryStartNode 37 74 0 0.05996 - LET countDaTra = ( /* subquery begin */
9 EnumerateListNode 5019 5000037 0 1.03633 - FOR i IN group /* list iteration */
10 CalculationNode 5019 5000037 0 1.45216 - LET #15 = (i.`daTra` == 1) /* simple expression */
11 FilterNode 37 4228828 771209 0.39480 - FILTER #15
12 CollectNode 1 74 0 0.00047 - COLLECT AGGREGATE countDaTra = LENGTH() /* count */
19 SubqueryEndNode 1 37 0 0.00015 - RETURN countDaTra ) /* subquery end */
15 CalculationNode 1 37 0 0.00004 - LET #17 = { "val" : tinhId, "count" : count, "countDaTra" : countDaTra } /* simple expression */
16 ReturnNode 1 37 0 0.00002 - RETURN #17
Indexes used:
none
Optimization rules applied:
Id RuleName
1 move-calculations-up
2 move-calculations-up-2
3 handle-arangosearch-views
4 remove-unnecessary-calculations-2
5 move-calculations-down
6 splice-subqueries
Query Statistics:
Writes Exec Writes Ign Scan Full Scan Index Cache Hits/Misses Filtered Peak Mem [b] Exec Time [s]
0 0 0 5000000 0 / 0 771209 329908224 46.54266
It seems that the problem is that the query produces 5M matches in the EnumerateViewNode, but in order to build the resulting documents for them, it will need to do 5M additional document lookups.
This could be avoided by adding the required attributes (tinhId
and daTra
for this query) to the view definition as "stored values".
I just tried this, and created another view (search_hosotiepnhan_3
) with that definition:
db._createView("search_hosotiepnhan_3", "arangosearch", {"storedValues":[["tinhId"],["daTra"]], links: {vertex_hosotiepnhan_2:{analyzers:["identity"], includeAllFields: true, trackListPositions: false}}});
Using that adjusted view in the query, while keeping everything else the same, dramatically speeds up the query:
Execution plan:
Id NodeType Calls Items Filtered Runtime [s] Comment
1 SingletonNode 1 1 0 0.00001 * ROOT
2 EnumerateViewNode 5000 5000000 0 1.19117 - FOR i IN search_hosotiepnhan_3 SEARCH (i.`tinhId` > 0) LET #21 = i.`daTra` LET #9 = i.`tinhId` /* view query without materialization */
3 LimitNode 5000 5000000 0 0.43068 - LIMIT 0, 5000000
6 CalculationNode 5000 5000000 0 1.95566 - LET #13 = { "tinhId" : #9, "daTra" : #21 } /* simple expression */
7 CollectNode 1 37 0 1.23645 - COLLECT tinhId = #9 AGGREGATE count = LENGTH() INTO group = #13 /* hash */
17 SortNode 1 37 0 0.04437 - SORT tinhId ASC /* sorting strategy: standard */
18 SubqueryStartNode 37 74 0 0.05579 - LET countDaTra = ( /* subquery begin */
9 EnumerateListNode 5018 5000037 0 0.97822 - FOR i IN group /* list iteration */
10 CalculationNode 5018 5000037 0 1.37251 - LET #15 = (i.`daTra` == 1) /* simple expression */
11 FilterNode 37 4213311 786726 0.37546 - FILTER #15
12 CollectNode 1 74 0 0.00039 - COLLECT AGGREGATE countDaTra = LENGTH() /* count */
19 SubqueryEndNode 1 37 0 0.00014 - RETURN countDaTra ) /* subquery end */
15 CalculationNode 1 37 0 0.00003 - LET #17 = { "val" : tinhId, "count" : count, "countDaTra" : countDaTra } /* simple expression */
16 ReturnNode 1 37 0 0.00002 - RETURN #17
Indexes used:
none
Optimization rules applied:
Id RuleName
1 move-calculations-up
2 move-calculations-up-2
3 handle-arangosearch-views
4 remove-unnecessary-calculations-2
5 move-calculations-down
6 splice-subqueries
Query Statistics:
Writes Exec Writes Ign Scan Full Scan Index Cache Hits/Misses Filtered Peak Mem [b] Exec Time [s]
0 0 0 5000000 0 / 0 786726 329973760 7.64167
I also tried running the query (with the adjusted view) on our current devel, which will become ArangoDB 3.12 at some point in Q1 next year.
The query performance there will be even better:
Execution plan:
Id NodeType Calls Par Items Filtered Runtime [s] Comment
1 SingletonNode 1 - 1 0 0.00001 * ROOT
2 EnumerateViewNode 5000 - 5000000 0 1.04602 - FOR i IN search_hosotiepnhan_3 SEARCH (i.`tinhId` > 0) LET #15 = i.`daTra` LET #9 = i.`tinhId` /* view query without materialization */
3 LimitNode 5000 - 5000000 0 0.07833 - LIMIT 0, 5000000
6 CalculationNode 5000 4999 5000000 0 1.80908 - LET #11 = { "tinhId" : #9, "daTra" : #15 } /* simple expression */
7 CollectNode 1 0 37 0 1.16470 - COLLECT tinhId = #9 AGGREGATE count = LENGTH() INTO group = #11 /* hash */
17 SortNode 1 0 37 0 0.02060 - SORT tinhId ASC /* sorting strategy: standard */
18 SubqueryStartNode 37 - 74 0 0.02330 - LET countDaTra = ( /* subquery begin */
9 EnumerateListNode 5018 - 5000037 0 0.77800 - FOR i IN group /* list iteration */
10 CalculationNode 5018 - 5000037 0 1.07768 - LET #12 = (i.`daTra` == 1) /* simple expression */
11 FilterNode 37 - 4213311 786726 0.31608 - FILTER #12
12 CollectNode 1 - 74 0 0.00023 - COLLECT AGGREGATE countDaTra = LENGTH() /* count */
19 SubqueryEndNode 1 - 37 0 0.00012 - RETURN countDaTra ) /* subquery end */
15 CalculationNode 1 0 37 0 0.00003 - LET #13 = { "val" : tinhId, "count" : count, "countDaTra" : countDaTra } /* simple expression */
16 ReturnNode 1 - 37 0 0.00001 - RETURN #13
Indexes used:
none
Optimization rules applied:
Id Rule Name Id Rule Name Id Rule Name
1 move-calculations-up 4 remove-unnecessary-calculations-2 7 splice-subqueries
2 move-calculations-up-2 5 move-calculations-down
3 handle-arangosearch-views 6 async-prefetch
Query Statistics:
Writes Exec Writes Ign Doc. Lookups Scan Full Scan Index Cache Hits/Misses Filtered Peak Mem [b] Exec Time [s]
0 0 0 0 5000000 0 / 0 786726 330006528 4.10029
Query Profile:
Query Stage Duration [s] Query Stage Duration [s] Query Stage Duration [s]
initializing 0.00001 loading collections 0.00002 instantiating executors 0.00007
parsing 0.00009 instantiating plan 0.00004 executing 4.09961
optimizing ast 0.00001 optimizing plan 0.00042 finalizing 0.00004
from arangodb.
One idea is to loop over the results only once instead of creating an arbitrary number of subqueries:
FOR i IN search_hosotiepnhan_3
SEARCH i.tinhId > 0
limit 0,5000000
COLLECT tinhId = i.tinhId
AGGREGATE count = length(1)
INTO group = {
daTra: i.daTra,
ngayTra: i.ngayTra_ts,
choBoSung: i.choBoSung,
thoiDiemChoBoSung: i.thoiDiemChoBoSung_ts,
ngayXuLy: i.ngayXuLy_ts,
ngayTiepNhan:i.ngayTiepNhan_s
}
LET counters = (
FOR i IN group
COLLECT AGGREGATE
countDaTra = SUM(i.daTra == 1 ? 1 : 0),
countNgayTra = SUM(i.ngayTra > 0 ? 1 : 0),
countChoBoSung = SUM(i.choBoSung == 1 ? 1 : 0),
countThoiDiemChoBoSung = SUM(i.thoiDiemChoBoSung > 0 ? 1 : 0),
countNgayXuLy = SUM(i.ngayXuLy > 0 ? 1 : 0),
countNgayTiepNhan = SUM(i.ngayTiepNhan > 0 ? 1 : 0)
RETURN {countDaTra, countNgayTra, countChoBoSung, countThoiDiemChoBoSung, countNgayXuLy, countNgayTiepNhan}
)
RETURN {
val: tinhId,
count: count,
countDaTra: counters[0].countDaTra,
countNgayTra: counters[0].countNgayTra,
countChoBoSung: counters[0].countChoBoSung,
countThoiDiemChoBoSung: counters[0].countThoiDiemChoBoSung,
countNgayXuLy: counters[0].countNgayXuLy,
countNgayTiepNhan: counters[0].countNgayTiepNhan
}
I can't estimate how this will impact performance (positively or negatively), but at least it is something you could try if you are worried about the amount of subqueries.
from arangodb.
Ok, I think there isn't anything left that I can do now.
The only suggestion I have is to try the upcoming version of ArangoDB (3.12), which is currently in development. I can't say if that will help or not, but maybe it is worth a try.
Linux nightly builds can be found here: https://download.arangodb.com/nightly/devel/Linux/x86_64/index.html
Please do not run a nightly build on the original data, but on a copy of your data!
from arangodb.
Hi @son2408 , Unfortunately, I am not aware of any good way to speed up COLLECT ... INTO further.
from arangodb.
@jsteemann yes, with your view definition the query speeds up dramatically. If my query have subqueries count then runtime query too slowly. It takes 16.40239s for 5M documents. Let help me optimize it.
Query
FOR i IN search_hosotiepnhan_3
SEARCH i.tinhId > 0
limit 0,5000000
COLLECT tinhId = i.tinhId
AGGREGATE count = length(1)
INTO group = {daTra: i.daTra, ngayTra: i.ngayTra_ts, choBoSung:i.choBoSung, thoiDiemChoBoSung:i.thoiDiemChoBoSung_ts, ngayXuLy: i.ngayXuLy_ts, ngayTiepNhan:i.ngayTiepNhan_ts}
LET countDaTra = (FOR i IN group
FILTER i.daTra == 1
COLLECT WITH COUNT INTO countDaTra
RETURN countDaTra)
LET countNgayTra = (FOR i IN group
FILTER i.ngayTra > 0
COLLECT WITH COUNT INTO countNgayTra
RETURN countNgayTra)
LET countChoBoSung = (FOR i IN group
FILTER i.choBoSung == 1
COLLECT WITH COUNT INTO countChoBoSung
RETURN countChoBoSung)
LET countThoiDiemChoBoSung = (FOR i IN group
FILTER i.thoiDiemChoBoSung > 0
COLLECT WITH COUNT INTO countThoiDiemChoBoSung
RETURN countThoiDiemChoBoSung)
LET countNgayXuLy = (FOR i IN group
FILTER i.ngayXuLy > 0
COLLECT WITH COUNT INTO countNgayXuLy
RETURN countNgayXuLy)
LET countNgayTiepNhan = (FOR i IN group
FILTER i.ngayTiepNhan > 0
COLLECT WITH COUNT INTO countNgayTiepNhan
RETURN countNgayTiepNhan)
RETURN {val: tinhId, count : count, countDaTra: countDaTra, countNgayTra: countNgayTra, countChoBoSung: countChoBoSung,countThoiDiemChoBoSung: countThoiDiemChoBoSung,countNgayXuLy:countNgayXuLy,countNgayTiepNhan:countNgayTiepNhan}
Query profile
Query String (1837 chars, cacheable: false):
FOR i IN search_hosotiepnhan_3
SEARCH i.tinhId > 0
limit 0,5000000
COLLECT tinhId = i.tinhId
AGGREGATE count = length(1)
INTO group = {daTra: i.daTra, ngayTra: i.ngayTra_ts, choBoSung:i.choBoSung,
thoiDiemChoBoSung:i.thoiDiemChoBoSung_ts, ngayXuLy: i.ngayXuLy_ts, ngayTiepNhan:i.ngayTiepNhan_s}
LET countDaTra = (FOR i IN group
FILTER i.daTra == 1
COLLECT WITH COUNT INTO countDaTra
RETURN countDaTra)
LET countNgayTra = (FOR i IN group
FILTER i.ngayTra > 0
COLLECT WITH COUNT INTO countNgayTra
RETURN countNgayTra)
LET countChoBoSung = (FOR i IN group
FILTER i.choBoSung == 1
COLLECT WITH COUNT INTO countChoBoSung
RETURN countChoBoSung)
LET countThoiDiemChoBoSung = (FOR i IN group
...
Execution plan:
Id NodeType Calls Items Filtered Runtime [s] Comment
1 SingletonNode 1 1 0 0.00000 * ROOT
2 EnumerateViewNode 5000 5000000 0 0.99799 - FOR i IN search_hosotiepnhan_3 SEARCH (i.`tinhId` > 0) LET #68 = i.`ngayTra_ts` LET #70 = i.`choBoSung` LET #34 = i.`tinhId` LET #74 = i.`ngayXuLy_ts` LET #66 = i.`daTra` LET #72 = i.`thoiDiemChoBoSung_ts` LET #76 = i.`ngayTiepNhan_s` /* view query without materialization */
3 LimitNode 5000 5000000 0 0.07333 - LIMIT 0, 5000000
6 CalculationNode 5000 5000000 0 2.80703 - LET #38 = { "daTra" : #66, "ngayTra" : #68, "choBoSung" : #70, "thoiDiemChoBoSung" : #72, "ngayXuLy" : #74, "ngayTiepNhan" : #76 } /* simple expression */
7 CollectNode 1 37 0 0.96351 - COLLECT tinhId = #34 AGGREGATE count = LENGTH() INTO group = #38 /* hash */
52 SortNode 1 37 0 0.07302 - SORT tinhId ASC /* sorting strategy: standard */
63 SubqueryStartNode 37 74 0 0.16044 - LET countNgayTiepNhan = ( /* subquery begin */
44 EnumerateListNode 5020 5000037 0 0.75889 - FOR i IN group /* list iteration */
45 CalculationNode 5020 5000037 0 0.76964 - LET #50 = (i.`ngayTiepNhan` > 0) /* simple expression */
46 FilterNode 37 37 5000000 0.14499 - FILTER #50
47 CollectNode 1 74 0 0.00011 - COLLECT AGGREGATE countNgayTiepNhan = LENGTH() /* count */
64 SubqueryEndNode 1 37 0 0.00002 - RETURN countNgayTiepNhan ) /* subquery end */
61 SubqueryStartNode 37 74 0 0.16176 - LET countNgayXuLy = ( /* subquery begin */
37 EnumerateListNode 5020 5000037 0 0.75396 - FOR i IN group /* list iteration */
38 CalculationNode 5020 5000037 0 0.87656 - LET #48 = (i.`ngayXuLy` > 0) /* simple expression */
39 FilterNode 37 5000037 0 0.14957 - FILTER #48
40 CollectNode 1 74 0 0.00012 - COLLECT AGGREGATE countNgayXuLy = LENGTH() /* count */
62 SubqueryEndNode 1 37 0 0.00002 - RETURN countNgayXuLy ) /* subquery end */
59 SubqueryStartNode 37 74 0 0.15743 - LET countThoiDiemChoBoSung = ( /* subquery begin */
30 EnumerateListNode 5020 5000037 0 0.75541 - FOR i IN group /* list iteration */
31 CalculationNode 5020 5000037 0 0.86033 - LET #46 = (i.`thoiDiemChoBoSung` > 0) /* simple expression */
32 FilterNode 37 1123639 3876398 0.16305 - FILTER #46
33 CollectNode 1 74 0 0.00012 - COLLECT AGGREGATE countThoiDiemChoBoSung = LENGTH() /* count */
60 SubqueryEndNode 1 37 0 0.00002 - RETURN countThoiDiemChoBoSung ) /* subquery end */
57 SubqueryStartNode 37 74 0 0.16859 - LET countChoBoSung = ( /* subquery begin */
23 EnumerateListNode 5020 5000037 0 0.75705 - FOR i IN group /* list iteration */
24 CalculationNode 5020 5000037 0 0.84098 - LET #44 = (i.`choBoSung` == 1) /* simple expression */
25 FilterNode 37 57277 4942760 0.15686 - FILTER #44
26 CollectNode 1 74 0 0.00013 - COLLECT AGGREGATE countChoBoSung = LENGTH() /* count */
58 SubqueryEndNode 1 37 0 0.00002 - RETURN countChoBoSung ) /* subquery end */
55 SubqueryStartNode 37 74 0 0.15836 - LET countNgayTra = ( /* subquery begin */
16 EnumerateListNode 5020 5000037 0 0.75944 - FOR i IN group /* list iteration */
17 CalculationNode 5020 5000037 0 0.90365 - LET #42 = (i.`ngayTra` > 0) /* simple expression */
18 FilterNode 37 4219307 780730 0.16860 - FILTER #42
19 CollectNode 1 74 0 0.00012 - COLLECT AGGREGATE countNgayTra = LENGTH() /* count */
56 SubqueryEndNode 1 37 0 0.00002 - RETURN countNgayTra ) /* subquery end */
53 SubqueryStartNode 37 74 0 0.08596 - LET countDaTra = ( /* subquery begin */
9 EnumerateListNode 5020 5000037 0 0.76448 - FOR i IN group /* list iteration */
10 CalculationNode 5020 5000037 0 0.84268 - LET #40 = (i.`daTra` == 1) /* simple expression */
11 FilterNode 37 4211854 788183 0.16787 - FILTER #40
12 CollectNode 1 74 0 0.00011 - COLLECT AGGREGATE countDaTra = LENGTH() /* count */
54 SubqueryEndNode 1 37 0 0.00002 - RETURN countDaTra ) /* subquery end */
50 CalculationNode 1 37 0 0.00004 - LET #52 = { "val" : tinhId, "count" : count, "countDaTra" : countDaTra, "countNgayTra" : countNgayTra, "countChoBoSung" : countChoBoSung, "countThoiDiemChoBoSung" : countThoiDiemChoBoSung, "countNgayXuLy" : countNgayXuLy, "countNgayTiepNhan" : countNgayTiepNhan } /* simple expression */
51 ReturnNode 1 37 0 0.00001 - RETURN #52
Indexes used:
none
Optimization rules applied:
Id RuleName
1 move-calculations-up
2 move-calculations-up-2
3 handle-arangosearch-views
4 remove-unnecessary-calculations-2
5 move-calculations-down
6 splice-subqueries
Query Statistics:
Writes Exec Writes Ign Scan Full Scan Index Cache Hits/Misses Filtered Peak Mem [b] Exec Time [s]
0 0 0 5000000 0 / 0 15388071 1829371904 16.40413
Query Profile:
Query Stage Duration [s]
initializing 0.00000
parsing 0.00013
optimizing ast 0.00002
loading collections 0.00001
instantiating plan 0.00012
optimizing plan 0.00122
instantiating executors 0.00028
executing 16.40239
finalizing 0.00007
I have multi subqueries. I see it spends alot of time for EnumerateListNode and CalculationNode for each subquery. Do you have the best way to statistic data with multi count by query in a FOR .. IN ? Because I see subqueries similar FOR i IN group
from arangodb.
@jsteemann thanks you, I tried query as your suggestion, but time query not as expected, it takes 26.65674 seconds. It's slower old query :(
Profile
Query String (1307 chars, cacheable: false):
FOR i IN search_hosotiepnhan_3
SEARCH i.tinhId > 0
limit 0,5000000
COLLECT tinhId = i.tinhId
AGGREGATE count = length(1)
INTO group = {
daTra: i.daTra,
ngayTra: i.ngayTra_ts,
choBoSung: i.choBoSung,
thoiDiemChoBoSung: i.thoiDiemChoBoSung_ts,
ngayXuLy: i.ngayXuLy_ts,
ngayTiepNhan:i.ngayTiepNhan_s
}
LET counters = (
FOR i IN group
COLLECT AGGREGATE
countDaTra = SUM(i.daTra == 1 ? 1 : 0),
countNgayTra = SUM(i.ngayTra > 0 ? 1 : 0),
countChoBoSung = SUM(i.choBoSung == 1 ? 1 : 0),
countThoiDiemChoBoSung = SUM(i.thoiDiemChoBoSung > 0 ? 1 : 0),
countNgayXuLy = SUM(i.ngayXuLy > 0 ? 1 : 0),
countNgayTiepNhan = SUM(i.ngayTiepNhan > 0 ? 1 : 0)
RETURN {countDaTra, countNgayTra, countChoBoSung, countThoiDiemChoBoSung, countNgayXuLy,
countNgayTiepNhan}
)
RETURN {
val: tinhId,
count: count,
co...
Execution plan:
Id NodeType Calls Items Filtered Runtime [s] Comment
1 SingletonNode 1 1 0 0.00000 * ROOT
2 EnumerateViewNode 5000 5000000 0 13.14875 - FOR i IN search_hosotiepnhan_3 SEARCH (i.`tinhId` > 0) /* view query */
3 LimitNode 5000 5000000 0 0.07553 - LIMIT 0, 5000000
4 CalculationNode 5000 5000000 0 0.62708 - LET #14 = i.`tinhId` /* attribute expression */
6 CalculationNode 5000 5000000 0 5.60139 - LET #18 = { "daTra" : i.`daTra`, "ngayTra" : i.`ngayTra_ts`, "choBoSung" : i.`choBoSung`, "thoiDiemChoBoSung" : i.`thoiDiemChoBoSung_ts`, "ngayXuLy" : i.`ngayXuLy_ts`, "ngayTiepNhan" : i.`ngayTiepNhan_s` } /* simple expression */
7 CollectNode 1 36 0 0.98457 - COLLECT tinhId = #14 AGGREGATE count = LENGTH() INTO group = #18 /* hash */
22 SortNode 1 36 0 0.07236 - SORT tinhId ASC /* sorting strategy: standard */
24 SubqueryStartNode 1 72 0 0.08490 - LET counters = ( /* subquery begin */
9 EnumerateListNode 5001 5000036 0 0.74345 - FOR i IN group /* list iteration */
10 CalculationNode 5001 5000036 0 0.75794 - LET #20 = ((i.`daTra` == 1) ? 1 : 0) /* simple expression */
11 CalculationNode 5001 5000036 0 0.80083 - LET #22 = ((i.`ngayTra` > 0) ? 1 : 0) /* simple expression */
12 CalculationNode 5001 5000036 0 0.74203 - LET #24 = ((i.`choBoSung` == 1) ? 1 : 0) /* simple expression */
13 CalculationNode 5001 5000036 0 0.75944 - LET #26 = ((i.`thoiDiemChoBoSung` > 0) ? 1 : 0) /* simple expression */
14 CalculationNode 5001 5000036 0 0.77572 - LET #28 = ((i.`ngayXuLy` > 0) ? 1 : 0) /* simple expression */
15 CalculationNode 5001 5000036 0 0.86129 - LET #30 = ((i.`ngayTiepNhan` > 0) ? 1 : 0) /* simple expression */
16 CollectNode 1 72 0 0.62120 - COLLECT AGGREGATE countDaTra = SUM(#20), countNgayTra = SUM(#22), countChoBoSung = SUM(#24), countThoiDiemChoBoSung = SUM(#26), countNgayXuLy = SUM(#28), countNgayTiepNhan = SUM(#30) /* sorted */
23 LimitNode 1 72 0 0.00001 - LIMIT 0, 1
17 CalculationNode 1 72 0 0.00005 - LET #32 = { "countDaTra" : countDaTra, "countNgayTra" : countNgayTra, "countChoBoSung" : countChoBoSung, "countThoiDiemChoBoSung" : countThoiDiemChoBoSung, "countNgayXuLy" : countNgayXuLy, "countNgayTiepNhan" : countNgayTiepNhan } /* simple expression */
25 SubqueryEndNode 1 36 0 0.00002 - RETURN #32 ) /* subquery end */
20 CalculationNode 1 36 0 0.00009 - LET #34 = { "val" : tinhId, "count" : count, "countDaTra" : counters[0].`countDaTra`, "countNgayTra" : counters[0].`countNgayTra`, "countChoBoSung" : counters[0].`countChoBoSung`, "countThoiDiemChoBoSung" : counters[0].`countThoiDiemChoBoSung`, "countNgayXuLy" : counters[0].`countNgayXuLy`, "countNgayTiepNhan" : counters[0].`countNgayTiepNhan` } /* simple expression */
21 ReturnNode 1 36 0 0.00001 - RETURN #34
Indexes used:
none
Optimization rules applied:
Id RuleName
1 move-calculations-up
2 optimize-subqueries
3 move-calculations-up-2
4 handle-arangosearch-views
5 remove-unnecessary-calculations-2
6 move-calculations-down
7 splice-subqueries
Query Statistics:
Writes Exec Writes Ign Scan Full Scan Index Cache Hits/Misses Filtered Peak Mem [b] Exec Time [s]
0 0 0 5000000 0 / 0 0 1805746176 26.65769
Query Profile:
Query Stage Duration [s]
initializing 0.00000
parsing 0.00010
optimizing ast 0.00001
loading collections 0.00001
instantiating plan 0.00006
optimizing plan 0.00067
instantiating executors 0.00015
executing 26.65674
finalizing 0.00006
from arangodb.
Please note that the extraction from the view is again the slowest part of your query.
You will need to adjust the view definition again so it includes all used attributes as "stored values".
from arangodb.
@jsteemann yes, I adjusted the view defintion.
{
"globallyUniqueId": "h64D5FDE00828/162169",
"cleanupIntervalStep": 2,
"writebufferIdle": 64,
"writebufferActive": 0,
"id": "162169",
"consolidationIntervalMsec": 1000,
"name": "search_hosotiepnhan_3",
"commitIntervalMsec": 1000,
"type": "arangosearch",
"consolidationPolicy": {
"type": "tier",
"segmentsBytesFloor": 2097152,
"segmentsBytesMax": 5368709120,
"segmentsMax": 10,
"segmentsMin": 1,
"minScore": 0
},
"primarySort": [
{
"field": "tinhId",
"asc": true
}
],
"primarySortCompression": "lz4",
"storedValues": [
{
"fields": [
"tinhId"
],
"compression": "lz4"
},
{
"fields": [
"ngayTra_ts"
],
"compression": "lz4"
},
{
"fields": [
"choBoSung"
],
"compression": "lz4"
},
{
"fields": [
"thoiDiemChoBoSung_ts"
],
"compression": "lz4"
},
{
"fields": [
"ngayXuLy_ts"
],
"compression": "lz4"
},
{
"fields": [
"ngayTiepNhan_s"
],
"compression": "lz4"
},
{
"fields": [
"dayTiepNhan_ts"
],
"compression": "lz4"
},
{
"fields": [
"daTra"
],
"compression": "lz4"
}
],
"writebufferSizeMax": 33554432,
"links": {
"vertex_hosotiepnhan_2": {
"analyzers": [
"identity"
],
"fields": {},
"includeAllFields": true,
"storeValues": "none",
"trackListPositions": false
}
}
}
from arangodb.
@jsteemann I see if used attributes the more the slow, pls check help me. Thanks you.
from arangodb.
@son2408 it's expected, using additional columns slow down query
You can try to unify in single column few fields (just write [["a", "b"]]
instead of [["a"], ["b"]]
, if they're always used together it commonly will be faster
Ofc there will be some limit, starting from what read whole document from rocksdb will be faster
from arangodb.
@MBkkt I tried to unify in single column few fields, but it's still too slow. Total runtime is 26.95196 seconds.
Profile
Query String (1306 chars, cacheable: false):
FOR i IN search_hosotiepnhan
SEARCH i.tinhId > 0
limit 0,5000000
COLLECT tinhId = i.tinhId
AGGREGATE count = length(1)
INTO group = {
daTra: i.daTra,
ngayTra: i.ngayTra_ts,
choBoSung: i.choBoSung,
thoiDiemChoBoSung: i.thoiDiemChoBoSung_ts,
ngayXuLy: i.ngayXuLy_ts,
ngayTiepNhan:i.ngayTiepNhan_ts
}
LET counters = (
FOR i IN group
COLLECT AGGREGATE
countDaTra = SUM(i.daTra == 1 ? 1 : 0),
countNgayTra = SUM(i.ngayTra > 0 ? 1 : 0),
countChoBoSung = SUM(i.choBoSung == 1 ? 1 : 0),
countThoiDiemChoBoSung = SUM(i.thoiDiemChoBoSung > 0 ? 1 : 0),
countNgayXuLy = SUM(i.ngayXuLy > 0 ? 1 : 0),
countNgayTiepNhan = SUM(i.ngayTiepNhan > 0 ? 1 : 0)
RETURN {countDaTra, countNgayTra, countChoBoSung, countThoiDiemChoBoSung, countNgayXuLy,
countNgayTiepNhan}
)
RETURN {
val: tinhId,
count: count,
cou...
Execution plan:
Id NodeType Calls Items Filtered Runtime [s] Comment
1 SingletonNode 1 1 0 0.00001 * ROOT
2 EnumerateViewNode 5000 5000000 0 12.71218 - FOR i IN search_hosotiepnhan SEARCH (i.`tinhId` > 0) /* view query */
3 LimitNode 5000 5000000 0 0.07812 - LIMIT 0, 5000000
4 CalculationNode 5000 5000000 0 0.63890 - LET #14 = i.`tinhId` /* attribute expression */
6 CalculationNode 5000 5000000 0 6.13356 - LET #18 = { "daTra" : i.`daTra`, "ngayTra" : i.`ngayTra_ts`, "choBoSung" : i.`choBoSung`, "thoiDiemChoBoSung" : i.`thoiDiemChoBoSung_ts`, "ngayXuLy" : i.`ngayXuLy_ts`, "ngayTiepNhan" : i.`ngayTiepNhan_ts` } /* simple expression */
7 CollectNode 1 37 0 1.08334 - COLLECT tinhId = #14 AGGREGATE count = LENGTH() INTO group = #18 /* hash */
22 SortNode 1 37 0 0.07627 - SORT tinhId ASC /* sorting strategy: standard */
24 SubqueryStartNode 1 74 0 0.08903 - LET counters = ( /* subquery begin */
9 EnumerateListNode 5001 5000037 0 0.74561 - FOR i IN group /* list iteration */
10 CalculationNode 5001 5000037 0 0.75636 - LET #20 = ((i.`daTra` == 1) ? 1 : 0) /* simple expression */
11 CalculationNode 5001 5000037 0 0.79988 - LET #22 = ((i.`ngayTra` > 0) ? 1 : 0) /* simple expression */
12 CalculationNode 5001 5000037 0 0.74970 - LET #24 = ((i.`choBoSung` == 1) ? 1 : 0) /* simple expression */
13 CalculationNode 5001 5000037 0 0.75335 - LET #26 = ((i.`thoiDiemChoBoSung` > 0) ? 1 : 0) /* simple expression */
14 CalculationNode 5001 5000037 0 0.77716 - LET #28 = ((i.`ngayXuLy` > 0) ? 1 : 0) /* simple expression */
15 CalculationNode 5001 5000037 0 0.92626 - LET #30 = ((i.`ngayTiepNhan` > 0) ? 1 : 0) /* simple expression */
16 CollectNode 1 74 0 0.63191 - COLLECT AGGREGATE countDaTra = SUM(#20), countNgayTra = SUM(#22), countChoBoSung = SUM(#24), countThoiDiemChoBoSung = SUM(#26), countNgayXuLy = SUM(#28), countNgayTiepNhan = SUM(#30) /* sorted */
23 LimitNode 1 74 0 0.00001 - LIMIT 0, 1
17 CalculationNode 1 74 0 0.00006 - LET #32 = { "countDaTra" : countDaTra, "countNgayTra" : countNgayTra, "countChoBoSung" : countChoBoSung, "countThoiDiemChoBoSung" : countThoiDiemChoBoSung, "countNgayXuLy" : countNgayXuLy, "countNgayTiepNhan" : countNgayTiepNhan } /* simple expression */
25 SubqueryEndNode 1 37 0 0.00002 - RETURN #32 ) /* subquery end */
20 CalculationNode 1 37 0 0.00008 - LET #34 = { "val" : tinhId, "count" : count, "countDaTra" : counters[0].`countDaTra`, "countNgayTra" : counters[0].`countNgayTra`, "countChoBoSung" : counters[0].`countChoBoSung`, "countThoiDiemChoBoSung" : counters[0].`countThoiDiemChoBoSung`, "countNgayXuLy" : counters[0].`countNgayXuLy`, "countNgayTiepNhan" : counters[0].`countNgayTiepNhan` } /* simple expression */
21 ReturnNode 1 37 0 0.00001 - RETURN #34
Indexes used:
none
Optimization rules applied:
Id RuleName
1 move-calculations-up
2 optimize-subqueries
3 move-calculations-up-2
4 handle-arangosearch-views
5 remove-unnecessary-calculations-2
6 move-calculations-down
7 splice-subqueries
Query Statistics:
Writes Exec Writes Ign Scan Full Scan Index Cache Hits/Misses Filtered Peak Mem [b] Exec Time [s]
0 0 0 5000000 0 / 0 0 1886453760 26.95313
Query Profile:
Query Stage Duration [s]
initializing 0.00000
parsing 0.00012
optimizing ast 0.00001
loading collections 0.00001
instantiating plan 0.00007
optimizing plan 0.00069
instantiating executors 0.00031
executing 26.95196
finalizing 0.00005
View configuration
{
"globallyUniqueId": "h64D5FDE00828/196510",
"cleanupIntervalStep": 2,
"writebufferIdle": 64,
"writebufferActive": 0,
"id": "196510",
"consolidationIntervalMsec": 1000,
"name": "search_hosotiepnhan",
"commitIntervalMsec": 1000,
"type": "arangosearch",
"consolidationPolicy": {
"type": "tier",
"segmentsBytesFloor": 2097152,
"segmentsBytesMax": 5368709120,
"segmentsMax": 10,
"segmentsMin": 1,
"minScore": 0
},
"primarySort": [],
"primarySortCompression": "lz4",
"storedValues": [
{
"fields": [
"tinhId",
"daTra",
"ngayTra_ts",
"choBoSung",
"thoiDiemChoBoSung_ts",
"ngayXuLy_ts",
"ngayTiepNhan_ts"
],
"compression": "lz4"
}
],
"writebufferSizeMax": 33554432,
"links": {
"vertex_hosotiepnhan_2": {
"analyzers": [
"identity"
],
"fields": {},
"includeAllFields": true,
"storeValues": "none",
"trackListPositions": false
}
}
}
from arangodb.
@jsteemann I included used attributes as "stored values", but it can't improve. I see the more atributes, the slower speed. Can you help me try it with data is shared in above, thanks you.
Query
FOR i IN search_hosotiepnhan_3
SEARCH i.tinhId > 0
limit 0,5000000
COLLECT tinhId = i.tinhId
AGGREGATE count = length(1)
INTO group = {
daTra: i.daTra,
ngayTra: i.ngayTra_ts,
choBoSung: i.choBoSung,
thoiDiemChoBoSung: i.thoiDiemChoBoSung_ts,
ngayXuLy: i.ngayXuLy_ts,
ngayTiepNhan:i.ngayTiepNhan_ts
}
LET counters = (
FOR i IN group
COLLECT AGGREGATE
countDaTra = SUM(i.daTra == 1 ? 1 : 0),
countNgayTra = SUM(i.ngayTra > 0 ? 1 : 0),
countChoBoSung = SUM(i.choBoSung == 1 ? 1 : 0),
countThoiDiemChoBoSung = SUM(i.thoiDiemChoBoSung > 0 ? 1 : 0),
countNgayXuLy = SUM(i.ngayXuLy > 0 ? 1 : 0),
countNgayTiepNhan = SUM(i.ngayTiepNhan > 0 ? 1 : 0)
RETURN {countDaTra, countNgayTra, countChoBoSung, countThoiDiemChoBoSung, countNgayXuLy, countNgayTiepNhan}
)
RETURN {
val: tinhId,
count: count,
countDaTra: counters[0].countDaTra,
countNgayTra: counters[0].countNgayTra,
countChoBoSung: counters[0].countChoBoSung,
countThoiDiemChoBoSung: counters[0].countThoiDiemChoBoSung,
countNgayXuLy: counters[0].countNgayXuLy,
countNgayTiepNhan: counters[0].countNgayTiepNhan
}
from arangodb.
Hard to say without any profile what the limiting factor is now.
You could also try removing the one left subqueries for the counting and fuse its COLLECT statement with the initial COLLECT. Then only one pass over the data is needed.
from arangodb.
@jsteemann yes, i tried to remove some attributes, runtime query speed up. With initial COLLECT is 6 attributes then runtime query is 32s, now attributes numbers is 2 then runtime query is 20s
profile now
Query String (598 chars, cacheable: false):
FOR i IN search_hosotiepnhan
SEARCH i.tinhId > 0
limit 0,5000000
COLLECT tinhId = i.tinhId
AGGREGATE count = length(1)
INTO group = {
daTra: i.daTra,
ngayTra: i.ngayTra_ts
}
LET counters = (
FOR i IN group
COLLECT AGGREGATE
countDaTra = SUM(i.daTra == 1 ? 1 : 0),
countNgayTra = SUM(i.ngayTra > 0 ? 1 : 0)
RETURN {countDaTra, countNgayTra}
)
RETURN {
val: tinhId,
count: count,
countDaTra: counters[0].countDaTra,
countNgayTra: counters[0].countNgayTra
}
Execution plan:
Id NodeType Calls Items Filtered Runtime [s] Comment
1 SingletonNode 1 1 0 0.00000 * ROOT
2 EnumerateViewNode 5000 5000000 0 12.83193 - FOR i IN search_hosotiepnhan SEARCH (i.`tinhId` > 0) /* view query */
3 LimitNode 5000 5000000 0 0.07608 - LIMIT 0, 5000000
4 CalculationNode 5000 5000000 0 0.64449 - LET #10 = i.`tinhId` /* attribute expression */
6 CalculationNode 5000 5000000 0 3.21296 - LET #14 = { "daTra" : i.`daTra`, "ngayTra" : i.`ngayTra_ts` } /* simple expression */
7 CollectNode 1 37 0 0.81389 - COLLECT tinhId = #10 AGGREGATE count = LENGTH() INTO group = #14 /* hash */
18 SortNode 1 37 0 0.02310 - SORT tinhId ASC /* sorting strategy: standard */
20 SubqueryStartNode 1 74 0 0.02810 - LET counters = ( /* subquery begin */
9 EnumerateListNode 5001 5000037 0 0.70404 - FOR i IN group /* list iteration */
10 CalculationNode 5001 5000037 0 0.70010 - LET #16 = ((i.`daTra` == 1) ? 1 : 0) /* simple expression */
11 CalculationNode 5001 5000037 0 0.90486 - LET #18 = ((i.`ngayTra` > 0) ? 1 : 0) /* simple expression */
12 CollectNode 1 74 0 0.30604 - COLLECT AGGREGATE countDaTra = SUM(#16), countNgayTra = SUM(#18) /* sorted */
19 LimitNode 1 74 0 0.00001 - LIMIT 0, 1
13 CalculationNode 1 74 0 0.00004 - LET #20 = { "countDaTra" : countDaTra, "countNgayTra" : countNgayTra } /* simple expression */
21 SubqueryEndNode 1 37 0 0.00002 - RETURN #20 ) /* subquery end */
16 CalculationNode 1 37 0 0.00004 - LET #22 = { "val" : tinhId, "count" : count, "countDaTra" : counters[0].`countDaTra`, "countNgayTra" : counters[0].`countNgayTra` } /* simple expression */
17 ReturnNode 1 37 0 0.00000 - RETURN #22
Indexes used:
none
Optimization rules applied:
Id RuleName
1 move-calculations-up
2 optimize-subqueries
3 move-calculations-up-2
4 handle-arangosearch-views
5 remove-unnecessary-calculations-2
6 move-calculations-down
7 splice-subqueries
Query Statistics:
Writes Exec Writes Ign Scan Full Scan Index Cache Hits/Misses Filtered Peak Mem [b] Exec Time [s]
0 0 0 5000000 0 / 0 0 547389440 20.24671
Query Profile:
Query Stage Duration [s]
initializing 0.00000
parsing 0.00010
optimizing ast 0.00001
loading collections 0.00001
instantiating plan 0.00007
optimizing plan 0.00046
instantiating executors 0.00027
executing 20.24583
finalizing 0.00003
from arangodb.
@jsteemann i think "stored values" is no efficient with COLLECT INTO
from arangodb.
"stored values" and COLLECT INTO are unrelated.
from arangodb.
Can you try fusing the two COLLECT statements into a single one, so you will need only one iteration over the dataset?
from arangodb.
@jsteemann two COLLECT on two other collections. COLLECT 1 for initial search statement, COLLECT 2 for group of COLLECT 1. How to fusing them ?
from arangodb.
Have you tried something along the lines of
FOR i IN search_hosotiepnhan_3
SEARCH i.tinhId > 0
limit 0,5000000
COLLECT tinhId = i.tinhId
AGGREGATE
count = length(1),
countDaTra = SUM(i.daTra == 1 ? 1 : 0),
countNgayTra = SUM(i.ngayTra > 0 ? 1 : 0),
countChoBoSung = SUM(i.choBoSung == 1 ? 1 : 0),
countThoiDiemChoBoSung = SUM(i.thoiDiemChoBoSung > 0 ? 1 : 0),
countNgayXuLy = SUM(i.ngayXuLy > 0 ? 1 : 0),
countNgayTiepNhan = SUM(i.ngayTiepNhan > 0 ? 1 : 0)
...
from arangodb.
@jsteemann i tried it, runtime query is 4 seconds. thanks you.
With more complex usecase is I need subquery collect by key huyenId field with query daTra == 1 so I must use INTO. How to rebuild this query without INTO as this above your query? becuase using INTO it run query too slowly
FOR i IN search_hosotiepnhan_3
SEARCH i.tinhId > 0
limit 0,5000000
COLLECT tinhId = i.tinhId
AGGREGATE
count = length(1),
countDaTra = SUM(i.daTra == 1 ? 1 : 0)
INTO group = {daTra: i.daTra, huyenId: i.huyenId}
LET statHuyen = (FOR g IN group
FILTER g.daTra == 1
COLLECT huyenId = g.huyenId
AGGREGATE
countHuyen = length(1)
RETURN {val: huyenId, count: countHuyen }
)
....
```
from arangodb.
/* view query */
view used without stored values
/* view query without materialization */
view used stored values
from arangodb.
Ok, I think there isn't anything left that I can do now. The only suggestion I have is to try the upcoming version of ArangoDB (3.12), which is currently in development. I can't say if that will help or not, but maybe it is worth a try. Linux nightly builds can be found here: https://download.arangodb.com/nightly/devel/Linux/x86_64/index.html Please do not run a nightly build on the original data, but on a copy of your data!
Ok. Let me to try it. Thanks @jsteemann
from arangodb.
hi @jsteemann I upgraded to version 3.12.0-NIGHTLY.20231219. I see runtime query with COLLECT..INTO improved more than old version. But It is not as expect. For detail usecase.
My query with three times INTO takes 8.5.
View Configuration
{
"globallyUniqueId": "h64D5FDE00828/16843",
"cleanupIntervalStep": 2,
"writebufferIdle": 64,
"writebufferActive": 0,
"id": "16843",
"consolidationIntervalMsec": 1000,
"name": "search_hosotiepnhan",
"commitIntervalMsec": 1000,
"type": "arangosearch",
"consolidationPolicy": {
"type": "tier",
"segmentsBytesFloor": 2097152,
"segmentsBytesMax": 5368709120,
"segmentsMax": 10,
"segmentsMin": 1,
"minScore": 0
},
"primarySort": [
{
"field": "hoSoTiepNhanId",
"asc": true
}
],
"primarySortCompression": "lz4",
"storedValues": [
{
"fields": [
"tinhId",
"huyenId",
"daTra",
"ngayTra_ts",
"choBoSung",
"thoiDiemChoBoSung_ts",
"ngayXuLy_ts",
"ngayTiepNhan_ts",
"dayTiepNhan",
"monthTiepNhan",
"quarterTiepNhan"
],
"compression": "lz4"
}
],
"writebufferSizeMax": 33554432,
"links": {
"vertex_hosotiepnhan": {
"analyzers": [
"identity"
],
"fields": {},
"includeAllFields": true,
"storeValues": "none",
"trackListPositions": false
}
}
}
Query
FOR i IN search_hosotiepnhan
limit 0, 5000000
collect tinhId = i.tinhId
aggregate
countTinh = length(1)
into group = {daTra: i.daTra, dayTiepNhan: i.dayTiepNhan, monthTiepNhan: i.monthTiepNhan, quarterTiepNhan: i.quarterTiepNhan}
//dayTiepNhan
let statDayTiepNhan = (
FOR g in group
collect collDayTiepNhan = g.dayTiepNhan
aggregate
countDayTiepNhan = length(1)
into groupDayTiepNhan = {daTra:g.daTra}
let statDayTiepNhanDaTra = (
FOR gDayTiepNhan in groupDayTiepNhan
collect daTra = gDayTiepNhan.daTra
aggregate
countDaTra = length(1)
return {val: daTra, val: countDaTra}
)
return {val: collDayTiepNhan, count: countDayTiepNhan, buckets: statDayTiepNhanDaTra}
)
//monthTiepNhan
let statMonthTiepNhan = (
FOR g in group
collect monthTiepNhan = g.monthTiepNhan
aggregate
countMonthTiepNhan = length(1)
into groupMonthTiepNhan = {daTra:g.daTra}
let statDaTra = (
FOR gMonthTiepNhan in groupMonthTiepNhan
collect daTra = gMonthTiepNhan.daTra
aggregate
countDaTra = length(1)
return {val: daTra, val: countDaTra}
)
return {val: monthTiepNhan, count: countMonthTiepNhan}
)
return {val: tinhId, count: countTinh, statDayTiepNhan: statDayTiepNhan, statMonthTiepNhan: statMonthTiepNhan}
Profile
Query String (1520 chars, cacheable: false):
FOR i IN search_hosotiepnhan
limit 0, 5000000
collect tinhId = i.tinhId
aggregate
countTinh = length(1)
into group = {daTra: i.daTra, dayTiepNhan: i.dayTiepNhan, monthTiepNhan: i.monthTiepNhan,
quarterTiepNhan: i.quarterTiepNhan}
//dayTiepNhan
let statDayTiepNhan = (
FOR g in group
collect collDayTiepNhan = g.dayTiepNhan
aggregate
countDayTiepNhan = length(1)
into groupDayTiepNhan = {daTra:g.daTra}
let statDayTiepNhanDaTra = (
FOR gDayTiepNhan in groupDayTiepNhan
collect daTra = gDayTiepNhan.daTra
aggregate
countDaTra = length(1)
return {val: daTra, val: countDaTra}
)
return {val: collDayTiepNhan, count: countDayTiepNhan, buckets: statDayTiepNhanDaTra}
)
//monthTiepNhan
let statMonthTiepNhan = (
FOR g in group
collect monthTiepNhan = g.monthTiepNhan
aggregate
...
Execution plan:
Id NodeType Calls Par Items Filtered Runtime [s] Comment
1 SingletonNode 1 - 1 0 0.00000 * ROOT
2 EnumerateViewNode 5000 - 5000000 0 0.51076 - FOR i IN search_hosotiepnhan LET #30 = i.`tinhId` LET #110 = i.`monthTiepNhan` LET #109 = i.`dayTiepNhan` LET #108 = i.`daTra` LET #111 = i.`quarterTiepNhan` /* view query without materialization */
3 LimitNode 5000 - 5000000 0 0.01663 - LIMIT 0, 5000000
6 CalculationNode 5000 4999 5000000 0 2.16173 - LET #32 = { "daTra" : #108, "dayTiepNhan" : #109, "monthTiepNhan" : #110, "quarterTiepNhan" : #111 } /* simple expression */
7 CollectNode 1 0 37 0 1.00809 - COLLECT tinhId = #30 AGGREGATE countTinh = LENGTH() INTO group = #32 /* hash */
44 SortNode 1 0 37 0 0.04790 - SORT tinhId ASC /* sorting strategy: standard */
52 SubqueryStartNode 1 - 74 0 0.11422 - LET statMonthTiepNhan = ( /* subquery begin */
26 EnumerateListNode 5001 - 5000037 0 0.74329 - FOR g IN group /* list iteration */
27 CalculationNode 5001 - 5000037 0 0.37468 - LET #40 = g.`monthTiepNhan` /* attribute expression */
29 CalculationNode 5001 - 5000037 0 1.03349 - LET #42 = { "daTra" : g.`daTra` } /* simple expression */
30 CollectNode 1 - 457 0 0.29846 - COLLECT monthTiepNhan = #40 AGGREGATE countMonthTiepNhan = LENGTH() /* hash */
47 SortNode 1 - 457 0 0.00010 - SORT monthTiepNhan ASC /* sorting strategy: standard */
39 CalculationNode 1 - 457 0 0.00017 - LET #46 = { "val" : monthTiepNhan, "count" : countMonthTiepNhan } /* simple expression */
53 SubqueryEndNode 1 - 37 0 0.00005 - RETURN #46 ) /* subquery end */
48 SubqueryStartNode 1 - 74 0 0.06846 - LET statDayTiepNhan = ( /* subquery begin */
9 EnumerateListNode 5001 - 5000037 0 0.73775 - FOR g IN group /* list iteration */
10 CalculationNode 5001 - 5000037 0 0.35109 - LET #33 = g.`dayTiepNhan` /* attribute expression */
12 CalculationNode 5001 - 5000037 0 1.03256 - LET #35 = { "daTra" : g.`daTra` } /* simple expression */
13 CollectNode 1 - 292 0 0.44655 - COLLECT collDayTiepNhan = #33 AGGREGATE countDayTiepNhan = LENGTH() INTO groupDayTiepNhan = #35 /* hash */
45 SortNode 1 - 292 0 0.00862 - SORT collDayTiepNhan ASC /* sorting strategy: standard */
50 SubqueryStartNode 1 - 547 0 0.00820 - LET statDayTiepNhanDaTra = ( /* subquery begin */
15 EnumerateListNode 5001 - 5000292 0 0.31771 - FOR gDayTiepNhan IN groupDayTiepNhan /* list iteration */
16 CalculationNode 5001 - 5000292 0 0.33699 - LET #36 = gDayTiepNhan.`daTra` /* attribute expression */
18 CollectNode 1 - 793 0 0.27310 - COLLECT daTra = #36 AGGREGATE countDaTra = LENGTH() /* hash */
46 SortNode 1 - 793 0 0.00023 - SORT daTra ASC /* sorting strategy: standard */
19 CalculationNode 1 - 793 0 0.00039 - LET #38 = { "val" : daTra, "val" : countDaTra } /* simple expression */
51 SubqueryEndNode 1 - 292 0 0.00009 - RETURN #38 ) /* subquery end */
22 CalculationNode 1 - 292 0 0.00016 - LET #39 = { "val" : collDayTiepNhan, "count" : countDayTiepNhan, "buckets" : statDayTiepNhanDaTra } /* simple expression */
49 SubqueryEndNode 1 - 37 0 0.00005 - RETURN #39 ) /* subquery end */
42 CalculationNode 1 0 37 0 0.00004 - LET #47 = { "val" : tinhId, "count" : countTinh, "statDayTiepNhan" : statDayTiepNhan, "statMonthTiepNhan" : statMonthTiepNhan } /* simple expression */
43 ReturnNode 1 - 37 0 0.00000 - RETURN #47
Indexes used:
none
Optimization rules applied:
Id Rule Name Id Rule Name Id Rule Name
1 move-calculations-up 5 remove-collect-variables 9 async-prefetch
2 remove-redundant-calculations 6 handle-arangosearch-views 10 splice-subqueries
3 remove-unnecessary-calculations 7 remove-unnecessary-calculations-2
4 move-calculations-up-2 8 move-calculations-down
Query Statistics:
Writes Exec Writes Ign Doc. Lookups Scan Full Scan Index Cache Hits/Misses Filtered Peak Mem [b] Exec Time [s]
0 0 0 0 5000000 0 / 0 0 1563394048 8.63782
Query Profile:
Query Stage Duration [s] Query Stage Duration [s] Query Stage Duration [s]
initializing 0.00000 loading collections 0.00001 instantiating executors 0.00018
parsing 0.00014 instantiating plan 0.00022 executing 8.63007
optimizing ast 0.00001 optimizing plan 0.00719 finalizing 0.00004
if this query has two times COLLECT..INTO. It takes 6s.
Query
FOR i IN search_hosotiepnhan
limit 0, 5000000
collect tinhId = i.tinhId
aggregate
countTinh = length(1)
into group = {daTra: i.daTra, dayTiepNhan: i.dayTiepNhan, monthTiepNhan: i.monthTiepNhan, quarterTiepNhan: i.quarterTiepNhan}
//dayTiepNhan
let statDayTiepNhan = (
FOR g in group
collect collDayTiepNhan = g.dayTiepNhan
aggregate
countDayTiepNhan = length(1)
into groupDayTiepNhan = {daTra:g.daTra}
let statDayTiepNhanDaTra = (
FOR gDayTiepNhan in groupDayTiepNhan
collect daTra = gDayTiepNhan.daTra
aggregate
countDaTra = length(1)
return {val: daTra, val: countDaTra}
)
return {val: collDayTiepNhan, count: countDayTiepNhan, buckets: statDayTiepNhanDaTra}
)
return {val: tinhId, count: countTinh, statDayTiepNhan: statDayTiepNhan}
Profile
Query String (925 chars, cacheable: false):
FOR i IN search_hosotiepnhan
limit 0, 5000000
collect tinhId = i.tinhId
aggregate
countTinh = length(1)
into group = {daTra: i.daTra, dayTiepNhan: i.dayTiepNhan, monthTiepNhan: i.monthTiepNhan,
quarterTiepNhan: i.quarterTiepNhan}
//dayTiepNhan
let statDayTiepNhan = (
FOR g in group
collect collDayTiepNhan = g.dayTiepNhan
aggregate
countDayTiepNhan = length(1)
into groupDayTiepNhan = {daTra:g.daTra}
let statDayTiepNhanDaTra = (
FOR gDayTiepNhan in groupDayTiepNhan
collect daTra = gDayTiepNhan.daTra
aggregate
countDaTra = length(1)
return {val: daTra, val: countDaTra}
)
return {val: collDayTiepNhan, count: countDayTiepNhan, buckets: statDayTiepNhanDaTra}
)
return {val: tinhId, count: countTinh, statDayTiepNhan: statDayTiepNhan}
Execution plan:
Id NodeType Calls Par Items Filtered Runtime [s] Comment
1 SingletonNode 1 - 1 0 0.00000 * ROOT
2 EnumerateViewNode 5000 - 5000000 0 0.51039 - FOR i IN search_hosotiepnhan LET #17 = i.`tinhId` LET #58 = i.`monthTiepNhan` LET #56 = i.`daTra` LET #57 = i.`dayTiepNhan` LET #59 = i.`quarterTiepNhan` /* view query without materialization */
3 LimitNode 5000 - 5000000 0 0.01718 - LIMIT 0, 5000000
6 CalculationNode 5000 4999 5000000 0 2.15355 - LET #19 = { "daTra" : #56, "dayTiepNhan" : #57, "monthTiepNhan" : #58, "quarterTiepNhan" : #59 } /* simple expression */
7 CollectNode 1 0 37 0 1.00086 - COLLECT tinhId = #17 AGGREGATE countTinh = LENGTH() INTO group = #19 /* hash */
27 SortNode 1 0 37 0 0.05242 - SORT tinhId ASC /* sorting strategy: standard */
30 SubqueryStartNode 1 - 74 0 0.06866 - LET statDayTiepNhan = ( /* subquery begin */
9 EnumerateListNode 5001 - 5000037 0 0.72884 - FOR g IN group /* list iteration */
10 CalculationNode 5001 - 5000037 0 0.35558 - LET #20 = g.`dayTiepNhan` /* attribute expression */
12 CalculationNode 5001 - 5000037 0 1.02633 - LET #22 = { "daTra" : g.`daTra` } /* simple expression */
13 CollectNode 1 - 292 0 0.43884 - COLLECT collDayTiepNhan = #20 AGGREGATE countDayTiepNhan = LENGTH() INTO groupDayTiepNhan = #22 /* hash */
28 SortNode 1 - 292 0 0.00886 - SORT collDayTiepNhan ASC /* sorting strategy: standard */
32 SubqueryStartNode 1 - 547 0 0.00816 - LET statDayTiepNhanDaTra = ( /* subquery begin */
15 EnumerateListNode 5001 - 5000292 0 0.31943 - FOR gDayTiepNhan IN groupDayTiepNhan /* list iteration */
16 CalculationNode 5001 - 5000292 0 0.34112 - LET #23 = gDayTiepNhan.`daTra` /* attribute expression */
18 CollectNode 1 - 793 0 0.26691 - COLLECT daTra = #23 AGGREGATE countDaTra = LENGTH() /* hash */
29 SortNode 1 - 793 0 0.00022 - SORT daTra ASC /* sorting strategy: standard */
19 CalculationNode 1 - 793 0 0.00038 - LET #25 = { "val" : daTra, "val" : countDaTra } /* simple expression */
33 SubqueryEndNode 1 - 292 0 0.00009 - RETURN #25 ) /* subquery end */
22 CalculationNode 1 - 292 0 0.00014 - LET #26 = { "val" : collDayTiepNhan, "count" : countDayTiepNhan, "buckets" : statDayTiepNhanDaTra } /* simple expression */
31 SubqueryEndNode 1 - 37 0 0.00004 - RETURN #26 ) /* subquery end */
25 CalculationNode 1 0 37 0 0.00002 - LET #27 = { "val" : tinhId, "count" : countTinh, "statDayTiepNhan" : statDayTiepNhan } /* simple expression */
26 ReturnNode 1 - 37 0 0.00000 - RETURN #27
Indexes used:
none
Optimization rules applied:
Id Rule Name Id Rule Name Id Rule Name
1 move-calculations-up 4 move-calculations-up-2 7 move-calculations-down
2 remove-redundant-calculations 5 handle-arangosearch-views 8 async-prefetch
3 remove-unnecessary-calculations 6 remove-unnecessary-calculations-2 9 splice-subqueries
Query Statistics:
Writes Exec Writes Ign Doc. Lookups Scan Full Scan Index Cache Hits/Misses Filtered Peak Mem [b] Exec Time [s]
0 0 0 0 5000000 0 / 0 0 1246724096 6.03617
Query Profile:
Query Stage Duration [s] Query Stage Duration [s] Query Stage Duration [s]
initializing 0.00000 loading collections 0.00001 instantiating executors 0.00014
parsing 0.00013 instantiating plan 0.00009 executing 6.03324
optimizing ast 0.00001 optimizing plan 0.00255 finalizing 0.00004
from arangodb.
from arangodb.
@jsteemann Can you let me the best way to speedup query with multi COLLECT .. INTO ? . I see the more COLLECT .. INTO the slower runtime query. Thanks you
from arangodb.
Related Issues (20)
- Graph traversal 1million edges too slowly HOT 5
- REGEX_REPLACE character classes not matching correctly HOT 3
- Algorithmics Graph travsersal ArangoDB with FILTER too slowly HOT 8
- PRUNE GRAPH NOT WORK WHEN USE CONDITION USES NEGATION HOT 2
- How to determine if the query was served by a specific shard using AQL Explain? HOT 1
- Performance graph data with filter too slowly HOT 12
- Arangodb not getting deployed on AWS EKS cluster though worked on local K8s cluster HOT 8
- CentOS / RHEL installs of ArangoDB failing HOT 1
- Does arangodb support IPV6 if the dual-stack arangodb deployed in a cluster cannot be started in a dual-stack K8s environment and no obvious error is reported in the debug logs?
- GCC 13.2.0 build devel error HOT 2
- Vulnerability runc CVE-2024-21626 in ArangoDB HOT 1
- Cluster RemoteNode slowdowns query by 50 times comparing to single instance HOT 3
- An integer field cannot be incremented HOT 2
- FATAL [c81f6] {general} failed to bind to endpoint 'http+tcp://[::]:8529'. Please check whether another instance is already running using this endpoint and review your endpoints configuration. HOT 1
- Guix package HOT 1
- After updating to version 3.11.7 of Arango BD, the service is constantly restarting HOT 1
- Graph traversal with cluster from list nodes very slow HOT 1
- GEO_CONTAINS produces false results (inconsistent right-hand rule) HOT 2
- Execution time inconsistent with the use of an index with sorting
- What next after Pregel removal?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from arangodb.