gregrahn / tpcds-kit Goto Github PK
View Code? Open in Web Editor NEWTPC-DS benchmark kit with some modifications/fixes
TPC-DS benchmark kit with some modifications/fixes
While creating query from the template query1.tpl I am getting the following error
substitution _END is used before being initialized at line 61.
Hi gregrahn,
Thank you for this repository 🥇
I have an error after this:
dsqgen \ -DIRECTORY ../query_templates \ -INPUT ../query_templates/templates.lst \ -VERBOSE Y \ -QUALIFY Y \ -SCALE 10000 \ -DIALECT netezza \ -OUTPUT_DIR /tmp
I have in output:
qgen2 Query Generator (Version 2.6.0) Copyright Transaction Processing Performance Council (TPC) 2001 - 2017 Parsed 99 templates Malloc Failed at 116 in permute.c
Have you ever had this error?
Thank you!
The queries are categorized into four types and few examples are there.
Reporting Class 41
Ad Hoc Class 59
Iterative Class 4
Data Mining Class 23
I did not find the type of each query anywhere. One hint is that store_sales queries are ad-hoc and catalog_sales are reporting. There is no other data point to classify the queries.
Dear all,
In "TPC BENCHMARK ™ DS Standard Specification Version 2.3.0", Page 49, there's Data
Maintenance Method 1 description --
Method 1: Fact Table Load
for every row v in view V corresponding to fact table F
get row v into local variable lv
for every type 1 business key column bkc in v
get row d from dimension table D corresponding to bkc
where the business keys of v and d are equal
update bkc of lv with surrogate key of d
end for
for every type 2 business key column bkc in v
get row d from dimension table D corresponding to bkc
where the business keys of v and d are equal and rec_end_date is NULL
update bkc of lv with surrogate key of d
end for
insert lv into F
end for
But what is the bkc for Fact Table or the view? -- As we know only Dimension table has the business
key. And by check all the Business Key column should name with postfix "_id" and for all the DM views
-- I can't find any column named like "..._id" ... And even weird -- there are two kinds of business keys
-- "type 1 business keys" and "type 2 business keys" ... Really confused on this. Would anyone kind help
me out?
Thanks in advance for your help.
Regards
Eisen
Thanks!
Sorry, could be a stupid question. I would like to do some tests under different scale factor, but I just can't find a way to generate the corresponding query results like those in answer_sets
folder.
[root@master tools]# dsdgen --help
-bash: dsdgen: command not found
and
[root@master tools]# dsdgen --DISTRIBUTIONS tpcds.idx
-bash: dsdgen: command not found
In the item table, for all current records (where i_rec_end_date is NULL) each unique value for column i_brand_id does not point to only only 1 brand name. In other words, the brand name cannot be determined from the brand id itself. Same is the case with class_id and class columns. Even if it is the way it was meant to be, isn't it semantically incorrect?
Another issue is that the hierarchy item->brand->class-> category does not exist in the generated data.
Hi,
I tried to run query 24+25 in Sql Server and I got no results,
what can be the problem?
Thanks!
Gal
Line 90 has an erronous line break causing the column alias to be on two lines. Seems like an obvious mistake.
select v1.item_sk, v1.d_date, v1.web_sales, v1.store_sales, max(v2.web_sales) web_cumulative, max(v2.store_sales) s
tore_cumulative
Thanks you for helping so many people with this repository. It would have been impossible to get this to work by using the source from official site.
I have the data ready in the database. During query generation I did not find any option for mysql as a dialect. I have tried ansi and sqlserver but the queries generated there are not valid mysql queries.
Any help in this regard will be highly appreciated. Thank you.
The business question for query91 states:
Display total returns of catalog sales by call center and manager in a particular month for male customers of unknown education or female customers with advanced degrees with a specified buy potential and from a particular time zone.
However the SQL contains
and ( (cd_marital_status = 'M' and cd_education_status = 'Unknown')
or(cd_marital_status = 'W' and cd_education_status = 'Advanced Degree'))
For male/female customers the query should use cd_gender and not cd_marital_status like such:
and ( (cd_gender = 'M' and cd_education_status = 'Unknown')
or(cd_gender = 'F' and cd_education_status = 'Advanced Degree'))
There appears to be a bug in the data generation that only generates data for the 2nd of March when the year is a leap year, in this case year 2000. As a result, 1998, 1999, 2001, and 2002 have no data for the 2nd of March. This means 1823 of 1827 days have data for the given date range, 1998-01-02 to 2003-01-02.
Not a critical bug, but documenting it.
Hi, I am using TPC-DS and I already uploaded the data generated with dsgen to the Database. However, when I try to run tpcds_ri.sql to include the foreign keys in the system I have several violations of these constraints. Is there any recommendation to avoid this problem? Should I follow a different process to implement these foreign key constraints?
I am generating a 1 GB database.
Thanks
The SELECTCONE variable references a column customer_preferred_cust_flag
but there is no column alias matching that. Need to add that alias for c_preferred_cust_flag
from customer
.
I have the following nix expression to set up the TPCDS toolkit. All is well, the executables (dsdgen dsqgen distcomp mkheader checksum) are copied but then when I run dsdgen I get this error: "Error: open of distributions failed: tpcds.idx: no such file or directory"
{ pkgs, ... }:
pkgs.stdenv.mkDerivation rec {
pname = "tpcds-kit";
version = "unstable-2019-06-14";
src = pkgs.fetchFromGitHub {
owner = "gregrahn";
repo = pname;
rev = "9d01e73403c32d8e3d89a987d096b98cbfae3c62";
sha256 = "removed";
};
preBuild = "cd tools";
buildInputs = with pkgs; [ bison byacc (pkgs.writeScriptBin "lex" "exec ${flex}/bin/flex $@") ];
hardeningDisable = [ "all" ];
installPhase = ''
mkdir -p $out/bin
cp dsdgen dsqgen distcomp mkheader checksum tpcds $out/bin;
'';
}
It seems that the tpcds.idx (since its not an executable, maybe?) is not copied into that folder, but the 5 executables are. If cd-ing into it I can't see it, dsdgen can't see it either and returns this error. Can confirm that in my local install of tpcds-kit, the tpcds.idx and tpcds.idx.h files are there and dsdgen can be run just fine. I am trying to set it up in Google cloud instance to work the same way. Tried adding tpcds to the cp command, but it seems to not do anything.
Is there something obvious I am missing? thank you in advance
v2.5.0 introduced query10a but as-is, does not run. Each branch of the union all
contains two where
clauses -- the second one needs to be an and
.
exists (select *
from (
select ws_bill_customer_sk as customer_sk, d_year,d_moy
from web_sales, date_dim where ws_sold_date_sk = d_date_sk
where d_year = [YEAR] and
d_moy between [MONTH] and [MONTH]+3
union all
select cs_ship_customer_sk as customer_sk, d_year, d_moy
from catalog_sales, date_dim where cs_sold_date_sk = d_date_sk
where d_year = [YEAR] and
d_moy between [MONTH] and [MONTH]+3
) x
where c.c_customer_sk = customer_sk)
query72.tpl contains two repeated predicates:
where d1.d_week_seq = d2.d_week_seq
and inv_quantity_on_hand < cs_quantity
and d3.d_date > d1.d_date + 5
and hd_buy_potential = '[BP]'
and d1.d_year = [YEAR]
and hd_buy_potential = '[BP]' -- repeated predicate
and cd_marital_status = '[MS]'
and d1.d_year = [YEAR] -- repeated predicate
A change introduced in v2.9 modified query77a.tpl to fix the incorrect inline view cr
(lines 75-84) is missing a comma after the first column (L76).
The below section has week1/week2 but the tues_sales is
tue_sales1/tue_sales1
and should likely be
tue_sales1/tue_sales2
[_LIMITA] select [_LIMITB] s_store_name1,s_store_id1,d_week_seq1
,sun_sales1/sun_sales2,mon_sales1/mon_sales2
,tue_sales1/tue_sales1,wed_sales1/wed_sales2,thu_sales1/thu_sales2
,fri_sales1/fri_sales2,sat_sales1/sat_sales2
The definition of customer
in tpcds.sql
lists the last column as
c_last_review_date char(10)
however the TPC-DS spec v2.4 list the last column as
c_last_review_date_sk identifier
with FK to d_date_sk
The DDL looks incorrect. This column change also impacts query30.tpl
When I attempt to generate TPC-DS data at scale factor < 1, I get the following:
[arunm2@vm-arunm2 tools]$ ./dsdgen -DIR ~/tpcds -SCALE 0.01
dsdgen Population Generator (Version 2.10.0)
Copyright Transaction Processing Performance Council (TPC) 2001 - 2018
Segmentation fault (core dumped)
Small scale factors are extremely useful to test functionality.
I used dsdgen without any option. I don't know if the data was created and in which database. My os is Macos
Hi,
i need help to run dsqgen
./dsqgen -input /TPC_DS/tpc-ds_gen/query_templates/templates.lst -directory /host/Users/TPC_DS/tpc-ds_gen/query_templates -dialect oracle -scale 1
the output is :
ERROR: Substitution'STATE' is used before being initialized at line 57 in /TPC_DS/tpc-ds_gen/query_templates/query1.tpl
and is almost the same when running with -template query1.tpl or any other query
best,
rim
At some point the dialect definitions lost the following two lines and dsqgen
will error if not present.
define _BEGIN = "-- start query " + [_QUERY] + " in stream " + [_STREAM] + " using template " + [_TEMPLATE];
define _END = "-- end query " + [_QUERY] + " in stream " + [_STREAM] + " using template " + [_TEMPLATE];
It seems that with the recent changes the FILTER flag for dsdgen no longer ouputs to STDOUT. __FILTER is still listed but data is output to local dir.
There is a bunch of date arithmetic in templates, that uses bad ORACLE syntax, i.e.:
query5.tpl: and (cast('[SALES_DATE]' as date) + 14 days)
query5.tpl: and (cast('[SALES_DATE]' as date) + 14 days)
query5.tpl: and (cast('[SALES_DATE]' as date) + 14 days)
Queries fail with:
query 7 in stream 0 using template query32.tpl
(cast('1999-02-22' as date) + 90 days)
*
ERROR at line 10:
ORA-00907: missing right parenthesis
The correct syntax is: (date) +90
or (date) + interval '90' day
.
Does it make sense to adjust templates with something like: + 90 &DAYSLITERAL
?
> grep -HP '\+\s+\d+\s+days' query*.tpl
query12.tpl: and (cast('[SDATE]' as date) + 30 days)
query16.tpl: (cast('[YEAR]-[MONTH]-01' as date) + 60 days)
query20.tpl: and (cast('[SDATE]' as date) + 30 days)
query21.tpl: and (cast ('[SALES_DATE]' as date) + 30 days)
query32.tpl: (cast('[CSDATE]' as date) + 90 days)
query32.tpl: (cast('[CSDATE]' as date) + 90 days)
query37.tpl: and d_date between cast('[INVDATE]' as date) and (cast('[INVDATE]' as date) + 60 days)
query40.tpl: and (cast ('[SALES_DATE]' as date) + 30 days)
query5.tpl: and (cast('[SALES_DATE]' as date) + 14 days)
query5.tpl: and (cast('[SALES_DATE]' as date) + 14 days)
query5.tpl: and (cast('[SALES_DATE]' as date) + 14 days)
query77.tpl: and (cast('[SALES_DATE]' as date) + 30 days)
query77.tpl: and (cast('[SALES_DATE]' as date) + 30 days)
query77.tpl: and (cast('[SALES_DATE]' as date) + 30 days)
query77.tpl: and (cast('[SALES_DATE]' as date) + 30 days)
query77.tpl: and (cast('[SALES_DATE]' as date) + 30 days)
query77.tpl: and (cast('[SALES_DATE]' as date) + 30 days)
query80.tpl: and (cast('[SALES_DATE]' as date) + 30 days)
query80.tpl: and (cast('[SALES_DATE]' as date) + 30 days)
query80.tpl: and (cast('[SALES_DATE]' as date) + 30 days)
query82.tpl: and d_date between cast('[INVDATE]' as date) and (cast('[INVDATE]' as date) + 60 days)
query92.tpl: (cast('[WSDATE]' as date) + 90 days)
query92.tpl: (cast('[WSDATE]' as date) + 90 days)
query94.tpl: (cast('[YEAR]-[MONTH]-01' as date) + 60 days)
query95.tpl: (cast('[YEAR]-[MONTH]-01' as date) + 60 days)
query98.tpl: and (cast('[SDATE]' as date) + 30 days)
Both iterations (queries) in query24.tpl
seem to be missing the PK/FK join condition between between customer
and customer_address
which correlates to customers who currently live in their birth countries and in the neighborhood of the store (snippet of business question from specification).
as of v2.3.0
from store_sales
,store_returns
,store
,item
,customer
,customer_address
where ss_ticket_number = sr_ticket_number
and ss_item_sk = sr_item_sk
and ss_customer_sk = c_customer_sk
and ss_item_sk = i_item_sk
and ss_store_sk = s_store_sk
and c_current_addr_sk = ca_address_sk -- this join is missing in query24.tpl
and c_birth_country = upper(ca_country)
and s_zip = ca_zip
and s_market_id=[MARKET]
When aliasing a column in SELECT query, I believe the word "AS" is the standard SQL (e.g., select sum(x) AS s
). Though some system allow omitting it, others don't, such as MonetDB.
The current template files don't have AS when aliasing a column. I think it is desired to add it.
Edit: I just find "AS" is missing only in some queries, e.g., query3.
Looks like a recent change broke this:
[ec2@ip-111 tools]$ make
yacc -d -v qgen.y
gcc -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -DYYDEBUG -DLINUX -g -Wall -c -o expr.o expr.c
expr.c:49:19: fatal error: y.tab.h: No such file or directory
#include "y.tab.h"
^
compilation terminated.
make: *** [expr.o] Error 1
[ec2@ip-111 tools]$
(I have the instance around, let me know if you want me to test something out)
The query77 and its variant has a slight error.
https://github.com/gregrahn/tpcds-kit/blob/master/query_templates/query77.tpl#L135
https://github.com/gregrahn/tpcds-kit/blob/master/query_variants/query77a.tpl#L136
to fix replace , coalesce(returns, 0) returns
by , coalesce(returns, 0) as returns
PS: I am unauthorized to push the fix
./dsqgen -input ../query_templates/templates.lst -scale 8 -dialect oracle -directory ../query_templates
ERROR: Substitution'_END' is used before being initialized at line 64 in ../query_templates/query1.tpl
for each tpl, I must add this :define _END = ""; ???
there are so many problems in generating data&query。 AND I CAN NOT STAND SO MANY PROBLEMS IN THIS PROJECT, DOCS EVEN NOT UPDATED.
Some systems default to \N
being the NULL
character in flat files versus a missing field. Adding an option to dbgen would be a nice solution for this.
As of v2.4 query77a is missing cr_call_center_sk
in the select list and group by making it semantically different
query77a
cr as
(select
sum(cr_return_amount) as returns,
sum(cr_net_loss) as profit_loss
from catalog_returns,
date_dim
where cr_returned_date_sk = d_date_sk
and d_date between cast('[SALES_DATE]' as date)
and (cast('[SALES_DATE]' as date) + 30 )
),
query77
cr as
(select cr_call_center_sk,
sum(cr_return_amount) as returns,
sum(cr_net_loss) as profit_loss
from catalog_returns,
date_dim
where cr_returned_date_sk = d_date_sk
and d_date between cast('[SALES_DATE]' as date)
and (cast('[SALES_DATE]' as date) + 30 days)
group by cr_call_center_sk
),
When using dsdgen with -ABREVIATION 1k (which admittedly I don't know if that's use as intended) it seems to seg fault.
Hello,
Is there any way to generate queries to use on MySQL? I've trying to use the netezza template but ERROR 1064 pops up on almost every single query. I'm just starting to use it so I'm rookie. Any help? Can't use other DBMS since it's a school project.
Best regards
When I make the code, I will make a mistake.
[root@master tools]# make gcc -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -DYYDEBUG -DLINUX -g -Wall -o dsqgen address.o date.o decimal.o dist.o error_msg.o expr.o eval.o genrand.o grammar_support.o keywords.o list.o nulls.o permute.o print.o QgenMain.o query_handler.o r_params.o scaling.o StringBuffer.o substitution.o tdefs.o text.o tokenizer.o w_inventory.o y.tab.o release.o scd.o build_support.o parallel.o -lm grammar_support.o: In function
include_file':
/opt/tpcds-kit-master/tools/grammar_support.c:128: undefined reference to yy_create_buffer' /opt/tpcds-kit-master/tools/grammar_support.c:129: undefined reference to
yy_switch_to_buffer'
y.tab.o: In function yyparse': /opt/tpcds-kit-master/tools/y.tab.c:551: undefined reference to
yylex'
/opt/tpcds-kit-master/tools/y.tab.c:1129: undefined reference to yylex' collect2: ld returned 1 exit status make: *** [dsqgen] Error 1
I need help to solve this issue. Thanks.
Hello I am on windows trying to generate data with dsdgen.exe
Command I've tried: dsdgen SCALE 1 DIR c:\tmp
But it doesn't create in the specified directory, instead just does it in the local tools directory.
My question is what did I do wrong? I checked the help section
[error] at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Serializer.apply(ExpressionEncoder.scala:215)
[error] at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Serializer.apply(ExpressionEncoder.scala:197)
[error] at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
[error] at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
[error] at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
[error] at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:729)
[error] at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
[error] at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:132)
[error] at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
[error] at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
[error] at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
[error] at org.apache.spark.scheduler.Task.run(Task.scala:127)
[error] at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:444)
[error] at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
[error] at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:447)
[error] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[error] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[error] at java.lang.Thread.run(Thread.java:748)
[error] Caused by: java.lang.ArrayIndexOutOfBoundsException: 0
[error] at org.apache.spark.sql.catalyst.expressions.GenericRow.get(rows.scala:174)
[error] at org.apache.spark.sql.Row.isNullAt(Row.scala:211)
[error] at org.apache.spark.sql.Row.isNullAt$(Row.scala:211)
[error] at org.apache.spark.sql.catalyst.expressions.GenericRow.isNullAt(rows.scala:166)
[error] at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.writeFields_0_0$(Unknown Source)
[error] at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)
[error] at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Serializer.apply(ExpressionEncoder.scala:211)
[error] ... 17 more
[error] 21/08/19 16:16:28 ERROR Executor: Exception in task 5.0 in stage 0.0 (TID 5)
[error] java.lang.RuntimeException: Error while encoding: java.lang.ArrayIndexOutOfBoundsException: 0
...
Dear All,
I just downloaded the most recent tpcds-kit-master. I first tested with --
/asiq160/rs/tpc-ds/tpcds-kit-master/tools/dsqgen -template query99.tpl -dir /asiq160/rs/tpc-ds/tpcds-kit-master/query_templates -dialect sybase
It works fine and generate a valid sql file. But while I ran --
/asiq160/rs/tpc-ds/tpcds-kit-master/tools/dsqgen -template query98.tpl -dir /asiq160/rs/tpc-ds/tpcds-kit-master/query_templates -dialect sybase
It raised error as --
qgen2 Query Generator (Version 2.3.0)
Copyright Transaction Processing Performance Council (TPC) 2001 - 2016
Error: open of distributions failed: tpcds.idx: No such file or directory
I don't know if it's due to query98.tpl is not in the list... And the batch statement --
/asiq160/rs/tpc-ds/tpcds-kit-master/tools/dsqgen -input /asiq160/rs/tpc-ds/tpcds-kit-master/query_templates/templates.lst -scale 8 -dialect sybase -directory /asiq160/rs/tpc-ds/tpcds-kit-master/query_templates
Also raised the same error --
asiq160@bigDataIQ:/iq_data/dump/query> /asiq160/rs/tpc-ds/tpcds-kit-master/tools/dsqgen -input /asiq160/rs/tpc-ds/tpcds-kit-master/query_templates/templates.lst -scale 8 -dialect sybase -directory /asiq160/rs/tpc-ds/tpcds-kit-master/query_templates
qgen2 Query Generator (Version 2.3.0)
Copyright Transaction Processing Performance Council (TPC) 2001 - 2016
Error: open of distributions failed: tpcds.idx: No such file or directory
So maybe due to the tpcds.idx not contains all the 99 templates? Please kind help. Thanks in advance for your help.
Regards
Eisen
spelling error in dsdgen
Hi gregrahn,
Thank you for this repository
I have an error trying to run the tests on a MySQL server on my machine. I suppose it has to do with the fact that the 'with' operator isn't supported in MySQL. Can help me out in trying to run these tests on a MySQL server?
Any help is appreciated.
Thanks,
Bhargav
As of v2.4 the results
CTE has a group by
that appears to be a mistake. There is no aggregate and when removing the group by
query22a returns the same result then as query22.
ERROR: column "inventory.inv_quantity_on_hand" must appear in the GROUP BY clause or be used in an aggregate function
LINE 6: ,inv_quantity_on_hand qoh
^
Additionally query22a has a reference to the warehouse
table and corresponding join -- which is not present in query22.
from inventory
,date_dim
,item
,warehouse -- <<< Not in query22
where inv_date_sk=d_date_sk
and inv_item_sk=i_item_sk
and inv_warehouse_sk = w_warehouse_sk -- <<< Not in query22
and d_month_seq between [DMS] and [DMS] + 11
There is an answer_sets folder but it does not work for normal queries generated using scale factor 1. We need to change substitution parameters (section 11 appendix B of TPCDS.1.3.1.pdf doc) after which results match. I could not find any automated way to do this.
Is there an option in dsqgen to generate validation queries using correct substitution parameters for scale factor 1 so that validation can be done?
With the default setting, data is generated from 1998-01-02 to 2003-01-02.
Is it possible to generate data for a different range of dates ?
for instance 2015-01-01 to 2020-01-01..
minor, but the syntax is 'dsgen -help', instead of --help
Errors related to two references to columns that do not exist and three duplicate name collisions.
psql:tpcds-kit/tools/tpcds_ri.sql:39: ERROR: column "cp_promo_id" referenced in foreign key constraint does not exist
psql:tpcds-kit/tools/tpcds_ri.sql:50: ERROR: constraint "cr_i" for relation "catalog_returns" already exists
psql:tpcds-kit/tools/tpcds_ri.sql:55: ERROR: column "cr_ship_date_sk" referenced in foreign key constraint does not exist
psql:tpcds-kit/tools/tpcds_ri.sql:119: ERROR: constraint "wr_ret_cd" for relation "web_returns" already exists
psql:tpcds-kit/tools/tpcds_ri.sql:124: ERROR: constraint "ws_b_cd" for relation "web_sales" already exists
Dear all,
When I'm preparing the refresh data, I created the source table with tpcds_source.sql. In this sql file, there's 1 table named s_web_returns. The statement is --
create table s_web_returns
(
wret_web_site_id char(16) ,
wret_order_id integer not null,
wret_line_number integer not null,
wret_item_id char(16) not null,
wret_return_customer_id char(16) ,
wret_refund_customer_id char(16) ,
wret_return_date char(10) ,
wret_return_time char(10) ,
wret_return_qty integer ,
wret_return_amt numeric(7,2) ,
wret_return_tax numeric(7,2) ,
wret_return_fee numeric(7,2) ,
wret_return_ship_cost numeric(7,2) ,
wret_refunded_cash numeric(7,2) ,
wret_reversed_charge numeric(7,2) ,
wret_account_credit numeric(7,2) ,
wret_reason_id char(16)
);
But while creating the fact view wrv on it, we can see an error -- Column 'wret_web_page_id' not found, and by check for wrv's statement --
CREATE VIEW wrv AS
SELECT
d_date_sk wr_return_date_sk
,t_time_sk wr_return_time_sk
,i_item_sk wr_item_sk
,c1.c_customer_sk wr_refunded_customer_sk
,c1.c_current_cdemo_sk wr_refunded_cdemo_sk
,c1.c_current_hdemo_sk wr_refunded_hdemo_sk
,c1.c_current_addr_sk wr_refunded_addr_sk
,c2.c_customer_sk wr_returning_customer_sk
,c2.c_current_cdemo_sk wr_returning_cdemo_sk
,c2.c_current_hdemo_sk wr_returning_hdemo_sk
,c2.c_current_addr_sk wr_returing_addr_sk
,wp_web_page_sk wr_web_page_sk
,r_reason_sk wr_reason_sk
,wret_order_id wr_order_number
,wret_return_qty wr_return_quantity
,wret_return_amt wr_return_amt
,wret_return_tax wr_return_tax
,wret_return_amt + wret_return_tax AS wr_return_amt_inc_tax
,wret_return_fee wr_fee
,wret_return_ship_cost wr_return_ship_cost
,wret_refunded_cash wr_refunded_cash
,wret_reversed_charge wr_reversed_charge
,wret_account_credit wr_account_credit
,wret_return_amt+wret_return_tax+wret_return_fee
-wret_refunded_cash-wret_reversed_charge-wret_account_credit wr_net_loss
FROM s_web_returns LEFT OUTER JOIN date_dim ON (cast(wret_return_date as date) = d_date)
LEFT OUTER JOIN time_dim ON ((CAST(SUBSTR(wret_return_time,1,2) AS integer)*3600
+CAST(SUBSTR(wret_return_time,4,2) AS integer)*60+CAST(SUBSTR(wret_return_time,7,2) AS integer))=t_time)
LEFT OUTER JOIN item ON (wret_item_id = i_item_id)
LEFT OUTER JOIN customer c1 ON (wret_return_customer_id = c1.c_customer_id)
LEFT OUTER JOIN customer c2 ON (wret_refund_customer_id = c2.c_customer_id)
LEFT OUTER JOIN reason ON (wret_reason_id = r_reason_id)
LEFT OUTER JOIN web_page ON (wret_web_page_id = WP_WEB_PAGE_id)
WHERE i_rec_end_date IS NULL AND wp_rec_end_date IS NULL;
But there's no wret_web_page_id column defined in s_web_returns. So would you please have a look and also modify the refresh data generator to add this column?
Thanks in advance for your help.
Regards
Eisen
Hi gregrahn,
I am getting this error. Any idea how can I fix it?
dsdgen Population Generator (Version 2.10.0)
dsdgen Population Generator (Version 2.10.0)
Copyright Transaction Processing Performance Council (TPC) 2001 - 2018
Copyright Transaction Processing Performance Council (TPC) 2001 - 2018
dsdgen Population Generator (Version 2.10.0)
Copyright Transaction Processing Performance Council (TPC) 2001 - 2018
Warning: Selected scale factor is NOT valid for result publication
Warning: Selected scale factor is NOT valid for result publication
dsdgen Population Generator (Version 2.10.0)
Copyright Transaction Processing Performance Council (TPC) 2001 - 2018
Warning: Selected scale factor is NOT valid for result publication
dsdgen Population Generator (Version 2.10.0)
Copyright Transaction Processing Performance Council (TPC) 2001 - 2018
Warning: Selected scale factor is NOT valid for result publication
dsdgen Population Generator (Version 2.10.0)
Copyright Transaction Processing Performance Council (TPC) 2001 - 2018
Warning: Selected scale factor is NOT valid for result publication
[ec2-user@ip-10-0-0-131 hands-on-with-redshift]$ Warning: Selected scale factor is NOT valid for result publication
dsdgen Population Generator (Version 2.10.0)
Copyright Transaction Processing Performance Council (TPC) 2001 - 2018
dsdgen Population Generator (Version 2.10.0)
Copyright Transaction Processing Performance Council (TPC) 2001 - 2018
Warning: Selected scale factor is NOT valid for result publication
Warning: Selected scale factor is NOT valid for result publication
ERROR: Failed to open output file!
File: print.c
Line: 490
ERROR: Failed to open output file!
File: print.c
Line: 490
ERROR: Failed to open output file!
File: print.c
Line: 490
ERROR: Failed to open output file!
File: print.c
Line: 490
ERROR: Failed to open output file!
File: print.c
Line: 490
ERROR: Failed to open output file!
File: print.c
Line: 490
ERROR: Failed to open output file!
File: print.c
Line: 490
ERROR: Failed to open output file!
File: print.c
Line: 490
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.