Giter Club home page Giter Club logo

sqllogictest-rs's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sqllogictest-rs's Issues

Abort the entire test file

Right now, we put all of the tpch test cases (q1.slt.part, q3.slt.part......) in one test file(tpch.slt) as we want to share all the common tables from tpch.

However, a failure in q1.slt.part would abort the whole test file. But q3 has nothing to do Q1 and checking its results(if wrong) would save an extra run.

Should we abort in a finer granularity, i.e. slt.part level? Or we should introduce some primitives for specifying different dependencies between test cases?

show how long each statement takes

Sometimes, we want to know how long does some specific statement takes. The sqllogictest cli may support ways to visualize this.

One proposal is to add +timing to statement ok. e.g.,

statement ok +timing +label="flush at the first time"

Then we will see:

flush at the first time ... 1.0s

We can also simply print time for every statement.

Proposal: support syntax "SET DEFAULT [GLOBAL] $SortMode"

In distributed environment, the order of query result set is uncertain, which is also not required in SQL without "ORDER BY".

To simplify the test script, we can support syntax like SET DEFAULT [GLOBAL] $SortMode to set the default sort mode required by query statement in the current file or all files, instead of specify sort mode after each. query statement.

support `defer` statement

We may support the defer statement to do some clean-ups. There're several cases:

  • We've created too many materialized views, and the dependency becomes too complex to resolve and write the drop statements correctly. With defer, that'll be natural.
  • We may create tables with the same name in multiple tests. If one fails, others may also fail because of not dropping the table: we may also allow defer to clean-up on failures.
  • We may not want to make control mode or session variable configurations bring side effects for other parts in this session.

feature: expose execution state to user

Currently, if a tester fails, we cannot know which test fails in console -- users will need to manually export RUST_LOG=info to see the result.

We should have some interface to let user know which statement is being executed. I propose a interface like:

for next_statement in tester.run_script(&script) {
   println!("running {:?}", next_statement);
}

When next is called on iterator, next_statement will be running in sqllogictest internally.

We can also go a step further to expose all our internal states:

for (statement, verifier) in tester.parse_script(&script) {
   let result = db.run(statement).await;
   verifier.verify(result.to_string()).unwrap();
}

CLI: support multiple files as input

so that it can be used in a pipeline's fasion, or multiple globs (expanded by shell).

Currently it accepts a single glob string argument, so it also need to be quoted.

e.g.,

โฏ sqllogictest ./**/*.slt  
error: Found argument './examples/condition/condition.slt' which wasn't expected, or isn't valid in this context

Add test for sqllogictest

As a testing framework, we need to ensure our correctness. Currently we can be relatively confident whether previously passed tests will fail after upgrading sqllogictest by testing agains real usage in e.g., risingwave. But it would definitely be better to add a test suite in our repo. #123

A more dangerous problem which is also harder to detect is that we won't pass previously failed tests. Imagine that after some changes, we allow all test cases to pass. ๐Ÿ˜‡ We need to test errors.

Here's an example of error that won't be run:

# statement is expected to fail with error: "Hey we", but got error: "Hey you got FakeDBError!"
# statement error Hey we
# give me an error

proposal: sqllogictest custom extension

I'm thinking of using +label after each statement ok and query to do some custom extensions over the original sqllogictest syntax. An example here:

https://github.com/cmu-db/bustub/blob/85477ace4eb3ff6531ccfb075dbc283ff99dbdf1/test/sql/p3.14-topn.slt#L354-L359

What we can do:

  • +explain print the query plan
  • +repeat:10 repeat for 10 times
  • +ensure:plan_node ensure there's some plan node in the plan
  • +session:name run this query in a given session, e.g., to test the behavior of txn

Support checking error message

Currently if a query is expected to return an error, we have to use command statement error, which does not support checking error message or error number.

Here I propose to add support for statement error <regex> and query error <regex> syntax like cockroach did.

For example:

statement error syntax error
CREAT TABLE t(a INT);
# Error: Parser Error: syntax error at or near "CREAT"

query error Overflow
SELECT 2147483647 + 1;
# Error: Out of Range Error: Overflow in addition of INT32 (2147483647 + 1)!

Document how to handle separators

Just checked how sqllogictest variants handle separators in the results:

  • sqlite: each row is one value instead of one row, so spaces are compared, but result cannot contain newline
  • duckdb: tab separated, so spaces are compared, but result cannot contain newline and tab
  • cockroach: compare "normalized rows", so whitespace characters are squashed. e.g., the following test will pass
query ITI
SELECT 1, E'a \t\n b', 2
----
1 a      b 2

Documentation needs to be improved

The README.md file just provides a one-line description of the repository, which is pretty unclear to the audience. I would suggest improving the documentation. It would be great if we could include the following information:

  • better description of this repository
  • how to install the program, e.g., any dependencies, etc
  • how to use the program
  • others

Support optional error string

statement error string::interval: context-dependent operators are not allowed in computed column\nHINT: STRING to INTERVAL casts depend on session IntervalStyle; use parse_interval\(string\) instead
CREATE TABLE invalid_table (
  invalid_col interval AS ('1 hour'::string::interval) STORED
)

Statement error is also a part of the testing. In some circumstances, we may expect the system to produce errors as expected.

Support extended query mode

For now, sqllogictest only support simple query mode. For introducing e2e test of extend query mode in risingwave, we should add extend query in the sqllogictest and using a opt '--extend' to choice 'simple query' or 'extended query'.

Add a hook `on_failure`

@BugenZhao raises an idea of printing all session variables after failure. Maybe we can support it by adding a hook on_failure for runner. (And a corresponding option for bin`

Fail to get output from DML with ```RETURNING```

As titled, the following code

query I
insert into t values (2+2) returning *;
----
4

will produce the following error:

query result mismatch:
[SQL] insert into t values (2+2) returning *;
[Diff] (-excepted|+actual)
-   4

A finally-statement-like code block to clean up things

Sometimes, the test files are maintained by different people and they may use the same name for a table or view. This is reasonable as we may view these test files as independent units.

When some test case in one test file failed, sqllogictest proceeds to execute another test file. Without dropping all of the tables and views, it can cause table exists failure in the coming tests.

Therefore, this issue proposes that we may have a finally statement like mechanism to clean up things. Statement in the finally code block would be always executed at last.

Is this mechanism worth implementing? Not necessarily design and implement it in this way, though.

support hash-threshold

    }else if( strcmp(sScript.azToken[0],"hash-threshold")==0 ){
      /* Set the maximum number of result values that will be accepted
      ** for a query.  If the number of result values exceeds this number,
      ** then an MD5 hash is computed of all values, and the resulting hash
      ** is the only result.
      **
      ** If the threshold is 0, then hashing is never used.
      **
      ** If a threshold was specified on the command line, ignore 
      ** any specifed in the script.
      */
query IIII
SELECT i, j, k, l FROM integers FULL OUTER JOIN integers2 ON integers.j+1<>integers2.l ORDER BY 1, 2, 3, 4
----
16000 values hashing to 8b9eab043624ff470b00a981c1d588d9

Use per file session for serial mode

If we set a session variable in a.slt, and then it will be still be set in b.slt. I think this is very error-prone, although we can do clean-up for session variable like for tables.

include problem is related but different: Currenly include's semantic is just copy-paste. Isolation for include subtests is another problem, as in #55, #81. This issue says different slts should have isolated sessions.

If so, would serial mode the same as -j1? No, -j1 will have isolated DBs. Serial mode isolates sessions, but do not isolates DBs, so clean-up tables is still needed

RFC: Supporting `include` statement.

Motivation

When writing tests for complex queries, we need to prepare data before executing queries. The most general approach is to execute insert into statements. However, mixing a lot of insert statements with queries in one file may lead to a large file which is difficult to maintain and understand.

Proposal

A good practice is to split these statements into small files and use include statements to merge them together. For example, tpch e2e tests may consist of several files: one file for creating tables, one file for each table's data, and one file for all queries, and eventually we use include command to concat them together.

include "create.slt"
include "insert_nation.slt"
include "insert_lineitem.slt"

query R
select * from lineitem;

refactor: moving binary to sqllogictest-bin

It's becoming more and more complex, so it would be good to have a separate crate.

cc @xxchan do you want to work on this? If so feel free to assign yourself. Otherwise I'll do the refactor maybe this month.

idea: add syntax for session

Use case: a user wants to setup a dataset (e.g., tpch), then tests different set of queries. What's more, they want to test it under different session configurations. e.g., risingwavelabs/risingwave#3629 (comment)

Previously, they will:

# A.slt
include prepare.slt.part
set CONFIG=A;  
include test1.slt.part
include test2.slt.part
# B.slt
include prepare.slt.part
set CONFIG=B;  
include test1.slt.part
include test2.slt.part

The biggest problem is that the tests cannot be parallized.

I came up with another syntax:

include prepare.slt.part
session {
  statement ok
  set CONFIG=A;  
  include test1.slt.part
}
session {
  statement ok
  set CONFIG=A;  
  include test2.slt.part
}
session {
  statement ok
  set CONFIG=B;  
  include test1.slt.part
}
session {
  statement ok
  set CONFIG=B;  
  include test2.slt.part
}

In this way, we can have:

  • parallism
  • isolated session config. This also solves another problem: config side-effects in a subtest .slt.part #55
  • isolated failure
  • reuse test cases

session can be alternatived called as e.g., subtest or run.

Make Result an associated type for DB

By the way, can we go a step further? e.g., add a Result associated type for DB, so that we can pass any type apart from String as query result. (Not needed in this PR)

Also, I would recommend adding an example for validator in examples folder.

Originally posted by @skyzh in #15 (review)

compare result based on semantics

Background

I'm trying to introduce INTERVAL type in Postgres-extend engine and find a problem caused by the way we test result.

In sqllogictest::AsyncDB, our run interface requires to return a string to compare with the expect result.
async fn run(&mut self, sql: &str) -> Result<String, Self::Error>

But there are some case in which the String is different but the semantics is the same.
Such asinterval '30 days'and interval ' 720:00:00', there are different string but the same semantics.

For some type(such as interval), there are different string format to express a same thing in same semantics. So do we need to add a way to compare the 'semantics' rather than compare the 'string'. I try to think a way to fix it but I don't wether it's worth or necessary. Because this case only exist in 'Interval', 'timstamptz' and other time-related type. So we can also declare the format must be equal to result from psql.

Maybe we can fix it by...

make the result can be multi-format. Such as:

async fn run(&mut self, sql: &str) -> Result<Ans, Self::Error>
enum Ans{
  str(String),
  Interval(...)
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.