Giter Club home page Giter Club logo

Comments (14)

melgenek avatar melgenek commented on August 15, 2024 3

Hi there!
I wanted to share one more use case for the problem that you're solving here.
Datafusion uses sqllogictest-rs to run some .slt tests to check compatibility between Datafusion and Postgres.

For now, we went the "canonical form" way. The string representation is what is compared, and both Datafusion and Postgres have to produce the same string representations.
In order to achieve the same text representation, we defined a set of typed value to string conversions for:

  • booleans
  • strings ((empty) and values)
  • decimals. Postgres and Datafusion sometimes use different underlying number types, which causes precision errors. So the conversions produce "canonical" rounded values.

The drawback of this approach is that I had to rewrite the postgres-extended to use these "canonical" conversions.
This approach also wouldn't work if the Postgres implementation is not based on types.

The current cross-engine cross-type logic is basically

typed result 1 -> canonical string
                                      -> compare strings
typed result 2 -> canonical string

Regarding the @ZENOTME's idea of custom comparators: the problem with them is that each engine implementation would need to know how to parse an opaque "expected result" string in order to compare it. It's a quite fragile approach because an engine cannot predict what users write in the outputs.

Additionally, the errors are now quite clear, because there are +/- showing the diffs between the expected and actual results. One would need to define both a comparator and a string representation in order to show errors.

The comparator logic also arguably seems more complicated to implement and maintain:

expected string -> parse to typed value -> compare to typed result 1
expected string -> parse to typed value -> compare to typed result 2

Defining "canonical" type representations seems like a big task. But, in general, the "canonical value" approach seems to be somewhat easier both to understand and implement for multiple engines, and multiple representations per type.

from sqllogictest-rs.

ZENOTME avatar ZENOTME commented on August 15, 2024 1

Yes. All should use days / hours, probably, if we only want to work on strings instead of interpreting the actual values.

If we can unify the '*.slt' file, this way seems to be a most simple way to solve this.

I will:

  1. unify the '*.slt' file, it can let us introduce new type(interval,timestamptz) quickly.
  2. try to implement custom result type later, it seems a good way for us to support new type in the future.

from sqllogictest-rs.

skyzh avatar skyzh commented on August 15, 2024

Why not format the timestamp types into strings?

from sqllogictest-rs.

skyzh avatar skyzh commented on August 15, 2024

I'd suggest implementing a custom result type and a custom comparator in sqllogictest. It's impossible to cover all data types in this framework.

from sqllogictest-rs.

ZENOTME avatar ZENOTME commented on August 15, 2024

Why not format the timestamp types into strings?

I'm not sure what you means. A timestamp type can be format into different string with the same semantics.

I'd suggest implementing a custom result type and a custom comparator in sqllogictest. It's impossible to cover all data types in this framework.

Sounds like a good way to solve this. I can try it.

from sqllogictest-rs.

skyzh avatar skyzh commented on August 15, 2024

I'm not sure what you means. A timestamp type can be format into different string with the same semantics.

You can format it to a fixed format (e.g., represented by hours, or represented by days, not sure how we can do this) before comparing.

from sqllogictest-rs.

ZENOTME avatar ZENOTME commented on August 15, 2024

I'm not sure what you means. A timestamp type can be format into different string with the same semantics.

You can format it to a fixed format (e.g., represented by hours, or represented by days, not sure how we can do this) before comparing.

So does it also mean '*.slt' file should use the fixed format?

from sqllogictest-rs.

skyzh avatar skyzh commented on August 15, 2024

Yes. All should use days / hours, probably, if we only want to work on strings instead of interpreting the actual values.

from sqllogictest-rs.

xxchan avatar xxchan commented on August 15, 2024

To summarize, when a type has more than one representations for the same value, an engine can convert it to a canonial form. The limitation is that the test cases should also be written in canonical form, which will also make some engines imcompatible, e.g., postgres and postgres-extended. A better way is to let the engine provide a custom comparator, but cmp(&str,&str) is hard to do. It can be better to let the engine provide a result type and the comparator can be cmp(&ResT, &str). It's slightly better but still need to do some parsing. 🤔

The behavior can be tweaked by using different engines. e.g., if some users do want to test the results AS-IS, they can just use a non-canonical engine.

from sqllogictest-rs.

ZENOTME avatar ZENOTME commented on August 15, 2024

It's slightly better but still need to do some parsing. The behavior can be tweaked by using different engines. e.g., if some users do want to test the results AS-IS, they can just use a non-canonical engine.

After introducing the custom comparator, we don't need the canonical form. We can directly make postgres-extended to be a non-canonical engine.

from sqllogictest-rs.

xxchan avatar xxchan commented on August 15, 2024

After introducing the custom comparator, pg-extended is still kind of canonical.

Assume A1 has canonical form A.

// before
expected actual pg    pg-extended
A1       A1     pass  fail
A        A1     fail  pass

// after
expected actual pg    pg-extended
A1       A1     pass  pass (change here)
A        A1     fail  pass

from sqllogictest-rs.

ZENOTME avatar ZENOTME commented on August 15, 2024

Why not change all
A A1
to
A1 A1
for the type have introduced the custom comparator.

from sqllogictest-rs.

ZENOTME avatar ZENOTME commented on August 15, 2024
pub trait CustomResult:ToString {
    fn cmp(&self,str:&str)->bool;
}

impl CustomResult for String {
    fn cmp(&self,str:&str)->bool{
        self == str
    }
}
pub type BoxResult = Box<dyn CustomResult>;

I try to create a CustomResult, but I find a this will caused we can't use unstable_sort() in runner. The reason is we can't compare the trait object.
So I thinking about two solution:

  1. reimplement a runner for CustomResult that can't support raw_sort, user should responsible to use 'order by'.
  2. I think the function of raw sort is to compare the result in ignoring order. We can just compare one row of object to all row of string. Such as:
output:
  [
     1: [object1 , object2 , object3]
  ]
expect:
  [
     1: [string1, string2, string3]
     2: [string4,string5, string6]
  ]
for output_row : output{
  for expect_row : expect{
     if compare(output_row,expect_row).is_success {
       ...
       break;
     }
    ...
  }
}

from sqllogictest-rs.

ZENOTME avatar ZENOTME commented on August 15, 2024

For sqllogictest, I think the binary format isn't appropriate for us. So I try to modify the rust-postgres to request text format result in extended query protocol and use it in extended-engine locally. It can run all the e2e test in risingwave. I try to push this modification to rust-postgres but there are no reaction now. So I think whether we can maintain a downstream version of rust-postgres first.

from sqllogictest-rs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.