Giter Club home page Giter Club logo

conformance's People

Contributors

cclauss avatar echeran avatar gnrunge avatar mosuem avatar mradbourne avatar robertbastian avatar sffc avatar srl295 avatar sven-oly avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

conformance's Issues

end-to-end not exiting on fatal Rust executor errors

The Rust executor is getting an error when trying to execute sendOneLine, and it does so for every batch of 10,000 tests that it sends.

Ex:

Testing ../executors/rust/target/release/executor / coll_shift_short. 190,000 of 192,707
Testing ../executors/rust/target/release/executor / coll_shift_short. 191,000 of 192,707
Testing ../executors/rust/target/release/executor / coll_shift_short. 192,000 of 192,707
!!! sendOneLine fails: input => {"label": "0190000", "string1": "\u2eb6!", "string2": "\u2eb6?", "test_type": "coll_shift_short"}
{"label": "0190001", "string1": "\u2eb6?", "string2": "\u2eb7!", "test_type": "coll_shift_short"}
...
#EXIT<. Err = [Errno 2] No such file or directory: '../executors/rust/target/release/executor'
!!!!!! processBatchOfTests: "platform error": "None"

Issues:

  • The Python script running everything logs the entire batch of test cases upon this error. We shouldn't print those 10000 lines
  • In cases where the Python script can't get the executors to do basic things properly, the Python script should exit with a non-zero exit code

Bonus points: in the future, we can use a logging library so that we can more easily control the behavior differently on our local machines vs. on CI

Integrate schema validation into executables

For the executables that we run (test data generator, test executor), we should validate the inputs to the executable against the schema within the executable, right before we use them.

So if step A generates output a that goes into step B that generates b, ..., then we want step B validating values in a right before it processes them.

That protects us against the data inconsistency of stale data problem.

test input issues for NumberFormatter / ICU4J

Some of these issues are a part of the test framework (ex: schema definition), some might be related to the ICU4J executor, some might be for the ICU4J NumberFormatter APIs.

Set locale field for collation tests

Also, for any of the existing collation tests, they are implicitly defaulting to the root locale, which is und. Updating these tests to have a specified locale means that we set the locale to be und .

Rename 'rust' to 'icu4x' in testdriver, executor code

The code has been using "Rust" instead of "ICU4X". We should rename accordingly.

Since the thing under test is an i18n library, we should rename our code according to the library name under test. The version number of the language runtime needed for the library version is a separate thing, and may not correspond 1:1 anyways (ex: ICU4X 1.0 and ICU4X 1.1 were developed against Rust 1.61, ICU4X 1.2 was developed against Rust 1.68.2).

Configure logging

Configure logging to have a single global settings file/config.

Also, make the logging level in CI be high enough to not show test execution progress.

Fix version labeling to use ICU4X version, not Rust

In the summary page and also in the detail page, the platform version is shown but not the ICU4X version, e.g.,
"platform: {'cldrVersion': '43.1.0', 'icuVersion': 'icu4x/2023-05-02/73.x', 'platform': 'rust', 'platformVersion': '1.73.0'}"

This should show the ICU4X version, e.g., 1.3 or 1.4, not "1.73".

Fix handling of non-matching surrogates in collation data.

The current test generator doesn't create tests for collation data when either of the test strings contains an incomplete surrogate. These are recorded in the logging files but they are not stored in any data or mentioned in any dashboards.

verifier crashes

From a fresh checkout of main, when running sh generateDataAndRun.sh, I get the following:

#EXIT<. Err = [Errno 2] No such file or directory: '../executors/rust/target/release/executor'
!!!!!! processBatchOfTests: "platform error": "None"

Traceback (most recent call last):
  File "/usr/local/google/home/elango/oss/conformance/testdriver/testdriver.py", line 111, in <module>
    main(sys.argv)
  File "/usr/local/google/home/elango/oss/conformance/testdriver/testdriver.py", line 101, in main
    driver.runPlans()
  File "/usr/local/google/home/elango/oss/conformance/testdriver/testdriver.py", line 91, in runPlans
    plan.runPlan()
  File "/usr/local/google/home/elango/oss/conformance/testdriver/testplan.py", line 86, in runPlan
    self.runOneTestMode()
  File "/usr/local/google/home/elango/oss/conformance/testdriver/testplan.py", line 219, in runOneTestMode
    numErrors = self.runAllSingleTests(per_execution)
  File "/usr/local/google/home/elango/oss/conformance/testdriver/testplan.py", line 279, in runAllSingleTests
    allTestResults.extend(self.processBatchOfTests(testLines))
TypeError: 'NoneType' object is not iterable
1
Verifier starting on 9 verify cases
  Verifying test coll_shift_short on rust executor
Cannot load ../TEMP_DATA/testResults/rust/coll_test_shift.json result data: Expecting value: line 1 column 1 (char 0)Traceback (most recent call last):
  File "/usr/local/google/home/elango/oss/conformance/verifier/verifier.py", line 500, in <module>
    main(sys.argv)
  File "/usr/local/google/home/elango/oss/conformance/verifier/verifier.py", line 491, in main
    verifier.verifyDataResults()
  File "/usr/local/google/home/elango/oss/conformance/verifier/verifier.py", line 189, in verifyDataResults
    self.compareTestToExpected()
  File "/usr/local/google/home/elango/oss/conformance/verifier/verifier.py", line 267, in compareTestToExpected
    self.report.platform_info = self.resultData['platform']
AttributeError: 'Verifier' object has no attribute 'resultData'. Did you mean: 'result_path'?
1

Create simple clustering of test failure/error results

When there are many test failures or errors, there are too many instances to report each one individually. Many of the test cases might look the same, and without any subgrouping.

It might be helpful to implement some simple unsupervised clustering of the input values (say, taking the top 10 most frequent values per input struct key) and report the top 10 counts.

Must deal with missing or incorrect icu testdata version

The testdriver code assumes that the --icu_version parameter for the test driver is defined and that it refers to existing data. However, the value may be missing or may not be one of the defined test sets.

Proposed solution: check all defined testdata directories. If icu_version is not defined or a bad value is given, use the highest number ICU version, e.g., a value of "xyz" will look at subdirectory names and pick the one that sorts highest.

For example, if the directories are [icu73, icu72, and icu71], a missing or incorrect value for icu_version will select icu73 data for testing.

More flexible source data download in testdata generator

testdata_gen.py hardcodes the source of data using a Github URL for a file from a specific version of ICU: https://github.com/unicode-org/conformance/blob/main/testgen/testdata_gen.py#L334

Instead, we should:

  • Options to de-flake the download process
    • Separate the download step from the data generation step
    • Enable option to download a file vs. using a local copy
    • Show user a display of download progress
  • Handle versioning of data (allow different versions of input)

Use HTML files to do HTML templating

Created from comment at #67 (comment)

+1 from me on this. Doing so should be win-win for everyone. It will probably feel like using jQuery.

It seems like the best way to do this in Python is using the Beautiful Soup library (docs). I've used JSoup in Java before, and that was really nice (powerful and easy). Beautiful Soup and JSoup seem to be comparable.

Using a regular HTML file as the input for HTML templating, rather than some special syntax that requires some special engine to interpret, is a simpler way to go. (Examples of special syntax HTML templating that are all-too-common still: ex1, ex2). The simplicity is that you keep code in Python along with the caller to the library, and you keep markup in HTML, and you don't mix the two. Not having to deal with yet another syntax is a follow on benefit.

Using logging instead of print

For the test driver and test data generator in Python, we should use logging instead of just printing to the console.

At the least, it's equivalent. But the potential benefits are:

  • logging methods (ex: logging.debug(), logging.error()) allow us to indicate what severity a statement is
  • we can control what level we view logs at for testing mode, debugging mode, and production mode
  • we can configure the format of the messages if needed (add timestamps, etc or not)

Leave input line untransformed in the error handling

Revisit #145 (comment), where an executor encounters an error in processing a test case. Instead of returning the test case input line as is in the error response, the error handling code is transforming the input line before including in the error response. This transformation seems unintended, unless there is a good reason.

@sven-oly

Add flexible pagination in test reports

For test reports, add pagination to speed review of test failures / errors / unimplemented options. This could use JSON data loaded directly rather than creating tables in the Python code.

Validate test case input and output at runtime

Now that we have schemas for test input and output, we should enable runtime validation of those test inputs & outputs across the board.

Doing so will enable the realization of a large chunk of the value proposition for having the schemas. It would ensure that all test cases passed to executors, and all data received from executors, adhere to the contracts defined by the schemas.

Define schema of test case data JSON

Some options for defining a schema:

  • JSON Schema
  • Protobuf

JSON Schema is a natural first choice. Also, it would take more effort to deal with Protobuf (perhaps too prohibitive in statically typed languages, even if possible in dynamic ones).

Only need to have a single tool to use JSON Schema since purpose is to validate once the JSON test data cases generated by the test generation tool.

Remove `DDT_DATA` dir and scripts referencing it

The DDT_DATA directory is obsolete at this point, and it seems to be just a copy of a portion of the TEMP_DATA directory that get created locally to store intermediate files.

We should remove the DDT_DATA directory. At this point, all scripts referencing that directory are obsolete, too.

Do not remove any Python code references to ddt_data. The Python identifier is the alias used for datasets.py when importing that Python file/module.

Executor for dart_native needs environment setup to execute

Testdriver with dart_native gives this in Linux environment. This needs to be fixed to run dart_native tests.

----> STDOUT= ><

!!!!!! !!!! ERROR IN EXECUTION: 255. STDERR = Unhandled exception:
UnimplementedError: Insert diplomat bindings here
#0 Collation4X.compareImpl (package:intl4x/src/collation/collation_4x.dart:16)
#1 Collation.compare (package:intl4x/src/collation/collation.dart:28)
#2 testCollator (file:///usr/local/google/home/ccornelius/ICU_conformance/conformance/executors/dart_native/bin/executor.dart:74)
#3 main. (file:///usr/local/google/home/ccornelius/ICU_conformance/conformance/executors/dart_native/bin/executor.dart:49)
#4 _RootZone.runUnaryGuarded (dart:async/zone.dart:1594)
#5 _BufferingStreamSubscription._sendData (dart:async/stream_impl.dart:339)
#6 _BufferingStreamSubscription._add (dart:async/stream_impl.dart:271)
#7 _SyncStreamControllerDispatch._sendData (dart:async/stream_controller.dart:776)
#8 _StreamController._add (dart:async/stream_controller.dart:650)
#9 _StreamController.add (dart:async/stream_controller.dart:598)
#10 _Socket._onData (dart:io-patch/socket_patch.dart:2381)
#11 _RootZone.runUnaryGuarded (dart:async/zone.dart:1594)
#12 _BufferingStreamSubscription._sendData (dart:async/stream_impl.dart:339)
#13 _BufferingStreamSubscription._add (dart:async/stream_impl.dart:271)
#14 _SyncStreamControllerDispatch._sendData (dart:async/stream_controller.dart:776)
#15 _StreamController._add (dart:async/stream_controller.dart:650)
#16 _StreamController.add (dart:async/stream_controller.dart:598)
#17 new _RawSocket. (dart:io-patch/socket_patch.dart:1899)
#18 _NativeSocket.issueReadEvent.issue (dart:io-patch/socket_patch.dart:1356)
#19 _microtaskLoop (dart:async/schedule_microtask.dart:40)
#20 _startMicrotaskLoop (dart:async/schedule_microtask.dart:49)
#21 _runPendingImmediateCallback (dart:isolate-patch/isolate_patch.dart:123)
#22 _RawReceivePort._handleMessage (dart:isolate-patch/isolate_patch.dart:190)
WARNING:root:!!!!!! process_batch_of_tests: "platform error": "!!!! ERROR IN EXECUTION: 255. STDERR = Unhandled exception:
UnimplementedError: Insert diplomat bindings here
#0 Collation4X.compareImpl (package:intl4x/src/collation/collation_4x.dart:16)
#1 Collation.compare (package:intl4x/src/collation/collation.dart:28)
#2 testCollator (file:///usr/local/google/home/ccornelius/ICU_conformance/conformance/executors/dart_native/bin/executor.dart:74)
#3 main. (file:///usr/local/google/home/ccornelius/ICU_conformance/conformance/executors/dart_native/bin/executor.dart:49)
#4 _RootZone.runUnaryGuarded (dart:async/zone.dart:1594)
#5 _BufferingStreamSubscription._sendData (dart:async/stream_impl.dart:339)
#6 _BufferingStreamSubscription._add (dart:async/stream_impl.dart:271)
#7 _SyncStreamControllerDispatch._sendData (dart:async/stream_controller.dart:776)
#8 _StreamController._add (dart:async/stream_controller.dart:650)
#9 _StreamController.add (dart:async/stream_controller.dart:598)
#10 _Socket._onData (dart:io-patch/socket_patch.dart:2381)
#11 _RootZone.runUnaryGuarded (dart:async/zone.dart:1594)
#12 _BufferingStreamSubscription._sendData (dart:async/stream_impl.dart:339)
#13 _BufferingStreamSubscription._add (dart:async/stream_impl.dart:271)
#14 _SyncStreamControllerDispatch._sendData (dart:async/stream_controller.dart:776)
#15 _StreamController._add (dart:async/stream_controller.dart:650)
#16 _StreamController.add (dart:async/stream_controller.dart:598)
#17 new _RawSocket. (dart:io-patch/socket_patch.dart:1899)
#18 _NativeSocket.issueReadEvent.issue (dart:io-patch/socket_patch.dart:1356)
#19 _microtaskLoop (dart:async/schedule_microtask.dart:40)
#20 _startMicrotaskLoop (dart:async/schedule_microtask.dart:49)
#21 _runPendingImmediateCallback (dart:isolate-patch/isolate_patch.dart:123)
#22 _RawReceivePort._handleMessage (dart:isolate-patch/isolate_patch.dart:190)
"

Number format tests include incorrect units

In many of the test failures for number format, the reason is that "furlong" is not a recognized unit. I think that the test data is incorrect, however. Perhaps the unit is not correctly set for many of the test cases.

Speed up end-to-end CI

We can speed up our end-to-end CI in different ways:

  • Cache Rust Cargo build artifacts
  • Split up executor work per-platform (or per-{platform, version})

ICU4X Collation failures

ICU4X in conformance testing shows more that 20% of the tests failing, seen here:
ICU4X/icu73

The actual collator options are seen in the test failure detail, with a few examples here. The inputs are s1 and s2 and the actual options used are given

  • {"label":"0010001","s1":"๐‘œฟ!","s2":"๐‘œฟ?","line":8661,"ignorePunctuation":true} CollatorOptions { strength: Some(Tertiary), alternate_handling: Some(Shifted), case_first: None, max_variable: None, case_level: None, numeric: None, backward_second_level: None }
  • {"label":"0243300","s1":"๐‘›b","s2":"๐‘œฑb","line":47434} CollatorOptions { strength: Some(Tertiary), alternate_handling: None, case_first: None, max_variable: None, case_level: None, numeric: None, backward_second_level: None }
  • {"label":"0373766","s1":"๏ค‡a","s2":"๏คˆa","line":177900} CollatorOptions { strength: Some(Tertiary), alternate_handling: None, case_first: None, max_variable: None, case_level: None, numeric: None, backward_second_level: None }

We need some help debugging help with this!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.