unicode-org / conformance Goto Github PK

Unicode & CLDR Data Driven Testing

Home Page: https://unicode-org.github.io/conformance/

License: Other

JavaScript 6.33% Python 45.99% Rust 5.50% Shell 3.44% Makefile 0.78% C 0.62% C++ 10.16% HTML 9.05% Dart 5.36% Java 12.76%

cldr test

conformance's People

Contributors

Stargazers

Watchers

Forkers

sven-oly gnrunge echeran mosuem cclauss seanpm2001 robertbastian hsivonen mradbourne priyanshujava

conformance's Issues

Turn off logging of progresss during GHA runs.

This would save time and output. Logging "10 of 1000" is not necessary in non-interactive runs.

Update the model to include tuple of platform + release, icu data version, and test type

Add this to the testing and reporting to support more flexible testing. For example:

ICU4C release 74.2 with icu69 test data on all test types
ICU4C 74.2 with icu74 testdata on all test types
NodeJS 18. with icu72 testdata
NodeJS 20.1.0 with icu73 test data
NodeJS 21.6.0 with icu72 data
NodeJS 21.6.0 with icu73 data
NodeJS 21.6.0 with icu74 data

Add collation with non-ignorable option

Started in PR#94: #94

end-to-end not exiting on fatal Rust executor errors

The Rust executor is getting an error when trying to execute sendOneLine, and it does so for every batch of 10,000 tests that it sends.

Ex:

Testing ../executors/rust/target/release/executor / coll_shift_short. 190,000 of 192,707
Testing ../executors/rust/target/release/executor / coll_shift_short. 191,000 of 192,707
Testing ../executors/rust/target/release/executor / coll_shift_short. 192,000 of 192,707
!!! sendOneLine fails: input => {"label": "0190000", "string1": "\u2eb6!", "string2": "\u2eb6?", "test_type": "coll_shift_short"}
{"label": "0190001", "string1": "\u2eb6?", "string2": "\u2eb7!", "test_type": "coll_shift_short"}
...
#EXIT<. Err = [Errno 2] No such file or directory: '../executors/rust/target/release/executor'
!!!!!! processBatchOfTests: "platform error": "None"

Issues:

The Python script running everything logs the entire batch of test cases upon this error. We shouldn't print those 10000 lines
In cases where the Python script can't get the executors to do basic things properly, the Python script should exit with a non-zero exit code

Bonus points: in the future, we can use a logging library so that we can more easily control the behavior differently on our local machines vs. on CI

Integrate schema validation into executables

For the executables that we run (test data generator, test executor), we should validate the inputs to the executable against the schema within the executable, right before we use them.

So if step A generates output a that goes into step B that generates b, ..., then we want step B validating values in a right before it processes them.

That protects us against the data inconsistency of stale data problem.

Add C++ executor with ICU4C

ICU4X - use more collation options

Use compare_type in collation to reduce test failures. Consider other options, too, e.g., strength.

CPP Likely Subtags: Replace "_" with '-' to match expected results

ICU4C uses "_" to separate components of the locale string. However, test data, Dart, ICU4X, and Node all use "-".

test input issues for NumberFormatter / ICU4J

Some of these issues are a part of the test framework (ex: schema definition), some might be related to the ICU4J executor, some might be for the ICU4J NumberFormatter APIs.

the types of the enum values for groupingStrategy are not homogeneous. This caused a problem for the ICU4J executor when handling the parsed value, requiring a workaround to stringify the parsed value followed before conversion to an enum was possible.
the names of the enums for groupingStrategy don't match the enum names for NumberFormatter.GroupingStrategy.
does ICU4J support halfCeil or halfFloor? It is not a part of java.math.RoundingMode
notation should be an enum instead of an open ended string
how do you use currencyDisplay in ICU4J NumberFormatter?
Update test data generator to skip tests with roundingMode = exact
Fix bugs in test data generation for skeleton values not fully matching the provided options

Add end-to-end pipeline commands to CI

Set locale field for collation tests

Also, for any of the existing collation tests, they are implicitly defaulting to the root locale, which is und. Updating these tests to have a specified locale means that we set the locale to be und .

Add errors and exceptions in test generation to the test data .json

All output warnings and errors should become available to later stages of the processing. At this time, they are merely output as logging to a terminal.

Rename 'rust' to 'icu4x' in testdriver, executor code

The code has been using "Rust" instead of "ICU4X". We should rename accordingly.

Since the thing under test is an i18n library, we should rename our code according to the library name under test. The version number of the language runtime needed for the library version is a separate thing, and may not correspond 1:1 anyways (ex: ICU4X 1.0 and ICU4X 1.1 were developed against Rust 1.61, ICU4X 1.2 was developed against Rust 1.68.2).

Configure logging

Configure logging to have a single global settings file/config.

Also, make the logging level in CI be high enough to not show test execution progress.

Fix many failing tests of collation in ICU4C

Handling the characters in escaping and converting to UnicodeString may be part of the problem here.

Debug problems with collation data and missing verifications

Some types of collation are missing verification data, giving runtime errors with no explanation, e.g., label "00000", "00002", ...

Fix these so the data and verifications are correct.

Fix version labeling to use ICU4X version, not Rust

In the summary page and also in the detail page, the platform version is shown but not the ICU4X version, e.g.,
"platform: {'cldrVersion': '43.1.0', 'icuVersion': 'icu4x/2023-05-02/73.x', 'platform': 'rust', 'platformVersion': '1.73.0'}"

This should show the ICU4X version, e.g., 1.3 or 1.4, not "1.73".

Fix handling of non-matching surrogates in collation data.

The current test generator doesn't create tests for collation data when either of the test strings contains an incomplete surrogate. These are recorded in the logging files but they are not stored in any data or mentioned in any dashboards.

Do we need to include the ECMA-402 version in the test case schema?

Is the schema of generated test cases affected by the version of ECMA-402 being used? If so, then include that version, too.

A follow-on task (or subtask) of #43.

verifier crashes

From a fresh checkout of main, when running sh generateDataAndRun.sh, I get the following:

#EXIT<. Err = [Errno 2] No such file or directory: '../executors/rust/target/release/executor'
!!!!!! processBatchOfTests: "platform error": "None"

Traceback (most recent call last):
  File "/usr/local/google/home/elango/oss/conformance/testdriver/testdriver.py", line 111, in <module>
    main(sys.argv)
  File "/usr/local/google/home/elango/oss/conformance/testdriver/testdriver.py", line 101, in main
    driver.runPlans()
  File "/usr/local/google/home/elango/oss/conformance/testdriver/testdriver.py", line 91, in runPlans
    plan.runPlan()
  File "/usr/local/google/home/elango/oss/conformance/testdriver/testplan.py", line 86, in runPlan
    self.runOneTestMode()
  File "/usr/local/google/home/elango/oss/conformance/testdriver/testplan.py", line 219, in runOneTestMode
    numErrors = self.runAllSingleTests(per_execution)
  File "/usr/local/google/home/elango/oss/conformance/testdriver/testplan.py", line 279, in runAllSingleTests
    allTestResults.extend(self.processBatchOfTests(testLines))
TypeError: 'NoneType' object is not iterable
1
Verifier starting on 9 verify cases
  Verifying test coll_shift_short on rust executor
Cannot load ../TEMP_DATA/testResults/rust/coll_test_shift.json result data: Expecting value: line 1 column 1 (char 0)Traceback (most recent call last):
  File "/usr/local/google/home/elango/oss/conformance/verifier/verifier.py", line 500, in <module>
    main(sys.argv)
  File "/usr/local/google/home/elango/oss/conformance/verifier/verifier.py", line 491, in main
    verifier.verifyDataResults()
  File "/usr/local/google/home/elango/oss/conformance/verifier/verifier.py", line 189, in verifyDataResults
    self.compareTestToExpected()
  File "/usr/local/google/home/elango/oss/conformance/verifier/verifier.py", line 267, in compareTestToExpected
    self.report.platform_info = self.resultData['platform']
AttributeError: 'Verifier' object has no attribute 'resultData'. Did you mean: 'result_path'?
1

Add likely subtags tests

CLDR is adding test data for likely subtags:

unicode-org/cldr#3173

We have support for this in both Intl and ICU4X. It would be a good test to add.

Create simple clustering of test failure/error results

When there are many test failures or errors, there are too many instances to report each one individually. Many of the test cases might look the same, and without any subgrouping.

It might be helpful to implement some simple unsupervised clustering of the input values (say, taking the top 10 most frequent values per input struct key) and report the top 10 counts.

Must deal with missing or incorrect icu testdata version

The testdriver code assumes that the --icu_version parameter for the test driver is defined and that it refers to existing data. However, the value may be missing or may not be one of the defined test sets.

Proposed solution: check all defined testdata directories. If icu_version is not defined or a bad value is given, use the highest number ICU version, e.g., a value of "xyz" will look at subdirectory names and pick the one that sorts highest.

For example, if the directories are [icu73, icu72, and icu71], a missing or incorrect value for icu_version will select icu73 data for testing.

Dart web setup in testdriver doesn't work in testing

The path for the dart_web executor isn't correct, and some parameters need updating.

See PR#84 for a fix.

More flexible source data download in testdata generator

testdata_gen.py hardcodes the source of data using a Github URL for a file from a specific version of ICU: https://github.com/unicode-org/conformance/blob/main/testgen/testdata_gen.py#L334

Instead, we should:

Options to de-flake the download process
- Separate the download step from the data generation step
- Enable option to download a file vs. using a local copy
- Show user a display of download progress
Handle versioning of data (allow different versions of input)

CaseMap data sources

We should include SpecialCasing.txt when we get around to writing a casemap adapter: https://unicode.org/Public/UNIDATA/SpecialCasing.txt

Add a Python linter for this code

Fix schema messages on main page

Simply to remove unneeded detail. Fix to include any failures.

Test generation creates bad options, causing test error

Check ICU4C likely subtags for unsupported favorScript

Find reason for many collation shift failures in NodeJS

The ignorePunctuation option doesn't have an effect on the test results for coll_shift_short data. This may be a problem in NodeJS

Mechanism to run different Node.js versions locally

We can either use:

nvm (Node Version Manager)
Docker

Add Java executor with ICU4J

Include HTML output from end-to-end execution in Github Pages

Use HTML files to do HTML templating

Created from comment at #67 (comment)

+1 from me on this. Doing so should be win-win for everyone. It will probably feel like using jQuery.

It seems like the best way to do this in Python is using the Beautiful Soup library (docs). I've used JSoup in Java before, and that was really nice (powerful and easy). Beautiful Soup and JSoup seem to be comparable.

Using a regular HTML file as the input for HTML templating, rather than some special syntax that requires some special engine to interpret, is a simpler way to go. (Examples of special syntax HTML templating that are all-too-common still: ex1, ex2). The simplicity is that you keep code in Python along with the caller to the library, and you keep markup in HTML, and you don't mix the two. Not having to deal with yet another syntax is a follow on benefit.

Using logging instead of print

For the test driver and test data generator in Python, we should use logging instead of just printing to the console.

At the least, it's equivalent. But the potential benefits are:

logging methods (ex: logging.debug(), logging.error()) allow us to indicate what severity a statement is
we can control what level we view logs at for testing mode, debugging mode, and production mode
we can configure the format of the messages if needed (add timestamps, etc or not)

Leave input line untransformed in the error handling

Revisit #145 (comment), where an executor encounters an error in processing a test case. Instead of returning the test case input line as is in the error response, the error handling code is transforming the input line before including in the error response. This transformation seems unintended, unless there is a good reason.

@sven-oly

Add flexible pagination in test reports

For test reports, add pagination to speed review of test failures / errors / unimplemented options. This could use JSON data loaded directly rather than creating tables in the Python code.

Reorganize verifier to open files based on verify plan

Current code uses settings of the the verifier object in function compareTestToExpected. It should use the data in the vplan object.

Use a single tuple of versions from {ICU/CLDR/Unicode}

This more accurately will represent the dependent relationship between the codebases/data.

Validate test case input and output at runtime

Now that we have schemas for test input and output, we should enable runtime validation of those test inputs & outputs across the board.

Doing so will enable the realization of a large chunk of the value proposition for having the schemas. It would ensure that all test cases passed to executors, and all data received from executors, adhere to the contracts defined by the schemas.

Define schema of test case data JSON

Some options for defining a schema:

JSON Schema
Protobuf

JSON Schema is a natural first choice. Also, it would take more effort to deal with Protobuf (perhaps too prohibitive in statically typed languages, even if possible in dynamic ones).

Only need to have a single tool to use JSON Schema since purpose is to validate once the JSON test data cases generated by the test generation tool.

Remove `DDT_DATA` dir and scripts referencing it

The DDT_DATA directory is obsolete at this point, and it seems to be just a copy of a portion of the TEMP_DATA directory that get created locally to store intermediate files.

We should remove the DDT_DATA directory. At this point, all scripts referencing that directory are obsolete, too.

Do not remove any Python code references to ddt_data. The Python identifier is the alias used for datasets.py when importing that Python file/module.

Executor for dart_native needs environment setup to execute

Testdriver with dart_native gives this in Linux environment. This needs to be fixed to run dart_native tests.

----> STDOUT= ><

!!!!!! !!!! ERROR IN EXECUTION: 255. STDERR = Unhandled exception:
UnimplementedError: Insert diplomat bindings here
#0 Collation4X.compareImpl (package:intl4x/src/collation/collation_4x.dart:16)
#1 Collation.compare (package:intl4x/src/collation/collation.dart:28)
#2 testCollator (file:///usr/local/google/home/ccornelius/ICU_conformance/conformance/executors/dart_native/bin/executor.dart:74)
#3 main. (file:///usr/local/google/home/ccornelius/ICU_conformance/conformance/executors/dart_native/bin/executor.dart:49)
#4 _RootZone.runUnaryGuarded (dart:async/zone.dart:1594)
#5 _BufferingStreamSubscription._sendData (dart:async/stream_impl.dart:339)
#6 _BufferingStreamSubscription._add (dart:async/stream_impl.dart:271)
#7 _SyncStreamControllerDispatch._sendData (dart:async/stream_controller.dart:776)
#8 _StreamController._add (dart:async/stream_controller.dart:650)
#9 _StreamController.add (dart:async/stream_controller.dart:598)
#10 _Socket._onData (dart:io-patch/socket_patch.dart:2381)
#11 _RootZone.runUnaryGuarded (dart:async/zone.dart:1594)
#12 _BufferingStreamSubscription._sendData (dart:async/stream_impl.dart:339)
#13 _BufferingStreamSubscription._add (dart:async/stream_impl.dart:271)
#14 _SyncStreamControllerDispatch._sendData (dart:async/stream_controller.dart:776)
#15 _StreamController._add (dart:async/stream_controller.dart:650)
#16 _StreamController.add (dart:async/stream_controller.dart:598)
#17 new _RawSocket. (dart:io-patch/socket_patch.dart:1899)
#18 _NativeSocket.issueReadEvent.issue (dart:io-patch/socket_patch.dart:1356)
#19 _microtaskLoop (dart:async/schedule_microtask.dart:40)
#20 _startMicrotaskLoop (dart:async/schedule_microtask.dart:49)
#21 _runPendingImmediateCallback (dart:isolate-patch/isolate_patch.dart:123)
#22 _RawReceivePort._handleMessage (dart:isolate-patch/isolate_patch.dart:190)
WARNING:root:!!!!!! process_batch_of_tests: "platform error": "!!!! ERROR IN EXECUTION: 255. STDERR = Unhandled exception:
UnimplementedError: Insert diplomat bindings here
#0 Collation4X.compareImpl (package:intl4x/src/collation/collation_4x.dart:16)
#1 Collation.compare (package:intl4x/src/collation/collation.dart:28)
#2 testCollator (file:///usr/local/google/home/ccornelius/ICU_conformance/conformance/executors/dart_native/bin/executor.dart:74)
#3 main. (file:///usr/local/google/home/ccornelius/ICU_conformance/conformance/executors/dart_native/bin/executor.dart:49)
#4 _RootZone.runUnaryGuarded (dart:async/zone.dart:1594)
#5 _BufferingStreamSubscription._sendData (dart:async/stream_impl.dart:339)
#6 _BufferingStreamSubscription._add (dart:async/stream_impl.dart:271)
#7 _SyncStreamControllerDispatch._sendData (dart:async/stream_controller.dart:776)
#8 _StreamController._add (dart:async/stream_controller.dart:650)
#9 _StreamController.add (dart:async/stream_controller.dart:598)
#10 _Socket._onData (dart:io-patch/socket_patch.dart:2381)
#11 _RootZone.runUnaryGuarded (dart:async/zone.dart:1594)
#12 _BufferingStreamSubscription._sendData (dart:async/stream_impl.dart:339)
#13 _BufferingStreamSubscription._add (dart:async/stream_impl.dart:271)
#14 _SyncStreamControllerDispatch._sendData (dart:async/stream_controller.dart:776)
#15 _StreamController._add (dart:async/stream_controller.dart:650)
#16 _StreamController.add (dart:async/stream_controller.dart:598)
#17 new _RawSocket. (dart:io-patch/socket_patch.dart:1899)
#18 _NativeSocket.issueReadEvent.issue (dart:io-patch/socket_patch.dart:1356)
#19 _microtaskLoop (dart:async/schedule_microtask.dart:40)
#20 _startMicrotaskLoop (dart:async/schedule_microtask.dart:49)
#21 _runPendingImmediateCallback (dart:isolate-patch/isolate_patch.dart:123)
#22 _RawReceivePort._handleMessage (dart:isolate-patch/isolate_patch.dart:190)
"

Number format tests include incorrect units

In many of the test failures for number format, the reason is that "furlong" is not a recognized unit. I think that the test data is incorrect, however. Perhaps the unit is not correctly set for many of the test cases.

Speed up end-to-end CI

We can speed up our end-to-end CI in different ways:

Cache Rust Cargo build artifacts
Split up executor work per-platform (or per-{platform, version})

Fix code creating characterizations in testreport.py.

The code in characterize_failures_by_options in verifier/testreport.py can be improved a lot by using collections.defaultdict.

See comments in this to fix the #124

ICU4X Collation failures

ICU4X in conformance testing shows more that 20% of the tests failing, seen here:
ICU4X/icu73

The actual collator options are seen in the test failure detail, with a few examples here. The inputs are s1 and s2 and the actual options used are given

{"label":"0010001","s1":"𑜿!","s2":"𑜿?","line":8661,"ignorePunctuation":true} CollatorOptions { strength: Some(Tertiary), alternate_handling: Some(Shifted), case_first: None, max_variable: None, case_level: None, numeric: None, backward_second_level: None }
{"label":"0243300","s1":"𑛁b","s2":"𑜱b","line":47434} CollatorOptions { strength: Some(Tertiary), alternate_handling: None, case_first: None, max_variable: None, case_level: None, numeric: None, backward_second_level: None }
{"label":"0373766","s1":"龜a","s2":"龜a","line":177900} CollatorOptions { strength: Some(Tertiary), alternate_handling: None, case_first: None, max_variable: None, case_level: None, numeric: None, backward_second_level: None }

We need some help debugging help with this!

Rust executor build error for ICU4X 1.0

It's not clear why the Rust executor fails to build in PR #59 . In executors/rust/Cargo.toml, all the versions of dependencies are fixed to a specific version (except for rust_version_runtime, which moved to version 2.x years ago).

I started PR #60 to fix (or at least diagnose) the error. It gives similar error output.

@sffc Any thoughts?

Remove extra logging of schema checks

Lots of debug lines are printed by schema checking. It's unnecessary!