Giter Club home page Giter Club logo

Comments (6)

Slamdunk avatar Slamdunk commented on July 29, 2024

Thank you for bringing this up.

So the IntlDateFormatter::LONG pattern for zh_Hans_HK seems to be y年M月d日 z ah:mm:ss.

Then the FormDateTimeSelect tries to split that pattern here:

$pregResult = preg_split(
"/([ \-,.:\/]*'.*?'[ \-,.:\/]*)|([ \-,.:\/]+)/",
$pattern,
-1,
PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY
);

The last ([ \-,.:\/]+) is what we need to focus on: it considers few basic chars as the split sequence, but not those chinese chars inside y年M月d日.

We could try to fix this by replacing any non-ASCII char with a space, so the preg_split behaves as expected again, what do you think?

from laminas-form.

pine3ree avatar pine3ree commented on July 29, 2024

Hello @Slamdunk ,

I believe that those non-ASCII characters (mostly present in asian languages, I remember Japanese uses them kanji too) are meaningful delimiters and they should be captured as such (like the 'at' delimiter for en_US) so that are displayed later on before/after the corresponding "select" element.
They usually mean "day", "month", "year",...and so on. I am not sure that simply surrounding them with single quotes before parsing will work.
Different locale also use them differently:

  • zh_* locales use pictograms down to IntlDateFormatter::MEDIUM
  • ja_* locales use pictograms down to IntlDateFormatter::FULL

Anyway, we should either (1) add tests and make the helpers work for all supported locales, or (2) limit the supported locales and add a generic simple alternative for those we do not (won't or can't) support.

kind regards

PS
I guess that after JavaScript selectors appeared many years ago, very few developers are nowadays using "select" element groups for "datetime" related inputs.

from laminas-form.

Slamdunk avatar Slamdunk commented on July 29, 2024

(2) limit the supported locales and add a generic simple alternative for those we do not (won't or can't) support.

That sounds fair enough to me: would you like to propose such change?

from laminas-form.

pine3ree avatar pine3ree commented on July 29, 2024

PS
As a quick fix (what I added in my plates functions for laminas-form)

        if (!isset($result['month'])) {
            $result['month'] = 'M';
        }

and similar for other missed captures

(edit) not related to your answer, I saw it after posting

from laminas-form.

pine3ree avatar pine3ree commented on July 29, 2024

btw, this string, wrapping pictograms inside single quotes, y'年'M'月'd'日' z ah:mm:ss is parsed correctly

from laminas-form.

pine3ree avatar pine3ree commented on July 29, 2024

Premise: I deleted all previous comments, since I believe to have found a simpler generic regular common expression for splitting the intl date-time pattern, in expanded format:

const SPLIT_REGEX = <<<EOR
    /
        (
            [^a-z']*
            (?:
                \('[^']+'\)
                |
                '[^']+'
                |
                [^a-z']+
            )+
            [^a-z']*
        )+
    /xiu
    EOR;

together with the modified method:

function getPattern(string $locale, IntlDateFormatter $intl = null): string
{
    $intl = new IntlDateFormatter($this->getLocale(), $this->dateType, $this->timeType);

    $pattern = $intl->getPattern();
    // Remove time zone format character present in various forms
    $pattern = str_replace(['(z)', '[z]', 'z ', ' z ' , ' z'], ' ', $pattern);
    // Remove time meridiem character present in various forms
    $pattern = str_replace(['(a)', '[a]', ' a ' , 'a ', ' a'], ' ', $pattern);
    // Cleanup extra inner spaces
    $pattern = preg_replace('/\s+/', ' ', $pattern);
    // Remove trailing commas from previous operations
    $pattern = trim($pattern, ", \t\n\r\0\x0B");

    return $pattern;
}

The regex works like this:

ref: https://www.unicode.org/reports/tr35/tr35-dates.html#Date_Field_Symbol_Table
ref: https://unicode-org.github.io/icu/userguide/format_parse/datetime/#date-field-symbol-table

the splitting string may have:

  • OPTIONAL PREFIX: non alfabetic-ascii chars (alfas not wrapped in single quotes are only used for date-time output, see tables)
  • EITHER
    • an escaped string wrapped in parenthesis ('a') , ('e')
    • an escaped unwrapped string 'at'
    • one or more non alfabetic-ascii chars
  • OPTIONAL SUFFIX: non alfabetic-ascii chars (alfas not wrapped in single quotes are only used for date-time output, see tables)

non alfabetic-ascii chars include standard date time separators like /, -, :, etc and all unicode symbols for year, month, day etc

result: https://onlinephp.io/c/97ff2

from laminas-form.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.