Giter Club home page Giter Club logo

Comments (11)

SorX14 avatar SorX14 commented on August 23, 2024 2

Presumably the reason for using regex to get the length of the string was to prevent a dep on mb_string yet allow unicode characters. You could do something like this instead:

if ($this->maximumKeyLength !== Capabilities::UNLIMITED_KEY_LENGTH
    && count(preg_split('/./u', $key, null)) > $this->maximumKeyLength
) {

This is much slower than preg_match but at least works. For a key 10000 😄's long and 10000 iterations:

  • preg_split is ~3s
  • preg_match is ~0.001s

Doing a length comparison:

  • mb_strlen is ~0.3s
  • strlen is ~0.0001s

Fixing this problem for the .1% of users who need massive key sizes will negatively impact the 99.9% majority speed-wise.

Perhaps something like this (not unicode accurate but gives a conservative limit). I'm not aware of a PCRE constant that provides a concrete limit to quantifier limits.

        protected const UNICODE_KEY_LENGTH_CHECK_LIMIT = 3000;
        
        if ($this->maximumKeyLength !== Capabilities::UNLIMITED_KEY_LENGTH) {
            if (
                $this->maximumKeyLength > self::UNICODE_KEY_LENGTH_CHECK_LIMIT 
                && strlen($key) > $this->maximumKeyLength
            ) {
                // Exception
            } else if (preg_match('/^.{'.($this->maximumKeyLength + 1).',}/u', $key)) {
                // Exception
            }
        }

Another approach would be to remove the check entirely for key lengths over a certain size. Rational being that you require modifying the default memory_limit and have a niche use-case so are likely technical enough to enforce key length yourself. Also, the underlying adapter would throw an exception should you exceed its limits.

        protected const UNICODE_KEY_LENGTH_CHECK_LIMIT = 3000;
        
        if (
            $this->maximumKeyLength !== Capabilities::UNLIMITED_KEY_LENGTH
            && strlen($key) < self::UNICODE_KEY_LENGTH_CHECK_LIMIT
            && preg_match('/^.{'.($this->maximumKeyLength + 1).',}/u', $key)
        ) {
                // Exception
        }

from laminas-cache.

SorX14 avatar SorX14 commented on August 23, 2024

Running into the same issue:

PHP Warning:  preg_match(): Compilation failed: number too big in {} quantifier at offset 9 in /vendor/laminas/laminas-cache/src/Psr/SimpleCache/SimpleCacheDecorator.php on line 338
  • laminas/laminas-cache-storage-adapter-redis 1.2.0
  • laminas/laminas-cache 2.12.1

from laminas-cache.

10n avatar 10n commented on August 23, 2024

Reading the PSR-16 specifications it seems the key should not be bigger then 64, This was previously imposed in validateKey method

// explicit verification
https://github.com/laminas/laminas-cache/blob/2.11.x/src/Psr/SimpleCache/SimpleCacheDecorator.php

// new capabilities verification
https://github.com/laminas/laminas-cache/blob/2.12.x/src/Psr/SimpleCache/SimpleCacheDecorator.php#L337

The fix that I would suggest is to not allow more than 64 characters even if the storage capabilities allows it.

from laminas-cache.

10n avatar 10n commented on August 23, 2024

Another issue that I see is that now Storages that may have capability less then 64 characters as maximum length will trigger an exception "The maximum key length capability must allow at least 64 characters".

If the storage must support at least 64 bytes in key and the PSR-16 indicates that the key must not be more than 64 bytes ... then it does not make any sense to have a property that stores the maximumKeyLength . If the storage is validated, that means the hard limit is by default 64.

I believe that if there is a maximum key validation that means storages with less then 64 bytes for the maximum key length should be allowed.

from laminas-cache.

10n avatar 10n commented on August 23, 2024

I have created a Pull request. #155

from laminas-cache.

boesing avatar boesing commented on August 23, 2024

I think we need an optimization of the check regarding the key length.
I dont think that regex is the best way to verify a maximum length of a string.


PSR-16 states that at least 64 characters have to be supported. Thats the minimum, not the maximum.
You can read about this here: https://www.php-fig.org/psr/psr-16/

Key - A string of at least one character that uniquely identifies a cached item. Implementing libraries MUST support keys consisting of the characters A-Z, a-z, 0-9, _, and . in any order in UTF-8 encoding and a length of up to 64 characters. Implementing libraries MAY support additional characters and encodings or longer lengths, but MUST support at least that minimum. Libraries are responsible for their own escaping of key strings as appropriate, but MUST be able to return the original unmodified key string. The following characters are reserved for future extensions and MUST NOT be supported by implementing libraries: {}()/\@:

There are also integration tests which checks that:
https://github.com/php-cache/integration-tests/blob/824633c9547dc3c7ac5faeca03dd82939cdeb73c/src/SimpleCacheTest.php#L422
https://github.com/php-cache/integration-tests/blob/824633c9547dc3c7ac5faeca03dd82939cdeb73c/src/CachePoolTest.php#L122

from laminas-cache.

boesing avatar boesing commented on August 23, 2024

Okay, I've had some investigations and came to a conclusion:
For the 2.x versions, the redis adapter v1 is in charge.

To fix this problem, we wont be able to totally avoid mb_strlen but I think I got something which at least will decrease the need for it.
However, due to the fact that this component does not yet require mbstring as an extension, we cannot rely on it. Using the symfony polyfill was an idea, but it uses iconv which could be missing as well.

So instead of changing the regex handling here, I will prepare a new minor version of the redis adapter which decreases the maximum key length to 65534 which will ensure that the regex of the decorator won't blow up.

In v2 of the redis adapter, the 512 MB will be re-added.
In v3 of the cache component, mbstring will be required (see #157) and the following code might work good enough:

https://3v4l.org/GURWn

// This already exists tho
if ($this->maximumKeyLength === Capabilities::UNLIMITED_KEY_LENGTH) {
    return;
}

// a single UTF-8 character may use up to 4 characters, so we do a naive calculation here to verify if that limit is exceeded
// keep in mind that for the redis adapter, a cache key must exceed 128000000 characters (128MB) to not pass this check!!!
$maximumNonUnicodeKeyLength = $this->maximumKeyLength / 4;
if (!isset($key[$maximumNonUnicodeKeyLength + 1])) {
    return;
}

// current functionality
$validator = function (string $key): bool {
    return ! (bool) preg_match('/^.{'. $this->maximumKeyLength . ',}$/u',$key);
};

// if the key length exceeds the maximum quantifier length, use `mb_strlen`
if ($this->maximumKeyLength > 65535) {
    $validator = function (string $key): bool {
        return mb_strlen($key, 'UTF-8') <= $this->maximumKeyLength;
    };
}

if ($validator($key)) {
    return;
}

throw new SimpleCacheInvalidArgumentException(sprintf(
    'Invalid key "%s" provided; key is too long. Must be no more than %d characters',
    $key,
    $this->maximumKeyLength
));

What do you think?

from laminas-cache.

10n avatar 10n commented on August 23, 2024

Maybe just use if else to avoid declaring an anonymous function that is immediately overridden ?

from laminas-cache.

SorX14 avatar SorX14 commented on August 23, 2024

From PSR-16, the enforcement of the minimum key length is already achieved with the current code and doesn't exceed regex limits.

Redis allows for 512 MB for key length, which is a size limit, not a character length. I believe the current implementation (https://github.com/laminas/laminas-cache-storage-adapter-redis/pull/14/files) is incorrect as a unicode 512000000 character string exceeds this.

If the adapters returned a max byte size for the key, you can derive the max character length with multiplication and only need strlen. This removes any headaches with character encoding and puts the onus on the adapter to report its capabilities. Perhaps a flag to say which calculation its using would be helpful for BC.

The goal of caching is performance and mb_strlen not only requires slower introspection of the source, but it also carries the baggage of a new extension. By changing to byte limit instead of character length you'll be able to achieve validation while retaining speed.

The check for max limit is maybe out of the remit of the PSR-16 code considering as enforcement hasn't been recommended, although the simple error message is convenient.

from laminas-cache.

boesing avatar boesing commented on August 23, 2024

@SorX14 good point. Using mb_strlen on a 4-byte character would lead to a result of 1.

So maybe not depending on mbstring at all would be a good thing. Maybe using utf8_decode along with strlen as suggested by @ghostwriter would be a better approach. But utf8 characters which will consume more than a Single Byte wont be detected either as these are converted to a single ? as of how I understand the documentation of utf8_decode.

Maybe we keep the maximum key length of redis at 65534 as it should be long enough anyway. If some one needs longer keys, they should redis directly rather than a library. 🤷🏼‍♂️

from laminas-cache.

boesing avatar boesing commented on August 23, 2024

Closed with #158

from laminas-cache.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.