Giter Club home page Giter Club logo

Comments (4)

WanWizard avatar WanWizard commented on August 23, 2024

Interesting observation.

I see that the PHP docs now contain a warning on the filter page, and the same, as a note, on the runtime config page: https://www.php.net/manual/en/filter.configuration.php. Those ini settings can not be set at runtime, so useless to fix this issue in the code.

I assume you've seen comment https://www.php.net/manual/en/filter.filters.sanitize.php#129098 too?

<?php
error_reporting(E_ALL);

function filter_string_polyfill(string $string): string
{
    $str = preg_replace('/\x00|<[^>]*>?/', '', $string);
    return str_replace(["'", '"'], ['&#39;', '&#34;'], $str);
}

$string = "óÓ";

echo filter_var($string,FILTER_SANITIZE_STRING).PHP_EOL;

echo htmlspecialchars($string).PHP_EOL;

echo strip_tags($string).PHP_EOL;

echo htmlspecialchars(strip_tags($string,ENT_QUOTES)).PHP_EOL;

echo filter_string_polyfill($string).PHP_EOL;

from core.

bartlomiejb avatar bartlomiejb commented on August 23, 2024

Thanks for a quick reply!

Yeah, I saw the comment with this proposed polyfill and I think it is a good one and we can use it instead of the current solution in Security::strip_tags(). Should I prepare a patch or could you do it, please? It would be nice to have it fixed on "1.9/develop".
One note however - it is NOT an exact replacement! But I guess it is close enough.

Here is another example piece of code that shows difference (thankfully - no problems with ó/Ó anymore :)):

<?php

$ss = [
        "óÓ | T<A>G | Abc < tag > Def | Ghi < Jkl | t<a>g",
        "only open <xxx",
        "AAA<BBB>>CCC",
];
foreach ($ss as $s)
{
        echo "input:\t\t\t$s\n";
        echo str_repeat('-', 8*3 + strlen($s)) . "\n";
        echo "FILTER_SANITIZE_STRING:\t" . filter_var($s, FILTER_SANITIZE_STRING) . "\n";
        echo "strip_tags:\t\t" . strip_tags($s) . "\n";
        echo "preg_replace:\t\t" . preg_replace('/\x00|<[^>]*>?/', '', $s) . "\n";
        echo "\n";
}

The output is:

input:                  óÓ | T<A>G | Abc < tag > Def | Ghi < Jkl | t<a>g
--------------------------------------------------------------------------
FILTER_SANITIZE_STRING: óÓ | TG | Abc  Def | Ghi
strip_tags:             óÓ | TG | Abc < tag > Def | Ghi < Jkl | tg
preg_replace:           óÓ | TG | Abc  Def | Ghi g

input:                  only open <xxx
--------------------------------------
FILTER_SANITIZE_STRING: only open
strip_tags:             only open
preg_replace:           only open

input:                  AAA<BBB>>CCC
------------------------------------
FILTER_SANITIZE_STRING: AAA>CCC
strip_tags:             AAA>CCC
preg_replace:           AAA>CCC

Note the difference in first example with FILTER_SANITIZE_STRING and preg_replace.

It is all really interesting indeed... I mean, what a mess this is ;) - I have made a little bit of digging in source code of PHP, here are some pointers, if you are interested:
first I will just point to / cite some snippets from the implementation of various filters/functions with links to the source code of PHP 8.3.9, and then I will share some conclusions:

ext/filter/filter.c
     { "special_chars",   FILTER_SANITIZE_SPECIAL_CHARS, php_filter_special_chars   },

https://github.com/php/php-src/blob/PHP-8.3.9/ext/filter/sanitizing_filters.c#L222

void php_filter_special_chars(PHP_INPUT_FILTER_PARAM_DECL)
	php_filter_strip(value, flags);

	/* encodes ' " < > & \0 to numerical entities */
	enc['\''] = enc['"'] = enc['<'] = enc['>'] = enc['&'] = enc[0] = 1;

	/* if strip low is not set, then we encode them as &#xx; */
	memset(enc, 1, 32);

	if (flags & FILTER_FLAG_ENCODE_HIGH) {
		memset(enc + 127, 1, sizeof(enc) - 127);
	}

	php_filter_encode_html(value, enc);
ext/filter/filter.c
     { "full_special_chars",   FILTER_SANITIZE_FULL_SPECIAL_CHARS, php_filter_full_special_chars   },

https://github.com/php/php-src/blob/PHP-8.3.9/ext/filter/sanitizing_filters.c#L243

void php_filter_full_special_chars(PHP_INPUT_FILTER_PARAM_DECL)
	if (!(flags & FILTER_FLAG_NO_ENCODE_QUOTES)) {
		quotes = ENT_QUOTES;
	} else {
		quotes = ENT_NOQUOTES;
	}
	buf = php_escape_html_entities_ex(
		(unsigned char *) Z_STRVAL_P(value), Z_STRLEN_P(value), /* all */ 1, quotes,
		/* charset_hint */ NULL, /* double_encode */ 0, /* quiet */ 0);

https://github.com/php/php-src/blob/PHP-8.3.9/ext/standard/html.c#L1103

PHPAPI zend_string *php_escape_html_entities_ex(const unsigned char *old, size_t oldlen, int all, int flags, const char *hint_charset, bool double_encode, bool quiet)
	.. lot of code, including some substitutions based on some entity tables... ..

htmlspecialchars():
https://github.com/php/php-src/blob/PHP-8.3.9/ext/standard/html.c#L1345

PHP_FUNCTION(htmlspecialchars)
{
	php_html_entities(INTERNAL_FUNCTION_PARAM_PASSTHRU, 0);
}

https://github.com/php/php-src/blob/PHP-8.3.9/ext/standard/html.c#L1316

static void php_html_entities(INTERNAL_FUNCTION_PARAMETERS, int all)
	..
	bool double_encode = 1;
	..
	replaced = php_escape_html_entities_ex(
		(unsigned char*)ZSTR_VAL(str), ZSTR_LEN(str), all, (int) flags,
		hint_charset ? ZSTR_VAL(hint_charset) : NULL, double_encode, /* quiet */ 0);
ext/filter/filter.c
     { "string",          FILTER_SANITIZE_STRING,        php_filter_string          },
     { "stripped",        FILTER_SANITIZE_STRING,        php_filter_string          },

https://github.com/php/php-src/blob/PHP-8.3.9/ext/filter/sanitizing_filters.c#L168

void php_filter_string(PHP_INPUT_FILTER_PARAM_DECL)
...
	/* strip high/strip low ( see flags )*/
	php_filter_strip(value, flags);

	if (!(flags & FILTER_FLAG_NO_ENCODE_QUOTES)) {
		enc['\''] = enc['"'] = 1;
	}
	if (flags & FILTER_FLAG_ENCODE_AMP) {
		enc['&'] = 1;
	}
	if (flags & FILTER_FLAG_ENCODE_LOW) {
		memset(enc, 1, 32);
	}
	if (flags & FILTER_FLAG_ENCODE_HIGH) {
		memset(enc + 127, 1, sizeof(enc) - 127);
	}

	php_filter_encode_html(value, enc);
	
	new_len = php_strip_tags_ex(Z_STRVAL_P(value), Z_STRLEN_P(value), NULL, 0, 1);

https://github.com/php/php-src/blob/PHP-8.3.9/ext/standard/string.c#L4799

/* [[state machine...]] */
PHPAPI size_t php_strip_tags_ex(char *rbuf, size_t len, const char *allow, size_t allow_len, bool allow_tag_spaces)
	.. a lot of code removing <..> etc. ..

strip_tags():
https://github.com/php/php-src/blob/PHP-8.3.9/ext/standard/string.c#L4522

PHP_FUNCTION(strip_tags)
...
	ZSTR_LEN(buf) = php_strip_tags_ex(ZSTR_VAL(buf), ZSTR_LEN(str), allowed_tags, allowed_tags_len, 0);
...

Few conclusions:

  • implementation of FILTER_SANITIZE_FULL_SPECIAL_CHARS and FILTER_SANITIZE_SPECIAL_CHARS is quite different, despite similar names: FULL version ends up in php_escape_html_entities_ex() which is much more complicated than the version without FULL
  • FILTER_SANITIZE_FULL_SPECIAL_CHARS indeed runs, more or less, the same as htmlspecialchars() - however note that FILTER_SANITIZE_FULL_SPECIAL_CHARS calls php_escape_html_entities_ex() with the (PHP internal) parameter all = 1, and htmlspecialchars() with all = 0! I guess that is the reason why the first one replaces ó with oacute, and the second one not
  • the similar story is with FILTER_SANITIZE_STRING and with strip_tags(): both ends up calling php_strip_tags_ex(), but with the different last parameter, called allow_tag_spaces! I guess that is the reason why they behave differently...

from core.

bartlomiejb avatar bartlomiejb commented on August 23, 2024

Or, better yet, let's use strip_tags() + preg_replace() to replace FILTER_SANITIZE_STRING: it gives the best results in my tests. Here is an updated example code and its output (I ommited implementing replacing quotes only for brevity, it should be in a final version in Fuel, of course):

<?php

$ss = [
        "óÓ | T<A>G | Abc < tag > Def | Ghi < Jkl | t<a>g",
        "only open <xxx",
        "AAA<BBB>>CCC",
        "Some \"' <bizzare> string & to Sanitize < !$@%",
];
foreach ($ss as $s)
{
        echo "input:\t\t\t\t$s\n";
        echo str_repeat('-', 8*4 + strlen($s)) . "\n";
        echo "FILTER_SANITIZE_STRING:\t\t" . filter_var($s, FILTER_SANITIZE_STRING) . "\n";
        echo "strip_tags:\t\t\t" . strip_tags($s) . "\n";
        echo "preg_replace:\t\t\t" . preg_replace('/\x00|<[^>]*>?/', '', $s) . "\n";
        echo "strip_tags + preg_replace:\t" . preg_replace('/\x00|<[^>]*>?/', '', strip_tags($s)) . "\n";
        echo "\n";
}

Output:

input:                          óÓ | T<A>G | Abc < tag > Def | Ghi < Jkl | t<a>g
----------------------------------------------------------------------------------
FILTER_SANITIZE_STRING:         óÓ | TG | Abc  Def | Ghi
strip_tags:                     óÓ | TG | Abc < tag > Def | Ghi < Jkl | tg
preg_replace:                   óÓ | TG | Abc  Def | Ghi g
strip_tags + preg_replace:      óÓ | TG | Abc  Def | Ghi

input:                          only open <xxx
----------------------------------------------
FILTER_SANITIZE_STRING:         only open
strip_tags:                     only open
preg_replace:                   only open
strip_tags + preg_replace:      only open

input:                          AAA<BBB>>CCC
--------------------------------------------
FILTER_SANITIZE_STRING:         AAA>CCC
strip_tags:                     AAA>CCC
preg_replace:                   AAA>CCC
strip_tags + preg_replace:      AAA>CCC

input:                          Some "' <bizzare> string & to Sanitize < !$@%
-----------------------------------------------------------------------------
FILTER_SANITIZE_STRING:         Some &#34;&#39;  string & to Sanitize
strip_tags:                     Some "'  string & to Sanitize < !$@%
preg_replace:                   Some "'  string & to Sanitize
strip_tags + preg_replace:      Some "'  string & to Sanitize

from core.

WanWizard avatar WanWizard commented on August 23, 2024

If you can create a pull request for it? You deserve the credits for this 😁 .

from core.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.