Giter Club home page Giter Club logo

php-utf-8's People

Contributors

corpsee avatar franzliedke avatar fsx avatar petsagouris avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

featherbb

php-utf-8's Issues

strcspn fails

Hello, I traced a bug to strcspn code from the utf8 library in fluxbb, which seems to be the same code as in your project and FSX. Your current tests pass but are insufficient, as it fails some tests I made for English and Japanese, comparing against how PHP's stock strcspn works.

I found a potential replacement function from Kohana team. It passes all the tests except one, where the mask is a Japanese character. The following script lets you choose which version (corpsee version or kohana version) is used.

The corpsee/fluxbb version fails on tests 1a, 1c, 1d, and 3f. Kohana fails on 3f only.

<?php
# Note I have posted as an issue at https://github.com/corpsee/php-utf-8/issues/2

include_once('/home/japtest/public_html/japantest/mb_utils.php');
include_once('/home/japtest/public_html/japantest/mb_fluxlib/utils/ascii.php');

/**
 * UTF8::strcspn
 *
 * @package    Kohana
 * @author     Kohana Team
 * @copyright  (c) 2007-2011 Kohana Team
 * @copyright  (c) 2005 Harry Fuecks
 * @license    http://www.gnu.org/licenses/old-licenses/lgpl-2.1.txt
 * From https://github.com/ushahidi/Sweeper/blob/master/system/utf8/strcspn.php
 */
function kohana_strcspn($str, $mask, $offset = NULL, $length = NULL) {
    if ($str == '' OR $mask == '') return 0;

    if (utf8_is_ascii($str) AND utf8_is_ascii($mask))
      return ($offset === NULL) ? strcspn($str, $mask) : (($length === NULL) ? strcspn($str, $mask, $offset) : strcspn($str, $mask, $offset, $length));

    if ($offset !== NULL OR $length !== NULL) {
        $str = mb_substr($str, $offset, $length);
    }

    // Escape these characters:  - [ ] . : \ ^ /
    // The . and : are escaped to prevent possible warnings about POSIX regex elements
    #$mask = preg_replace('#[-[\].:\\\\^/]#u', '\\\\$0', $mask); 
    preg_match('/^[^'.$mask.']+/u', $str, $matches);

    return isset($matches[0]) ? mb_strlen($matches[0]) : 0;
}

function sel_strcspn($str, $mask, $offset = NULL, $length = NULL) {
    #  return utf8_strcspn($str, $mask, $offset, $length); # fluxbb/corpsee
           return kohana_strcspn($str, $mask, $offset, $length); # kohana team
}


print "\n1. English\n";
$s = "abcdefghijklmnop";
print "S: " . $s . "\n";
print "1a. U:" . sel_strcspn($s,'h',3) . " PHP:" . strcspn($s, 'h', 3) . " Should be 4 ('defg')\n";
print "1b. U:" . sel_strcspn($s,'d',3) . " PHP:" . strcspn($s, 'd', 3) . " Should be 0 ('') \n";


print "\n2. English HTML\n";
$s = "<b>A brief test.</b><div>"; # </b starts at 16. 21 is d.
# [] [25] Should be 0
$chars = '/>';
$pos = 21;
print "1c. [". sel_strcspn($s, $chars, $pos) . "] [" . strcspn($s, $chars, $pos) . "] Should be 3 \n";
$pos = 1;
print "1d. [". sel_strcspn($s, $chars, $pos) . "] [" . strcspn($s, $chars, $pos) . "] Should be 1 \n";


print "\n3. English HTML with Japanese\n";
$s = "<b>A brief test.</b><div>現在、企業ユーザー向けに多岐にわたるサービスを提供しています。</div>";
#WAS $chars = '/>';
$chars = '\/\>';
$pos = 21;

print "3a. U:" . sel_strcspn($s,">") . " PHP:" . strcspn($s,">") . " Should be 2 \n"; # 2 2 ^^2^^
print "3b. Char at pos {$pos} is: U:" . mb_substr($s,$pos,1) . " PHP:" . substr($s,$pos,1) . " Should be d \n"; # d d ^^d^^

print "S: " . $s . "\n";
print "3c. U:" . sel_strcspn($s, $chars, $pos) . " PHP:" . strcspn($s, $chars, $pos) . " Should be 0 \n"; # 0 3 ^^0^^

$chars = '向けに';
$pos = 26;
print "3d. U:" . sel_strcspn($s,">") . " PHP:" . strcspn($s,">") . " Should be 2 \n"; # 2 2 ^^2^^
print "3e. Char at pos {$pos} is: U:" . mb_substr($s,$pos,1) . " PHP:" . substr($s,$pos,1) . " Should be 在 \n"; 
print "3f. U:" . sel_strcspn($s, $chars, $pos) . " PHP:" . strcspn($s, $chars, $pos) . " Should be 8 \n"; # 8 utf8 chars: 在、企業ユーザー






print "\n\n--- The tests from this point on work. ---\n\n";
$str = 'iñtërnâtiônàlizætiøn';
print sel_strcspn($str,'t') . " ^^2^^ \n";
print sel_strcspn($str,'â') . " ^^6^^ \n";

$str = 'aeioustr';
print sel_strcspn($str,'tr') . "  " . strcspn($str,'tr') . "\n";

$str = 'internationalization';
print sel_strcspn($str,'a')  . "  " . strcspn($str,'a')  . "\n";

$str = "i\nñtërnâtiônàlizætiøn";
print sel_strcspn($str,'t')  . " ^^3^^ \n";

$str = "i\nñtërnâtiônàlizætiøn";
print sel_strcspn($str,"\n") . " ^^1^^ \n";


?>

Consider patchwork/utf8

Hello ! I discovered today your fork of php-utf8.

Just to let you know, I maintain a lib that has exactly the same scope:
https://github.com/nicolas-grekas/Patchwork-UTF8

I know about php-utf8 since years and I'm pretty sure Patchwork-UTF8 is worth considering for replacement, now that the devs of php-utf8 have stopped working on it.

Best regards,
Nicolas

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.