Giter Club home page Giter Club logo

Comments (34)

nyamsprod avatar nyamsprod commented on July 22, 2024

Basically what you suggesting is adding the following methods :

League\Csv\AbstractCsv::setOutputBOM($use_bom_on_output); //$use_bom_on_output is a boolean
League\Csv\AbstractCsv::hasOutputBOM(); // returns true or false

Because we don't want any BC break by default:

League\Csv\AbstractCsv::hasOutputBOM(); returns false

To be clear this will only have an effect on the League\Csv\AbstractCsv::output method.
If you have a better names they are welcomed

from csv.

RomeroMsk avatar RomeroMsk commented on July 22, 2024

Ok, you can make it as you wish :)

from csv.

nyamsprod avatar nyamsprod commented on July 22, 2024

Since it will work on output.. it will be available on both classes Reader and Writer

from csv.

RomeroMsk avatar RomeroMsk commented on July 22, 2024

These names are ok for me.

from csv.

nyamsprod avatar nyamsprod commented on July 22, 2024

The dev-master branch now supports BOM on output you can test it and if everything is good for you it will be release hopefully next week as version 6.3

from csv.

RomeroMsk avatar RomeroMsk commented on July 22, 2024

Checked, it works. Thanks a lot!

from csv.

Heart1010 avatar Heart1010 commented on July 22, 2024

@RomeroMsk Do

$bom = chr(239).chr(187).chr(191);

work for you (which os? It should work on MS Excel Win but not an Mac?)? I've read this as "best solution" for Excel http://stackoverflow.com/questions/155097/microsoft-excel-mangles-diacritics-in-csv-files/1648671#1648671 which uses

/**
   * Export an array as downladable Excel CSV
   * @param array   $header
   * @param array   $data
   * @param string  $filename
   */
  function toCSV($header, $data, $filename) {
    $sep  = "\t";
    $eol  = "\n";
    $csv  =  count($header) ? '"'. implode('"'.$sep.'"', $header).'"'.$eol : '';
    foreach($data as $line) {
      $csv .= '"'. implode('"'.$sep.'"', $line).'"'.$eol;
    }
    $encoded_csv = mb_convert_encoding($csv, 'UTF-16LE', 'UTF-8');
    header('Content-Description: File Transfer');
    header('Content-Type: application/vnd.ms-excel');
    header('Content-Disposition: attachment; filename="'.$filename.'.csv"');
    header('Content-Transfer-Encoding: binary');
    header('Expires: 0');
    header('Cache-Control: must-revalidate, post-check=0, pre-check=0');
    header('Pragma: public');
    header('Content-Length: '. strlen($encoded_csv));
    echo chr(255) . chr(254) . $encoded_csv;
    exit;
  }
PHP can input and output Unicode, but a little different from what Microsoft means: 
when Microsoft says "Unicode", it unexplicitly means little-endian UTF-16 with 
BOM(FF FE = chr(255).chr(254)), whereas PHP's "UTF-16" means big-endian with 
BOM. For this reason, PHP does not seem to be able to output Unicode CSV file for 
Microsoft Excel. Solving this problem is quite simple: just put BOM infront of 
UTF-16LE string.

Example:

$unicode_str_for_Excel = chr(255).chr(254).mb_convert_encoding( $utf8_str, 'UTF-16LE', 'UTF-8');

Another problem is that excel opens the csv in one line...

UTF-16LE solution for CSV for Excel by Eugene Murai works well:
$unicode_str_for_Excel = chr(255).chr(254).mb_convert_encoding( $utf8_str, 'UTF-16LE', 'UTF-8');

However, then Excel on Mac OS X doesn't identify columns properly and its puts each 
whole row in its own cell. In order to fix that, use TAB "t" character as CSV delimiter 
rather than comma or colon.

Which encoding opens CSV files correctly with Excel on both Mac and Windows? see http://stackoverflow.com/questions/6588068/which-encoding-opens-csv-files-correctly-with-excel-on-both-mac-and-windows

=> So using UTF-16LE with BOM and tab delimiter should "the best way" to store the csv file (for MS Excel).

from csv.

RomeroMsk avatar RomeroMsk commented on July 22, 2024

It works for me on Windows and doesn't work on Mac. But we decide to leave this problem as is.
I think, you can try to convert encoding of content to UTF-16LE, add BOM and change delimeter - now Writer allows all of this, I belive.

from csv.

nyamsprod avatar nyamsprod commented on July 22, 2024

After more researchs on the BOM issue I realize that to really resolve this issue you need to know the BOM sequence specific to the CSV encoding charset. I think we should change the signature of the setBOMOnOutput method. This method should accept a BOM sequence or a Class constant which refers to the available BOM sequence.
The developer who wants to handle BOM would require to set the proper encoding AND the proper BOM sequence attached to it. This is more verbose but less error prone (Of note I'm using the stream capability as League/Csv default encoding charset is UTF8.)

Here's a example code:

use League\Csv\Reader;

$csv = new Reader::createFromPath('/path/to/my/file.csv');
$csv->setBOM(Reader::BOM_UTF16LE);
$csv->appendStreamFilter('convert.utf16encode');
$csv->output('file.csv');

Thoughts ?
You can look here for more information

from csv.

RomeroMsk avatar RomeroMsk commented on July 22, 2024

Looks good for me. Did you try to open UTF-16 encoded file with right BOM sequence in MS Excel on Mac and Win?

from csv.

nyamsprod avatar nyamsprod commented on July 22, 2024

No I did not, first I don't have a Mac ... and I don't use MS Excel at all 🎱 That being said, implementing this way should do it. The only tradeoffs are:

  • You need to know which client is going to open you CSV prior to downloading
  • There's the extra work of providing the stream to convert on the fly your UTF-8 encoded CSV into UTF-16 (if your document is do be opened in MS Excel Mac if I follow correctly)

from csv.

RomeroMsk avatar RomeroMsk commented on July 22, 2024

I can ask my collegue to check the file on Mac if you generate it with UTF-16LE + BOM.

from csv.

nyamsprod avatar nyamsprod commented on July 22, 2024

The changes are done. Now the new methods are :

use League\Csv\Reader;

$csv = new Reader::createFromPath('/path/to/my/file.csv');
$bom_sequence = $csv->getBOMOnOutput(); // returns '';
$csv->setBOMOnOutput(Reader::BOM_UTF16LE);
$bom_sequence = $csv->getBOMOnOutput(); // returns "\xFF\xFE"
$csv->appendStreamFilter('convert.utf16encode');
$csv->output('file.csv');

don't forget to implement the stream filter converter otherwise the output will be in UTF-8 which defeat the BOM sequence addition. you can use the FilterTranscode class from the example for instance to do so.

from csv.

RomeroMsk avatar RomeroMsk commented on July 22, 2024

Sorry, out of my working place now. Can check changes only after weekend.

from csv.

nyamsprod avatar nyamsprod commented on July 22, 2024

no problem I can wait 👍

from csv.

RomeroMsk avatar RomeroMsk commented on July 22, 2024

Hello. I can't make Writer to use filter. Can you help me?
I've got FilterTranscode example class and changed only namespace. Then I tried to use it like this:

    stream_filter_register(FilterTranscode::FILTER_NAME . '*', '\common\components\FilterTranscode');
    $writer = Writer::createFromFileObject(new \SplTempFileObject);
    $writer->appendStreamFilter('convert.transcode.UTF-8:UTF-16LE');
    $writer->setBOMOnOutput(Writer::BOM_UTF16_LE);
    $writer->insertOne($content);
    $writer->output($fileName);

and got The stream filter API can not be used LogicException. What am I doing wrong?

from csv.

nyamsprod avatar nyamsprod commented on July 22, 2024

Streams do not work with createFromFileObject because SplFileObject poorly supports PHP stream
you can make it work using:

$writer = Writer::createFromPath('/my/path/to/my/file.csv');
//or
$writer = Writer::createFromPath('php://output'); 

from csv.

RomeroMsk avatar RomeroMsk commented on July 22, 2024

Now I'm getting Cannot rewind file php://filter/write=convert.transcode.UTF-8:UTF-16LE/resource=php://output

from csv.

nyamsprod avatar nyamsprod commented on July 22, 2024

Here's what I did to test on my computer the file is located in the examples directory.

error_reporting(-1);
ini_set('display_errors', '1');

use League\Csv\Reader;
use League\Csv\Writer;
use lib\FilterTranscode;

require '../vendor/autoload.php';

stream_filter_register(FilterTranscode::FILTER_NAME."*", "\lib\FilterTranscode");

$csv = Reader::createFromPath(__DIR__.'/data/prenoms.csv');
$csv->setBOMOnOutput(Reader::BOM_UTF16_LE);
$csv->appendStreamFilter(FilterTranscode::FILTER_NAME."UTF-8:UTF-16LE");
$csv->output('test.csv');

Hope it helps

from csv.

RomeroMsk avatar RomeroMsk commented on July 22, 2024

Ok, I've got a result with $writer = Writer::createFromPath('/my/path/to/my/file.csv').
MS Excel on Windows didn't recognize a separators in CSV (just loaded entire lines), but with right encoded symbols.
Waiting for collegue to test it on Mac...

from csv.

nyamsprod avatar nyamsprod commented on July 22, 2024

The delimiter can be adjust using setDelimiter method

from csv.

RomeroMsk avatar RomeroMsk commented on July 22, 2024

Yes, I know. But after manipulations with encoding MS Excel is not recognizing delimiters correctly.
With UTF-8 everything is ok, but with UTF-16LE - not.
The same situation is on Mac.

from csv.

nyamsprod avatar nyamsprod commented on July 22, 2024

it's because the tabulation is still in UTF-8 👎 you should use the UTF-16 tabulation character instead

from csv.

RomeroMsk avatar RomeroMsk commented on July 22, 2024

But I'm using ; as delimiter, beacuse MS Excel is awaiting it in CSV files.

from csv.

Heart1010 avatar Heart1010 commented on July 22, 2024

...see my comment above. MS Excel "wants" tab as delimiter.

from csv.

RomeroMsk avatar RomeroMsk commented on July 22, 2024

Ok, with setDelimiter("\t") it works.

from csv.

nyamsprod avatar nyamsprod commented on July 22, 2024

Great!!

I've added a method to detect the BOM in the Input CSV so we will end up having:

  • getBOMOnInput (detect and return the BOM sequence used on the input CSV if any)
  • setBOMOnOutput (set the BOM sequence to prepend on output methods (ie: output and __toString)
  • getBOMOnOutput (get the BOM sequence that will be prepended on output methods)

Removing the Input BOM is already possible with the existing extract methods.

When this is stable and bug free I think we will have a complete new and nice feature.

Thoughts ?

from csv.

RomeroMsk avatar RomeroMsk commented on July 22, 2024

For me sounds good, thanks for your help!

from csv.

Heart1010 avatar Heart1010 commented on July 22, 2024

Yes 👍

...and make a section for that infos -also regarding MS Excel- in your documentation page would be helpful imho :-)!

from csv.

nyamsprod avatar nyamsprod commented on July 22, 2024

I think @RomeroMsk could write that section, since like I said I'm no MS Excel expert :) . Let me add everything in the source and we will update the documentation when all is done and stable

from csv.

RomeroMsk avatar RomeroMsk commented on July 22, 2024

I'm not an expert too :) Also my English is not very good.

from csv.

nyamsprod avatar nyamsprod commented on July 22, 2024

http://csv.thephpleague.com/bom/ <- the documentation you can improve it directly via pull request if needed.

from csv.

nyamsprod avatar nyamsprod commented on July 22, 2024

the stable release version 6.3 was released with the stable feature. Of note the methods names have been changed.

from csv.

RomeroMsk avatar RomeroMsk commented on July 22, 2024

Checked 6.3, everything is fine, thanks! 👍

from csv.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.