Comments (34)
Basically what you suggesting is adding the following methods :
League\Csv\AbstractCsv::setOutputBOM($use_bom_on_output); //$use_bom_on_output is a boolean
League\Csv\AbstractCsv::hasOutputBOM(); // returns true or false
Because we don't want any BC break by default:
League\Csv\AbstractCsv::hasOutputBOM(); returns false
To be clear this will only have an effect on the League\Csv\AbstractCsv::output
method.
If you have a better names they are welcomed
from csv.
Ok, you can make it as you wish :)
from csv.
Since it will work on output.. it will be available on both classes Reader
and Writer
from csv.
These names are ok for me.
from csv.
The dev-master
branch now supports BOM on output you can test it and if everything is good for you it will be release hopefully next week as version 6.3
from csv.
Checked, it works. Thanks a lot!
from csv.
@RomeroMsk Do
$bom = chr(239).chr(187).chr(191);
work for you (which os? It should work on MS Excel Win but not an Mac?)? I've read this as "best solution" for Excel http://stackoverflow.com/questions/155097/microsoft-excel-mangles-diacritics-in-csv-files/1648671#1648671 which uses
/**
* Export an array as downladable Excel CSV
* @param array $header
* @param array $data
* @param string $filename
*/
function toCSV($header, $data, $filename) {
$sep = "\t";
$eol = "\n";
$csv = count($header) ? '"'. implode('"'.$sep.'"', $header).'"'.$eol : '';
foreach($data as $line) {
$csv .= '"'. implode('"'.$sep.'"', $line).'"'.$eol;
}
$encoded_csv = mb_convert_encoding($csv, 'UTF-16LE', 'UTF-8');
header('Content-Description: File Transfer');
header('Content-Type: application/vnd.ms-excel');
header('Content-Disposition: attachment; filename="'.$filename.'.csv"');
header('Content-Transfer-Encoding: binary');
header('Expires: 0');
header('Cache-Control: must-revalidate, post-check=0, pre-check=0');
header('Pragma: public');
header('Content-Length: '. strlen($encoded_csv));
echo chr(255) . chr(254) . $encoded_csv;
exit;
}
PHP can input and output Unicode, but a little different from what Microsoft means:
when Microsoft says "Unicode", it unexplicitly means little-endian UTF-16 with
BOM(FF FE = chr(255).chr(254)), whereas PHP's "UTF-16" means big-endian with
BOM. For this reason, PHP does not seem to be able to output Unicode CSV file for
Microsoft Excel. Solving this problem is quite simple: just put BOM infront of
UTF-16LE string.
Example:
$unicode_str_for_Excel = chr(255).chr(254).mb_convert_encoding( $utf8_str, 'UTF-16LE', 'UTF-8');
Another problem is that excel opens the csv in one line...
UTF-16LE solution for CSV for Excel by Eugene Murai works well:
$unicode_str_for_Excel = chr(255).chr(254).mb_convert_encoding( $utf8_str, 'UTF-16LE', 'UTF-8');
However, then Excel on Mac OS X doesn't identify columns properly and its puts each
whole row in its own cell. In order to fix that, use TAB "t" character as CSV delimiter
rather than comma or colon.
Which encoding opens CSV files correctly with Excel on both Mac and Windows? see http://stackoverflow.com/questions/6588068/which-encoding-opens-csv-files-correctly-with-excel-on-both-mac-and-windows
=> So using UTF-16LE with BOM and tab delimiter should "the best way" to store the csv file (for MS Excel).
from csv.
It works for me on Windows and doesn't work on Mac. But we decide to leave this problem as is.
I think, you can try to convert encoding of content to UTF-16LE, add BOM and change delimeter - now Writer allows all of this, I belive.
from csv.
After more researchs on the BOM issue I realize that to really resolve this issue you need to know the BOM sequence specific to the CSV encoding charset. I think we should change the signature of the setBOMOnOutput method. This method should accept a BOM sequence or a Class constant which refers to the available BOM sequence.
The developer who wants to handle BOM would require to set the proper encoding AND the proper BOM sequence attached to it. This is more verbose but less error prone (Of note I'm using the stream capability as League/Csv default encoding charset is UTF8.)
Here's a example code:
use League\Csv\Reader;
$csv = new Reader::createFromPath('/path/to/my/file.csv');
$csv->setBOM(Reader::BOM_UTF16LE);
$csv->appendStreamFilter('convert.utf16encode');
$csv->output('file.csv');
Thoughts ?
You can look here for more information
from csv.
Looks good for me. Did you try to open UTF-16 encoded file with right BOM sequence in MS Excel on Mac and Win?
from csv.
No I did not, first I don't have a Mac ... and I don't use MS Excel at all 🎱 That being said, implementing this way should do it. The only tradeoffs are:
- You need to know which client is going to open you CSV prior to downloading
- There's the extra work of providing the stream to convert on the fly your UTF-8 encoded CSV into UTF-16 (if your document is do be opened in MS Excel Mac if I follow correctly)
from csv.
I can ask my collegue to check the file on Mac if you generate it with UTF-16LE + BOM.
from csv.
The changes are done. Now the new methods are :
use League\Csv\Reader;
$csv = new Reader::createFromPath('/path/to/my/file.csv');
$bom_sequence = $csv->getBOMOnOutput(); // returns '';
$csv->setBOMOnOutput(Reader::BOM_UTF16LE);
$bom_sequence = $csv->getBOMOnOutput(); // returns "\xFF\xFE"
$csv->appendStreamFilter('convert.utf16encode');
$csv->output('file.csv');
don't forget to implement the stream filter converter otherwise the output will be in UTF-8 which defeat the BOM sequence addition. you can use the FilterTranscode class from the example for instance to do so.
from csv.
Sorry, out of my working place now. Can check changes only after weekend.
from csv.
no problem I can wait 👍
from csv.
Hello. I can't make Writer to use filter. Can you help me?
I've got FilterTranscode example class and changed only namespace. Then I tried to use it like this:
stream_filter_register(FilterTranscode::FILTER_NAME . '*', '\common\components\FilterTranscode');
$writer = Writer::createFromFileObject(new \SplTempFileObject);
$writer->appendStreamFilter('convert.transcode.UTF-8:UTF-16LE');
$writer->setBOMOnOutput(Writer::BOM_UTF16_LE);
$writer->insertOne($content);
$writer->output($fileName);
and got The stream filter API can not be used
LogicException. What am I doing wrong?
from csv.
Streams do not work with createFromFileObject because SplFileObject poorly supports PHP stream
you can make it work using:
$writer = Writer::createFromPath('/my/path/to/my/file.csv');
//or
$writer = Writer::createFromPath('php://output');
from csv.
Now I'm getting Cannot rewind file php://filter/write=convert.transcode.UTF-8:UTF-16LE/resource=php://output
from csv.
Here's what I did to test on my computer the file is located in the examples directory.
error_reporting(-1);
ini_set('display_errors', '1');
use League\Csv\Reader;
use League\Csv\Writer;
use lib\FilterTranscode;
require '../vendor/autoload.php';
stream_filter_register(FilterTranscode::FILTER_NAME."*", "\lib\FilterTranscode");
$csv = Reader::createFromPath(__DIR__.'/data/prenoms.csv');
$csv->setBOMOnOutput(Reader::BOM_UTF16_LE);
$csv->appendStreamFilter(FilterTranscode::FILTER_NAME."UTF-8:UTF-16LE");
$csv->output('test.csv');
Hope it helps
from csv.
Ok, I've got a result with $writer = Writer::createFromPath('/my/path/to/my/file.csv')
.
MS Excel on Windows didn't recognize a separators in CSV (just loaded entire lines), but with right encoded symbols.
Waiting for collegue to test it on Mac...
from csv.
The delimiter can be adjust using setDelimiter method
from csv.
Yes, I know. But after manipulations with encoding MS Excel is not recognizing delimiters correctly.
With UTF-8 everything is ok, but with UTF-16LE - not.
The same situation is on Mac.
from csv.
it's because the tabulation is still in UTF-8 👎 you should use the UTF-16 tabulation character instead
from csv.
But I'm using ;
as delimiter, beacuse MS Excel is awaiting it in CSV files.
from csv.
...see my comment above. MS Excel "wants" tab as delimiter.
from csv.
Ok, with setDelimiter("\t")
it works.
from csv.
Great!!
I've added a method to detect the BOM in the Input CSV so we will end up having:
getBOMOnInput
(detect and return the BOM sequence used on the input CSV if any)setBOMOnOutput
(set the BOM sequence to prepend on output methods (ie:output
and__toString
)getBOMOnOutput
(get the BOM sequence that will be prepended on output methods)
Removing the Input BOM is already possible with the existing extract methods.
When this is stable and bug free I think we will have a complete new and nice feature.
Thoughts ?
from csv.
For me sounds good, thanks for your help!
from csv.
Yes 👍
...and make a section for that infos -also regarding MS Excel- in your documentation page would be helpful imho :-)!
from csv.
I think @RomeroMsk could write that section, since like I said I'm no MS Excel expert :) . Let me add everything in the source and we will update the documentation when all is done and stable
from csv.
I'm not an expert too :) Also my English is not very good.
from csv.
http://csv.thephpleague.com/bom/ <- the documentation you can improve it directly via pull request if needed.
from csv.
the stable release version 6.3 was released with the stable feature. Of note the methods names have been changed.
from csv.
Checked 6.3, everything is fine, thanks! 👍
from csv.
Related Issues (20)
- Calls to rewind or fseek that fail are ignored silently HOT 5
- How to read from STDIN instead of a file HOT 2
- Is Writer::insertAll() faster than Writer::insertOne() or not? HOT 2
- how to read csv without BOM? HOT 4
- getRecords from PART of columns and random sort HOT 2
- getRecords with defined header produce wrong results HOT 1
- Add object mapping HOT 12
- Enable protection against long lines HOT 6
- Preformatting data before conversion HOT 3
- TabularDataReader::getRecords()->current() can return NULL HOT 3
- php version is locked to 8.1, it causes issues with php upgrade to 8.2 or greater HOT 1
- Parsing CSV with escape character before enclosure HOT 2
- Feature request: Trim parsed csv strings HOT 2
- Since 9.12.0 TabularDataReader::getRecords is stricter - exception is thrown with string keys for header HOT 10
- Enclosure not works properly HOT 2
- Is there a simple way to format xls into csv before write/read on it
- Why does calling getContent() seem to empty the iterator returned by the Reader HOT 2
- Detect delimiter of file automatically HOT 1
- The method chunkBy is not yielding chunks correctly HOT 2
- getHeader implementation should match the documentation
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from csv.