Giter Club home page Giter Club logo

spreadsheet-reader's Introduction

spreadsheet-reader is a PHP spreadsheet reader that differs from others in that the main goal for it was efficient data extraction that could handle large (as in really large) files. So far it may not definitely be CPU, time or I/O-efficient but at least it won't run out of memory (except maybe for XLS files).

So far XLSX, ODS and text/CSV file parsing should be memory-efficient. XLS file parsing is done with php-excel-reader from http://code.google.com/p/php-excel-reader/ which, sadly, has memory issues with bigger spreadsheets, as it reads the data all at once and keeps it all in memory.

Requirements:

Usage:

All data is read from the file sequentially, with each row being returned as a numeric array. This is about the easiest way to read a file:

<?php
	// If you need to parse XLS files, include php-excel-reader
	require('php-excel-reader/excel_reader2.php');

	require('SpreadsheetReader.php');

	$Reader = new SpreadsheetReader('example.xlsx');
	foreach ($Reader as $Row)
	{
		print_r($Row);
	}
?>

However, now also multiple sheet reading is supported for file formats where it is possible. (In case of CSV, it is handled as if it only has one sheet.)

You can retrieve information about sheets contained in the file by calling the Sheets() method which returns an array with sheet indexes as keys and sheet names as values. Then you can change the sheet that's currently being read by passing that index to the ChangeSheet($Index) method.

Example:

<?php
	$Reader = new SpreadsheetReader('example.xlsx');
	$Sheets = $Reader -> Sheets();

	foreach ($Sheets as $Index => $Name)
	{
		echo 'Sheet #'.$Index.': '.$Name;

		$Reader -> ChangeSheet($Index);

		foreach ($Reader as $Row)
		{
			print_r($Row);
		}
	}
?>

If a sheet is changed to the same that is currently open, the position in the file still reverts to the beginning, so as to conform to the same behavior as when changed to a different sheet.

Testing

From the command line:

php test.php path-to-spreadsheet.xls

In the browser:

http://path-to-library/test.php?File=/path/to/spreadsheet.xls

Notes about library performance

  • CSV and text files are read strictly sequentially so performance should be O(n);
  • When parsing XLS files, all of the file content is read into memory so large XLS files can lead to "out of memory" errors;
  • XLSX files use so called "shared strings" internally to optimize for cases where the same string is repeated multiple times. Internally XLSX is an XML text that is parsed sequentially to extract data from it, however, in some cases these shared strings are a problem - sometimes Excel may put all, or nearly all of the strings from the spreadsheet in the shared string file (which is a separate XML text), and not necessarily in the same order. Worst case scenario is when it is in reverse order - for each string we need to parse the shared string XML from the beginning, if we want to avoid keeping the data in memory. To that end, the XLSX parser has a cache for shared strings that is used if the total shared string count is not too high. In case you get out of memory errors, you can try adjusting the SHARED_STRING_CACHE_LIMIT constant in SpreadsheetReader_XLSX to a lower one.

TODOs:

  • ODS date formats;

Licensing

All of the code in this library is licensed under the MIT license as included in the LICENSE file, however, for now the library relies on php-excel-reader library for XLS file parsing which is licensed under the PHP license.

spreadsheet-reader's People

Contributors

matej116 avatar pilsetnieks avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

spreadsheet-reader's Issues

Digit string interpreted as integer in XLSX

I store phone numbers in a spreadsheet, some of them starts with '0'. When I process data in PHP, it need to be parsed as string, but spreadsheet-reader is treating it as integer. The leading '0' characters are dropped.

I have prepared test data 'abc.xlsx' (http://bit.ly/HG3ovz). 6th and 9th columns are the phone numbers. The following source code shows that value in these cells are interpreted as integer, instead of string.

When this code is run, you will encounter a "Undefined index issue". I have proposed a quick hack to solve this problem at #57.

require('SpreadsheetReader.php');

$reader = new SpreadsheetReader('abc.xlsx');
$index = -1;
foreach ($reader as $row) {
    $index++;

    if ($index == 0)
        continue;
    elseif ($index > 11)
        break;

    echo $row[5]."\n";
}

ChangeSheet doesn't work

I have the following code from the example:

foreach ($Sheets as $Index => $Name)
{
echo 'Sheet #'.$Index.': '.$Name;
$Reader->ChangeSheet($Index);
echo "
change sheet";
foreach ($Reader as $rowData) {
print_r($rowData);
}
}

output:
Sheet #0: cash
Sheet #1 : others

put the printout of (print_r) gives twice the rows of Sheet #1:others.... ?? Weird??

Reading a temp filr

This plugin is unable to read data from a temp xlsx file,
Steps to reproduce

  1. Select a file.
  2. Do not use move_upload_file and try to read from $_FILE['ny_var']['tmp_name']

In test.php, XLS & XLSX Files shifting cells when blank cells exist in row

EDIT: Issue only present in test.php file that is given with download

When an XLS or XLSX file has a row with a blank cell the given row array skips the blank cell and shifts all remaining values for the row into one cell over.

For example, if a file had this:

+---------+------------+--------+-------+
| Fruit   | Favorite   | Bowl   | Count |
+---------+------------+--------+-------+
| Apple   |    Y       | Wood   |     2 |
| Pear    |            | Glass  |     6 |
| Peach   |    Y       |        |     2 |
+---------+------------+--------+-------+

The resulting array would be:

Array(
    ('Fruit','Favorite','Bowl','Count'),
    ('Apple','Y','Wood','2'),
    ('Pear','Glass','6'),
    ('Peach','Y','2')
);

Instead of:

Array(
    ('Fruit','Favorite','Bowl','Count'),
    ('Apple','Y','Wood','2'),
    ('Pear','','Glass','6'),
    ('Peach','Y','','2')
);

It works properly if the file is a CSV file.

XLSX 'current' method advances in the file

There's a bit of an unexpected behavior in the iterator's interface behavior on the XLSX side. When I use the $file->current() method, I would expect it to return always the same line when I call it over and over again (which is what it does with an XLS file). However, I did a test by doing

$file->current();
$file->current();
$file->current();

And every time, I end up one line further in the file. I don't think that's standard behavior for an iterator. Anyway, thanks for looking into this little bug.

edit: I looked at the source code and I discovered this in the "current()" method:

if ($this -> Index == 0)
{
$this -> next();
$this -> Index--;

}

If I remove the line

$this->Index--

things start behaving as they should. I hope this helps.
Osu

unable to read xls

Hi! How to fix read error xls created on Excel version 5 or less, Thanks.

Weird text appears instead of readable texts.

Hi!

I've recently came across this library due to my frustration in PHPExcel not being able to read large (wide) XLSX files. Thank you.

I've made it ran after fixing ZipArchive::extractTo(): Permission denied error on my Mac (just pass SpreadsheetReader() a 2nd parameter containing an array that includes "TempDir")..

But print_r($Row) shows this:

<br />
<b>Warning</b>:  pathinfo() expects parameter 1 to be string, array given in <b>/Applications/XAMPP/xamppfiles/htdocs/spreadsheet-reader-master/SpreadsheetReader.php</b> on line <b>60</b><br />
Array
(
    [0] => PK������ÆK1E�_rels/.rels­’ÏJ�1�‡ï}Š�{w¶�Dd³½ˆÐ›H}€˜Ìþa7™0�u}{ƒ�Z©¥��I~óÍ7CšÝ�fõŠœGŠFoªZ+ŒŽü�{£Ÿ�÷ë�½kWÍ#ÎVJ$�cʪÔÄlô ’n�²�0Ø\QÂX^:â`¥�¹‡dÝd{„m]_�ÿdèöˆ©öÞhÞû�V‡÷„—°©ëF‡wä^�F9ÑâW¢�-÷(F/3¼�OÏDSU �N»l/wù{N�(Ö[±àˆq�¸T³Œ˜¿u<¹‡r�?�ç„®þs9¸�F�þ¼’MéËhÕÀÑ'h?PK��fª‚·à
    [1] => �PK������ÆK1E�docProps/app.xml��QKÃ0�…ßý�%øº&­iÌFÚ!ˆOÊ�«ì­Ääv‹´IHâØþ½QÁíÙÇÃ9|÷œ+ÖÇy*��¢q¶EUIP�V9mì®E¯ý£"&iµœœ…�� ¢uw%žƒó�’�Xd‚�-Ú§äW�Gµ‡YÆ2Û6
    [2] => £�³LY†�vãh�Ü
    [3] => õ9ƒM¸&„a8&°�ôÂÿ�Ñ/quHÿ…j§¾ûÅ·þä3¯�½KrêÍ]E–�Ÿ¥¸ó~2J¦<¿{4ï�6?<LË›’”ôúIªÍËvØr60Z\$†\ø�T¬–ºá�çœÊzT54·œhM�Ê�g\ÓŠ�%("ðå-�Ï�ì¾PK���7tNù†�PK������ÆK1E�docProps/core.xml�’Qo‚0�Çß÷)HßK)¨¸�ða‹O[b"fËÞšrb3(¤­¢ß~�•9ãÃÞzýÿïwwí%‹c]y�ÐF6*EÔ���J4…TeŠ6ù�Ï‘g,W�¯��):�A‹ì)�-��†•nZÐV‚ñ�H�&Ú�í¬m�!Fì æÆw�åÄm£kn]¨KÒrñÍK a�ÌH
)

here is the code that i've used:

<?php
    require('SpreadsheetReader.php');
    $Reader = new SpreadsheetReader('gale.xlsx', array('TempDir' => 'xl'));
    $Sheets = $Reader -> Sheets();
        $Reader -> ChangeSheet(1);
        foreach ($Reader as $Row)
        {
            print_r($Row);
        }
?>

I am only interested in the 2nd sheet.
The XLSX used to have a password.
Other sheets are password protected from being edited.
Some cells on the 2nd sheet points to a reference column on another Excel file or points to a column on a protected sheet. (Sorry, I don't know how it's formally called. When I click on the cell, a drop down appears).
The sheet has a lot of styling because it is a report to the managements.

I tried issue #38 but it only displayed question marks (?????) on some parts.

Thank you!

On wrong file type reader die instead of throwing an exception

Hi,
I have a problem with this line of code:

die('The filename ' . $sFileName . ' is not readable');

I can't catch this error with try, catch and page just stops working with this message. Why there is not used simple "throw new Exception" like everywhere else?

First sheet missing file reading xlsx file

Hi,
When Print all sheets it says 3 sheets are present. but whne I print Temp Directory only two are there. please help me to resolve this issue.

Following is the object;

SpreadsheetReader_XLSX Object
(
[Options:private] => Array
(
[TempDir] =>
[ReturnDateTimeObjects] =>
)

[Valid:private] => 1
[Handle:private] =>
[WorksheetPath:private] => /tmp/5261b72f2c247/xl/worksheets/sheet2.xml
[Worksheet:private] => XMLReader Object
    (
    )

[SharedStringsPath:private] =>
[SharedStrings:private] =>
[SharedStringCache:private] => Array
    (
    )

[WorkbookXML:private] => SimpleXMLElement Object
    (
        [fileVersion] => SimpleXMLElement Object
            (
                [@attributes] => Array
                    (
                        [appName] => Calc
                    )

            )

        [workbookPr] => SimpleXMLElement Object
            (
                [@attributes] => Array
                    (
                        [backupFile] => false
                        [showObjects] => all
                        [date1904] => false
                    )

            )

        [workbookProtection] => SimpleXMLElement Object
            (
            )

        [bookViews] => SimpleXMLElement Object
            (
                [workbookView] => SimpleXMLElement Object
                    (
                        [@attributes] => Array
                            (
                                [activeTab] => 2
                                [firstSheet] => 0
                                [showHorizontalScroll] => true
                                [showSheetTabs] => true
                                [showVerticalScroll] => true
                                [tabRatio] => 208
                                [windowHeight] => 8192
                                [windowWidth] => 16384
                                [xWindow] => 0
                                [yWindow] => 0
                            )

                    )

            )

        [sheets] => SimpleXMLElement Object
            (
                [sheet] => Array
                    (
                        [0] => SimpleXMLElement Object
                            (
                                [@attributes] => Array
                                    (
                                        [name] => Sheet1
                                        [sheetId] => 1
                                        [state] => visible
                                    )

                            )

                        [1] => SimpleXMLElement Object
                            (
                                [@attributes] => Array
                                    (
                                        [name] => Sheet2
                                        [sheetId] => 2
                                        [state] => visible
                                    )

                            )

                        [2] => SimpleXMLElement Object
                            (
                                [@attributes] => Array
                                    (
                                        [name] => Sheet3
                                        [sheetId] => 3
                                        [state] => visible
                                    )

                            )

                    )

            )

        [calcPr] => SimpleXMLElement Object
            (
                [@attributes] => Array
                    (
                        [iterateCount] => 100
                        [refMode] => A1
                        [iterate] => false
                        [iterateDelta] => 0.001
                    )

            )

    )

[Styles:private] => Array
    (
        [0] =>
    )

[TempDir:private] => /tmp/5261b72f2c247/
[TempFiles:private] => Array
    (
        [0] => /tmp/5261b72f2c247/xl/worksheets/sheet2.xml
        [1] => /tmp/5261b72f2c247/xl/worksheets/sheet3.xml
    )

[totalRowCount:private] => 20
[CurrentRow:private] =>
[Index:private] => 0
[Sheets:private] => Array
    (
        [2] => Sheet1
        [3] => Sheet2
        [4] => Sheet3
    )

[SharedStringCount:private] => 0
[SharedStringIndex:private] => 0
[LastSharedStringValue:private] =>
[RowOpen:private] =>
[SSOpen:private] =>
[SSForwarded:private] =>
[Formats:private] => Array
    (
        [164] => GENERAL
    )

[ParsedFormatCache:private] => Array
    (
    )

)

Semantic Versioning via Git Tags

I noticed you guys are keeping version numbers in the composer.json file, but I'm not sure if you're aware that this isn't translating to packagist. Have a look at your packagist page:

https://packagist.org/packages/nuovo/spreadsheet-reader

It only shows the two branches as being available and no versions. This makes it difficult to track dependencies to old versions of this library. Really easy fix for this is to tag releases with the semantic version whenever you make a version bump. For example, this commit:

493ccd4

You would run:

git tag -a v0.5.10 -m "Version 0.5.10"

git push --tags

It would then show up in your tags and you could (optionally) create a release for the new version. So long as they follow the semantic versioning syntax, packagist will understand it.

Take a look at the tags of one of my libraries: https://github.com/FrozenNode/Laravel-Administrator/tags

And then notice how it looks on Packagist: https://packagist.org/packages/frozennode/administrator

This makes it possible to track an older version by doing this in your composer.json:

"frozennode/administrator": "4.*"

set active sheet

Hello friends,

if i want to set set active sheet then where i need to change as i have file with multiple sheets and i just want to fetch data from only one file. i was checking but unable to find the solution for xlsx,xls and for csv parse.

Currency issues

I found that there are some issues related to currencies in the XLSX parser. when there's a currency, the preg_replace is sending an empty string in the first parameter.

I made a quick replacement that works:

$Value = preg_replace('^\[.+\]^', $Format['Currency'], $Value);

Either return extended info from excel_reader, or disable reading it.

Spreadsheet_Excel_Reader by default loads extended information (like style, visibility, links, and so on).
It seems that you are discarding that data completely.
There should either be a way to use that data, or to disable loading it.
You can disable loading it, for example, by adding $this -> Handle -> store_extended_info = false; just after $this -> Handle = new Spreadsheet_Excel_Reader;.

Changing of sheets is not working for XLSX file, but working fine for ods file.

Hello All,

I've used spreadsheet-reader api to load the data from excel sheet. Initially i tried with ODS file with 3 sheets in it. That time, it worked fine and it is loading sheet wise data properly. Then, i created xlsx file with 3 sheets. But, now, it is not changing the sheet. Every time it is showing sheet 1 data.

I've used the same code that was given in API - test.php. Still same problem.

Please help me where i'm missing.

Thanks in advance.

format renderer on html

Seems very wonderful package to read all type of excel files.
I have one doubt, does this solution render the format also on html as like in php-excle-reader.

Thanks
Gagan

Output encoding

Hi,
How set output encoding for .XLSX and .ODS ? For .XLS there is a function setOutputEncoding() .

Extensions in capital letters are not recognized

I deal with a lot of automatically-generated files and, for some reason, the extension is always in caps. However, spreadsheetreader won't treat a .XLS file as an Excel file since the comparison is case-sensitive. Once I manually changed the extension to .xls, things started working as expected.

Hope this helps.

getting junk in cells and rows for xls file

i created an xls file with two words: "hello" in one column and "me" in another and saved.
i tried to use the component to analyase it and got junk in many cells.
i attached a picture of the log file i write the cell content into.

also weird is that i had only two words in two columns, and i get junk on about 30 or 40 cells.

my code is:

require('modules/InboundEmail/spreadsheet-reader-master/php-excel-reader/excel_reader2.php');

require('modules/InboundEmail/spreadsheet-reader-master/SpreadsheetReader.php');

$Reader = new SpreadsheetReader($filePathInUpload);

foreach ($Reader as $Row)
{
foreach($Row as $cell)
{
$GLOBALS['log']->fatal("cell = " . $cell);
}
}

anyone have an idea?

junk

Dates Issue

I'm so glad I spent ages integrating this into a data import system only to find out it renders date fields so they're unusable.

Might be something worth fixing for other people. I'm just getting 5 digit integers.

feature request: getValue() for XLSX

First of all, I'm not certain if this is the right place for this since it's not an issue... but a feature request, but here goes.

With XLS files, the excel_reader2.php library has a getValue() method that allows one to access any cell within the file without the need to iterate through the entire file to reach it. Of course, this is only possible because the file are completely loaded into memory, as opposed to XLSX which are read from disk to keep memory usage as low as possible.

So my request is simple: Would it be possible to have a getValue($row, $col) working for the XLSX part? Because sometimes, you need to access a few specific cells before looping through everything.

Thanks in advance,

Osu

Can't read XLS or XLSX files.

Within my web site, this library can only read CSV files. XLS and XLSX files result in gibberish.

The file has 85 rows and 7 columns. Not a lot of data at all.

I added the library with Composer, and don't need to require() it in my code. Does the excel_reader2.php file need to be loaded before SpreadsheetReader.php?

Running the test.php script on the console works for both CSV and XLS files. XLSX results in this problem (trimmed a bit to avoid Markdown issues):

php ./vendor/nuovo/spreadsheet-reader/test.php ~/test.xlsx
Starting memory: 1524144
Spreadsheets:
Array
(
[0] => test
)
Sheet test

PHP Warning: XMLReader::open(): Empty string supplied as input in /var/www/portal/vendor/nuovo/spreadsheet-reader/SpreadsheetReader_XLSX.php on line 959
PHP Warning: XMLReader::read(): Load Data before trying to read in /var/www/portal/vendor/nuovo/spreadsheet-reader/SpreadsheetReader_XLSX.php on line 995
0: array(0) {}
Memory: 2872 current, 1929504 base

PHP Warning: XMLReader::read(): Load Data before trying to read in /var/www/portal/vendor/nuovo/spreadsheet-reader/SpreadsheetReader_XLSX.php on line 995

Time: 0.00015616416931152
End of sheet test

Any suggestions on how to get XLS and XLSX working?

Default options overwritten for TempDir

The construct checks for $Options['TempDir'] supplied by the user, otherwise it defaults to the system temp dir. It ignores the default value supplied when declaring the $Options property on line 10:

$this -> TempDir = isset($Options['TempDir']) && is_writable($Options['TempDir']) ?
            $Options['TempDir'] :
            sys_get_temp_dir();

Possible fix:

if(isset($Options['TempDir']) && is_writable($Options['TempDir']))
{
    $this -> TempDir = $Options['TempDir'];
}
elseif($this->Options['TempDir'] && is_writable($this->Options['TempDir']))
{
    $this -> TempDir = $this->Options['TempDir'];
}
else
{
    $this -> TempDir = sys_get_temp_dir();
}

XLSX empty rows skipped

When reading in XLSX files, the first row as well as all rows that contain no data are skipped. The following rows are then placed in the skipped rows' index. For example, if the spreadsheet has 10 rows, of which 2 rows have no data, parsing all the cells as follows:

            $excel_obj = new SpreadsheetReader($file);
            // Sheet is changed to the correct one, not shown here

            foreach ($excel_obj as $key => $row)
            {
                error_log("key:".$key); // This only prints for each non empty row
                if ($row)
                {
                    $form_data[] = $row;
                }
                else
                {
                    error_log("example allowed for this case, not sure why - never happens though");
                }
            }

results in $form_data having 7 rows, indexed 0 to 6 (instead of 10 rows indexed 0 to 9). This would not be as big of an issue if the row indexes were not compromised ($form_data still has 10 row indexes, but 3 does not contain any row data). Unfortunately as it is, it is impossible to reconstruct the original data matrix. Is there another way of reading the data that I am missing, or maybe a proposed fix?

Otherwise fast and efficient library, thanks!

Read inline cell value

I am using Apache POI (Java) to generate xlsx files. POI library writes data inline, so in my case reader always return empty rows.
Cell generated with POI:

<c r="A1" t="inlineStr">
    <is>
        <t>SOME TEXT</t>
    </is>
</c>

I solved this by adding one more case to next function:

public function next()
{
    ...
    // Cell value
    case 'is':
    case 'v':
        if ($this -> Worksheet -> nodeType == XMLReader::END_ELEMENT)
...

class loader fix

This line in SpreadsheetReader.php:

            if (!class_exists('SpreadsheetReader_'.$Type))

should be changed to:

            if (!class_exists('SpreadsheetReader_'.$Type,false))

to prevent autoloading of the class.

XLSX rewind does not actually rewind

Hi there,

I've discovered that when you load a XLSX sheet and do this:

$file->next()
$file->rewind()
print_r($file->current())

The row that's printed is actually the second row, not the first as I would expect. That's because the rewind() function never sets the CurrentRow property back to false. I added the slight correction and everything started working as it should.

I hope this helps,

Osu

All rows seem empty except for first row (XLSX File)

Hey there.

First, thank you for making this platform.

I have a big XLSX (~400k rows) file, and Im trying to parse it.

I followed the instructions and my code looks like this:

require('SpreadsheetReader.php');
$Reader = new SpreadsheetReader('test.xlsx');
$Reader -> ChangeSheet(0);
foreach ($Reader as $Row)
{
var_dump($Row);
}

it only dumps the first row (the header) correctly and all the next rows, are shown as arrays with empty strings.

what do I need to do here?

Thanks;
Haim

Request feature: row skipping

Hi,

I would like to have a way to skip row if needed.

Ex: My first row are my sheet title.
My second row are my column title.
I would like to begin my data collection at the third row only.

Thanks

_GetInt4d() in excel_reader2.php file returns zero strings

i manged to get the spreadsheet-reader working and it was working fine untill i received one xls file that did not read the cell content well.

all cell content have unencoded text:
cell = ��������

this happened when other xls files that look the same, worked fine.
i dig into the code and found out that when the spreasheet reader reads this file, the function _GetInt4d() in file excel_reader2.php returns 0 strings when in other correct files, it returns numbers like 11948 and because it returns 0, the whole chunk of code of encoding the cells is skupped by the for-loop that comes afterwards. this is the code in excel_reader2.php:

case SPREADSHEET_EXCEL_READER_TYPE_SST:
$spos = $pos + 4;
$limitpos = $spos + $length;
$uniqueStrings = $this->_GetInt4d($data, $spos+4);
$spos += 8;
for ($i = 0; $i < $uniqueStrings; $i++) {
// Read in the number of characters
if ($spos == $limitpos) {
...
...
...

because the _GetInt4d retuns zero, the whole code of encoding and other functions are all skipped.
does anyone know what _GetInt4d() function does and why in one xls file it returns strings and in the other 0?

by the way, when i open the problematic xls file and save it again using the Excel application manually, the file becomes fixed and the _GetInt4d() returns strings again.
but i can't do it manually for every file so it is not a solution for me.

please help!
any tip will do.

DateTime columns problem

Hello,
I ran into a problem reading Excel sheet, where I have a cell with Time format of Minutes:Seconds. Value of this cell is 1:00 and when reading the value it gave me this error:

DateInterval::__construct(): Unknown or bad format (P4DT-342000S)

It's because the $Value read from excel is "4.1666666666666664E-2" (21). So I added:

$Value = (float)$Value;

It's now being converted to 0.041666666666667 and it works.

            // Dates and times
            elseif ($Format['Type'] == 'DateTime')
            {
                $Value = (float)$Value; // THIS IS MY FIX
                $Days = (int)$Value;
                // Correcting for Feb 29, 1900
                if ($Days > 60)
                {
                    $Days--;
                }

Access single cell

Hallo

With excel_reader2.php i can access cell like: $data->sheets[0]['cells'][1][1]

How can i access it with the spreadsheet-reader. If i load spreadsheet-reader the ansewer is emty.
If i got it as array the answer is: Fatal error: Cannot use object of type SpreadsheetReader as array

Thanks

issue with PHP spreadsheet reader

I am facing one issue with PHP spreadsheet reader https://github.com/nuovo/spreadsheet-reader , it is reading all formats file except .xlsx , in case of .xlsx script was not running

require($docroot.'/inc/filereader/excel_reader2.php');
require($docroot.'/inc/SpreadsheetReader.inc.php');             
$Spreadsheet = new SpreadsheetReader("filepath");

When i am executing php script at my localhost, SpreadsheetReader reading .xlsx file and returning desired result, but when i am trying to run same code on some server then it is only executing for(.csv,.xls,.txt) but not for .xlsx.

I have already checked folder and file permissions for above issue at server.

Fail to find sheets in XLSX made using LibreOffice + fix

This is my test script:

<?php
require_once('spreadsheet-reader/SpreadsheetReader.php');
$reader = new SpreadsheetReader('test2.xlsx');
foreach ($reader as $row) {
    print_r($row);
    break;
}

These are the warnings I get, and no row is printed:

php t.php 
PHP Warning:  XMLReader::open(): Empty string supplied as input in spreadsheet-reader/SpreadsheetReader_XLSX.php on line 928
PHP Warning:  XMLReader::read(): Load Data before trying to read in spreadsheet-reader/SpreadsheetReader_XLSX.php on line 965

My XLSX file contains 1 sheet with only the first 2 fields filled in.
Perhaps it's related to LibreOffice, since I created the XLSX file with LibreOffice.

I debugged the problem using a lot of error_logging. It would make life easier if the code was written in a more paranoid style using exceptions. The problem was that SpreadsheetReader_XLSX was trying to open the non-existant sheet2.xml instead of sheet1.xml in the temporary directory.

I don't know much about the XLSX structure and why r:id was being parsed to get the sheet number, but this is my quick fix to SpreadsheetReader_XLSX.php that works for me (the 'fix' block is my fix, the next block is the original):

    public function Sheets()
    {
        if ($this -> Sheets === false)
        {
            $this -> Sheets = array();
            foreach ($this -> WorkbookXML -> sheets -> sheet as $Index => $Sheet)
            {
                if ('fix') {
                    //Parse this: <sheet name="Sheet1" sheetId="1" state="visible" r:id="rId2"/>
                    $Attributes = $Sheet -> attributes();
                    foreach ($Attributes as $Name => $Value)
                    {
                        if ($Name == 'sheetId') {
                            $SheetID = (int)$Value;
                            break;
                        }
                    }
                    $this -> Sheets[$SheetID] = (string)$Sheet['name'];
                }
                else {
                    $Attributes = $Sheet -> attributes('r', true);
                    foreach ($Attributes as $Name => $Value)
                    {
                        if ($Name == 'id')
                        {
                            $SheetID = (int)str_replace('rId', '', (string)$Value);
                            break;
                        }
                    }
                    $this -> Sheets[$SheetID] = (string)$Sheet['name'];
                }
            }
            ksort($this -> Sheets);
        }
        return array_values($this -> Sheets);

Request Feature : merged cells

Here's the issue.

Your library is great, however, you cannot know if a particular cell is merged or not. So far, the first cell in a merged group gets the correct value, however, the following cells are empty. Exemple : cells B2 to B5 are merged. B2 got content : "foobar", but B3-B5 are empty.

What would be awesome is to keep it dead simple and simply fill the "empty" merged cells with the correct value.

What do you think ?

ChangeSheet not doing anything for XLSX

Hey there,

  When you open an XLSX file and call ChangeSheet as your first operation, ChangeSheet will not change the sheet. This is because of the following condition in the rewind method:

if ($this -> Index > 0 || !($this -> Worksheet instanceof XMLReader))

This is because, if you haven't done anything,

$this->Index is 0 and $this->Worksheet is indeed an instance of XMLReader since the first sheet was loaded. So to fix this, either unset $this->Worksheet or set $this->Index to 1 in the ChangeSheet method, this way, rewind will successfully load the correct worksheet when it's called.

XLSX files with formulas merge formula & value as output

If you have a formula in an XLSX file, the formula and the value merge together as one string of data.

For example:

If you had a spreadsheet with the formula: =A2*0.2

That formula computed to the value: 40

The output for that cell given by the XLSX parser is: A2*0.240

The output should be just 40, just like it would give if the file was an XLS file instead.

Repeated cells in OO Calc spreadsheets not working

I had an OpenOffice spreadsheet containing triplets of either "Y" (for yes) or blank (for no) entries in columns. I found that this code reads only the first Y, ignoring the others. So for example Y,Y,- was read as Y,-,- and -,Y,Y was read as -,Y,-. I tracked this down to a line of code dealing with repeat counts in SpreadsheetReader_ODS.php, but I lack the knowledge of pull requests and suchlike so I'll give you the fix here.

Line 279 in the version I downloaded this evening had this code:

$this -> CurrentRow = array_pad($this -> CurrentRow, count($this -> CurrentRow) + $RepeatedColumnCount - 1, '');

What that does is it pads the array of cells out with blanks. In fact, it should be this:

$this -> CurrentRow = array_pad($this -> CurrentRow, count($this -> CurrentRow) + $RepeatedColumnCount - 1, $LastCellContent);

I have tested it with my own code, but not extensively with every possible input. Still, it seems to work, so I offer it to you to apply as you see fit.

addslahes

Hi there,

first of all: Thank you for this simple but fast reader.
My problem:
I want store the values in database. For that in need to mask the strings (e.g with php addslashes)
I do not want to loop over each value again.
Can you give me hint, where I have to add the addslashes command in your SpreadsheetReader_XLSX class.

Thanks a lot
jens

License?

Under which license do you distribute your code? I want like to include it into our project Aixada...

Strings always empty

I discovered that when the number of shared strings in the file is greater than the constant SHARED_STRING_CACHE_LIMIT, all string values are empty. I did manage to make things work by changing two things, both of them in the GetSharedString() method.

First off, the following condition:

if ((self::SHARED_STRING_CACHE_LIMIT === null || self::SHARED_STRING_CACHE_LIMIT > 0) && ($this -> SharedStringCache !== null))

is problematic since $this->SharedStringCache is initialized as an array, meaning the condition specified here is always true (array is always different from null). I corrected things by checking whether the array was empty instead:

if ((self::SHARED_STRING_CACHE_LIMIT === null || self::SHARED_STRING_CACHE_LIMIT > 0) && (!empty($this -> SharedStringCache)))

Next, there was a problem with this loop:

while ($this -> SharedStrings -> read());
{
if ($this -> SharedStrings -> name == 'sst')
{
$this -> SharedStringCount = $this -> SharedStrings -> getAttribute('uniqueCount');
break;
}
}

The problem was that upon entering the loop, $this -> SharedStrings was already positioned correctly. So by executing a read before checking the current node, this loop would go through the entire file before finding the closing 'sst' tag, thus making it impossible to return any value without resetting the file. In order to avoid this, I had to change the "while" to a "do-->while" like this:

do
{
if ($this -> SharedStrings -> name == 'sst')
{
$this -> SharedStringCount = $this -> SharedStrings -> getAttribute('uniqueCount');
break;
}
}
while ($this -> SharedStrings -> read());

Anyway, I hope this helps.

Osu

Thanks for this module

When I upload 46 mb xlsx then It is not showing text word or Alphabets showing only digits..
is am i not installed this module correctly ?
Suggest

Thanks ....

Single Quote in cell terminated read of that cell

When importing some cells contain Brand Names with single quote ie Tom's Amy's the reading of the cell stops with the single quote which is printed. Ie Tom' Amy'
I am using the
$Reader = new SpreadsheetReader('myfile.xls');
foreach ($Reader as $Row)
{
print_r($Row);
echo "
";
}

This is in the Google code .. Sorry

unable to fetch data xlsx

Hi I tried using your class, I am getting the rows and columns but. I am unable to get the inside the cell. instead null string is coming..
I tried saving the test xls file as xlsx and the output i got is something like

Array ( [0] => [1] => [2] => [3] => ) Array ( [0] => [1] => [2] => [3] => ) Array ( [0] => [1] => [2] => [3] => ) Array ( [0] => [1] => [2] => [3] => )

Please let me know what could be the problem.

Problem with Macro modules

Hi there,

I noticed that when a XLS spreadsheet contains a Macro module, the said module will be treated as a data sheet by the reader. I did a quick print_r of the Sheets() method and ended up with:

0 => Macro1
1 => Data

The problem is that Macro1 contains no data whatsoever, which generates an error in the ChangeSheet() method on this line:

$this -> ColumnCount = $this -> Handle -> sheets[$this -> CurrentSheet]['numCols'];

The reason is that, because this sheet is actually a Macro module, it is not actually part of the

$this->Handle->sheets

So to prevent the error, I had to change the condition of the if statement in the ChangeSheet() method to:

if (isset($this -> Sheets[$Index]) && isset( $this -> Handle -> sheets[$Index] ) )

And now things stopped crashing. Though I had to add a little extra logic in my own programs to make sure I was actually on the first datasheet before starting treatment.

I hope this helps.

Osu

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.