Giter Club home page Giter Club logo

excel-streaming-reader's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

excel-streaming-reader's Issues

Licence incompatiblity

I noticed that you decided to choose the GPLv2 for your library, while I consider this licence in general fair, in this case it could pose issues - according to the FSF GPLv2 and APLv2 are incompatible:

http://www.gnu.org/licenses/license-list.en.html#apache2

As this is build on top of apache POI I would suggest to move to APLv2 as the basic library.

Please note: I'm not a lawyer, I was just stumbling over the project because saw my heap explode when reading a small excel file.

Can't delete delete file used by StreamingReader

Hi,

When I use the StreamingReader with a manual file it cannot be deleted. In windows it complains that the file is used by another process:

final File source = new File("\some\file.xlsx");
final File tmp = Files.createTempFile("tmp-", ".xlsx").toFile();
FileUtils.copyFile(source, tmp); //this is Apache-commons-IO but it doesnt' really matter how we produced the temp file, the problem is the same
System.out.println(tmp.getCanonicalPath());
try(StreamingReader reader = StreamingReader.builder().sheetIndex(0).read(tmp)){
    for (Row row : reader) {
        //do something
    }
}
Files.delete(tmp.toPath()); //throws an exception

Looking at the StreamingReader code I see that you are creating a OPCPackage, shouldn't it be closed when the StreamingReader is closed? Is it something else? Are you able to reproduce this?

Thanks,

Cesar,

PS: Great library btw!

StreamingRow.getFirstCellNum is not supported

Hi,

I think it would be correct to implement also the row.getFirstCellNum almost by the same way as you implemented the row.row.getLastCellNum() method in #46 ticket. Here is the possible implementation (if I'm not wrong):
return (short) (cellMap.size() == 0 ? -1 : 1);

Thx in advance.

getting Hyperlink of Cell

I 'am trying to get hyperlink of a cell and also the label by this way

Row row=rowIterator.next();
Iterator<Cell> cellIterator = row.cellIterator();
currentCell = cellIterator.next();
parthyperlink=currentCell.getHyperlink().getAddress();
String label=currentCell.getHyperlink().getLabel();

and I 'am getting this Exception

com.monitorjbl.xlsx.exceptions.NotSupportedException
at com.monitorjbl.xlsx.impl.StreamingCell.getHyperlink(StreamingCell.java:353)
at com.z2data.mapping.ExeclHorizental.getCurrentRow(ExeclHorizental.java:88)

please any help :)

Thanks in Advance ..........

Implement read-only mode to enable app engine compatibility

First of all thanks for this great library. It helps to solve a real problem with Apache POI.

I need to read a file in app engine, which loves to throw java.security.AccessControlExceptions because you can't write to the file system.

Using vanilla POI I was reading files using OPCPackage.open(new File("input.xlsx"), PackageAccess.READ) which worked well. However, with your library things are not so simple. I initially thought it might only be the read(InputStream) method that didn't work (as it writes a temporary file) but the read(File) doesn't work either.

I've forked the project and added the PackageAccess.READ and tried a couple of other things but I'm still getting exceptions. I'm guessing it might be because it uses XSSFReader and not XSSFWorkbook but I really don't know.

If you could help me in any way, or offer some advice I'd really appreciate it. This is the stack trace I'm getting atm:

[ERROR]     at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:305)
[ERROR]     at com.monitorjbl.xlsx.StreamingReader.<clinit>(StreamingReader.java:59)
[ERROR]     at com.monitorjbl.xlsx.StreamingReader$Builder.findSheet(StreamingReader.java:457)
[ERROR]     at com.monitorjbl.xlsx.StreamingReader$Builder.read(StreamingReader.java:418)
[ERROR]     at com.utilitiessavings.usavappv7.server.reader.ParseBgPrices.parseFromFile(ParseBgPrices.java:63)
[ERROR]     at com.utilitiessavings.usavappv7.server.handler.parse.ParseBgPriceBookHandler.execute(ParseBgPriceBookHandler.java:75)
[ERROR]     at com.utilitiessavings.usavappv7.server.handler.parse.ParseBgPriceBookHandler.execute(ParseBgPriceBookHandler.java:33)
[ERROR]     at com.gwtplatform.dispatch.rpc.server.AbstractDispatchImpl.doExecute(AbstractDispatchImpl.java:154)
[ERROR]     at com.gwtplatform.dispatch.rpc.server.AbstractDispatchImpl.execute(AbstractDispatchImpl.java:110)
[ERROR]     at com.gwtplatform.dispatch.rpc.server.AbstractDispatchServiceImpl.execute(AbstractDispatchServiceImpl.java:87)
[ERROR]     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[ERROR]     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[ERROR]     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[ERROR]     at java.lang.reflect.Method.invoke(Method.java:497)
[ERROR]     at com.google.gwt.user.server.rpc.RPC.invokeAndEncodeResponse(RPC.java:587)
[ERROR]     ... 54 more
[ERROR] Caused by: java.security.AccessControlException: access denied ("java.lang.RuntimePermission" "createSecurityManager")
[ERROR]     at java.security.AccessControlContext.checkPermission(AccessControlContext.java:457)

ParseBgPrices line 63 is calling this code:

 StreamingReader reader = new StreamingReader.Builder()
                .rowCacheSize(100)
                .bufferSize(4096)
                .sheetIndex(0)
                .read(file);

Thanks

NPE when iterating through rows

Trying to upgrade to 0.2.10 from 0.2.8, now that #15 was fixed...
java.lang.NullPointerException
at com.monitorjbl.xlsx.StreamingReader.unformattedContents(StreamingReader.java:216)
at com.monitorjbl.xlsx.StreamingReader.handleEvent(StreamingReader.java:136)
at com.monitorjbl.xlsx.StreamingReader.getRow(StreamingReader.java:89)
at com.monitorjbl.xlsx.StreamingReader.access$400(StreamingReader.java:55)
at com.monitorjbl.xlsx.StreamingReader$StreamingIterator.hasNext(StreamingReader.java:440)
at com.monitorjbl.xlsx.StreamingReader$StreamingIterator.(StreamingReader.java:434)
at com.monitorjbl.xlsx.StreamingReader.iterator(StreamingReader.java:236)

AbstractMethodError is thrown when closing an OPCPackage

Demonstrated in the commit: https://github.com/stjohnb/excel-streaming-reader/commit/9022b0ff97eb1ff0699065314d3b060ce379f221

The AbstractMethodError can be resolved by bumping the version of xercesImpl from 2.4.0 to 2.11.0 but that introduces xml-apis:xml-apis:jar:1.4.01 as a new transitive dependency which clashes with stax:stax-api:jar:1.0.1

java.lang.AbstractMethodError: org.apache.xerces.dom.DocumentImpl.getXmlStandalone()Z
at com.sun.org.apache.xalan.internal.xsltc.trax.DOM2TO.setDocumentInfo(DOM2TO.java:377)
at com.sun.org.apache.xalan.internal.xsltc.trax.DOM2TO.parse(DOM2TO.java:131)
at com.sun.org.apache.xalan.internal.xsltc.trax.DOM2TO.parse(DOM2TO.java:98)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transformIdentity(TransformerImpl.java:699)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:743)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:357)
at org.apache.poi.openxml4j.opc.StreamHelper.saveXmlInStream(StreamHelper.java:80)
at org.apache.poi.openxml4j.opc.internal.marshallers.ZipPartMarshaller.marshallRelationshipPart(ZipPartMarshaller.java:174)
at org.apache.poi.openxml4j.opc.ZipPackage.saveImpl(ZipPackage.java:464)
at org.apache.poi.openxml4j.opc.OPCPackage.save(OPCPackage.java:1441)
at org.apache.poi.openxml4j.opc.OPCPackage.save(OPCPackage.java:1426)
at org.apache.poi.openxml4j.opc.ZipPackage.closeImpl(ZipPackage.java:351)
at org.apache.poi.openxml4j.opc.OPCPackage.close(OPCPackage.java:426)
at com.monitorjbl.xlsx.StreamingReaderTest.testClosingFiles(StreamingReaderTest.java:459)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252)
at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141)
at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115)
at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75)

getCellType returns CELL_TYPE_BLANK for numeric cells

In the attached excel file, cell A1 has numeric value 123. The xml contains 123. Since attribute t is missing getCellType returns CELL_TYPE_BLANK. I think CELL_TYPE_NUMERIC should be default when content!=null && type==null in StreamingCell.getCellType

Simple.xlsx

Support 1904-based date values

Date cells are currently evaluated assuming 1900-based date values. Older sheets created by Excel for Mac and newer sheets created by exporting from Numbers use 1904 based dates. Evidently there's a flag at the workbook level that can distinguish these, but it is currently ignored. Compare the calculations in getDateCellValue between com.monitorjbl.xlsx.impl.StreamingCell and org.apache.poi.xssf.usermodel.XSSFCell streaming value.

If fixing this is too much effort, I'd be happy to contribute a fix myself, but would love any insights you might have to offer on how to start such work.

Reference: https://support.microsoft.com/en-us/help/214330/differences-between-the-1900-and-the-1904-date-system-in-excel

getDateCellValue not working as desired.

I am reading an excel's column whose format is Date.
row.getCell(54).getCellType() -->(showing 3)
row.getCell(54).getDateCellValue() ;
giving me Following Runtime Exception.
Exception in thread "main" java.lang.NullPointerException
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1838)
at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
at java.lang.Double.parseDouble(Double.java:538)
at com.monitorjbl.xlsx.impl.StreamingCell.getNumericCellValue(StreamingCell.java:130)
at com.monitorjbl.xlsx.impl.StreamingCell.getDateCellValue(StreamingCell.java:143)

Can't use foreach or try-with-resources.

I followed the ReadMe to code this:
try ( InputStream is = new FileInputStream(new File("/path/to/workbook.xlsx")); Workbook workbook = StreamingReader.builder() .rowCacheSize(100) .bufferSize(4096) .sheetIndex(0) .open(is); ) { for (Sheet sheet : workbook){ System.out.println(sheet.getSheetName()); for (Row r : sheet) { for (Cell c : r) { System.out.println(c.getStringCellValue()); } } } }
I got 2 error. First is that Workbook is not a AutoClosable which is required by try-with-resources. So I can not use in this way. The second problem is that Workbook can't be foreached.
Then I exclude the org.apache.poi(version 3.9). and use the 3.14. Then it's right now. Seem that the old version can't use the try-with-resources.

Can't read file in multiple thread

Hi, I just got a problem in the very basic line. I'm using JavaFx multi thread.

Workbook workbook = StreamingReader.builder().rowCacheSize(100).bufferSize(4096).open(is);

Every time it just crashes in this line. Tried to catch exception, but no exception displayed. Tried to print something as well, neither worked.
Do you have any ideas what's going on? I really can't find anything wrong. Thank you so much.

Not able read large excel files with 1 million rows

Not able read large excel files with 1 million rows, I am getting java.lang.OutOfMemoryError: Java heap space

I am not able attach my excel file which is around 34MB( getting errror Yowza that's a big file. Try again with a file smaller than 10MB)

Below is the sample data in the Excel, where first row is header followed by 1 million employee records
Code FirstName LastName City Address Country DepartmentRef PersonTypeRef PropertyRef
A1000001 A1000001 A1000001 Hyderabad 1stAvenue India 99 Internal coordinator 26
A1000002 A1000002 A1000002 Hyderabad 1stAvenue India 99 Internal coordinator 26
A1000003 A1000003 A1000003 Hyderabad 1stAvenue India 99 Internal coordinator 26

Request for release 1.0.2

Hi Taylor,

There have been a few fixes merged since 1.0.1 (#50 #56 #61 #65) and I would quite selfishly like to get rid of my git submodule and go back to pulling a release from Maven Central. Would you be willing to push a release? Let me know if there's anything I can do to help.

Thanks,
MIke

PS: Thanks for publishing this library. It really saved my weekend when I found it back in October. More likely, many weekends.

Reader returning additional blank lines

As I am iterating through an excel file, the module returns a blank row every 1 + cacheSize rows. When I set the cache size to 10 rows, I get an extra blank row every 11 rows, and when I set the cache size to 100 rows I get a blank row every 101 rows. Do you have any idea what might be causing this?

The supplied data appears to be in the OLE2 Format

org.apache.poi.openxml4j.exceptions.OLE2NotOfficeXmlFileException: The supplied data appears to be in the OLE2 Format. You are calling the part of POI that deals with OOXML (Office Open XML) Documents. You need to call a different part of POI to process this data (eg HSSF instead of XSSF)

I got this exception when I loaded file of type XLS

and this is my code

File inputWorkbook inputWorkbook = new File(fileName);
Streaming reader reader = StreamingReader.builder()
.rowCacheSize(100)
.bufferSize(4096)
.sheetIndex(0)
.read(inputWorkbook);

UnsupportedOperationException?

I'm using the lib since today and it's really nice. I can open very quickly large files.
My plan was to extract the first 10 rows, so that i can obtain the headers of the excel files (named ones)

When i am using:
Workbook workbook = StreamingReader.builder().open(inputStream); // fails (@getSheet)
Sheet sheet = workbook.getSheet (sheetName);

Where sheetName is "FirstQuarter" (which exists on the excel file), i'll get following exception:

java.lang.UnsupportedOperationException
at com.monitorjbl.xlsx.impl.StreamingSheet.getRow(StreamingSheet.java:98)

am i missing something?

Unable to load sheet

I am working on this API and when testing I am getting below error Unable to find sheet at index [0]

[](SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation %28NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
com.monitorjbl.xlsx.exceptions.MissingSheetException: Unable to find sheet at index [0]
at com.monitorjbl.xlsx.StreamingReader$Builder.read(StreamingReader.java:434)
at com.monitorjbl.xlsx.StreamingReader$Builder.read(StreamingReader.java:393)
at test.XLSXToCSVConverterStreamer.xlsx(XLSXToCSVConverterStreamer.java:53)
at test.XLSXToCSVConverterStreamer.main(XLSXToCSVConverterStreamer.java:160)
)

StreamingCell.getBooleanCellValue( ) is dangerous

This library is very nice in that it provides an easy drop-in replacement for simple uses of the vanilla POI API. One important part of this ease-of-use is that unimplemented features are reported very obviously at runtime via e.g. NotSupportedExceptions. If you try this library with your existing code and you don't get any exceptions at runtime, you're probably OK.

Unfortunately, StreamingCell.getBooleanCellValue() breaks this convention and assumption by not being implemented, but rather than throwing an exception it simply returns false. If users are not very, very careful, this could encourage them to start using this library when in fact it doesn't implement functionality that they need.

I don't know if StreamingCell.getBooleanCellValue() can be implemented or not; if it can, then it would be great to see it implemented; if it can't be implemented, then throwing a NotSupportedException would be more appropriate than just returning false.

https://github.com/monitorjbl/excel-streaming-reader/blob/master/src/main/java/com/monitorjbl/xlsx/impl/StreamingCell.java#L305

Error when loading xlsx file in Spring

I use the latest version.

My code:

private void testExcel(String fileName) {
        ClassLoader classLoader = getClass().getClassLoader();
        File file = new File(classLoader.getResource(fileName).getFile());
        Workbook workbook = StreamingReader.builder()
                .rowCacheSize(100)    // number of rows to keep in memory (defaults to 10)
                .bufferSize(4096)     // buffer size to use when reading InputStream to file (defaults to 1024)
                .open(file);
    }

Error Logs:

java.lang.NoClassDefFoundError: org/apache/poi/POIXMLTypeLoader
	at org.openxmlformats.schemas.spreadsheetml.x2006.main.SstDocument$Factory.parse(Unknown Source)
	at org.apache.poi.xssf.model.SharedStringsTable.readFrom(SharedStringsTable.java:119)
	at org.apache.poi.xssf.model.SharedStringsTable.<init>(SharedStringsTable.java:106)
	at org.apache.poi.xssf.eventusermodel.XSSFReader.getSharedStringsTable(XSSFReader.java:82)
	at com.monitorjbl.xlsx.impl.StreamingWorkbookReader.init(StreamingWorkbookReader.java:117)
	at com.monitorjbl.xlsx.StreamingReader$Builder.open(StreamingReader.java:278)

0.3 release

Make the 0.3 branch the mainline release going forward. Make a 0.2 branch for any backporting efforts in the future.

Java 6 Syntax Compatability

Hello Taylor,

Thank you for writing this library and I have been using in one of my projects. However we are still on Java 6 for some of the old features and I had to change some of the Java 7 syntax (e.g. try with resources) to use it in some of the features.

I have created a branch from your forked repo and refactored the code and unit tests to Java 6 syntax. Let me know if you want me to push that branch to your repo back. I can create a pull request for the same.

Parser removing leading zeros

So, if you try to parser 0007 as a String instead of a Number you can't right now. Since 0007 matches in the regex number it ll be treated as a number and so it will return 7.
That will be a problem if you are working with ID numbers.

Tmp file are not deleted if sheet not found.

If i call read by InputStream a tmp file will be created.

Ok, but if sheet was not found the result is a RuntimeException and the tmp file will not be deleted and StreamingReader does not will be returned to let me call close method.

It will occur not only when sheet = null, if take any exception in read by file the tmp file will be let in temp folder from filesystem.

NullPointerException when cell type is null

source code line 123,when type is null

Exception in thread "main" java.lang.NullPointerException
at com.monitorjbl.xlsx.StreamingReader.unformattedContents(StreamingReader.java:216)
at com.monitorjbl.xlsx.StreamingReader.handleEvent(StreamingReader.java:136)
at com.monitorjbl.xlsx.StreamingReader.getRow(StreamingReader.java:89)

datetest.xlsx.zip

Unable to read input stream

got java.io.IOException: Stream Closed
at java.io.FileInputStream.readBytes(Native Method)
at java.io.FileInputStream.read(FileInputStream.java:233)
at com.monitorjbl.xlsx.impl.StreamingWorkbookReader.writeInputStreamToFile(StreamingWorkbookReader.java:170)
at com.monitorjbl.xlsx.impl.StreamingWorkbookReader.init(StreamingWorkbookReader.java:74)

`public static void readExcelFileAsStreamer(File excelFile, List header, Queue rows, Status status) {

    boolean foundSheet = false;
    int hdrRowPos = -1;
    int sheetPos = -1;
    int rowLimit = 10;
    int count = 0;
    int totalRowCounter = 0;
    StreamingReader reader = null;
    Iterator<Row> rowIterator = null;
    Row headerRow = null;
    try {
        status.init();
        InputStream is = new FileInputStream(excelFile);

        // ########################################### Find sheet and header row ###############################
        for (int i = 0; i < rowLimit && !foundSheet; i++) {
            Workbook workbook = StreamingReader.builder()
                    .rowCacheSize(100)    // number of rows to keep in memory (defaults to 10)
                    .bufferSize(16384)     // buffer size to use when reading InputStream to file (defaults to 1024)
                    .sheetIndex(0)        // index of sheet to use (defaults to 0)
                    .open(is);   
            Sheet  sheet= workbook.getSheetAt(0);
            rowIterator = sheet.iterator();

            while (rowIterator.hasNext() && count < rowLimit && !foundSheet) {
                Row r = rowIterator.next();
                count = r.getRowNum();
                if (checkIfHeaderKeyExists(r)) {
                    foundSheet = true;
                    sheetPos = i;
                    headerRow = r;
                    hdrRowPos = count;
                    log.warn("For file: " + excelFile.getName() + " header key is found on Header position: " + hdrRowPos);
                }

            }
        }
        // ############################################################################

        if (hdrRowPos == -1 || sheetPos == -1) {
            throw new IllegalArgumentException("Given File: " + excelFile + " is not a Valid file");
        }

        getHeaders(excelFile, header, headerRow);

        status.setLoadedHeader();
        Thread.sleep(5000);

        int emptyRowCount = 0;

        status.setReading();
        while (rowIterator.hasNext()) {
            Row r = rowIterator.next();

            if (r == null || isEmptyRowStrm(r, header)) {
                log.warn("For file: " + excelFile + ", row: " + r.getRowNum() + " is empty");
                emptyRowCount++;
                if (emptyRowCount > MAX_EMPTY_ROW) {
                    log.warn("finsihed to end of excel file");
                    break;
                }
                continue;
            }
            if (r.getRowNum() % 1000 == 0) {
                // log.info("Added Row: " + r.getRowNum());
                //Thread.sleep(100);
            }
            if (r != null) {
                totalRowCounter++;
                rows.add(r);
                emptyRowCount = 0;
            }

        }
        Thread.sleep(1000);
    } catch (Exception e) {
        e.printStackTrace();
    } finally {
        log.info("For file: " + excelFile + " Total Number of Rows read: " + totalRowCounter);
        status.setDone();
    }

}

`

here is the file
TA02_Event_basedata_Solid Tumors_v1.3.xlsx

Way to iterate over all sheets

As it stands (and as I understand it) this tool is not useful for reading in a multi sheet workbook with an unknown number of sheets.
Is it possible to include a method to change that? Ideally getAllSheetNames(), but also getNumberOfSheets() or hasSheet(int index) would enable iterating over all sheets.

Problem with setup

Hi there! Just want to thank you for this library, it's exactly what I needed.

But to get started, can I clarify that ontop of including the .jar file in the libs folder, I also need to include poi-3.13.jar from Apache POI? Otherwise, classes like Workbook couldn't be resolved.

While waiting for an answer, my libs folder contains the following 2 jar files.

xlsx-streamer-1.0.0 and poi-3.13

Upon running my project, I encountered the following error.

java.lang.ClassNotFoundException: Didn't find class "org.slf4j.LoggerFactory"

After adding the respective slf4j jar file according to the dependency stated in the Logging section of your README, the next error regarding log4j appears.

May I know where did I went wrong? I would appreciate if you could guide me through the proper setup and files needed in order to get this library working.

Much appreciated and many thanks in advance!

Not returning empty cells

In my xlsx file I have some empty cells what are not returned, simply I'm getting the next cell. That cause I can not be sure about which column is really returning to me.

I do not know which is the internal logic, but will be better if I can tell how many columns I want to be returned rather jumping empty cells.

Can't reach the cell values

I'm having a problem while I'm trying to get cells values from a sheet. The strings values of the cells are blank. I'm using the monitorjbl as it's specified.

//This method is to update the informations of object nota that was already persisted at the database
public void parserPlanilhasContrib(File f, Session s) throws IOException, InvalidFormatException
    {
        int n = 0;
        DAO d = new DAO();

        while(n<2)
        {
            InputStream inputStream = new FileInputStream(f); //Instancia o imputStream que vai receber o arquivo de entrada
            //Prepare the reader of the sheet
            StreamingReader reader = StreamingReader.builder().rowCacheSize(50).bufferSize(2048).sheetIndex(n).read(inputStream);

            if(n==0) //if it is the first sheet...
            {
                for(Row r : reader)
                {
                    int numLinha = r.getRowNum();

                    if(numLinha >= 1) //Ignores the first line, that contains the header
                    {
                        Nota nota = new Nota();

                        for(Cell c : r)
                        {
                            if(c.getColumnIndex()==0)
                            {
                                String id = c.getStringCellValue().replace(".","") //here i get the value normally;

                                int idNota = Integer.parseInt(id);

                                nota = d.pegaNota(idNota); //takes the nota that have the same id value of the cell value
                            }
                            else if(c.getColumnIndex()==5)
                            {
                                nota.setcUF(c.getStringCellValue()); // but from here, and ahead, in the next cells, the value that I get is blank

                                System.out.println("cuf: " + c.getStringCellValue()); 
                            }
                            else if(c.getColumnIndex()==6)
                            {
                                nota.setNatOp(c.getStringCellValue());

                                System.out.println("natop: " + c.getStringCellValue());
                            }
                            else if(c.getColumnIndex()==7)
                            {
                                nota.setIndPag(c.getStringCellValue());

                                System.out.println("indpag: " + c.getStringCellValue());
                            }
                            else if(c.getColumnIndex()==8)
                            {
                                nota.setModulo(c.getStringCellValue());

                                System.out.println("modulo: " + c.getStringCellValue());
                            }
                            else if(c.getColumnIndex()==9)
                            {
                                nota.setdEmi(c.getStringCellValue());

                                System.out.println("demi: " + c.getStringCellValue());
                            }
                            else if(c.getColumnIndex()==10)
                            {
                                nota.setdSaiEnt(c.getStringCellValue());

                                System.out.println("dsaient: " + c.getStringCellValue());
                            }
                            else if(c.getColumnIndex()==11)
                            {
                                nota.setTpNF(c.getStringCellValue());

                                System.out.println("tpnf: " + c.getStringCellValue());
                            }
                            else if(c.getColumnIndex()==12)
                            {
                                nota.setTpEmis(c.getStringCellValue());

                                System.out.println("tpemis: " + c.getStringCellValue());
                            }
                            else if(c.getColumnIndex()==13)
                            {
                                nota.setTpAmb(c.getStringCellValue());

                                System.out.println("tpamb: " + c.getStringCellValue());
                            }
                            else if(c.getColumnIndex()==14)
                            {
                                nota.setFinNFe(c.getStringCellValue());

                                System.out.println("finnfe: " + c.getStringCellValue());
                            }
                            else if(c.getColumnIndex()==17)
                            {
                                nota.setJustificativa(c.getStringCellValue());

                                System.out.println("justificativa: " + c.getStringCellValue());
                            }
                            else if(c.getColumnIndex()==18)
                            {
                                nota.setvBC(c.getStringCellValue());
                            }
                            else if(c.getColumnIndex()==19)
                            {
                                nota.setvICMS(c.getStringCellValue());
                            }
                            else if(c.getColumnIndex()==20)
                            {
                                nota.setvBCST(c.getStringCellValue());
                            }
                            else if(c.getColumnIndex()==21)
                            {
                                nota.setvST(c.getStringCellValue());
                            }
                            else if(c.getColumnIndex()==22)
                            {
                                nota.setvProd(c.getStringCellValue());
                            }
                            else if(c.getColumnIndex()==23)
                            {
                                nota.setvFrete(c.getStringCellValue());
                            }
                            else if(c.getColumnIndex()==24)
                            {
                                nota.setvSeg(c.getStringCellValue());
                            }
                            else if(c.getColumnIndex()==25)
                            {
                                nota.setvDesc(c.getStringCellValue());
                            }
                            else if(c.getColumnIndex()==26)
                            {
                                nota.setvII(c.getStringCellValue());
                            }
                            else if(c.getColumnIndex()==27)
                            {
                                nota.setvIPI(c.getStringCellValue());
                            }
                            else if(c.getColumnIndex()==28)
                            {
                                nota.setvPIS(c.getStringCellValue());
                            }
                            else if(c.getColumnIndex()==29)
                            {
                                nota.setvCOFINS(c.getStringCellValue());
                            }
                            else if(c.getColumnIndex()==30)
                            {
                                nota.setvOutro(c.getStringCellValue());
                            }
                            else if(c.getColumnIndex()==31)
                            {
                                nota.setvNF(c.getStringCellValue());
                            }
                            else if(c.getColumnIndex()==32)
                            {
                                nota.setvTotTrib(c.getStringCellValue());
                            }
                            else if(c.getColumnIndex()==33)
                            {
                                nota.setIndFrete(c.getStringCellValue());

                                System.out.println("indfrete: " + c.getStringCellValue());
                            }
                            else if(c.getColumnIndex()==34)
                            {
                                nota.setxMotivo(c.getStringCellValue());

                                System.out.println("xmotivo: " + c.getStringCellValue());
                            }
                            else if(c.getColumnIndex()==35)
                            {
                                nota.setInfoAd(c.getStringCellValue());

                                System.out.println("infoad: " + c.getStringCellValue());
                            }
                        }

                        s.update(nota);
                        s.flush();
                        s.clear();
                    }
                }
            }
            if(n==1) //Se for a sheet 2, correspondente ao item
            {
               //almost the same of the last sheet but for other type of object...
            }

            n++;
        }
    }

Here is an example of a line of the first sheet of my workbook. Pipes are just to separate the cells:

14.086 | 01492857000139 | 54772017000439 | NFe31150101492857000139550010004802991111007013 | 480299 | 31 | VENDA DE COMBUSTIVEL OU LUBRIFICANTE ADQUIRIDO OU RECEBIDO D | 1 | 55 | 2015-01-07 | 2015-01-07 | 1 | 1 | 1 | 1 | 0 | 2.0 | 0.00 | 0.00 | 0.00 | 0.00 | 2076.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 34.25 | 157.78 | 6.87 | 2082.87 | 0 | 0 | Autorizado o uso da NF-e | ST JA RETIDO ART. 37 INC. PART. DO ANEXO XV - ; FAVOR CONFERIR MERCADORIA NO ATO DA ENTREGA. NAO ACEITAREMOS RECLAMACOES POSTERIORES. PRAZO DE VALIDADE 8 (OITO) DIAS. REGIME ESPECIAL/PTA NR. 16.000423503-41-SEF/MG; COD.VENDEDOR: 1305 VENDEDOR: DANIEL NASCIMENTO SANTOS-1305 ROTA: 13; FORMA PAGAMENTO: BANCO DO BRASIL - COB.BANCARIA - 03/21-28-35; TRANSACAO: 1216795; BASE ST: 2.333,10 ICMS ST: 419,96; NUM.PEDIDO: 130500496 NUM. CARGA: 20246 NUM. BOX: 9

Blank cells throwing null pointer exception while reading

I am using this api and rows contains blank cells causing null pointer issue. Below is the stack trace.
com.monitorjbl.xlsx.exceptions.NotSupportedException
java.lang.NullPointerException
at test.XLSXToCSVConverterStreamer.xlsx(XLSXToCSVConverterStreamer.java:67)
at test.XLSXToCSVConverterStreamer.main(XLSXToCSVConverterStreamer.java:164)

if I use Cell cell = row.getCell(i, Row.CREATE_NULL_AS_BLANK) to read empty cells as blank I am getting below issue. Kindly help me in resolving this

com.monitorjbl.xlsx.exceptions.NotSupportedException
at com.monitorjbl.xlsx.impl.StreamingRow.getCell(StreamingRow.java:108)

Quoted value of a formula cell

Hi! I've noticed that for a formula cell the value is returned in quotes. Is this by design? What is the reason for it? In particular, if the value is an empty string, two quote marks are returned.
SampleSum.xlsx
I get the following values with attached file:
-4 "minus"
4 ""

StreamingRow.getLastCellNum() off-by-one and blank cells

Hi! I think getLastCellNum() may be returning a value 1 more than intended.

With two columns (A and B), it returns a value of 3. (see implementation below - it's returning size() + 1).

With a blank column - let's say we have A populated, B blank, C blank, and D populated - it's not looking for the highest valued column index, just the number of populated columns, so the result there is not correct either.

  /**
   * Gets the index of the last cell contained in this row <b>PLUS ONE</b>.
   * 
   * @return short representing the last logical cell in the row <b>PLUS ONE</b>,
   *   or -1 if the row does not contain any cells.
   */
  @Override
  public short getLastCellNum() {
    return (short) (cellMap.size() == 0 ? -1 : cellMap.size() + 1);
  }

Workaround, to get the number of cells in a row with either XLSX streaming or regular POI:

    private int numCellsInRow(Row row) {
        if (row instanceof StreamingRow) {
            List<Integer> colIndexes = new ArrayList<>(((StreamingRow) row).getCellMap().keySet());
            if (colIndexes.size() < 1) {
                return 0;
            }
            colIndexes.sort(null);
            return colIndexes.get(colIndexes.size()-1) + 1;
        }
        else {
            // HSSFRow.getLastCellNum() returns the 1-based index of the last cell, or -1
            return (row.getLastCellNum() < 1) ? 0 : row.getLastCellNum();
        }
    }

License

Hi,

I am currently an intern working at a company, and I really like your library. I would like to use your library in a commercial web app, but if I use it using the license you are currently using, then I would be required to share the source code. This wouldn't make any sense because it would be a security issue for the site. I was wondering if you could release it under the Lesser General Public License. I would really appreciate it.

Thanks!

Show blank cell

When cell is blank at the excel, the code ignore this and pass to next cell, but I need the blank cells. How I can show blanks cell??

for (Row r : reader) {
    for (Cell c : r) {
        //... show blank cell here.
    }
}

Version:0.2.13

Support newer POI version

The latest version of POI (3.12) requires that the StreamingCell class implement a few extra methods. The required version for POI should be bumped up.

Way of get a RowCount before Iterating over Rows

Is it possible? because the only way i know to get an iterator count is to iterate over all items, maybe what could be done is getting the last and read its "r" attr, but don't know how to do it. Any suggestions are welcome.

Thx

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.