Giter Club home page Giter Club logo

Comments (13)

nightscape avatar nightscape commented on July 19, 2024 1

Ah, in the code you're actually doing that.

from spark-excel.

nightscape avatar nightscape commented on July 19, 2024

Can you provide more detail what is happening? Do you get the corresponding column as a String or is an exception thrown or something completely different?
You might run into this line because of a not handled Excel data type: https://github.com/crealytics/spark-excel/blob/master/src/main/scala/com/crealytics/spark/excel/ExcelRelation.scala#L67
The conversion of Date should actually be handled here: https://github.com/crealytics/spark-excel/blob/master/src/main/scala/com/crealytics/spark/excel/ExcelRelation.scala#L110

from spark-excel.

raviuiit avatar raviuiit commented on July 19, 2024

while reading a date field from excel we normally get a numeric value from POI API. So for fetching the same we need to use
if ((HSSFDateUtil.isCellDateFormatted(cell)))
and then format the cell using date formatter.

SO when i am using Your API i am getting the date field as Numeric as expeccted. So now when i tried looking into the code i found that if i pass an schema object then i would be able to get the field as date.
Schema:
StructType schema = DataTypes.createStructType(new StructField[] {DataTypes.createStructField("Created", DataTypes.DateType, true) });

But in your API i am not able to pass schema as shown in the example posted in readme.md on github.

So please provide an option to read date field.

from spark-excel.

nightscape avatar nightscape commented on July 19, 2024

Hi @raviuiit,

could you locally build #13 and see if that works for you?

from spark-excel.

zpi1223 avatar zpi1223 commented on July 19, 2024

I had the same problem, when i read a date cell to spark, What kind of StructType do I use,i get a double value,Is there any way to deal with this?

from spark-excel.

nightscape avatar nightscape commented on July 19, 2024

I just released 0.8.5 which should fix this. I had to drop Scala 2.10 support because a dependency does not provide Scala 2.10 builds.
I hope everybody managed to switch to 2.11 already 😉

from spark-excel.

raviuiit avatar raviuiit commented on July 19, 2024

I am trying to do this for a Date field 'Created' in Excel
StructType schema = DataTypes.createStructType(new StructField[] {DataTypes.createStructField("Created", DataTypes.DateType, true) });

Dataset<Row> df = spark.read().format("com.crealytics.spark.excel").option("location", "/home/impadmin/Downloads/TestXLS.xls") .option("sheetName", "Sheet1").option("useHeader", "true").option("treatEmptyValuesAsNulls", "true").option("inferSchema", "true") .option("addColorColumns", "false").schema(schema).load();

Dataset<Row> created = spark.sql("select Created FROM test"); teenagers.printSchema(); System.out.println(created.count()); System.out.println(created.toJSON().toJavaRDD().collect());
Now when i am trying to browse the data in dataframe, the field created is coming empty, though count is coming correct but data is empty.

Please provide an example to read date field from xls using your api.

from spark-excel.

nightscape avatar nightscape commented on July 19, 2024

In the code you pasted, you haven't registered df as table test and the teenagers variable doesn't seem to exist.
Also, you should probably set .option("inferSchema", "false") and try with an xlsx file instead of an xls one.
If that still doesn't work, could you create a code snippet that can be run from scratch and also attach the file that you're using?

from spark-excel.

raviuiit avatar raviuiit commented on July 19, 2024

I am trying to do this for a Date field 'Created' in Excel
StructType schema = DataTypes.createStructType(new StructField[] {DataTypes.createStructField("Created", DataTypes.DateType, true) });

Dataset<Row> df = spark.read().format("com.crealytics.spark.excel").option("location", "/home/impadmin/Downloads/TestXLS.xls") .option("sheetName", "Sheet1").option("useHeader", "true").option("treatEmptyValuesAsNulls", "true").option("inferSchema", "true") .option("addColorColumns", "false").schema(schema).load();

Dataset<Row> created = spark.sql("select Created FROM test"); created.printSchema(); System.out.println(created.count()); System.out.println(created.toJSON().toJavaRDD().collect());

Now when i am trying to browse the data in dataset, i am getting following exception:

Exception in thread "main" java.lang.IllegalStateException: Cannot get a NUMERIC value from a STRING cell at org.apache.poi.xssf.usermodel.XSSFCell.typeMismatch(XSSFCell.java:1062) at org.apache.poi.xssf.usermodel.XSSFCell.getNumericCellValue(XSSFCell.java:307) at com.crealytics.spark.excel.ExcelRelation.numericValue$lzycompute$1(ExcelRelation.scala:91)

AND if i am using the API without passing any schema then the field 'Created' is coming as String.

Please provide an example to read date field from xls using your api.

from spark-excel.

raviuiit avatar raviuiit commented on July 19, 2024

Tried as per your comment. Still getting the same issue.
Code and the sample .xls file
upload.zip

from spark-excel.

nightscape avatar nightscape commented on July 19, 2024

The file you attached has 4 columns (Name, Geo, IP, Created), the schema you provide has only 1.
Can you try specifying all columns in StructType schema = DataTypes.createStructType(...)?

from spark-excel.

nightscape avatar nightscape commented on July 19, 2024

Hmm, I don't have any idea what's going wrong here and why in that file the dates are encoded as Strings.
I'm not a POI expert and also don't have time to look into this.
I've attached a working xlsx file generated by the tests, maybe you can find out what the difference is between that file and yours.
spark_excel_integration_test974366423318328802.xlsx

from spark-excel.

raviuiit avatar raviuiit commented on July 19, 2024

@nightscape
Thanks for the prompt response. I will look into it and will let you know if found anything.

from spark-excel.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.