Comments (13)
Ah, in the code you're actually doing that.
from spark-excel.
Can you provide more detail what is happening? Do you get the corresponding column as a String or is an exception thrown or something completely different?
You might run into this line because of a not handled Excel data type: https://github.com/crealytics/spark-excel/blob/master/src/main/scala/com/crealytics/spark/excel/ExcelRelation.scala#L67
The conversion of Date
should actually be handled here: https://github.com/crealytics/spark-excel/blob/master/src/main/scala/com/crealytics/spark/excel/ExcelRelation.scala#L110
from spark-excel.
while reading a date field from excel we normally get a numeric value from POI API. So for fetching the same we need to use
if ((HSSFDateUtil.isCellDateFormatted(cell)))
and then format the cell using date formatter.
SO when i am using Your API i am getting the date field as Numeric as expeccted. So now when i tried looking into the code i found that if i pass an schema object then i would be able to get the field as date.
Schema:
StructType schema = DataTypes.createStructType(new StructField[] {DataTypes.createStructField("Created", DataTypes.DateType, true) });
But in your API i am not able to pass schema as shown in the example posted in readme.md on github.
So please provide an option to read date field.
from spark-excel.
Hi @raviuiit,
could you locally build #13 and see if that works for you?
from spark-excel.
I had the same problem, when i read a date cell to spark, What kind of StructType do I use,i get a double value,Is there any way to deal with this?
from spark-excel.
I just released 0.8.5 which should fix this. I had to drop Scala 2.10 support because a dependency does not provide Scala 2.10 builds.
I hope everybody managed to switch to 2.11 already 😉
from spark-excel.
I am trying to do this for a Date field 'Created' in Excel
StructType schema = DataTypes.createStructType(new StructField[] {DataTypes.createStructField("Created", DataTypes.DateType, true) });
Dataset<Row> df = spark.read().format("com.crealytics.spark.excel").option("location", "/home/impadmin/Downloads/TestXLS.xls") .option("sheetName", "Sheet1").option("useHeader", "true").option("treatEmptyValuesAsNulls", "true").option("inferSchema", "true") .option("addColorColumns", "false").schema(schema).load();
Dataset<Row> created = spark.sql("select Created FROM test"); teenagers.printSchema(); System.out.println(created.count()); System.out.println(created.toJSON().toJavaRDD().collect());
Now when i am trying to browse the data in dataframe, the field created is coming empty, though count is coming correct but data is empty.
Please provide an example to read date field from xls using your api.
from spark-excel.
In the code you pasted, you haven't registered df
as table test
and the teenagers
variable doesn't seem to exist.
Also, you should probably set .option("inferSchema", "false")
and try with an xlsx
file instead of an xls
one.
If that still doesn't work, could you create a code snippet that can be run from scratch and also attach the file that you're using?
from spark-excel.
I am trying to do this for a Date field 'Created' in Excel
StructType schema = DataTypes.createStructType(new StructField[] {DataTypes.createStructField("Created", DataTypes.DateType, true) });
Dataset<Row> df = spark.read().format("com.crealytics.spark.excel").option("location", "/home/impadmin/Downloads/TestXLS.xls") .option("sheetName", "Sheet1").option("useHeader", "true").option("treatEmptyValuesAsNulls", "true").option("inferSchema", "true") .option("addColorColumns", "false").schema(schema).load();
Dataset<Row> created = spark.sql("select Created FROM test"); created.printSchema(); System.out.println(created.count()); System.out.println(created.toJSON().toJavaRDD().collect());
Now when i am trying to browse the data in dataset, i am getting following exception:
Exception in thread "main" java.lang.IllegalStateException: Cannot get a NUMERIC value from a STRING cell at org.apache.poi.xssf.usermodel.XSSFCell.typeMismatch(XSSFCell.java:1062) at org.apache.poi.xssf.usermodel.XSSFCell.getNumericCellValue(XSSFCell.java:307) at com.crealytics.spark.excel.ExcelRelation.numericValue$lzycompute$1(ExcelRelation.scala:91)
AND if i am using the API without passing any schema then the field 'Created' is coming as String.
Please provide an example to read date field from xls using your api.
from spark-excel.
Tried as per your comment. Still getting the same issue.
Code and the sample .xls file
upload.zip
from spark-excel.
The file you attached has 4 columns (Name, Geo, IP, Created), the schema
you provide has only 1.
Can you try specifying all columns in StructType schema = DataTypes.createStructType(...)
?
from spark-excel.
Hmm, I don't have any idea what's going wrong here and why in that file the dates are encoded as Strings.
I'm not a POI expert and also don't have time to look into this.
I've attached a working xlsx file generated by the tests, maybe you can find out what the difference is between that file and yours.
spark_excel_integration_test974366423318328802.xlsx
from spark-excel.
@nightscape
Thanks for the prompt response. I will look into it and will let you know if found anything.
from spark-excel.
Related Issues (20)
- [BUG] spark-excel library not working as a workspace library HOT 2
- New Case on Large Number Being Captured As Scientific Notation
- [BUG] last Columns with first line value empty not being read from .xlsx HOT 3
- support spark 3.5 HOT 3
- Incorrect Data Frame creation HOT 1
- [BUG] ClassNotFoundException for 'excel.DefaultSource' while using API V2 HOT 13
- Mentioned jar for scala 2.12 does not exist HOT 2
- [BUG] <infer schema should not include the auto generated columns>
- [BUG] Spark Excel is Incompatible with AWS EMR v6.13 and higher HOT 2
- [BUG] ClassCastException: scala.Some cannot be cast to [Lorg.apache.spark.sql.catalyst.InternalRow HOT 6
- [BUG] Incorrect date formatting if I indicate sheet Spark Read Excel HOT 1
- [BUG] Excel File with Macros Detected as "Potentially" Malicious. Unable to read Excel as a result. HOT 1
- [BUG] When Read Excel Files, Several Errors Using Java HOT 2
- Error Handling for Corrupt Files in Chunk Processing HOT 1
- [BUG] No thrown exception if schema is provieded, but there is no workbook/sheet (PDF with XLSX Extension)
- [FEATURE] Optimize JAR size HOT 2
- [BUG] Cannot read files into dataframe in Databricks 13.3 LTS Runtime 3.3.0 Spark HOT 3
- Extract sheet names using pyspark HOT 3
- [BUG] Wrong place to put maxRowsInMemory
- Loading Excel with PERMISSIVE on EMR fails while it works locally (on Windows) HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from spark-excel.