Comments (17)
I occured the same problem~~~
I just want TO WRITE LOTS OF DATASETS IN ONE FILE.
THE EXPERIENCE OF WRITING OF SPARK IS SO TERRIBLE.
I LOVE PYTHON.
from spark-excel.
Is there any plans for this? I need this specific functionality for my use case. I have a handful of DataFrames and want to save them each as a sheet in a single Excel workbook. Perhaps an "append" mode support for the DataFrameWriter is how this would be done?
from spark-excel.
We currently don't have any plans of doing this ourselves, we're open to PRs though.
I guess the easiest thing to do would be to add an artificial sheet
column to the DataFrame which - if set - determines into which sheet the data should be written.
from spark-excel.
With just "overwrite" available how is the above solution viable? I created a column called sheet and tried doing it. It just overwrites and doesn't write the output to individual sheets based on the value of sheet.Any suggestions greatly appreciated.
from spark-excel.
@sbvarre My answer was a hint how this could be implemented, not how it currently works ๐
from spark-excel.
I'm just publishing 0.11.0-beta1
. Once it has landed on Maven Central you can give it a try using the new .mode("append")
option.
This allows you to write Datasets into existing files, so you should be able to write different Datasets into different sheets by calling .write
multiple times with different dataAddress
es (see the corresponding section in the README).
from spark-excel.
Hi
When I use .mode("append"), I have this Error : java.lang.RuntimeException: Append mode is not supported by com.crealytics.spark.excel.DefaultSource
In the scala code of com.crealytics.spark.excel.DefaultSource (here https://github.com/crealytics/spark-excel/pull/25/files ) I found that the append mode is not available :(
Have you an other solution ?
Thanks
from spark-excel.
@hbenzineb did you actually use 0.11.0-beta1
?
And what does #25 have to do with this?
from spark-excel.
@nightscape : thank you for your answer.
I use 0.10.0.
I change the dependency maven to 0.11.0-beta1 but I have a new Error
18/11/09 13:58:06 ERROR ApplicationMaster: User class threw exception: scala.MatchError: Map(path -> /project/cdn/dwh/tomfir/trt/Worktime2.xlsx, useheader -> true, timestampformat -> mm-dd-yyyy hh:mm:ss, dateformat -> yy-mmm-d, sheetname -> 01-Code partenaire) (of class org.apache.spark.sql.catalyst.util.CaseInsensitiveMap)
scala.MatchError: Map(path -> /project/cdn/dwh/tomfir/trt/Worktime2.xlsx, useheader -> true, timestampformat -> mm-dd-yyyy hh:mm:ss, dateformat -> yy-mmm-d, sheetname -> 01-Code partenaire) (of class org.apache.spark.sql.catalyst.util.CaseInsensitiveMap)
at com.crealytics.spark.excel.DataLocator$.apply(DataLocator.scala:52)
at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:59)
at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:471)
at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:50)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:135)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:609)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:233)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:217)
at com.socgen.tomfir.spark.collect.ControlBsre$.controlAgregateBsre(ControlBsre.scala:35)
at com.socgen.tomfir.spark.collect.CollectTomfir$.main(CollectTomfir.scala:138)
at com.socgen.tomfir.spark.collect.CollectTomfir.main(CollectTomfir.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:646)
from spark-excel.
0.11.0-beta1
has a different way of addressing cells. Please read the links in my message above your first message or the corresponding CHANGELOG.
from spark-excel.
Thanks. I'm starting to understand better :)
Which option can I use to add a new sheet in my file.xlsx and write in from the first cell A1 please ?
from spark-excel.
https://github.com/crealytics/spark-excel/blob/append_to_existing_file/README.md#write-a-dataframe-to-an-excel-file
If the sheet given in dataAddress
does not exist yet, it will be created.
from spark-excel.
Ok
I tried with this syntax:
` val dfCountCdPtn = dfData.groupBy("AT138_CODE_PTN").count().alias("NB_LINES")
dfCountCdPtn.write
.format("com.crealytics.spark.excel")
.option("dataAddress", "'01-Code partenaire'!A1:B1000")
//.option("preHeader", "Pre-header\tin\tcells\nand\trows") // Optional, default None. If set adds rows (split by "\n") and cells (split by "\t") before the column headers.
.option("useHeader", "true")
//.option("dateFormat", "yy-mmm-d") // Optional, default: yy-m-d h:mm
//.option("timestampFormat", "mm-dd-yyyy hh:mm:ss") // Optional, default: yyyy-mm-dd hh:mm:ss.000
.mode("overwrite")
.save("/project/cdn/dwh/tomfir/trt/MAD BSRE Quotidienne.xlsx")
//Agregate on code Banque
val dfCountByBq = dfData.groupBy("BQ").count().alias("Nombre d'enregistrements")
dfCountByBq.write
.format("com.crealytics.spark.excel")
.option("dataAddress", "'02-Code banque'!A1:B1000")
//.option("preHeader", "Pre-header\tin\tcells\nand\trows") // Optional, default None. If set adds rows (split by "\n") and cells (split by "\t") before the column headers.
.option("useHeader", "true")
//.option("dateFormat", "yy-mmm-d") // Optional, default: yy-m-d h:mm
//.option("timestampFormat", "mm-dd-yyyy hh:mm:ss") // Optional, default: yyyy-mm-dd hh:mm:ss.000
.mode("Append")
.save("/project/cdn/dwh/tomfir/trt/MAD BSRE Quotidienne.xlsx")`
The first write is OK
But the second print the Error : diagnostics: User class threw exception: java.lang.IllegalArgumentException: InputStream of class class org.apache.commons.compress.archivers.zip.ZipArchiveInputStream is not implementing InputStreamStatistics.
I m using the dependency
org.apache.commons
commons-compress
1.18
from spark-excel.
@hbenzineb this is issue #93.
Let's continue the discussion regarding this specific problem there.
from spark-excel.
I just released 0.11.0
. Please have a look at the corresponding CHANGELOG
and the README
about the new .mode("append")
feature which allows you to write into existing files.
from spark-excel.
I am also facing the below issue .
Append mode is not supported by com.crealytics.spark.excel.DefaultSource
Using version 0.9.15
from spark-excel.
@navg223 please read the comment directly above yours ๐
from spark-excel.
Related Issues (20)
- support spark 3.5 HOT 3
- Incorrect Data Frame creation HOT 1
- [BUG] ClassNotFoundException for 'excel.DefaultSource' while using API V2 HOT 13
- Mentioned jar for scala 2.12 does not exist HOT 2
- [BUG] <infer schema should not include the auto generated columns>
- [BUG] Spark Excel is Incompatible with AWS EMR v6.13 and higher HOT 2
- [BUG] ClassCastException: scala.Some cannot be cast to [Lorg.apache.spark.sql.catalyst.InternalRow HOT 6
- [BUG] Incorrect date formatting if I indicate sheet Spark Read Excel HOT 1
- [BUG] Excel File with Macros Detected as "Potentially" Malicious. Unable to read Excel as a result. HOT 1
- [BUG] When Read Excel Files, Several Errors Using Java HOT 2
- Error Handling for Corrupt Files in Chunk Processing HOT 1
- [BUG] No thrown exception if schema is provieded, but there is no workbook/sheet (PDF with XLSX Extension)
- [FEATURE] Optimize JAR size HOT 2
- [BUG] Cannot read files into dataframe in Databricks 13.3 LTS Runtime 3.3.0 Spark HOT 3
- Extract sheet names using pyspark HOT 4
- [BUG] Wrong place to put maxRowsInMemory
- Loading Excel with PERMISSIVE on EMR fails while it works locally (on Windows) HOT 3
- Unable to download the jar of any version over com.crealytics:spark-excel_2.12:0.13.7
- [BUG] Exclude commonsใslf4j and log4j jars
- Excel formatted Date come out as 2 year digit even when the Excel date format is 4 year digit
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from spark-excel.