Giter Club home page Giter Club logo

docjure's Introduction

Docjure

Docjure makes reading and writing Office Excel spreadsheet documents in Clojure easy.

Who is this for?

Docjure is aimed at making the basic use case of reading and writing spreadsheets easy.

If you need advanced charting, pivot tables etc., the easiest way is to build template spreadsheets with Excel and populate them with Docjure.

If you want to manipulate advanced features programatically, you are probably better off using the underlying Apache POI spreadsheet library directly or looking for another tool.

Docjure has low churn. It is very stable library with a history going back to 2009 (open-sourced in 2010).

Usage

Example: Read a Price List spreadsheet

(use 'dk.ative.docjure.spreadsheet)

;; Load a spreadsheet and read the first two columns from the
;; price list sheet:
(->> (load-workbook "spreadsheet.xlsx")
     (select-sheet "Price List")
     (select-columns {:A :name, :B :price}))

;=> [{:name "Foo Widget", :price 100}, {:name "Bar Widget", :price 200}]

Example: Read a single cell

If you want to read a single cell value, you can use the select-cell function which takes an Excel-style cell reference (A2) and returns the cell. In order to get the actual value, use read-cell

(use 'dk.ative.docjure.spreadsheet)
(read-cell
 (->> (load-workbook "spreadsheet.xlsx")
      (select-sheet "Price List")
      (select-cell "A1")))

Example: Load a Workbook from a Resource

This example loads a workbook from a named file. In the case of running in the application server, the file typically resides in the resources directory, and it's not on the caller's path. To cover this scenario, we provide the function 'load-workbook-from-resource' that takes a named resource as the parameter. After a minor modification, the same example will look like:

(->> (load-workbook-from-resource "spreadsheet.xlsx")
     (select-sheet "Price List")
     (select-columns {:A :name, :B :price}))

Example: Load a Workbook from a Stream

The function 'load-workbook' is a multimethod, and the first example takes a file name as a parameter. The overloaded version of 'load-workbook' takes an InputStream. This may be useful when uploading a workbook to the server over HTTP connection as multipart form data. In this case, the web framework passes a byte buffer, and the example should be modified as (note that you have to use 'with-open' to ensure that the stream will be closed):

(with-open [stream (clojure.java.io/input-stream bytes)]
  (->> (load-workbook stream)
       (select-sheet "Price List")
       (select-columns {:A :name, :B :price})))

Example: Create a spreadsheet

This example creates a spreadsheet with a single sheet named "Price List". It has three rows. We apply a style of yellow background colour and bold font to the top header row, then save the spreadsheet.

(use 'dk.ative.docjure.spreadsheet)

;; Create a spreadsheet and save it
(let [wb (create-workbook "Price List"
                          [["Name" "Price"]
                           ["Foo Widget" 100]
                           ["Bar Widget" 200]])
      sheet (select-sheet "Price List" wb)
      header-row (first (row-seq sheet))]
  (set-row-style! header-row (create-cell-style! wb {:background :yellow,
                                                     :font {:bold true}}))
  (save-workbook! "spreadsheet.xlsx" wb))

Example: Create a workbook with multiple sheets

This example creates a spreadsheet with multiple sheets. Simply add more sheet-name and data pairs. To create a sheet with no data, pass nil as the data argument.

(use 'dk.ative.docjure.spreadsheet)

;; Create a spreadsheet and save it
(let [wb (create-workbook "Squares"
                          [["N" "N^2"]
                           [1 1]
                           [2 4]
                           [3 9]]
                          "Cubes"
                          [["N" "N^3"]
                           [1 1]
                           [2 8]
                           [3 27]])]
   (save-workbook! "exponents.xlsx" wb))

Example: Use Excel Formulas in Clojure

Docjure allows you not only to evaluate a formula cell in a speadsheet, it also provides a way of exposing a formula in a cell as a Clojure function using the cell-fn function.

(use 'dk.ative.docjure.spreadsheet)
;; Load a speadsheet and take the first sheet, construct a function from cell A2, taking
;; A1 as input.
(def formula-from-a2 (cell-fn "A2"
                                  (first (sheet-seq (load-workbook "spreadsheet.xlsx")))
                                  "A1"))

;; Returns value of cell A2, as if value in cell A1 were 1.0
(formula-from-a2 1.0)

Example: Handling Error Cells

If the spreadsheet being read contains cells with errors the default behaviour of the library is to return a keyword representing the error as the cell value.

For example, given a spreadsheet with errors:

(use 'dk.ative.docjure.spreadsheet)

(def sample-cells (->> (load-workbook "spreadsheet.xlsx")
                       (sheet-seq)
                       (mapcat cell-seq)))

sample-cells

;=> (#<XSSFCell 15.0> #<XSSFCell NA()> #<XSSFCell 35.0> #<XSSFCell 13/0> #<XSSFCell 33.0> #<XSSFCell 96.0>)

Reading error cells, or cells that evaluate to an error (e.g. divide by zero) returns a keyword representing the type of error from read-cell.

(->> sample-cells
     (map read-cell))

;=> (15.0 :NA 35.0 :DIV0 33.0 96.0)

How you handle errors will depend on your application. You may want to replace specific errors with a default value and remove others for example:

(->> sample-cells
     (map read-cell)
     (replace {:DIV0 0.0})
     (remove keyword?))

;=> (15.0 35.0 0.0 33.0 96.0)

The following is a list of all possible error values:

#{:VALUE :DIV0 :CIRCULAR_REF :REF :NUM :NULL :FUNCTION_NOT_IMPLEMENTED :NAME :NA}

Example: Iterating over spreadsheet data

A note on sparse data

It's worth understanding a bit about the underlying structure of a spreadsheet before you start iterating over the contents.

Spreadsheets are designed to be sparse - not all rows in the spreadsheet must physically exist, and not all cells in a row must physically exist. This is how you can create data at ZZ:65535 without using huge amounts of storage.

Thus each cell can be in 3 states - with data, blank, or nonexistent (null). There's a special type CellType.BLANK for blank cells, but missing cells are just returned as nil.

Similarly rows can exist with cells, or exist but be empty, or they can not exist at all.

Prior to Docjure 1.11 the iteration functions wrapped the underlying Apache POI iterators, which skipped over missing data - this could cause surprising behaviour, especially when there were missing cells inside tabular data.

Since Docjure 1.11 iteration now returns nil values for missing rows and cells - this is a breaking change - any code that calls row-seq or cell-seq now needs to deal with possible nil values.

Iterating over rows

You can iterate over all the rows in a worksheet with row-seq:

(->> (load-workbook "test.xls")
     (select-sheet "first")
     row-seq)

This will return a sequence of org.apache.poi.usermodel.Row objects, or nil for any missing rows. You can use (remove nil? (row-seq ...) ) if you are happy to ignore missing rows, but then be aware the nth result in the sequence might not match the nth row in the spreadsheet.

Iterating over cells

You can iterate over all the cells in a row with cell-seq - this returns a sequence of org.apache.poi.usermodel.Cell objects, or nil for missing cells. Note that (read-cell nil) returns nil so it's safe to apply read-cell to the results of cell-seq

(->> (load-workbook "test.xls")
     (select-sheet "first")
     row-seq
     (remove nil?)
     (map cell-seq)
     (map #(map read-cell %)))

For example, if you run the above snippet on a sparse spreadsheet like:

First Name Middle Name Last Name
Edger Allen Poe
(missing row)
John (missing) Smith

Then it will return:

(("First Name" "Middle Name" "Last Name")
 ("Edger" "Allen" "Poe")
 ("John" nil "Smith"))

Formatting: Adjust Column Width to Contents

You can adjust the column widths to they fit the contents. This makes generated workbooks look nicer.

For example, to auto-size all the columns in all the sheets in a workbook, use this:

;; wb is a workbook
(dorun (for [sheet (sheet-seq wb)]
             (auto-size-all-columns! sheet)))

To apply auto-width to individual columns in a sheet, use the auto-size-column! function.

Automatically get the Docjure jar from Clojars

The Docjure jar is distributed on Clojars. Here you can find both release builds and snapshot builds of pre-release versions.

If you are using the Leiningen build tool just add this line to the :dependencies list in project.clj to use it:

[dk.ative/docjure "1.14.0"]

Remember to issue the 'lein deps' command to download it.

Example project.clj for using Docjure 1.17.0

(defproject some.cool/project "1.0.0-SNAPSHOT"
      :description "Spreadsheet magic using Docjure"
      :dependencies [[org.clojure/clojure "1.10.0"]
                     [dk.ative/docjure "1.17.0"]])

Installation for Contributors

You need to install the Leiningen build tool to build the library. You can get it here: Leiningen

The library uses the Apache POI library which will be downloaded by the "lein deps" command.

Then build the library:

 lein deps
 lein compile
 lein test

To run the tests on all supported Clojure versions use:

lein all test

To check for security issues use:

lein nvd check

To check for new versions of dependencies:

lein ancient

Releasing to Clojars

When releasing a version to Clojars you must provide your user-name. The password is a deployment token, not your normal password. You can generate this by logging into Clojars. These tokens have an expiration date so if it does not work, log in a check if you need a new token.

    lein deploy clojars

You also need a GPG key to sign the releases.

These also expire, and must be periodically renewed. You can check your keys and their status like this:

    gpg --list-keys

See also lein help deploy and lein help gpg.

Remember to tag releases in git. You can list the tags with git tag -n and tag with git tag -a TAGNAME -m 'Tag comment'.

Build Status

Build Status

License

Copyright (c) 2009-2022 Martin Jul

Docjure is licensed under the MIT License. See the LICENSE file for the license terms.

Docjure uses the Apache POI library, which is licensed under the Apache License v2.0.

For more information on Apache POI refer to the Apache POI web site.

Contact information

Martin Jul

Contributors

This library includes great contributions from

Thank you very much!

Honorary Mention

A special thank you also goes out to people that did not contribute code but shared their ideas, reported security issues or bugs and otherwise inspired the continuing work on the project.

docjure's People

Contributors

anttivi-solita avatar ative avatar axrs avatar bendisposto avatar cbaatz avatar cmiles74 avatar dpetranek avatar ekaitz-zarraga avatar fvides avatar harold avatar jarzka avatar jonneale avatar kornysietsma avatar lee-wonjun avatar macroz avatar madstap avatar manuelherzog avatar mauricioszabo avatar micsokoli avatar mjul avatar mva avatar naipmoro avatar nidu avatar oliverholworthy avatar orestis avatar ragnard avatar rossgibb avatar stuarth avatar trieloff avatar vijaykiran avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

docjure's Issues

Automatically use header row to read tabular sheet

I want to parse all the data from a tabular worksheet (one that has a header row) without having to type the column map for select-columns--like this:

(->> (load-workbook "file.xls")
     (select-sheet "Sheet1")
     (read-tabular-sheet))

...and get a list of maps like this:

({:arbitrary-col-name "a1" :another-header "b1"}
 {:arbitrary-col-name "a2" :another-header "b2"})

Please consider including the following read-tabular-sheet convenience function (or something like it) in docjure. I'll be happy to make a pull request if the idea suits you.

(ns read-tabular-sheet-proposal
  (:require [dk.ative.docjure.spreadsheet :refer :all]
            [camel-snake-kebab.core :as csk]))

(defn excel-col
  "Calculates Excel name for the given column index: 1 -> A, 26 -> Z, 27 -> AA"
  [i]
  (loop [d i
         c ""]
    (let [m (rem (- d 1) 26)]
      (if (> d 0)
        (recur (int(/ (- d m) 26))
               (str (char (+ 65 m)) c))
        c))))

(defn ->keys
  "Idiomatic keys from text in header row mapped to Excel col names."
  [names]
  (map #(hash-map (keyword (excel-col %1))
                  (keyword %2) )
       (take (count names) (rest (range)))
       (map csk/->kebab-case names)))

(defn col-headers
  "List of strings from the header row"
  [^org.apache.poi.hssf.usermodel.HSSFSheet ws]
  (->> (row-seq ws)
       first
       (.iterator)
       (iterator-seq)
       (map #(.toString %))))

(defn col-keys
  "Generate the key map for 'select-columns' based on the header row."
  [^org.apache.poi.hssf.usermodel.HSSFSheet ws]
  (->> (col-headers ws)
       (->keys)
       (apply merge)))

(defn read-tabular-sheet
  "Returns a Clojure map with  keys based on first row."
  [^org.apache.poi.hssf.usermodel.HSSFSheet ws]
  (rest (select-columns (col-keys ws) ws)))

Date formatting not correctly applied

Hi

There is an issue with date formatting, such that after a number of date values, the date formatting is not applied. This appears to be as a result of the workbook supporting a finite number of styles and after this is exceeded new styles not being applied. See issue described on Stack Overflow:

https://stackoverflow.com/questions/48319626/apache-poi-setting-date-format-not-always-properly-formatting-in-excel

As we can see in the function below styles are created dynamically, for each cell where the value is detected as being of a date type:

(defn apply-date-format! [^
Cell cell ^String format]
  (let [workbook (.. cell getSheet getWorkbook)
        date-style (.createCellStyle workbook)
        format-helper (.getCreationHelper workbook)]
    (.setDataFormat date-style
                    (.. format-helper createDataFormat (getFormat format)))
    (.setCellStyle cell date-style)))

The solution is to create the workbook date style once and then pass it into this function. I noticed reading some of the PRs that you're looking to use a more declarative approach, by using maps to pass around values. This is probably a good use case.

As a workaround have done as follows, not ideal, but fixes the issue:

(def workbook-style (atom nil))

(defn get-wb-style
  [workbook]
  (let [date-style (.createCellStyle workbook)
        format-helper (.getCreationHelper workbook)]
    (.setDataFormat date-style
                    (.. format-helper createDataFormat (getFormat "dd/MMM/yy;@")))
    {:style date-style :workbook workbook}))

(defn apply-date-format! [^Cell cell ^String format]
  (let [workbook (.. cell getSheet getWorkbook)]
    (when (or (nil? @workbook-style) (not= (:workbook @workbook-style) workbook))
      (reset! workbook-style (get-wb-style workbook)))
    (.setCellStyle cell (:style @workbook-style))))

(intern 'dk.ative.docjure.spreadsheet 'apply-date-format! apply-date-format!)

The alternative would have been to intern half the library to pass the style through when creating the workbook.

Hope this helpful :-)

Upgrade to Clojure 1.9

Upgrade and test with Clojure 1.9, and remember to update project.clj to run tests on all previous versions.

Reading Cell Values

Hi Martin,

Before I submit a pull request I'd like to get your thoughts on some changes so that the pull request can be merged easily.

There are two changes I'd like to propose both to fix behaviour around reading cells. From error cells and formula cells. The changes should allow you close Pull Requests #7 #8 #11 #31 and issues #21 #27.

Reading Cell Values

Error Cells

Pull requests #7, #8, and issues #21, #27 indicate that reading error cells has been a problem for some people (myself now included) in the form of an unhandled dispatch value in the multimethod read-cell.

Unhandled java.lang.IllegalArgumentException
No method in multimethod 'read-cell' for dispatch value: 5

Dispatch value 5 corresponds to the cell value CELL_TYPE_ERROR, the only value currently unhandled by the multimethod of the six cell types:

CELL_TYPE_BLANK
CELL_TYPE_BOOLEAN
CELL_TYPE_ERROR
CELL_TYPE_FORMULA
CELL_TYPE_NUMERIC
CELL_TYPE_STRING

I think it would be desirable to be able to call read-cell on any valid cell type without throwing an exception.

I would like to propose that the keyword value of the FormulaError Enumeration is returned:

i.e. one of the following:

:VALUE :DIV0 :CIRCULAR_REF :REF :NUM :NULL :FUNCTION_NOT_IMPLEMENTED :NAME :NA

Reading the value of an error cell returns a byte which can be used to find the corresponding formula error.

(defmethod read-cell Cell/CELL_TYPE_ERROR [^Cell cell]
  (keyword (.name (FormulaError/forInt (.getErrorCellValue cell)))))

(keyword (.name FormulaError/NA))
=> :NA

Formula Cells

Currently calling read-cell on a formula cell that evaluates to anything other than a numeric value will raise an exception.

Pull request #11 attempts to solve this by wrapping the body in a try-catch. Which helps with the exception, but that doesn't address the actual problem of not being able to read non-numeric formula cells. (e.g. text or error formula cells) without resulting in an IllegalStateException:

IllegalStateException Cannot get a numeric value from a error formula cell
IllegalStateException Cannot get a numeric value from a text formula cell

The DateUtil/isCellDateFormatted function is only valid for numeric type cells. So need to check this after evaluation of the formula when we know the type of cell value.

i.e. the following:

(defmethod read-cell Cell/CELL_TYPE_FORMULA   [^Cell cell]
  (let [evaluator (.. cell getSheet getWorkbook
                      getCreationHelper createFormulaEvaluator)
        cv (.evaluate evaluator cell)]
    (if (and (= Cell/CELL_TYPE_NUMERIC (.getCellType cv))
             (DateUtil/isCellDateFormatted cell))
      (.getDateCellValue cell)
      (read-cell-value cv false))))

This turns out to be the exact change someone has proposed with #31. However, that commit is littered with trailing whitespace changes in addition to the code change.

Whitespace

Pull request: #15 #16 #31 all include trailing whitespace changes that clutter the commits.

I'd like to propose a single commit to remove all existing trailing whitespace to improve clarity of future commits from contributors that forget to turn off their save/commit hooks that remove the whitespace.

— Oliver

New release needed to support auto-size-all-columns! documented in README.md

Hi, thanks for the great work.

Just wanted to make you aware that the documentation indicates that auto-size-all-columns! and other new functions are available, however there is currently no release where these functions are available. It seems that the latest release 1.18.0 is from 19. of August, but these functions were added and documented on the 27. of September. Any chance you could create a new release?

thanks

Writing on an empty cell

Hi!

I am struggling to write in an empty cell. I could not find in the documention. Here is my attempt:

(defn write [xl]
  (let [wb (docjure/load-workbook xl)
        sheet (docjure/select-sheet "Sheet1" wb)
        cell (docjure/select-cell "B6" sheet)
        wrte (docjure/set-cell! cell "Teste2")]
    (docjure/save-workbook! xlsx wb)))

In this example, the spreadsheet has a Sheet1 with no value in the cell B6.

Here was my output

       spreadsheet.clj:  240  dk.ative.docjure.spreadsheet/eval419/fn
          MultiFn.java:  234  clojure.lang.MultiFn/invoke
              AFn.java:  156  clojure.lang.AFn/applyToHelper
              AFn.java:  144  clojure.lang.AFn/applyTo
              Var.java:  705  clojure.lang.Var/applyTo
              core.clj:  665  clojure.core/apply
              core.clj:  660  clojure.core/apply
             debug.clj:  524  cider.nrepl.middleware.debug/apply-instrumented-maybe
             debug.clj:  520  cider.nrepl.middleware.debug/apply-instrumented-maybe
                  REPL:   30  ptc.core/eval7003/write
                  REPL:   35  ptc.core/eval7007
                  REPL:   35  ptc.core/eval7007
         Compiler.java: 7177  clojure.lang.Compiler/eval
         Compiler.java: 7132  clojure.lang.Compiler/eval
              core.clj: 3214  clojure.core/eval
              core.clj: 3210  clojure.core/eval
interruptible_eval.clj:   91  nrepl.middleware.interruptible-eval/evaluate/fn
              main.clj:  437  clojure.main/repl/read-eval-print/fn
              main.clj:  437  clojure.main/repl/read-eval-print
              main.clj:  458  clojure.main/repl/fn
              main.clj:  458  clojure.main/repl
              main.clj:  368  clojure.main/repl
           RestFn.java: 1523  clojure.lang.RestFn/invoke
interruptible_eval.clj:   84  nrepl.middleware.interruptible-eval/evaluate
interruptible_eval.clj:   56  nrepl.middleware.interruptible-eval/evaluate
interruptible_eval.clj:  155  nrepl.middleware.interruptible-eval/interruptible-eval/fn/fn
              AFn.java:   22  clojure.lang.AFn/run
           session.clj:  190  nrepl.middleware.session/session-exec/main-loop/fn
           session.clj:  189  nrepl.middleware.session/session-exec/main-loop
              AFn.java:   22  clojure.lang.AFn/run
           Thread.java:  834  java.lang.Thread/run

Analysing with the debugger, cell returned nil and broke on set-cell! call.

If anything else is needed from me, let me know.

Regards!

How can I change sheet to stream and save it into workbook

I have a big data to save.

So I have to loop to call add-rows! to sheet and then save the sheet to workbook

I guess the data is too big and the clojure throw OutOfMemoryError GC overhead limit exceeded Exception.

So I have to change the sheet into stream and save the workbook with outputstream .

What should I do?
Thanks.

create a workbook with multiple sheets

I'd like to be able to create a workbook with multiple sheets. I was able to get this working by re-writing the create-workbook function, would you be open to a pull request for this?

Generate documentation (on push)

It would be nice to generate a documentation with the doc comments for the code.

Ideally in a GitHub action to keep it always up to date.

A pull request would be appreciated.

read-cell date formulas as numeric

Hello. When you issue a (read-cell cell) for cell which contains formula that returns date - read-cell returns a number but not a date. Probably a check should be added in read-cell for CELL_TYPE_FORMULA.

Data format now being applied?

I'm having an issue where no matter if I specify :data-format it doesn't seem to get applied to the Excel file.

(let [workbook (excel/create-workbook "test" [[91596]])
      sheet (excel/select-sheet "test" workbook)
      cells (excel/cell-seq sheet)
      cell (nth cells 0)
      style (excel/create-cell-style! workbook {:data-format "#,##0"})]
    (excel/set-cell-style! cell style)
    (println cell) ;; #object[org.apache.poi.xssf.usermodel.XSSFCell 0x7594019e 91596.0]
    (println (-> style (.getDataFormatString))) ;; General
    (println (-> style (.getDataFormat))) ;; 0
    (excel/save-workbook! "testing.xlsx" workbook))

As you can see from the logs, it doesn't seem to be being passed down at all.

Expected Actual
91,596 91596

What am I doing wrong?

Generate workbook as a stream?

Hello,

Any tips on how to generate the workbook as a stream? From the code I see that we can deliver the complete workbook to a stream, but I'd like to have a stream of input rows and a stream of output workbook.

Suggestion - extra indirection level to support stream IO instead of just filenames

This will be useful for webapps, at least. I currently have the following defined to support this:

(defn load-workbook-from-stream
  "Load an Excel .xls or .xlsx workbook from a stream."
  [in-stream]
  (with-open [stream in-stream]
    (WorkbookFactory/create stream)))

(defn stream-workbook! 
  "Write the workbook into a stream."
  [#^Workbook workbook]
  (assert-type workbook Workbook)
  (with-open [out-stream (ByteArrayOutputStream.)]
    (.write workbook out-stream)
    (ByteArrayInputStream. (.toByteArray out-stream))))

XSSF specific features

I needed to support hex code colors instead of the existing colors in IndexedColors, but in order to get that to work in clojure I had to change the type hints of the create-cell-style! function to be XSSFWorkbook because for some reason clojure wasn't finding the method of XSSFCellStyle that accepted XSSFColor as an argument even though I had verified that the classes (and poi version) were correct.

Is it important to you that you support HSSF? or should I submit a PR with my changes?

Suggestion - read the entire workbook contents, instead of just the selected cell-names

In order to provide a complete reader, I needed this function. Maybe you'll want to use it in docjure, or write a similar one (note that I made keywords, but that's not exactly required):

(defn process-workbook [#^Workbook wb]
  "Returns a vector with maps for each sheet (a name and a vector with maps for each row, in the form [{:a1 valA1, :c2 valC2, ...}, ...]."
  (vec
   (for [sheet (sheet-seq wb)]
     (hash-map :name (sheet-name sheet)
           :rows (let [rows (vec
                 (for [row (into-seq sheet)]
                   (map read-cell row)))
               headers (map keyword (first rows))
               data (rest rows)]
               (map #(zipmap headers %) data))))))

Can we select columns by custom column name ?

Thanks for great library. I tried select-column with custom name in excel but it doesn't seem to work. Is there anyway to make it work ? I'm happy to contribute, I need some direction since I'm not familiar with the java lib.

Can't seem to modify .xls file

Hello!

I'm new to Clojure so I'm sure this is my fault but I'm just kind of too clueless to figure out what might be causing it.

Right now I have two simple functions for testing reading/writing for an excel file.

(defn a []
  (let [workbook (load-workbook "/home/ubikation/src/clojure/excel-test/products.xls")
        a1 (-> workbook (.getSheetAt 0) (.getRow 0) (.getCell 0))]
    (set-cell! a1 "foo")
    (println (.getStringCellValue a1))))

(defn z []
  (let [workbook (load-workbook "/home/ubikation/src/clojure/excel-test/products.xls")
        a1 (-> workbook (.getSheetAt 0) (.getRow 0) (.getCell 0))]
    (println (.getStringCellValue a1))))

The first one works, and prints "foo" but the second one will always return the initial value of the xls file.

Perhaps there is a non-local file bind? I'm not really sure what's going on or how to figure out what I'm doing wrong.

Sorry if this is the wrong location for this.

Read one xls, append to another - IllegalArgumentException No matching method found: setCellValue

Hi,

I'm not sure if this is a bug in my code, or not:

user=> (use 'dk.ative.docjure.spreadsheet)
nil
user=> (def destinationDoc "/Users/nathan/Desktop/Compiled.xls")

'user/destinationDoc

user=> (def dwb (load-workbook destinationDoc))

'user/dwb

user=> (def swb (load-workbook "/Users/nathan/Desktop/test.xls"))

'user/swb

user=> (def newdata (rest (row-seq (select-sheet "Data" swb))))

'user/newdata

user=> (add-rows! (select-sheet "Data" dwb) newdata)
IllegalArgumentException No matching method found: setCellValue for class org.apache.poi.hssf.usermodel.HSSFCell clojure.lang.Reflector.invokeMatchingMethod (Reflector.java:79)

The source spreadsheet has two lines of text, the destination only has a header row. Where should I look? Or is there a more appropriate place to ask this question?

Using org.clojure/clojure "1.3.0" and dk.ative/docjure "1.5.0"

Thanks,
Nathan

Numeric text string being read as float

Hi,

Probably a newbie question:

I have a cell set as Text, say 9A5655897 then when I read with select-columns I get "9A5655897"

But if I have 91371672 then I get 9.13712672E8

Is there a way of forcing a read as a string?

I am on Mac OS if that makes a difference.

Thanks

Get index with select-columns

Hi there! Very nice lib!

Quick question: I would like to know if it is possible to get the index value with other column values. For example:


(->> (load-workbook "spreadsheet.xlsx")
     (select-sheet "Price List")
     (select-columns {:Index? :A :name, :B :price}))

;=> [{:Index? 0 :name "Foo Widget", :price 100}, {:Index? 1 :name "Bar Widget", :price 200}]

Where :Index? would be a key to reference it.

I tried with :Index with no luck.

Thx!

Error upgrading to 0.13 using java 1.8

Not sure if occurring also with Java 1.9, in case documentation should state necessity to upgrade.
The error is:

java.lang.RuntimeException: Unable to find static field: CELL_TYPE_BOOLEAN in interface org.apache.poi.ss.usermodel.Cell

Pardon the approximation, I haven't investigated this further (happy 0.12 user).

SKEW formula in Excel results in exception instead of :FUNCTION_NOT_IMPLEMENTED

I have a simple Excel worksheet that has a formula cell with the function SKEW. Screenshot below:
image

Trying to read in this Excel file will result in an exception using the following code, which is mostly straight from the README.

(defn load-xls
  "Load Excel spreadsheet and return a list of lists"
  [fname]
  (->> (xls/load-workbook fname)
       (xls/sheet-seq)
       (first)
       xls/row-seq
       (remove nil?)
       (map xls/cell-seq)
       (map #(map xls/read-cell %))))

The result is:

((1.0)Error printing return value (NotImplementedFunctionException) at org.apache.poi.ss.formula.functions.NotImplementedFunction/evaluate (NotImplementedFunction.java:40).
SKEW

I believe this should result in the cell to have the keyword value :FUNCTION_NOT_IMPLEMENTED.

Error loading workbook

clj-spelling-list-parser.core> (spreadsheet/load-workbook "wachat_20150410.xlsx")
                                                clojure.core/eval                           core.clj: 3081
                                                              ...                                         
                          clj-spelling-list-parser.core/eval26754   form-init5197475672653279465.clj:  112
                                                              ...                                         
                         dk.ative.docjure.spreadsheet/eval4923/fn                    spreadsheet.clj:   79
             dk.ative.docjure.spreadsheet/load-workbook-from-file                    spreadsheet.clj:   64
           dk.ative.docjure.spreadsheet/load-workbook-from-stream                    spreadsheet.clj:   58
               org.apache.poi.ss.usermodel.WorkbookFactory.create               WorkbookFactory.java:   87
                     org.apache.poi.openxml4j.opc.OPCPackage.open                    OPCPackage.java:  272
                   org.apache.poi.openxml4j.opc.ZipPackage.<init>                    ZipPackage.java:   88
org.apache.poi.openxml4j.util.ZipInputStreamZipEntrySource.<init>  ZipInputStreamZipEntrySource.java:   51
                        java.util.zip.ZipInputStream.getNextEntry                ZipInputStream.java:  122
                             java.util.zip.ZipInputStream.readLOC                ZipInputStream.java:  310
java.util.zip.ZipException: only DEFLATED entries can have EXT descriptor

Error writing back time with 1/0/1900 date

Hello. Found an interesting issue. If you enter the into Excel cell time only (e.g. "3:10") - it will be converted to "1/0/1900 3:10:00 AM" (Long date format). If you read it with read-cell - you'll get "1899-12-31T21:10:00.000-00:00" (due to my timezone) date in Clojure (ha, silly Clojure doesn't know about Jan 0). So when you write it back - it writes just -1 into the cell (probably because Excel can't handle dates before 1900).

So when just reading a value and writing it back you lose some data - looks bad to me.

You could easily reproduce it by creating Excel workbook with top-left cell value "3:10" (it will be converted to date-time), then execute following:

(let [fname "path_to_file.xlsx"
      wb (load-workbook fname)
      sh (.getSheetAt wb 0)
      c-in (.. sh (getRow 0) (getCell 0))]
  (add-row! sh [(read-cell c-in)])
  (save-workbook! fname wb))

What do you think about it?

P.S.: Jan 0 is event mentioned here.

Exception on reading a value from cell with error

Hello. When i'm reading a value from a cell with an error using read-cell i get the following exception:

IllegalArgumentException No method in multimethod 'read-cell-value' for dispatch value: 5  
clojure.lang.MultiFn.getFn (MultiFn.java:160)

Apparently that's because read-cell-value doesn't handle Cell/CELL_TYPE_ERROR. Is it a bug or should i treat erroneous cells differently?

Expose quotePrefixed style attribute in `create-cell-style!`

I noticed that when outputting Excel, you shouldn't really prefix the content with a single quote e.g. '=1+2 (this was a valid strategy when outputting CSV).

Since POI/docjure will actually set the value as a string, and not as a formula, Excel will display it correctly. But, if you try to edit some text that begins with e.g. - then Excel will complain and ask you to prefix manually.

You can avoid that, by setting the quotePrefixed attribute in the cell style. It would be nice to expose that as a keyword in the create-cell-style! function.

Web Development

What you're doing here is fantastic! Are there any issues with creating | editing spreadsheets on the web? I'm converting my History of the World website (history.clmitchell.net) to a clojure/hoplon-based web app and would LOVE to use an Excel spreadsheet as my database from which to display data on the web page!

Applying styles to multiple sheets?

Does anybody have an example of how to apply styles to multiple sheets?

Basically - I need the first row of every sheet (I don't necessarily know the names of the sheets-so need to iterate) to be bolded.

I was able to modify the existing example to do this - but only in the REPL - couldn't work out how to this within a function.

I am probably going about this the wrong way - so just wanted a example to put me on the right path.

I'm creating a spreadsheet using:

(apply create-spreadsheet ...) where "..." is structure that matches what is needed by the function. (That bit works). But then I need to iterate the created sheets in the workbook - and then apply styles.

Thanks

  • Great library by the way.

Make `dk.ative.docjure.spreadsheet/create-date-format` public

Hello,

Im trying to write the code to stream a jdbc-next result, of typeIReduceInit, row by row during the reduction into a Sheet.

Got this error The maximum number of Cell Styles was exceeded. You can define up to 64000 style in a .xlsx Workbook.
Looking into this repo, I've found this PR that fixes the problem by memoizing the date creation, but only if Im using add-rows!.

(binding [xls/create-date-format (memoize xls/create-date-format)]
  (reduce (fn [_ row]
            (xls/add-row! sheet row))
          nil
          result))

I tried to do the same wrapping in my code but create-date-format is a private var and I can reference it from the outside.
I achieved what I wanted copying the code to my namespace and overriding dk.ative.docjure.spreadsheet/set-cell with (defmethod dk.ative.docjure.spreadsheet/set-cell! Date ... but I was wondering if this one could be made public

thanks a lot!

way to merge cells ?

Hi,

I'm new to the docjure lib and I need to apply a specific format over cells in a spreadsheet that I write.

In particular, I need to merge some cells like here (StackOverflow)
or at least let the contents on the first cell overflow into the next ones...

Is there a feature for this that I missed, or an example of interop with POI that I could
use ?

Thanks,

Gudesh

Very large files - Laziness ?

This is more of a question, rather than issue. I tried to read around the documents, but couldn't find much, and I am not that familiar with Apache POI .. For very large files, do we get a lazy sequence, or is the entire result-set stored in memory ?

I have used the csv libraries of clojure, and they all return lazy sequence, so I had to be careful about holding on to the file-reader until the sequence was consumed. Not sure what is the situation with this library.

CVE-2019-12415

Hi there,

using [dk.ative/docjure "1.12.0"] will bring in CVE-2019-12415, as lein-nvd would indicate.

That is fixed with [org.apache.poi/poi "4.1.1"], but one cannot pull that change without incurring into #82, so #81 would be a great start.

cc/ @manuelherzog

Possible to style a spreadsheet?

Is it possible to style a spreadsheet using docjure?

Or, would it be better to just inject into an already styled workbook / spreadsheet?

Reuse similar-looking CellStyles

A common issue when using Excel is that there is a limit of 64000 styles in an Excel document.

This means that at some point you need to reuse styles, you can't just be creating the same cell style and applying it. It gets worse if you're mutating already existing cell styles, e.g. to add a number format etc.

Apache POI has a CellUtil class that supposedly handles this, but during my testing it seemed buggy. What it does essentially is to just create a data structure out of a CellStyle, and iterate through the styles on the workbook to see if a class already exists.

Are you interested in adding a new function like (add-cell-style! cell style-map) that would do the same? Since create-cell-style! already accepts a data structure, the new function would support also generating the same data structure from an existing CellStyle.

unable to read cells with formula referring to string value

The code for read-cell first checks the date format, but during that check, POI throws because it's assuming already a numeric cell type.
It's probably rather a POI bug, but I was wondering why the date check is needed in the first place.

V2 with a more composition-friendly and configurable API

Rewrite the API to work on a Clojure map containing the POI instance rather than the POI instance itself.

This will allow us to add configuration and other state to the map as well (e.g. a cursor, the current sheet, names of the workflow (text) styles and other information).

Please use the v2 branch for this as it will not be fully backwards compatible if client code is depending on the leaky abstraction of the POI instances being passed around for state.

There is some discussion of this in Pull Request #57 .
cc: @tombooth

Write to a particular, specified cell

Looking for ways to write to an already specified cell
But didn't see anything like this
Combining add-row! and select-cell worked.
Is there a better way?

(defn add-sel-row! [n ^Sheet sheet values]
(assert-type sheet Sheet)
(let [cellref (CellReference. n)
r (.getRow cellref)
col (.getCol cellref)
row (.createRow sheet r)]
(doseq [[column-index value] (map-indexed #(list %1 %2) values)]
(set-cell! (.createCell row col) value))
row))

Output a formula

If I try to output a worksheet, how do I make the formulas work? for example:

(defn gen-excel-smple []
  (let [wb (create-workbook "Price List"
                              [["Name" "Price"]
                               ["Foo Widget" 100]
                               ["Bar Widget" 200]
                               ["Total" "=sum(B2,B3)"]])   ; <<<<<<<<<<<<< FORMULA
          sheet (select-sheet "Price List" wb)
          header-row (first (row-seq sheet))]
      (set-row-style! header-row (create-cell-style! wb {:background :yellow,
                                                         :font {:bold true}}))
      (save-workbook! "ss.xlsx" wb)))

The line with the Total how can I make =sum(B2,B3) work?

Incorrect value retrieving cell value when COUNTIFS is used

Hi Martin

We have a cell containing a formula which returns a weird result whenread-cell is called on it. I suspect this is a known POI issue, although I couldn't find any mention of it. We can work around it by using the underlying getNumericCellValue which I appreciate is probably not a great catch-all solution particularly for non-numeric formulae, but wonder whether you happen to know what's going on and whether you could point towards a fix.

We have seen this issue specifically in cases when the COUNTIFS function is being called with multiple predicates.

Given the sheet below:
workbook.xlsx

Which for reference, looks like this:

screen shot 2017-01-05 at 16 34 29

I would expect the below code snippet to return the numerical value in cell B2, which is 0:

(ns docjure-test.core
  (:require [dk.ative.docjure.spreadsheet :as docjure]))

(defn read-file [path]
  (let [sheet (->> path 
                   (docjure/load-workbook)
                   (docjure/sheet-seq)
                   first)
        cell  (docjure/select-cell "B2" sheet)]
    (docjure/read-cell cell)))

Instead, when I evaluate this code. I get 2.0

If we replace the call to read-cell with the raw getNumericCellValue, as below:

(defn read-file [path]
  (let [sheet (->> path 
                   (docjure/load-workbook)
                   (docjure/sheet-seq)
                   first)
        cell  (docjure/select-cell "B2" sheet)]
    (. cell getNumericCellValue)))

The value 0.0 is returned as expected.

If there's no simple way to fix this issue, is there any way to throw an exception in this case? It would be better for us to retrieve no value instead of the wrong value.

Thanks

Exceptions while reading XLSX

I have a confidential Excel document that I want to read data from.
It is a set of 3 document, the other two work fine.

If I just read the document as I got it, it tells me it has trouble with CELL_TYPE_ERROR. There appears to be a partial patch for this, from a few years ago.

java.lang.IllegalArgumentException: No method in multimethod 'read-cell' for dispatch value: 5
              MultiFn.java:160 clojure.lang.MultiFn.getFn
              MultiFn.java:227 clojure.lang.MultiFn.invoke
             spreadglob.clj:11 excelgraph.spreadglob/row-values[fn]
             spreadglob.clj:10 excelgraph.spreadglob/row-values[fn]
               LazySeq.java:42 clojure.lang.LazySeq.sval
               LazySeq.java:60 clojure.lang.LazySeq.seq
                   RT.java:484 clojure.lang.RT.seq
                  core.clj:133 clojure.core/seq
                 core.clj:2490 clojure.core/map[fn]
               LazySeq.java:42 clojure.lang.LazySeq.sval
               LazySeq.java:60 clojure.lang.LazySeq.seq
                   RT.java:484 clojure.lang.RT.seq
                  core.clj:133 clojure.core/seq
              protocols.clj:30 clojure.core.protocols/seq-reduce
              protocols.clj:54 clojure.core.protocols/fn
              protocols.clj:13 clojure.core.protocols/fn[fn]
                 core.clj:6177 clojure.core/reduce
                 core.clj:6229 clojure.core/into
             spreadglob.clj:18 excelgraph.spreadglob/parse-sheet[fn]
               LazySeq.java:42 clojure.lang.LazySeq.sval
               LazySeq.java:67 clojure.lang.LazySeq.seq
                   RT.java:484 clojure.lang.RT.seq
                  core.clj:133 clojure.core/seq
                 core.clj:6574 clojure.core/map-indexed[fn]
               LazySeq.java:42 clojure.lang.LazySeq.sval
               LazySeq.java:60 clojure.lang.LazySeq.seq
                   RT.java:484 clojure.lang.RT.seq
                  core.clj:133 clojure.core/seq
              protocols.clj:30 clojure.core.protocols/seq-reduce
              protocols.clj:54 clojure.core.protocols/fn
              protocols.clj:13 clojure.core.protocols/fn[fn]
                 core.clj:6177 clojure.core/reduce
           relationalize.clj:7 excelgraph.relationalize/build-index
                   core.clj:23 neoviz.core/fn
                   core.clj:94 compojure.core/make-route[fn]
                   core.clj:40 compojure.core/if-route[fn]
                   core.clj:25 compojure.core/if-method[fn]
                  core.clj:107 compojure.core/routing[fn]
                 core.clj:2443 clojure.core/some
                  core.clj:107 compojure.core/routing
               RestFn.java:139 clojure.lang.RestFn.applyTo
                  core.clj:619 clojure.core/apply
                  core.clj:112 compojure.core/routes[fn]
         keyword_params.clj:32 ring.middleware.keyword-params/wrap-keyword-params[fn]
          nested_params.clj:70 ring.middleware.nested-params/wrap-nested-params[fn]
                 params.clj:58 ring.middleware.params/wrap-params[fn]
      multipart_params.clj:107 ring.middleware.multipart-params/wrap-multipart-params[fn]
                  flash.clj:31 ring.middleware.flash/wrap-flash[fn]
                session.clj:85 ring.middleware.session/wrap-session[fn]
                   json.clj:42 ring.middleware.json/wrap-json-response[fn]
                  Var.java:415 clojure.lang.Var.invoke
                 reload.clj:18 ring.middleware.reload/wrap-reload[fn]
             stacktrace.clj:17 ring.middleware.stacktrace/wrap-stacktrace-log[fn]
             stacktrace.clj:80 ring.middleware.stacktrace/wrap-stacktrace-web[fn]
                  jetty.clj:18 ring.adapter.jetty/proxy-handler[fn]
              (Unknown Source) ring.adapter.jetty.proxy$org.eclipse.jetty.server.handler.AbstractHandler$0.handle
       HandlerWrapper.java:116 org.eclipse.jetty.server.handler.HandlerWrapper.handle
               Server.java:363 org.eclipse.jetty.server.Server.handle
AbstractHttpConnection.java:483 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest
AbstractHttpConnection.java:931 org.eclipse.jetty.server.AbstractHttpConnection.content
AbstractHttpConnection.java:992 org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content
           HttpParser.java:856 org.eclipse.jetty.http.HttpParser.parseNext
           HttpParser.java:240 org.eclipse.jetty.http.HttpParser.parseAvailable
   AsyncHttpConnection.java:82 org.eclipse.jetty.server.AsyncHttpConnection.handle
SelectChannelEndPoint.java:628 org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle
 SelectChannelEndPoint.java:52 org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run
     QueuedThreadPool.java:608 org.eclipse.jetty.util.thread.QueuedThreadPool.runJob
     QueuedThreadPool.java:543 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run
               Thread.java:724 java.lang.Thread.run

I don't run Windows, so I opened the document in LibreOffice. To make sure that LibreOffice in itself does not change things, I just hit save.

It did change. It appears that this error is due to some unsupported type of formula in Apache POI.

java.lang.RuntimeException: Unexpected ptg class (org.apache.poi.ss.formula.ptg.ArrayPtg)
          WorkbookEvaluator.java:683 org.apache.poi.ss.formula.WorkbookEvaluator.getEvalForPtg
          WorkbookEvaluator.java:527 org.apache.poi.ss.formula.WorkbookEvaluator.evaluateFormula
          WorkbookEvaluator.java:288 org.apache.poi.ss.formula.WorkbookEvaluator.evaluateAny
          WorkbookEvaluator.java:230 org.apache.poi.ss.formula.WorkbookEvaluator.evaluate
       XSSFFormulaEvaluator.java:264 org.apache.poi.xssf.usermodel.XSSFFormulaEvaluator.evaluateFormulaCellValue
       XSSFFormulaEvaluator.java:117 org.apache.poi.xssf.usermodel.XSSFFormulaEvaluator.evaluate
                    (Unknown Source) sun.reflect.NativeMethodAccessorImpl.invoke0
    NativeMethodAccessorImpl.java:57 sun.reflect.NativeMethodAccessorImpl.invoke
DelegatingMethodAccessorImpl.java:43 sun.reflect.DelegatingMethodAccessorImpl.invoke
                     Method.java:606 java.lang.reflect.Method.invoke
                   Reflector.java:93 clojure.lang.Reflector.invokeMatchingMethod
                   Reflector.java:28 clojure.lang.Reflector.invokeInstanceMethod
                  spreadsheet.clj:31 dk.ative.docjure.spreadsheet/eval5881[fn]
                    MultiFn.java:227 clojure.lang.MultiFn.invoke
                   spreadglob.clj:11 excelgraph.spreadglob/row-values[fn]
                   spreadglob.clj:10 excelgraph.spreadglob/row-values[fn]
                     LazySeq.java:42 clojure.lang.LazySeq.sval
                     LazySeq.java:60 clojure.lang.LazySeq.seq
                         RT.java:484 clojure.lang.RT.seq
                        core.clj:133 clojure.core/seq
                       core.clj:2490 clojure.core/map[fn]
                     LazySeq.java:42 clojure.lang.LazySeq.sval
                     LazySeq.java:60 clojure.lang.LazySeq.seq
                         RT.java:484 clojure.lang.RT.seq
                        core.clj:133 clojure.core/seq
                    protocols.clj:30 clojure.core.protocols/seq-reduce
                    protocols.clj:54 clojure.core.protocols/fn
                    protocols.clj:13 clojure.core.protocols/fn[fn]
                       core.clj:6177 clojure.core/reduce
                       core.clj:6229 clojure.core/into
                   spreadglob.clj:18 excelgraph.spreadglob/parse-sheet[fn]
                     LazySeq.java:42 clojure.lang.LazySeq.sval
                     LazySeq.java:67 clojure.lang.LazySeq.seq
                         RT.java:484 clojure.lang.RT.seq
                        core.clj:133 clojure.core/seq
                       core.clj:6574 clojure.core/map-indexed[fn]
                     LazySeq.java:42 clojure.lang.LazySeq.sval
                     LazySeq.java:60 clojure.lang.LazySeq.seq
                         RT.java:484 clojure.lang.RT.seq
                        core.clj:133 clojure.core/seq
                    protocols.clj:30 clojure.core.protocols/seq-reduce
                    protocols.clj:54 clojure.core.protocols/fn
                    protocols.clj:13 clojure.core.protocols/fn[fn]
                       core.clj:6177 clojure.core/reduce
                 relationalize.clj:7 excelgraph.relationalize/build-index
                         core.clj:23 neoviz.core/fn
                         core.clj:94 compojure.core/make-route[fn]
                         core.clj:40 compojure.core/if-route[fn]
                         core.clj:25 compojure.core/if-method[fn]
                        core.clj:107 compojure.core/routing[fn]
                       core.clj:2443 clojure.core/some
                        core.clj:107 compojure.core/routing
                     RestFn.java:139 clojure.lang.RestFn.applyTo
                        core.clj:619 clojure.core/apply
                        core.clj:112 compojure.core/routes[fn]
               keyword_params.clj:32 ring.middleware.keyword-params/wrap-keyword-params[fn]
                nested_params.clj:70 ring.middleware.nested-params/wrap-nested-params[fn]
                       params.clj:58 ring.middleware.params/wrap-params[fn]
            multipart_params.clj:107 ring.middleware.multipart-params/wrap-multipart-params[fn]
                        flash.clj:31 ring.middleware.flash/wrap-flash[fn]
                      session.clj:85 ring.middleware.session/wrap-session[fn]
                         json.clj:42 ring.middleware.json/wrap-json-response[fn]
                        Var.java:415 clojure.lang.Var.invoke
                       reload.clj:18 ring.middleware.reload/wrap-reload[fn]
                   stacktrace.clj:17 ring.middleware.stacktrace/wrap-stacktrace-log[fn]
                   stacktrace.clj:80 ring.middleware.stacktrace/wrap-stacktrace-web[fn]
                        jetty.clj:18 ring.adapter.jetty/proxy-handler[fn]
                    (Unknown Source) ring.adapter.jetty.proxy$org.eclipse.jetty.server.handler.AbstractHandler$0.handle
             HandlerWrapper.java:116 org.eclipse.jetty.server.handler.HandlerWrapper.handle
                     Server.java:363 org.eclipse.jetty.server.Server.handle
     AbstractHttpConnection.java:483 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest
     AbstractHttpConnection.java:931 org.eclipse.jetty.server.AbstractHttpConnection.content
     AbstractHttpConnection.java:992 org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content
                 HttpParser.java:856 org.eclipse.jetty.http.HttpParser.parseNext
                 HttpParser.java:240 org.eclipse.jetty.http.HttpParser.parseAvailable
         AsyncHttpConnection.java:82 org.eclipse.jetty.server.AsyncHttpConnection.handle
      SelectChannelEndPoint.java:628 org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle
       SelectChannelEndPoint.java:52 org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run
           QueuedThreadPool.java:608 org.eclipse.jetty.util.thread.QueuedThreadPool.runJob
           QueuedThreadPool.java:543 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run
                     Thread.java:724 java.lang.Thread.run

I don't think there are any tricky formula in this document. They are just rows of data, from the looks of it. Just to be sure, I copied the whole thing, and did paste special, and pasted only text and numbers. This resulted in

org.apache.poi.ss.formula.FormulaParseException: Parse error near char 0 '' in specified formula ''. Expected cell ref or constant literal
                      FormulaParser.java:219 org.apache.poi.ss.formula.FormulaParser.expected
                     FormulaParser.java:1124 org.apache.poi.ss.formula.FormulaParser.parseSimpleFactor
                     FormulaParser.java:1079 org.apache.poi.ss.formula.FormulaParser.percentFactor
                     FormulaParser.java:1066 org.apache.poi.ss.formula.FormulaParser.powerFactor
                     FormulaParser.java:1426 org.apache.poi.ss.formula.FormulaParser.Term
                     FormulaParser.java:1526 org.apache.poi.ss.formula.FormulaParser.additiveExpression
                     FormulaParser.java:1510 org.apache.poi.ss.formula.FormulaParser.concatExpression
                     FormulaParser.java:1467 org.apache.poi.ss.formula.FormulaParser.comparisonExpression
                     FormulaParser.java:1447 org.apache.poi.ss.formula.FormulaParser.unionExpression
                     FormulaParser.java:1568 org.apache.poi.ss.formula.FormulaParser.parse
                      FormulaParser.java:176 org.apache.poi.ss.formula.FormulaParser.parse
             XSSFEvaluationWorkbook.java:148 org.apache.poi.xssf.usermodel.XSSFEvaluationWorkbook.getFormulaTokens
                  WorkbookEvaluator.java:286 org.apache.poi.ss.formula.WorkbookEvaluator.evaluateAny
                  WorkbookEvaluator.java:230 org.apache.poi.ss.formula.WorkbookEvaluator.evaluate
               XSSFFormulaEvaluator.java:264 org.apache.poi.xssf.usermodel.XSSFFormulaEvaluator.evaluateFormulaCellValue
               XSSFFormulaEvaluator.java:117 org.apache.poi.xssf.usermodel.XSSFFormulaEvaluator.evaluate
                            (Unknown Source) sun.reflect.NativeMethodAccessorImpl.invoke0
            NativeMethodAccessorImpl.java:57 sun.reflect.NativeMethodAccessorImpl.invoke
        DelegatingMethodAccessorImpl.java:43 sun.reflect.DelegatingMethodAccessorImpl.invoke
                             Method.java:606 java.lang.reflect.Method.invoke
                           Reflector.java:93 clojure.lang.Reflector.invokeMatchingMethod
                           Reflector.java:28 clojure.lang.Reflector.invokeInstanceMethod
                          spreadsheet.clj:31 dk.ative.docjure.spreadsheet/eval5881[fn]
                            MultiFn.java:227 clojure.lang.MultiFn.invoke
                           spreadglob.clj:11 excelgraph.spreadglob/row-values[fn]
                           spreadglob.clj:10 excelgraph.spreadglob/row-values[fn]
                             LazySeq.java:42 clojure.lang.LazySeq.sval
                             LazySeq.java:60 clojure.lang.LazySeq.seq
                                 RT.java:484 clojure.lang.RT.seq
                                core.clj:133 clojure.core/seq
                               core.clj:2490 clojure.core/map[fn]
                             LazySeq.java:42 clojure.lang.LazySeq.sval
                             LazySeq.java:60 clojure.lang.LazySeq.seq
                                 RT.java:484 clojure.lang.RT.seq
                                core.clj:133 clojure.core/seq
                            protocols.clj:30 clojure.core.protocols/seq-reduce
                            protocols.clj:54 clojure.core.protocols/fn
                            protocols.clj:13 clojure.core.protocols/fn[fn]
                               core.clj:6177 clojure.core/reduce
                               core.clj:6229 clojure.core/into
                           spreadglob.clj:18 excelgraph.spreadglob/parse-sheet[fn]
                             LazySeq.java:42 clojure.lang.LazySeq.sval
                             LazySeq.java:67 clojure.lang.LazySeq.seq
                                 RT.java:484 clojure.lang.RT.seq
                                core.clj:133 clojure.core/seq
                               core.clj:6574 clojure.core/map-indexed[fn]
                             LazySeq.java:42 clojure.lang.LazySeq.sval
                             LazySeq.java:60 clojure.lang.LazySeq.seq
                                 RT.java:484 clojure.lang.RT.seq
                                core.clj:133 clojure.core/seq
                            protocols.clj:30 clojure.core.protocols/seq-reduce
                            protocols.clj:54 clojure.core.protocols/fn
                            protocols.clj:13 clojure.core.protocols/fn[fn]
                               core.clj:6177 clojure.core/reduce
                         relationalize.clj:7 excelgraph.relationalize/build-index
                                 core.clj:23 neoviz.core/fn
                                 core.clj:94 compojure.core/make-route[fn]
                                 core.clj:40 compojure.core/if-route[fn]
                                 core.clj:25 compojure.core/if-method[fn]
                                core.clj:107 compojure.core/routing[fn]
                               core.clj:2443 clojure.core/some
                                core.clj:107 compojure.core/routing
                             RestFn.java:139 clojure.lang.RestFn.applyTo
                                core.clj:619 clojure.core/apply
                                core.clj:112 compojure.core/routes[fn]
                       keyword_params.clj:32 ring.middleware.keyword-params/wrap-keyword-params[fn]
                        nested_params.clj:70 ring.middleware.nested-params/wrap-nested-params[fn]
                               params.clj:58 ring.middleware.params/wrap-params[fn]
                    multipart_params.clj:107 ring.middleware.multipart-params/wrap-multipart-params[fn]
                                flash.clj:31 ring.middleware.flash/wrap-flash[fn]
                              session.clj:85 ring.middleware.session/wrap-session[fn]
                                 json.clj:42 ring.middleware.json/wrap-json-response[fn]
                                Var.java:415 clojure.lang.Var.invoke
                               reload.clj:18 ring.middleware.reload/wrap-reload[fn]
                           stacktrace.clj:17 ring.middleware.stacktrace/wrap-stacktrace-log[fn]
                           stacktrace.clj:80 ring.middleware.stacktrace/wrap-stacktrace-web[fn]
                                jetty.clj:18 ring.adapter.jetty/proxy-handler[fn]
                            (Unknown Source) ring.adapter.jetty.proxy$org.eclipse.jetty.server.handler.AbstractHandler$0.handle
                     HandlerWrapper.java:116 org.eclipse.jetty.server.handler.HandlerWrapper.handle
                             Server.java:363 org.eclipse.jetty.server.Server.handle
             AbstractHttpConnection.java:483 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest
             AbstractHttpConnection.java:931 org.eclipse.jetty.server.AbstractHttpConnection.content
             AbstractHttpConnection.java:992 org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content
                         HttpParser.java:856 org.eclipse.jetty.http.HttpParser.parseNext
                         HttpParser.java:240 org.eclipse.jetty.http.HttpParser.parseAvailable
                 AsyncHttpConnection.java:82 org.eclipse.jetty.server.AsyncHttpConnection.handle
              SelectChannelEndPoint.java:628 org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle
               SelectChannelEndPoint.java:52 org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run
                   QueuedThreadPool.java:608 org.eclipse.jetty.util.thread.QueuedThreadPool.runJob
                   QueuedThreadPool.java:543 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run
                             Thread.java:724 java.lang.Thread.run

Excel Escape Sequences

For posterity:

There is a known issue with Excel escape sequences of the form _x[0-9a-fA-F]{4}_

https://bz.apache.org/bugzilla/show_bug.cgi?id=57008

Demonstrated with this:

user> (use 'dk.ative.docjure.spreadsheet)

user> (let [wb (create-workbook "Test"
                                [["foo_x1070_bar"]])]
        (save-workbook! "/tmp/spreadsheet.xlsx" wb))

The _x1070_ gets replaced by something unprintable:

image

To prevent this the escape sequence _x005F must be prepended:

user> (let [wb (create-workbook "Test"
                                [["foo_x005F_x1070_bar"]])]
        (save-workbook! "/tmp/spreadsheet.xlsx" wb))

image

The function does the escaping, should be passed a sequence of sequences.

user> (defn spreadsheet-escape [data]
        (for [row data]
          (map #(if (string? %)
                  (clojure.string/replace
                   %
                   #"(?i)(_x[0-9a-f]{4}_)"
                   "_x005F$1")
                  %)
               row)))
#'user/spreadsheet-escape
user> (spreadsheet-escape [["foo_x1070_bar"]])
(("foo_x005F_x1070_bar"))

Blank rows/cells are omitted from sequences and from select-columns

This is basically the same as the issue my colleague nurous raised against clj-excel:
mebaran/clj-excel#5

The root problem is that iterators in POI skips blank rows and cells. As a result, a blank row won't be returned in a row-seq call, and a blank cell won't be returned in a cell-seq call. Worse, when running functions like select-columns the results can be quite wrong due to these problems.

(note that this is for truly blank cells - a spreadsheet seems to be implemented somewhere as a sparse array. Cells where you've entered data previously, and then deleted the data, may still be returned by these functions)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.