Giter Club home page Giter Club logo

semantic-csv's People

Contributors

adamfrey avatar alphaho avatar bzg avatar dhruvbhatia avatar gitter-badger avatar jonyepsilon avatar jumarko avatar mahinshaw avatar metasoarous avatar sbelak avatar the-alchemist avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

semantic-csv's Issues

Project status?

Is this library still under development? Last commit was over a year ago. Last release was an alpha release. No judgment or criticism intended--just trying to figure out whether it's wise to use semantic-csv in a new project or not.

Dependency bump of clojurescript to remove vulnerabilities

semantic-csv has a few vulnerable sub-dependencies flagged by Snyk. Most seem to be in an old version of clojurescript and bumping will probably fix them. The following were identified ...

gson 2.7 needs bumping to >2.8.9
guava 20.0 needs bumping to >24.1.1
protobuf-java 3.0.2 needs bumping to >3.16.3

The following is the dependency graph snippet for semantic-csv got from clojure -Stree

semantic-csv/semantic-csv 0.2.0
  . org.clojure/clojurescript 1.9.493
    . com.google.javascript/closure-compiler-unshaded v20170218
      . com.google.javascript/closure-compiler-externs v20170218
      . args4j/args4j 2.33
      . com.google.guava/guava 20.0 // << --- VULNERABLE
      . com.google.protobuf/protobuf-java 3.0.2 // << --- VULNERABLE
      . com.google.code.gson/gson 2.7 // << --- VULNERABLE
      . com.google.code.findbugs/jsr305 3.0.1
      . com.google.jsinterop/jsinterop-annotations 1.0.0
    . org.clojure/google-closure-library 0.0-20160609-f42b4a24
      . org.clojure/google-closure-library-third-party 0.0-20160609-f42b4a24
    X org.clojure/data.json 0.2.6 :older-version
    . org.mozilla/rhino 1.7R5
    X org.clojure/tools.reader 1.0.0-beta3 :use-top
  . clojure-csv/clojure-csv 2.0.1

Could not locate semantic_csv__init.class or semantic_csv.clj on classpath.

Hello,

I added [semantic-csv "0.1.0"] to project.clj's :dependencies and then ran lein deps. Unfortunately, lein repl complains about 'FileNotFoundException: Could not locate semantic_csv__init.class or semantic_csv.clj on classpath.' I've tried downgrading my Clojure version from 1.8.0 to 1.5.0 to no avail.

Any suggestions?

lein deps :tree shows:

 [clojure-complete "0.2.4" :exclusions [[org.clojure/clojure]]]
 [clojure-csv "2.0.1"]
 [mysql/mysql-connector-java "5.1.18"]
 [org.clojure/clojure "1.8.0"]
 [org.clojure/data.csv "0.1.3"]
 [org.clojure/data.json "0.2.6"]
 [org.clojure/java.jdbc "0.6.2-alpha3"]
 [org.clojure/tools.nrepl "0.2.12" :exclusions [[org.clojure/clojure]]]
 [semantic-csv "0.1.0"]

My project.clj file:

(defproject ganges "0.1.0-SNAPSHOT"
  :description "FIXME: write description"
  :url "http://example.com/FIXME"
  :license {:name "Eclipse Public License"
            :url "http://www.eclipse.org/legal/epl-v10.html"}
  :dependencies [[org.clojure/clojure "1.8.0"]
    [semantic-csv "0.1.0"]
    [clojure-csv/clojure-csv "2.0.1"]
    [org.clojure/data.csv "0.1.3"]
    [org.clojure/java.jdbc "0.6.2-alpha3"]
    [mysql/mysql-connector-java "5.1.18"]
    [org.clojure/data.json "0.2.6"]]
  :main ^:skip-aot ganges.core
  :target-path "target/%s"
  :profiles {:uberjar {:aot :all}})

Here is the full stacktrace:

#error {
 :cause Could not locate semantic_csv__init.class or semantic_csv.clj on classpath. Please check that namespaces with dashes use underscores in the Clojure file name.
 :via
 [{:type clojure.lang.Compiler$CompilerException
   :message java.io.FileNotFoundException: Could not locate semantic_csv__init.class or semantic_csv.clj on classpath. Please check that namespaces with dashes use underscores in the Clojure file name., compiling:(ganges/core.clj:8:1)
   :at [clojure.lang.Compiler load Compiler.java 7391]}
  {:type java.io.FileNotFoundException
   :message Could not locate semantic_csv__init.class or semantic_csv.clj on classpath. Please check that namespaces with dashes use underscores in the Clojure file name.
   :at [clojure.lang.RT load RT.java 456]}]
 :trace
 [[clojure.lang.RT load RT.java 456]
  [clojure.lang.RT load RT.java 419]
  [clojure.core$load$fn__5677 invoke core.clj 5893]
  [clojure.core$load invokeStatic core.clj 5892]
  [clojure.core$load doInvoke core.clj 5876]
  [clojure.lang.RestFn invoke RestFn.java 408]
  [clojure.core$load_one invokeStatic core.clj 5697]
  [clojure.core$load_one invoke core.clj 5692]
  [clojure.core$load_lib$fn__5626 invoke core.clj 5737]
  [clojure.core$load_lib invokeStatic core.clj 5736]
  [clojure.core$load_lib doInvoke core.clj 5717]
  [clojure.lang.RestFn applyTo RestFn.java 142]
  [clojure.core$apply invokeStatic core.clj 648]
  [clojure.core$load_libs invokeStatic core.clj 5774]
  [clojure.core$load_libs doInvoke core.clj 5758]
  [clojure.lang.RestFn applyTo RestFn.java 137]
  [clojure.core$apply invokeStatic core.clj 648]
  [clojure.core$require invokeStatic core.clj 5796]
  [clojure.core$require doInvoke core.clj 5796]
  [clojure.lang.RestFn invoke RestFn.java 408]
  [ganges.core$eval751 invokeStatic core.clj 8]
  [ganges.core$eval751 invoke core.clj 8]
  [clojure.lang.Compiler eval Compiler.java 6927]
  [clojure.lang.Compiler load Compiler.java 7379]
  [clojure.lang.RT loadResourceScript RT.java 372]
  [clojure.lang.RT loadResourceScript RT.java 363]
  [clojure.lang.RT load RT.java 453]
  [clojure.lang.RT load RT.java 419]
  [clojure.core$load$fn__5677 invoke core.clj 5893]
  [clojure.core$load invokeStatic core.clj 5892]
  [clojure.core$load doInvoke core.clj 5876]
  [clojure.lang.RestFn invoke RestFn.java 408]
  [clojure.core$load_one invokeStatic core.clj 5697]
  [clojure.core$load_one invoke core.clj 5692]
  [clojure.core$load_lib$fn__5626 invoke core.clj 5737]
  [clojure.core$load_lib invokeStatic core.clj 5736]
  [clojure.core$load_lib doInvoke core.clj 5717]
  [clojure.lang.RestFn applyTo RestFn.java 142]
  [clojure.core$apply invokeStatic core.clj 648]
  [clojure.core$load_libs invokeStatic core.clj 5774]
  [clojure.core$load_libs doInvoke core.clj 5758]
  [clojure.lang.RestFn applyTo RestFn.java 137]
  [clojure.core$apply invokeStatic core.clj 648]
  [clojure.core$require invokeStatic core.clj 5796]
  [clojure.core$require doInvoke core.clj 5796]
  [clojure.lang.RestFn invoke RestFn.java 408]
  [user$eval5 invokeStatic form-init7900099664318909146.clj 1]
  [user$eval5 invoke form-init7900099664318909146.clj 1]
  [clojure.lang.Compiler eval Compiler.java 6927]
  [clojure.lang.Compiler eval Compiler.java 6916]
  [clojure.lang.Compiler eval Compiler.java 6916]
  [clojure.lang.Compiler load Compiler.java 7379]
  [clojure.lang.Compiler loadFile Compiler.java 7317]
  [clojure.main$load_script invokeStatic main.clj 275]
  [clojure.main$init_opt invokeStatic main.clj 277]
  [clojure.main$init_opt invoke main.clj 277]
  [clojure.main$initialize invokeStatic main.clj 308]
  [clojure.main$null_opt invokeStatic main.clj 342]
  [clojure.main$null_opt invoke main.clj 339]
  [clojure.main$main invokeStatic main.clj 421]
  [clojure.main$main doInvoke main.clj 384]
  [clojure.lang.RestFn invoke RestFn.java 421]
  [clojure.lang.Var invoke Var.java 383]
  [clojure.lang.AFn applyToHelper AFn.java 156]
  [clojure.lang.Var applyTo Var.java 700]
  [clojure.main main main.java 37]]}

Option :keyify is ignored

When you specify both :keyify and :header options to mappify, then :keyify seems to be ignored: the output maps are always produced with "symbolized" column names.

Btw, the source code for "mappify" shown at http://metasoarous.github.io/semantic-csv/ actually works fine, but the actual (transducer-based) implementation has the above issue.

mappify should set missing values (for rows shorter than header) to nil

I found it weird and undocomunted but with the following malformed CSV (first two lines have less commas than the header)

sepal_length,sepal_width,petal_length,petal_width,label,id,test_train
5.8,4,1.2,0.2,15
4.8,3,1.4,
4.3,3,1.1,0.1,Iris-setosa,14,train

Using the mappify method will produce the following:

{:sepal_length "5.8", :sepal_width "4", :petal_length "1.2", :petal_width "0.2", :label "15"}
{:sepal_length "4.8", :sepal_width "3", :petal_length "1.4", :petal_width ""}
{:sepal_length "4.3", :sepal_width "3", :petal_length "1.1", :petal_width "0.1", :label "Iris-setosa", :id "14", :test_train "train"}

As you can see some rows are smaller than others, totally missing from the mappified results. I was expecting that all rows will have the same size, with nil values when something is missing from the CSV.

Bare in mind that using {:structs true} will produce the expected results:

{:sepal_length "5.8", :sepal_width "4", :petal_length "1.2", :petal_width "0.2", :label "15", :id nil, :test_train nil}
{:sepal_length "4.8", :sepal_width "3", :petal_length "1.4", :petal_width "", :label nil, :id nil, :test_train nil}
{:sepal_length "4.3", :sepal_width "3", :petal_length "1.1", :petal_width "0.1", :label "Iris-setosa", :id "14", :test_train "train"}

I have some other issues using structs but I will probably open another issue when I can get a reproducible environment.

I'll open up a pull request with a fix I have made in order to fix this.

:cast-fns stack traces hide which cast-fn caused error

Set up

A csv file titled "example.csv" that contains:

first-column,second-column
X,1

A clojure file called example/csv.clj that contains:

(ns example.csv
  (:require [semantic-csv.core :as sc]))

(defn manually-cast-columns
  [col]
  (assoc col
         :first-column (sc/->int (:first-column col))
         :second-column (sc/->int (:second-column col))))

(defn process-with-cast-fns []
  (sc/slurp-csv "example.csv"
                :cast-fns {:first-column sc/->int
                           :second-column sc/->int}))

(defn process-manually []
  (->> (sc/slurp-csv "example.csv")
       (mapv manually-cast-columns)))

(defn -main
  [& args]
  (case (first args)
    "cast" (process-with-cast-fns)
    "man" (process-manually)))

Issue

If I have the malformed csv file above (an "X" in a column that is supposed to be a number) and call process-with-cast-fns, it will fail. The stack trace outputs a NullPointerException, and the line indicating where the failure happened is just the slurp-csv call.

Caused by: java.lang.NullPointerException
	at clojure.core$partial$fn__5561.invoke(core.clj:2616)
	at clojure.lang.AFn.applyToHelper(AFn.java:154)
	at clojure.lang.RestFn.applyTo(RestFn.java:132)
	at clojure.core$apply.invokeStatic(core.clj:659)
	at clojure.core$update_in$up__6562.invoke(core.clj:6105)
	at clojure.core$update_in.invokeStatic(core.clj:6106)
	at clojure.core$update_in.doInvoke(core.clj:6092)
	at clojure.lang.RestFn.invoke(RestFn.java:445)
	at semantic_csv.impl.core$row_val_caster$fn__250.invoke(core.cljc:43)
	at clojure.core.protocols$iter_reduce.invokeStatic(protocols.clj:49)
	at clojure.core.protocols$fn__7841.invokeStatic(protocols.clj:75)
	at clojure.core.protocols$fn__7841.invoke(protocols.clj:75)
	at clojure.core.protocols$fn__7781$G__7776__7794.invoke(protocols.clj:13)
	at clojure.core$reduce.invokeStatic(core.clj:6748)
	at clojure.core$reduce.invoke(core.clj:6730)
	at semantic_csv.impl.core$cast_row.invokeStatic(core.cljc:62)
	at semantic_csv.impl.core$cast_row.doInvoke(core.cljc:46)
	at clojure.lang.RestFn.invoke(RestFn.java:521)
	at semantic_csv.transducers$cast_with$fn__333$fn__334.invoke(transducers.cljc:167)
	at semantic_csv.transducers$mappify$fn__313$fn__314.invoke(transducers.cljc:42)
	at clojure.core$filter$fn__5610$fn__5611.invoke(core.clj:2798)
	at clojure.lang.TransformerIterator.step(TransformerIterator.java:79)
	at clojure.lang.TransformerIterator.hasNext(TransformerIterator.java:97)
	at clojure.lang.RT.chunkIteratorSeq(RT.java:510)
	at clojure.core$sequence.invokeStatic(core.clj:2654)
	at clojure.core$sequence.invoke(core.clj:2639)
	at semantic_csv.core$parse_and_process.invokeStatic(core.cljc:285)
	at semantic_csv.core$parse_and_process.doInvoke(core.cljc:277)
	at clojure.lang.RestFn.invoke(RestFn.java:439)
	at clojure.core$partial$fn__5561.invoke(core.clj:2617)
	at clojure.lang.AFn.applyToHelper(AFn.java:156)
	at clojure.lang.RestFn.applyTo(RestFn.java:132)
	at clojure.core$apply.invokeStatic(core.clj:657)
	at clojure.core$apply.invoke(core.clj:652)
	at semantic_csv.impl.core$apply_kwargs.invokeStatic(core.cljc:17)
	at semantic_csv.impl.core$apply_kwargs.doInvoke(core.cljc:13)
	at clojure.lang.RestFn.invoke(RestFn.java:439)
	at semantic_csv.core$slurp_csv.invokeStatic(core.cljc:305)
	at semantic_csv.core$slurp_csv.doInvoke(core.cljc:298)
	at clojure.lang.RestFn.invoke(RestFn.java:439)
	at example.csv$process_with_cast_fns.invokeStatic(core.clj:11)
	at example.csv$process_with_cast_fns.invoke(core.clj:10)
	at example.csv$_main.invokeStatic(core.clj:22)
	at example.csv$_main.doInvoke(core.clj:19)
	at clojure.lang.RestFn.invoke(RestFn.java:408)
	at clojure.lang.Var.invoke(Var.java:381)
	at user$eval149.invokeStatic(form-init6465213592991168308.clj:1)
	at user$eval149.invoke(form-init6465213592991168308.clj:1)
	at clojure.lang.Compiler.eval(Compiler.java:7062)
	at clojure.lang.Compiler.eval(Compiler.java:7052)
	at clojure.lang.Compiler.load(Compiler.java:7514)
	... 12 more

If I call process-manually, it will fail as well. The error that is thrown is instead a java.lang.NumberFormatException, and the line indicating where the failure happened points directly to the correct sc/->int call.

Caused by: java.lang.NumberFormatException: For input string: "X"
	at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
	at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
	at java.lang.Double.parseDouble(Double.java:538)
	at semantic_csv.casters$__GT_int.invokeStatic(casters.cljc:41)
	at semantic_csv.casters$__GT_int.invoke(casters.cljc:31)
	at semantic_csv.casters$__GT_int.invokeStatic(casters.cljc:38)
	at semantic_csv.casters$__GT_int.invoke(casters.cljc:31)
	at example.csv$manually_cast_columns.invokeStatic(core.clj:7)
	at example.csv$manually_cast_columns.invoke(core.clj:4)
	at clojure.core$mapv$fn__8088.invoke(core.clj:6832)
	at clojure.lang.ArrayChunk.reduce(ArrayChunk.java:58)
	at clojure.core.protocols$fn__7847.invokeStatic(protocols.clj:136)
	at clojure.core.protocols$fn__7847.invoke(protocols.clj:124)
	at clojure.core.protocols$fn__7807$G__7802__7816.invoke(protocols.clj:19)
	at clojure.core.protocols$seq_reduce.invokeStatic(protocols.clj:31)
	at clojure.core.protocols$fn__7835.invokeStatic(protocols.clj:75)
	at clojure.core.protocols$fn__7835.invoke(protocols.clj:75)
	at clojure.core.protocols$fn__7781$G__7776__7794.invoke(protocols.clj:13)
	at clojure.core$reduce.invokeStatic(core.clj:6748)
	at clojure.core$mapv.invokeStatic(core.clj:6823)
	at clojure.core$mapv.invoke(core.clj:6823)
	at example.csv$process_manually.invokeStatic(core.clj:17)
	at example.csv$process_manually.invoke(core.clj:15)
	at example.csv$_main.invokeStatic(core.clj:23)
	at example.csv$_main.doInvoke(core.clj:19)
	at clojure.lang.RestFn.invoke(RestFn.java:408)
	at clojure.lang.Var.invoke(Var.java:381)
	at user$eval149.invokeStatic(form-init6579145103885863466.clj:1)
	at user$eval149.invoke(form-init6579145103885863466.clj:1)
	at clojure.lang.Compiler.eval(Compiler.java:7062)
	at clojure.lang.Compiler.eval(Compiler.java:7052)
	at clojure.lang.Compiler.load(Compiler.java:7514)
	... 12 more

This is a contrived example to show the difference, but in my actual app, I have 20+ columns and hundreds of row, so figuring out exactly which column is failing is much more difficult. I suspect it's much faster to use :cast-fns, but the errors returned by the stack traces from manual casting are way more helpful, as they point directly to the function call where I attempt to cast bad input and display the given bad value.

I don't know what's possible in this library, but it would be very helpful if somehow the :cast-fns processing could return the raw exceptions instead of swallowing them and outputting only an NPE with no indication of which cast-fn or which value it failed on.

Thanks so much.

reads pdf files without errors

Just wanted to clarify if it is a wontfix. Tested with https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=2ahUKEwiS68eHqtXoAhUCJKwKHVsdBEQQFjAAegQIBRAB&url=https%3A%2F%2Fwww.uscis.gov%2Fsystem%2Ffiles_force%2Ffiles%2Fform%2Fi-864-pc.pdf%3Fdownload%3D1&usg=AOvVaw0e6nfkkBM2ozKOqWJSg_aO

(semantic-csv.core/slurp-csv "i-864-pc.pdf")
=> 
({:%PDF-1.7
%���� "2 0 obj"}
 {:%PDF-1.7
%���� "<</Filter/FlateDecode/Length 3312>>stream"}
 {:%PDF-1.7
%����
  "�[N]����\b�E�/�'@��4<'�;�8��������r�Xte�fo$PR��0ӥ?kh��h��/��M���._q$ne����]�^�H\\s-C�6տ.[�٥���lB���\"��`I!�׵.����|����ӗ��z�[˒��~E{8��>�\\阣��$�bp��\bR�l��G�E\b����_��~EpQ��b�U���"}
 {:%PDF-1.7
%����
  "㭥�a_�~ߕ��60����A�j�����M�$�X��*�3˒����WI����B�x�������l�c�d���unRd��k���z��&x�w��Y���[���E*F�C����#�PvPS 8�������sZ\\�j����LW�Qѯ��<�!���G�u�����)��E�\b���n"}

When cell is a vector, (string/join ", " cell)

Is that possible?

If I have this data:
(def data [{:id 1 :tags ["tag1" "tag2"]} {:id 2 :tags ["tag3" "tag4"]}])

I want this csv output

-------------------
| id | tags       |
-------------------
|  1 | tag1, tag2 |
-------------------
|  2 | tag3, tag4 |
-------------------

Do you accept other separators than , ?

Could not find it in the docs.

I am a complete clj-newb, otherwise I would have made a PR.

I guess I could use the regular csv-reader and mappify, but would love the slurpers and spitters to accept different (input/output) separators.

(The divine (python) pandas library even accepts all regexes as separators - like \s+ to read both spaces and tabs as a separator, but that might be hard to implement here, dunno...)

Consider adding a `csvcat` function of some sort

If you have multiple files with header/column-name data in different orders in the various files, tools like csvstack from python's csvkit fail to handle things gracefully. I already have a running sketch working in another project but there are still some questions about what this should look like in here. Also, as implemented in my project, it's not lazy, and there is possibly a little bit of thinking there about the right way to do this (some testing is required...) due to the way Clojure deals with caching lazy sequences.

Transducers!

It would be awesome to have the various processing functions available as transducers. Not sure what the best way of doing this is without abandoning support for older Clojures, but it's something worth looking into. Also, some question of whether it should just use it's own namespace, or whether it should do some sort of argument dispatch.

Keyify header names

As pointed out in the README, would be nice to write (:more row) as apposed to (get row "more").

Create column type "sniffer"

Would look through and make a guess about what type each column should be. Should be triggered via a :sniff true flag, and be overrideable by casters specified in :cast-fns.

Headers to lower case

I recently found myself needing and writing a function to make headers a consistent case for key lookups in a CSV I don't have control over. I'm not sure if it's something that fits into the library itself but I'm happy to submit a patch if it sounds useful.

spit-csv: write header, even if there are no rows

Hi,
I don't know if the current behaviour is the expected behaviour and that my expectation is wrong, or if this is a bug :) , but when I use spit-csv with a custom header (via :header ) and :prepend-header true, I would expect that the written file always gets a header, even if there are no rows.
The current behaviour is that if there are no rows, no header is written even if a :header is given and :prepend-header is true

I can write a workaround in my code, but I wonder if this should be fixed in spit-csv instead?

Add a csv writer

This should be along the lines of python's csv.DictWriter, but also have an option for the positional writing.

Should be able to do something like

(let [data [{:this "a" :that "b"}
            {:this "c" :that "d"}
      writer (csv/writer "filename.csv" :columns [:this :that])]
  (doseq [x (data)]
    (csv/write! x)))

Positional writing could actually be triggered by not specifying :columns. Missing entries should be handled gracefully, but perhaps with options for raising in case of such happening.

Add slurp-async so cljs has a slurp analog

Will need to load a csv parser. Papa parse seems popular in js land, and it's in cljsjs repos. I tried to get it to work on another project once, but couldn't quite figure out the whole cljsjs thing (could have tried harder I'm sure).

This should probably be defined for both clj and cljs, since why not.

(extracted from #45)

Apply process' :cast-fn to headers

We have a CSV table which contains lines (incl. the header) like:

name                                ,key

It appears that semantic-csv currently turns 'name ' into ':name ', i.e. a keyword containing spaces. It would be nice if there was a way to apply some :cast-fn to the header, too, before it is going to be keywordized.

Experiment with adding header metadata output of `slurp-csv` and others

Some questions about how this would work:

  • Would it just be the top level collection containers, or would it be included on individual records?
    • What are the performance implications here?
    • Would different levels be opt-in?
  • How would this compose between calls? (Metadata doesn't always come out the other end of a transform the way you expect it to)
  • Does this hint at there being a need for a higher level structure for passing information around?
  • Does the order information captured by structs make this unecessary?

Value is that this would allow you to write out data in the columnar order it came in with.

Simpler comment filtering

Using regular expressions is perhaps a bit unnecessary most of the time. Would be nice to just specify a single character.

Row protocols?

With vectors, maps and possibly structs (see #8) as possible row types, it might be nice to have some protocols around which various row operations can be defined for clarity. With this could even allow lists/sequences (though these would have poor performance characteristics for some things, could be better than erroring out on a get somewhere).

tranduce to file abstraction?

@mahinshaw I just found this note I left myself... Not sure if it still makes sense or not but here it is:

There was this chunk of code in the transducer work:

   (transduce (comp (batch batch-size)
                    (map #(impl/apply-kwargs csv/write-csv % writer-opts)))
              (fn [w rowstr]
                (.write w rowstr)
                w)
              file
              vect-rows)))))

Should this be abstracted into something more general? Does this make any sense?

New transducers/spit-csv is broken

@mahinshaw The tests are failing for transducer/spit-csv now... having a bit of trouble figuring out why. Would you mind taking a look?

BTW, I pushed some changes to the implementation there that tries to compose all of the processing into a single transducer: 7068082. The tests were failing even prior to this though.

Add `:transform-header` option to mappify

This would just be a function applied to each one of the headers column names to get a new column name. Returning nil should leave unaltered, so that a map could be used easily for transforming a subset of the header names. There is some question here though about how this would play with the :keywordify option.

Test cljs code

Would be good to have tests for the relevant cljs code using doo, cljs.test and whatever else folks use for this sort of thing. I got stuck trying to do this a couple of months ago, so help from someone who's done it before would be super appreciated.

(extracted from #45)

Compile for ClojureScript via cljc

We're assuming Clojure 1.7 now anyway to get transducers, so reader conditionals may as well come along for the ride!

For now, we'll probably leave the slurp and spit functions out, since we won't have access to clojure-csv from cljs. And async will make things a little weird. But would be nice to eventually offer something in this realm.

Add the ability to mappify without consuming first row

If you have CSV data without a header in it, you might want to specify a header manually, and not consume the first row of the actual data (since it's not a header). Right now this could be done with

(->> ...
     (cons ["the" "header" "row"])
     (mappify))

So maybe it's not something worth worrying about. Could alias cons to add-header though, so folks don't have to think about it.

Optional core.async helpers

It would be nice to have something like slurp-async (see #54) that puts each row one by one on a go channel, perhaps using the transducers and pipeline. I don't want to make a hard requirement on core.async though, so we should follow the route of mpdairy/posh's Reagent plugin, so that the relevant function(s)/ns(s) are only defined if core.async is available. I don't know if this means a separate ns or not; Probably best not to define one unless necessary (or unless we realize there are enough such helpers that it actually makes sense to, which I doubt will be the case, since a lot of the logic gets generalized nicely with transducers).

(extracted from #45)

More robust cast functions

Currently ->int & co. die on empty strings & nils making them impractical to work with sparse data. Is this by design or do you accept patches to make them more robust?

Better cloning?

We have a really cool semantic-csv.impl.core/clone-var macro that expands to a def which refers to the original var, but copied over the docstring, argslist and everything. I don't have the patience right now to get something that does "the best we can" in making this cross compatible with cljs, because portable cljs macros are nutty af. I did a little work in this direction though, and here are some of the things I've come up with:

This would maybe be the most conservative thing, and seems to work for the clj side at least. I don't yet know however if the cljs part works (either ClojureScript JVM or ClojureScript JS). There's a good chance it may not for the reasons mentioned in the portable cljs blog post (the reader cond is expanding at macro compile time, so thing may not work quite right there for JVM targeting JS, but JS targeting JS might).

(defmacro clone-var
  "Clone the var pointed to by fsym into current ns such that arglists, name and doc metadata are preserned."
  [fsym]
  #?(:clj
     (let [v (resolve fsym)
           m (subset-map (meta v) [:arglists :name :doc])
           m (update m :arglists (fn [arglists] (list 'quote arglists)))]
       `(def ~(vary-meta (:name m) (constantly m)) ~fsym))
     :cljs
     `(def ~(symbol (name fsym)) ~fsym)))

There's also the following, which tries to use the tricks from the above blog post, but it isn't even working for clj as is. And the blog post is kind of focused on a particular part of the general problem which is getting the vars to resolve properly.

(defmacro clone-var
  "Clone the var pointed to by fsym into current ns such that arglists, name and doc metadata are preserned."
  [fsym]
  `(if-cljs
     ~(let [v (resolve fsym)
            m (subset-map (meta v) [:arglists :name :doc])
            m (update m :arglists (fn [arglists] (list 'quote arglists)))]
        `(def ~(vary-meta (:name m) (constantly m)) ~fsym))
     `(def ~(symbol (name fsym)) ~fsym)))

I'm not really interested in optimizing this any more until we get some cljs testing set up with doo or whatever (see #56). For now I'm just going to do a top level reader conditional that decides whether to manually call (def the-var path.to.actual/the-var) a bunch of times, or use the macro. The only downside is loosing docstrings for cljs, which isn't the end of the world.

(Relates to #45)

Improved error handling within casting functions

Right now, the casting functions fail if anything goes wrong. It would be nice to handle things more gracefully. Here's what I'm thinking:

  1. By default, let exceptions raise
  2. Allow specification of an :exception-handler (fn [colname value] ...) that can do whatever it wants and return a value to be used in the output for that row/col
  3. A separate option for simply specifying a default value (like nil) to return when parsing fails
  4. Another option for leaving out columns that don't parse

Note that 3) could be accomplished with 2), and 4) could be implemented with either 2) or 3) together with a secondary filter step. As such these are somewhat less high in priority and may have to wait.

Add function for slurping stream into core.matrix dataset

Won't be lazy. Should probably behave somewhat like mappify, in assuming first row to be header, while accepting a :header opt to be more manual about it. Should probably also accept a seq of maps or arrays. Maybe call datasetify?

Append mode for spit-csv function

Only makes sense for spit, since otherwise users should just use file mode to make sense of things. But for spit, would be nice if in addition to the obvious thing we could read in the existing file's header and do the right thing accordingly.

Column subsetting/removing

Removing columns is pretty easy with maps (->> ... (map #(dissoc % :this :that)) ...), but subsetting isn't a one liner. And even removing positionally on vector rows isn't immediately obvious. So it would be nice to have some abstractions to take care of this.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.