metasoarous / semantic-csv Goto Github PK
View Code? Open in Web Editor NEWHigher level tools for working with CSV data and files
Home Page: http://metasoarous.github.io/semantic-csv
License: Eclipse Public License 1.0
Higher level tools for working with CSV data and files
Home Page: http://metasoarous.github.io/semantic-csv
License: Eclipse Public License 1.0
Is this library still under development? Last commit was over a year ago. Last release was an alpha release. No judgment or criticism intended--just trying to figure out whether it's wise to use semantic-csv in a new project or not.
semantic-csv
has a few vulnerable sub-dependencies flagged by Snyk. Most seem to be in an old version of clojurescript
and bumping will probably fix them. The following were identified ...
gson
2.7 needs bumping to >2.8.9
guava
20.0 needs bumping to >24.1.1
protobuf-java
3.0.2 needs bumping to >3.16.3
The following is the dependency graph snippet for semantic-csv
got from clojure -Stree
semantic-csv/semantic-csv 0.2.0
. org.clojure/clojurescript 1.9.493
. com.google.javascript/closure-compiler-unshaded v20170218
. com.google.javascript/closure-compiler-externs v20170218
. args4j/args4j 2.33
. com.google.guava/guava 20.0 // << --- VULNERABLE
. com.google.protobuf/protobuf-java 3.0.2 // << --- VULNERABLE
. com.google.code.gson/gson 2.7 // << --- VULNERABLE
. com.google.code.findbugs/jsr305 3.0.1
. com.google.jsinterop/jsinterop-annotations 1.0.0
. org.clojure/google-closure-library 0.0-20160609-f42b4a24
. org.clojure/google-closure-library-third-party 0.0-20160609-f42b4a24
X org.clojure/data.json 0.2.6 :older-version
. org.mozilla/rhino 1.7R5
X org.clojure/tools.reader 1.0.0-beta3 :use-top
. clojure-csv/clojure-csv 2.0.1
Hello,
I added [semantic-csv "0.1.0"] to project.clj's :dependencies and then ran lein deps
. Unfortunately, lein repl
complains about 'FileNotFoundException: Could not locate semantic_csv__init.class or semantic_csv.clj on classpath.' I've tried downgrading my Clojure version from 1.8.0 to 1.5.0 to no avail.
Any suggestions?
lein deps :tree
shows:
[clojure-complete "0.2.4" :exclusions [[org.clojure/clojure]]]
[clojure-csv "2.0.1"]
[mysql/mysql-connector-java "5.1.18"]
[org.clojure/clojure "1.8.0"]
[org.clojure/data.csv "0.1.3"]
[org.clojure/data.json "0.2.6"]
[org.clojure/java.jdbc "0.6.2-alpha3"]
[org.clojure/tools.nrepl "0.2.12" :exclusions [[org.clojure/clojure]]]
[semantic-csv "0.1.0"]
My project.clj file:
(defproject ganges "0.1.0-SNAPSHOT"
:description "FIXME: write description"
:url "http://example.com/FIXME"
:license {:name "Eclipse Public License"
:url "http://www.eclipse.org/legal/epl-v10.html"}
:dependencies [[org.clojure/clojure "1.8.0"]
[semantic-csv "0.1.0"]
[clojure-csv/clojure-csv "2.0.1"]
[org.clojure/data.csv "0.1.3"]
[org.clojure/java.jdbc "0.6.2-alpha3"]
[mysql/mysql-connector-java "5.1.18"]
[org.clojure/data.json "0.2.6"]]
:main ^:skip-aot ganges.core
:target-path "target/%s"
:profiles {:uberjar {:aot :all}})
Here is the full stacktrace:
#error {
:cause Could not locate semantic_csv__init.class or semantic_csv.clj on classpath. Please check that namespaces with dashes use underscores in the Clojure file name.
:via
[{:type clojure.lang.Compiler$CompilerException
:message java.io.FileNotFoundException: Could not locate semantic_csv__init.class or semantic_csv.clj on classpath. Please check that namespaces with dashes use underscores in the Clojure file name., compiling:(ganges/core.clj:8:1)
:at [clojure.lang.Compiler load Compiler.java 7391]}
{:type java.io.FileNotFoundException
:message Could not locate semantic_csv__init.class or semantic_csv.clj on classpath. Please check that namespaces with dashes use underscores in the Clojure file name.
:at [clojure.lang.RT load RT.java 456]}]
:trace
[[clojure.lang.RT load RT.java 456]
[clojure.lang.RT load RT.java 419]
[clojure.core$load$fn__5677 invoke core.clj 5893]
[clojure.core$load invokeStatic core.clj 5892]
[clojure.core$load doInvoke core.clj 5876]
[clojure.lang.RestFn invoke RestFn.java 408]
[clojure.core$load_one invokeStatic core.clj 5697]
[clojure.core$load_one invoke core.clj 5692]
[clojure.core$load_lib$fn__5626 invoke core.clj 5737]
[clojure.core$load_lib invokeStatic core.clj 5736]
[clojure.core$load_lib doInvoke core.clj 5717]
[clojure.lang.RestFn applyTo RestFn.java 142]
[clojure.core$apply invokeStatic core.clj 648]
[clojure.core$load_libs invokeStatic core.clj 5774]
[clojure.core$load_libs doInvoke core.clj 5758]
[clojure.lang.RestFn applyTo RestFn.java 137]
[clojure.core$apply invokeStatic core.clj 648]
[clojure.core$require invokeStatic core.clj 5796]
[clojure.core$require doInvoke core.clj 5796]
[clojure.lang.RestFn invoke RestFn.java 408]
[ganges.core$eval751 invokeStatic core.clj 8]
[ganges.core$eval751 invoke core.clj 8]
[clojure.lang.Compiler eval Compiler.java 6927]
[clojure.lang.Compiler load Compiler.java 7379]
[clojure.lang.RT loadResourceScript RT.java 372]
[clojure.lang.RT loadResourceScript RT.java 363]
[clojure.lang.RT load RT.java 453]
[clojure.lang.RT load RT.java 419]
[clojure.core$load$fn__5677 invoke core.clj 5893]
[clojure.core$load invokeStatic core.clj 5892]
[clojure.core$load doInvoke core.clj 5876]
[clojure.lang.RestFn invoke RestFn.java 408]
[clojure.core$load_one invokeStatic core.clj 5697]
[clojure.core$load_one invoke core.clj 5692]
[clojure.core$load_lib$fn__5626 invoke core.clj 5737]
[clojure.core$load_lib invokeStatic core.clj 5736]
[clojure.core$load_lib doInvoke core.clj 5717]
[clojure.lang.RestFn applyTo RestFn.java 142]
[clojure.core$apply invokeStatic core.clj 648]
[clojure.core$load_libs invokeStatic core.clj 5774]
[clojure.core$load_libs doInvoke core.clj 5758]
[clojure.lang.RestFn applyTo RestFn.java 137]
[clojure.core$apply invokeStatic core.clj 648]
[clojure.core$require invokeStatic core.clj 5796]
[clojure.core$require doInvoke core.clj 5796]
[clojure.lang.RestFn invoke RestFn.java 408]
[user$eval5 invokeStatic form-init7900099664318909146.clj 1]
[user$eval5 invoke form-init7900099664318909146.clj 1]
[clojure.lang.Compiler eval Compiler.java 6927]
[clojure.lang.Compiler eval Compiler.java 6916]
[clojure.lang.Compiler eval Compiler.java 6916]
[clojure.lang.Compiler load Compiler.java 7379]
[clojure.lang.Compiler loadFile Compiler.java 7317]
[clojure.main$load_script invokeStatic main.clj 275]
[clojure.main$init_opt invokeStatic main.clj 277]
[clojure.main$init_opt invoke main.clj 277]
[clojure.main$initialize invokeStatic main.clj 308]
[clojure.main$null_opt invokeStatic main.clj 342]
[clojure.main$null_opt invoke main.clj 339]
[clojure.main$main invokeStatic main.clj 421]
[clojure.main$main doInvoke main.clj 384]
[clojure.lang.RestFn invoke RestFn.java 421]
[clojure.lang.Var invoke Var.java 383]
[clojure.lang.AFn applyToHelper AFn.java 156]
[clojure.lang.Var applyTo Var.java 700]
[clojure.main main main.java 37]]}
When you specify both :keyify
and :header
options to mappify
, then :keyify
seems to be ignored: the output maps are always produced with "symbolized" column names.
Btw, the source code for "mappify" shown at http://metasoarous.github.io/semantic-csv/ actually works fine, but the actual (transducer-based) implementation has the above issue.
I found it weird and undocomunted but with the following malformed CSV (first two lines have less commas than the header)
sepal_length,sepal_width,petal_length,petal_width,label,id,test_train
5.8,4,1.2,0.2,15
4.8,3,1.4,
4.3,3,1.1,0.1,Iris-setosa,14,train
Using the mappify
method will produce the following:
{:sepal_length "5.8", :sepal_width "4", :petal_length "1.2", :petal_width "0.2", :label "15"}
{:sepal_length "4.8", :sepal_width "3", :petal_length "1.4", :petal_width ""}
{:sepal_length "4.3", :sepal_width "3", :petal_length "1.1", :petal_width "0.1", :label "Iris-setosa", :id "14", :test_train "train"}
As you can see some rows are smaller than others, totally missing from the mappified results. I was expecting that all rows will have the same size, with nil
values when something is missing from the CSV.
Bare in mind that using {:structs true}
will produce the expected results:
{:sepal_length "5.8", :sepal_width "4", :petal_length "1.2", :petal_width "0.2", :label "15", :id nil, :test_train nil}
{:sepal_length "4.8", :sepal_width "3", :petal_length "1.4", :petal_width "", :label nil, :id nil, :test_train nil}
{:sepal_length "4.3", :sepal_width "3", :petal_length "1.1", :petal_width "0.1", :label "Iris-setosa", :id "14", :test_train "train"}
I have some other issues using structs but I will probably open another issue when I can get a reproducible environment.
I'll open up a pull request with a fix I have made in order to fix this.
A csv file titled "example.csv" that contains:
first-column,second-column
X,1
A clojure file called example/csv.clj
that contains:
(ns example.csv
(:require [semantic-csv.core :as sc]))
(defn manually-cast-columns
[col]
(assoc col
:first-column (sc/->int (:first-column col))
:second-column (sc/->int (:second-column col))))
(defn process-with-cast-fns []
(sc/slurp-csv "example.csv"
:cast-fns {:first-column sc/->int
:second-column sc/->int}))
(defn process-manually []
(->> (sc/slurp-csv "example.csv")
(mapv manually-cast-columns)))
(defn -main
[& args]
(case (first args)
"cast" (process-with-cast-fns)
"man" (process-manually)))
If I have the malformed csv file above (an "X" in a column that is supposed to be a number) and call process-with-cast-fns
, it will fail. The stack trace outputs a NullPointerException, and the line indicating where the failure happened is just the slurp-csv
call.
Caused by: java.lang.NullPointerException
at clojure.core$partial$fn__5561.invoke(core.clj:2616)
at clojure.lang.AFn.applyToHelper(AFn.java:154)
at clojure.lang.RestFn.applyTo(RestFn.java:132)
at clojure.core$apply.invokeStatic(core.clj:659)
at clojure.core$update_in$up__6562.invoke(core.clj:6105)
at clojure.core$update_in.invokeStatic(core.clj:6106)
at clojure.core$update_in.doInvoke(core.clj:6092)
at clojure.lang.RestFn.invoke(RestFn.java:445)
at semantic_csv.impl.core$row_val_caster$fn__250.invoke(core.cljc:43)
at clojure.core.protocols$iter_reduce.invokeStatic(protocols.clj:49)
at clojure.core.protocols$fn__7841.invokeStatic(protocols.clj:75)
at clojure.core.protocols$fn__7841.invoke(protocols.clj:75)
at clojure.core.protocols$fn__7781$G__7776__7794.invoke(protocols.clj:13)
at clojure.core$reduce.invokeStatic(core.clj:6748)
at clojure.core$reduce.invoke(core.clj:6730)
at semantic_csv.impl.core$cast_row.invokeStatic(core.cljc:62)
at semantic_csv.impl.core$cast_row.doInvoke(core.cljc:46)
at clojure.lang.RestFn.invoke(RestFn.java:521)
at semantic_csv.transducers$cast_with$fn__333$fn__334.invoke(transducers.cljc:167)
at semantic_csv.transducers$mappify$fn__313$fn__314.invoke(transducers.cljc:42)
at clojure.core$filter$fn__5610$fn__5611.invoke(core.clj:2798)
at clojure.lang.TransformerIterator.step(TransformerIterator.java:79)
at clojure.lang.TransformerIterator.hasNext(TransformerIterator.java:97)
at clojure.lang.RT.chunkIteratorSeq(RT.java:510)
at clojure.core$sequence.invokeStatic(core.clj:2654)
at clojure.core$sequence.invoke(core.clj:2639)
at semantic_csv.core$parse_and_process.invokeStatic(core.cljc:285)
at semantic_csv.core$parse_and_process.doInvoke(core.cljc:277)
at clojure.lang.RestFn.invoke(RestFn.java:439)
at clojure.core$partial$fn__5561.invoke(core.clj:2617)
at clojure.lang.AFn.applyToHelper(AFn.java:156)
at clojure.lang.RestFn.applyTo(RestFn.java:132)
at clojure.core$apply.invokeStatic(core.clj:657)
at clojure.core$apply.invoke(core.clj:652)
at semantic_csv.impl.core$apply_kwargs.invokeStatic(core.cljc:17)
at semantic_csv.impl.core$apply_kwargs.doInvoke(core.cljc:13)
at clojure.lang.RestFn.invoke(RestFn.java:439)
at semantic_csv.core$slurp_csv.invokeStatic(core.cljc:305)
at semantic_csv.core$slurp_csv.doInvoke(core.cljc:298)
at clojure.lang.RestFn.invoke(RestFn.java:439)
at example.csv$process_with_cast_fns.invokeStatic(core.clj:11)
at example.csv$process_with_cast_fns.invoke(core.clj:10)
at example.csv$_main.invokeStatic(core.clj:22)
at example.csv$_main.doInvoke(core.clj:19)
at clojure.lang.RestFn.invoke(RestFn.java:408)
at clojure.lang.Var.invoke(Var.java:381)
at user$eval149.invokeStatic(form-init6465213592991168308.clj:1)
at user$eval149.invoke(form-init6465213592991168308.clj:1)
at clojure.lang.Compiler.eval(Compiler.java:7062)
at clojure.lang.Compiler.eval(Compiler.java:7052)
at clojure.lang.Compiler.load(Compiler.java:7514)
... 12 more
If I call process-manually
, it will fail as well. The error that is thrown is instead a java.lang.NumberFormatException
, and the line indicating where the failure happened points directly to the correct sc/->int
call.
Caused by: java.lang.NumberFormatException: For input string: "X"
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
at java.lang.Double.parseDouble(Double.java:538)
at semantic_csv.casters$__GT_int.invokeStatic(casters.cljc:41)
at semantic_csv.casters$__GT_int.invoke(casters.cljc:31)
at semantic_csv.casters$__GT_int.invokeStatic(casters.cljc:38)
at semantic_csv.casters$__GT_int.invoke(casters.cljc:31)
at example.csv$manually_cast_columns.invokeStatic(core.clj:7)
at example.csv$manually_cast_columns.invoke(core.clj:4)
at clojure.core$mapv$fn__8088.invoke(core.clj:6832)
at clojure.lang.ArrayChunk.reduce(ArrayChunk.java:58)
at clojure.core.protocols$fn__7847.invokeStatic(protocols.clj:136)
at clojure.core.protocols$fn__7847.invoke(protocols.clj:124)
at clojure.core.protocols$fn__7807$G__7802__7816.invoke(protocols.clj:19)
at clojure.core.protocols$seq_reduce.invokeStatic(protocols.clj:31)
at clojure.core.protocols$fn__7835.invokeStatic(protocols.clj:75)
at clojure.core.protocols$fn__7835.invoke(protocols.clj:75)
at clojure.core.protocols$fn__7781$G__7776__7794.invoke(protocols.clj:13)
at clojure.core$reduce.invokeStatic(core.clj:6748)
at clojure.core$mapv.invokeStatic(core.clj:6823)
at clojure.core$mapv.invoke(core.clj:6823)
at example.csv$process_manually.invokeStatic(core.clj:17)
at example.csv$process_manually.invoke(core.clj:15)
at example.csv$_main.invokeStatic(core.clj:23)
at example.csv$_main.doInvoke(core.clj:19)
at clojure.lang.RestFn.invoke(RestFn.java:408)
at clojure.lang.Var.invoke(Var.java:381)
at user$eval149.invokeStatic(form-init6579145103885863466.clj:1)
at user$eval149.invoke(form-init6579145103885863466.clj:1)
at clojure.lang.Compiler.eval(Compiler.java:7062)
at clojure.lang.Compiler.eval(Compiler.java:7052)
at clojure.lang.Compiler.load(Compiler.java:7514)
... 12 more
This is a contrived example to show the difference, but in my actual app, I have 20+ columns and hundreds of row, so figuring out exactly which column is failing is much more difficult. I suspect it's much faster to use :cast-fns
, but the errors returned by the stack traces from manual casting are way more helpful, as they point directly to the function call where I attempt to cast bad input and display the given bad value.
I don't know what's possible in this library, but it would be very helpful if somehow the :cast-fns
processing could return the raw exceptions instead of swallowing them and outputting only an NPE with no indication of which cast-fn or which value it failed on.
Thanks so much.
Just wanted to clarify if it is a wontfix. Tested with https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=2ahUKEwiS68eHqtXoAhUCJKwKHVsdBEQQFjAAegQIBRAB&url=https%3A%2F%2Fwww.uscis.gov%2Fsystem%2Ffiles_force%2Ffiles%2Fform%2Fi-864-pc.pdf%3Fdownload%3D1&usg=AOvVaw0e6nfkkBM2ozKOqWJSg_aO
(semantic-csv.core/slurp-csv "i-864-pc.pdf")
=>
({:%PDF-1.7
%���� "2 0 obj"}
{:%PDF-1.7
%���� "<</Filter/FlateDecode/Length 3312>>stream"}
{:%PDF-1.7
%����
"�[N]����\b�E�/�'@��4<'�;�8��������r�Xte�fo$PR��0ӥ?kh��h��/��M���._q$ne����]�^�H\\s-C�6տ.[�٥���lB���\"��`I!�.����|����ӗ��z�[˒��~E{8��>�\\阣��$�bp��\bR�l��G�E\b����_��~EpQ��b�U���"}
{:%PDF-1.7
%����
"㭥�a_�~ߕ��60����A�j�����M�$�X��*�3˒����WI����B�x�������l�c�d���unRd��k���z��&x�w��Y���[���E*F�C����#�PvPS 8�������sZ\\�j����LW�Qѯ��<�!���G�u�����)��E�\b���n"}
Is that possible?
If I have this data:
(def data [{:id 1 :tags ["tag1" "tag2"]} {:id 2 :tags ["tag3" "tag4"]}])
I want this csv output
-------------------
| id | tags |
-------------------
| 1 | tag1, tag2 |
-------------------
| 2 | tag3, tag4 |
-------------------
Could not find it in the docs.
I am a complete clj-newb, otherwise I would have made a PR.
I guess I could use the regular csv-reader and mappify, but would love the slurpers and spitters to accept different (input/output) separators.
(The divine (python) pandas library even accepts all regexes as separators - like \s+
to read both spaces and tabs as a separator, but that might be hard to implement here, dunno...)
If you have multiple files with header/column-name data in different orders in the various files, tools like csvstack
from python's csvkit
fail to handle things gracefully. I already have a running sketch working in another project but there are still some questions about what this should look like in here. Also, as implemented in my project, it's not lazy, and there is possibly a little bit of thinking there about the right way to do this (some testing is required...) due to the way Clojure deals with caching lazy sequences.
It would be awesome to have the various processing functions available as transducers. Not sure what the best way of doing this is without abandoning support for older Clojures, but it's something worth looking into. Also, some question of whether it should just use it's own namespace, or whether it should do some sort of argument dispatch.
As pointed out in the README, would be nice to write (:more row)
as apposed to (get row "more")
.
Main advantage is that structs preserve order information, so don't have to worry about header order changing on input/output so much. A case of only adding information? Thus growth?
I really found the :transform-header
but was going nuts trying to get it to work. It appears the version on Clojars is really old despite having the same version number as master
. Any chance of bumping and deploying an updated version? Thanks.
Would look through and make a guess about what type each column should be. Should be triggered via a :sniff true
flag, and be overrideable by casters specified in :cast-fns
.
I recently found myself needing and writing a function to make headers a consistent case for key lookups in a CSV I don't have control over. I'm not sure if it's something that fits into the library itself but I'm happy to submit a patch if it sounds useful.
This actually isn't on us but rather on marginalia. I've created an issue for this here: clj-commons/marginalia#164. If this doesn't get fixed we'll need to look for alternatives.
Hi,
I don't know if the current behaviour is the expected behaviour and that my expectation is wrong, or if this is a bug :) , but when I use spit-csv with a custom header (via :header ) and :prepend-header true, I would expect that the written file always gets a header, even if there are no rows.
The current behaviour is that if there are no rows, no header is written even if a :header is given and :prepend-header is true
I can write a workaround in my code, but I wonder if this should be fixed in spit-csv instead?
This should be along the lines of python's csv.DictWriter
, but also have an option for the positional writing.
Should be able to do something like
(let [data [{:this "a" :that "b"}
{:this "c" :that "d"}
writer (csv/writer "filename.csv" :columns [:this :that])]
(doseq [x (data)]
(csv/write! x)))
Positional writing could actually be triggered by not specifying :columns
. Missing entries should be handled gracefully, but perhaps with options for raising in case of such happening.
Will need to load a csv parser. Papa parse seems popular in js land, and it's in cljsjs repos. I tried to get it to work on another project once, but couldn't quite figure out the whole cljsjs thing (could have tried harder I'm sure).
This should probably be defined for both clj and cljs, since why not.
(extracted from #45)
We have a CSV table which contains lines (incl. the header) like:
name ,key
It appears that semantic-csv currently turns 'name '
into ':name '
, i.e. a keyword containing spaces. It would be nice if there was a way to apply some :cast-fn
to the header, too, before it is going to be keywordized.
Some questions about how this would work:
Value is that this would allow you to write out data in the columnar order it came in with.
Using regular expressions is perhaps a bit unnecessary most of the time. Would be nice to just specify a single character.
With vectors, maps and possibly structs (see #8) as possible row types, it might be nice to have some protocols around which various row operations can be defined for clarity. With this could even allow lists/sequences (though these would have poor performance characteristics for some things, could be better than erroring out on a get
somewhere).
@mahinshaw I just found this note I left myself... Not sure if it still makes sense or not but here it is:
There was this chunk of code in the transducer work:
(transduce (comp (batch batch-size)
(map #(impl/apply-kwargs csv/write-csv % writer-opts)))
(fn [w rowstr]
(.write w rowstr)
w)
file
vect-rows)))))
Should this be abstracted into something more general? Does this make any sense?
@mahinshaw The tests are failing for transducer/spit-csv
now... having a bit of trouble figuring out why. Would you mind taking a look?
BTW, I pushed some changes to the implementation there that tries to compose all of the processing into a single transducer: 7068082. The tests were failing even prior to this though.
This would just be a function applied to each one of the headers column names to get a new column name. Returning nil
should leave unaltered, so that a map could be used easily for transforming a subset of the header names. There is some question here though about how this would play with the :keywordify
option.
Would be good to have tests for the relevant cljs code using doo, cljs.test and whatever else folks use for this sort of thing. I got stuck trying to do this a couple of months ago, so help from someone who's done it before would be super appreciated.
(extracted from #45)
We're assuming Clojure 1.7 now anyway to get transducers, so reader conditionals may as well come along for the ride!
For now, we'll probably leave the slurp and spit functions out, since we won't have access to clojure-csv
from cljs. And async will make things a little weird. But would be nice to eventually offer something in this realm.
Maybe markdown formatting; should just return a string that can be printed. Maybe maybe have a print-table
function for convenience as well.
If you have CSV data without a header in it, you might want to specify a header manually, and not consume the first row of the actual data (since it's not a header). Right now this could be done with
(->> ...
(cons ["the" "header" "row"])
(mappify))
So maybe it's not something worth worrying about. Could alias cons to add-header
though, so folks don't have to think about it.
It would be nice to have something like slurp-async
(see #54) that puts each row one by one on a go channel, perhaps using the transducers and pipeline
. I don't want to make a hard requirement on core.async
though, so we should follow the route of mpdairy/posh's Reagent plugin, so that the relevant function(s)/ns(s) are only defined if core.async
is available. I don't know if this means a separate ns or not; Probably best not to define one unless necessary (or unless we realize there are enough such helpers that it actually makes sense to, which I doubt will be the case, since a lot of the logic gets generalized nicely with transducers).
(extracted from #45)
Currently ->int & co. die on empty strings & nils making them impractical to work with sparse data. Is this by design or do you accept patches to make them more robust?
Currently they use Integer/parseInt and Float/parseFloat, which only work for strings. These should work on numerics as well.
We have a really cool semantic-csv.impl.core/clone-var
macro that expands to a def which refers to the original var, but copied over the docstring, argslist and everything. I don't have the patience right now to get something that does "the best we can" in making this cross compatible with cljs, because portable cljs macros are nutty af. I did a little work in this direction though, and here are some of the things I've come up with:
This would maybe be the most conservative thing, and seems to work for the clj side at least. I don't yet know however if the cljs part works (either ClojureScript JVM or ClojureScript JS). There's a good chance it may not for the reasons mentioned in the portable cljs blog post (the reader cond is expanding at macro compile time, so thing may not work quite right there for JVM targeting JS, but JS targeting JS might).
(defmacro clone-var
"Clone the var pointed to by fsym into current ns such that arglists, name and doc metadata are preserned."
[fsym]
#?(:clj
(let [v (resolve fsym)
m (subset-map (meta v) [:arglists :name :doc])
m (update m :arglists (fn [arglists] (list 'quote arglists)))]
`(def ~(vary-meta (:name m) (constantly m)) ~fsym))
:cljs
`(def ~(symbol (name fsym)) ~fsym)))
There's also the following, which tries to use the tricks from the above blog post, but it isn't even working for clj as is. And the blog post is kind of focused on a particular part of the general problem which is getting the vars to resolve properly.
(defmacro clone-var
"Clone the var pointed to by fsym into current ns such that arglists, name and doc metadata are preserned."
[fsym]
`(if-cljs
~(let [v (resolve fsym)
m (subset-map (meta v) [:arglists :name :doc])
m (update m :arglists (fn [arglists] (list 'quote arglists)))]
`(def ~(vary-meta (:name m) (constantly m)) ~fsym))
`(def ~(symbol (name fsym)) ~fsym)))
I'm not really interested in optimizing this any more until we get some cljs testing set up with doo or whatever (see #56). For now I'm just going to do a top level reader conditional that decides whether to manually call (def the-var path.to.actual/the-var)
a bunch of times, or use the macro. The only downside is loosing docstrings for cljs, which isn't the end of the world.
(Relates to #45)
Right now, the casting functions fail if anything goes wrong. It would be nice to handle things more gracefully. Here's what I'm thinking:
:exception-handler (fn [colname value] ...)
that can do whatever it wants and return a value to be used in the output for that row/colnil
) to return when parsing failsNote that 3) could be accomplished with 2), and 4) could be implemented with either 2) or 3) together with a secondary filter step. As such these are somewhat less high in priority and may have to wait.
Won't be lazy. Should probably behave somewhat like mappify
, in assuming first row to be header, while accepting a :header
opt to be more manual about it. Should probably also accept a seq of maps or arrays. Maybe call datasetify
?
Only makes sense for spit, since otherwise users should just use file mode to make sense of things. But for spit, would be nice if in addition to the obvious thing we could read in the existing file's header and do the right thing accordingly.
It's what I keep thinking; but I feel funny about having only that one argument map differently (all the others map directly to the callout function arguments).
Removing columns is pretty easy with maps (->> ... (map #(dissoc % :this :that)) ...)
, but subsetting isn't a one liner. And even removing positionally on vector rows isn't immediately obvious. So it would be nice to have some abstractions to take care of this.
May not make sense if a file handle is passed in
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.