Giter Club home page Giter Club logo

jsontables.jl's People

Contributors

attdona avatar bluesmoon avatar juliatagbot avatar quinnj avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

jsontables.jl's Issues

Problem with missing/nothing and copying

It is best shown on an example

julia> using DataFrames, JSONTables

julia> x = DataFrame(A=[true, false, true], B=[1, 2, missing],
                     C=[missing, "b", "c"], D=['a', missing, 'c'])
3×4 DataFrame
│ Row │ A     │ B       │ C       │ D       │
│     │ Bool  │ Int64⍰  │ String⍰ │ Char⍰   │
├─────┼───────┼─────────┼─────────┼─────────┤
│ 1   │ true  │ 1       │ missing │ 'a'     │
│ 2   │ false │ 2       │ b       │ missing │
│ 3   │ true  │ missing │ c       │ 'c'     │

julia> s1 = arraytable(x)
"[{\"A\":true,\"B\":1,\"C\":null,\"D\":\"a\"},{\"A\":false,\"B\":2,\"C\":\"b\",\"D\":null},{\"A\":true,\"B\":null,\"C\":\"c\",\"D\":\"c\"}]"

julia> s2 = objecttable(x)
"{\"A\":[true,false,true],\"B\":[1,2,null],\"C\":[null,\"b\",\"c\"],\"D\":[\"a\",null,\"c\"]}"

julia> j1 = jsontable(s1)
JSONTables.Table{false,JSON3.Array{JSON3.Object,Base.CodeUnits{UInt8,String},Array{UInt64,1}}}(JSON3.Object[{
   "A": true,
   "B": 1,
   "C": nothing,
   "D": "a"
}, {
   "A": false,
   "B": 2,
   "C": "b",
   "D": nothing
}, {
   "A": true,
   "B": nothing,
   "C": "c",
   "D": "c"
}])

julia> j2 = jsontable(s2)
JSONTables.Table{true,JSON3.Object{Base.CodeUnits{UInt8,String},Array{UInt64,1}}}({
   "A": [
          true,
          false,
          true
        ],
   "B": [
          1,
          2,
          nothing
        ],
   "C": [
          nothing,
          "b",
          "c"
        ],
   "D": [
          "a",
          nothing,
          "c"
        ]
})

julia> DataFrame(j1)
3×4 DataFrame
│ Row │ A     │ B      │ C      │ D      │
│     │ Bool  │ Union… │ Union… │ Union… │
├─────┼───────┼────────┼────────┼────────┤
│ 1   │ true  │ 1      │        │ a      │
│ 2   │ false │ 2      │ b      │        │
│ 3   │ true  │        │ c      │ c      │

julia> DataFrame(j2)
ERROR: MethodError: no method matching copy(::Nothing)

And there are two issues:

  1. missing gets converted to nothing in the write-read back process
  2. later you cannot materialize it with DataFrame constructor as internally a copy of the read-in vector is attempted and it fails as it contains nothing - but only in objecttable case

CC @quinnj

Escape sequences

\/ escape is not handled correctly I think. E.g.

julia> JSON.parse("{\"a\":\"b\\/c\"}")
Dict{String,Any} with 1 entry:
  "a" => "b/c"

but

julia> JSONTables.jsontable("{\"a\":\"b\\/c\"}")
JSONTables.Table{true,JSON3.Object{Base.CodeUnits{UInt8,String},Array{UInt64,1}}}(Error showing value of type JSONTables.Table{true,JSON3.Object{Base.CodeUnits{UInt8,String},Array{UInt64,1}}}:
ERROR: ArgumentError: encountered invalid escape character in json string: "b\/c"

(it would be great to get it fixed and tagged, in mid August I am running a workshop in which I will use JSON files containing such escapes).

Thank you!

Error converting from JSONTables to DataFrame

Hi,

Thanks for writing this package. Right now I am an encountering this error while trying the example in README.md

julia> jtable=jsontable(text);                                                                       
julia> df=DataFrame(jtable)                                                                          
ERROR: MethodError: no method matching size(::String)
Closest candidates are:
  size(::BitArray{1}) at bitarray.jl:77
  size(::BitArray{1}, ::Any) at bitarray.jl:81
  size(::Core.Compiler.StmtRange) at show.jl:1561
  ...

text here is a String of the json source. Not sure if it has something to do with DataFrames. Also my package versions are

(v1.1) pkg> status JSONTables                                                                        
    Status `~/.julia/environments/v1.1/Project.toml`
  [0f8b85d8] JSON3 v0.1.8
  [b9914132] JSONTables v0.1.2
  [bd369af6] Tables v0.2.11

(v1.1) pkg> status DataFrames                                                                        
    Status `~/.julia/environments/v1.1/Project.toml`
  [34da2185] Compat v2.1.0
  [a93c6f00] DataFrames v0.19.3
  [bd369af6] Tables v0.2.11

Thanks.

Unexpected result on heterogeneous data

I think it would be good to have a better story for heterogeneous data. Both of the following results (which are generated from the same data but where entries are ordered differently) are surprising and can cause problems.

julia> using JSONTables, DataFrames

julia> json_a = """[
       {"timea": 1585154193000, "troublemaker": 97},
       {"timea": 1310044361000}
       ]""";

julia> json_b = """[
       {"timea": 1310044361000},
       {"timea": 1585154193000,"troublemaker": 97}
       ]""";

julia> DataFrame(jsontable(json_a)) # throws error
ERROR: KeyError: key :troublemaker not found
Stacktrace:
 [1] get(::JSON3.Object{Base.CodeUnits{UInt8,String},SubArray{UInt64,1,Array{UInt64,1},Tuple{UnitRange{Int64}},true}}, ::Symbol) at /home/gerhard/.julia/packages/JSON3/YGLA7/src/JSON3.jl:53
...

julia> DataFrame(jsontable(json_b)) # looses troublemaker silently
2×1 DataFrame
│ Row │ timea         │
│     │ Int64         │
├─────┼───────────────┤
│ 11310044361000 │
│ 21585154193000

What I would have expected jsontable to produce:

julia> using JSON3

julia> reduce((x, y) -> append!(x, y;cols=:union), JSON3.read(json_a);init=DataFrame())
2×2 DataFrame
│ Row │ timea         │ troublemaker │
│     │ Int64         │ Int64?       │
├─────┼───────────────┼──────────────┤
│ 1   │ 1585154193000 │ 97           │
│ 2   │ 1310044361000 │ missing      │

julia> reduce((x, y) -> append!(x, y;cols=:union), JSON3.read(json_b);init=DataFrame())
2×2 DataFrame
│ Row │ timea         │ troublemaker │
│     │ Int64         │ Int64?       │
├─────┼───────────────┼──────────────┤
│ 1   │ 1310044361000 │ missing      │
│ 2   │ 1585154193000 │ 97           │

If this is not possible or desired at least the documentation should include a clear warning about what to expect.

Thx!

Recovering the input table from a JSONified table

Hi there,

I have been using some code for transforming a DataFrame to JSON and back again, with the requirement that the de-JSONified DataFrame is an exact copy of the input DataFrame, eltypes and all. I'd like to make this code public, and see that this package has the same purpose but doesn't preserve types. Can we combine our efforts?

My code is below (...credit where it's due, this was written by Josh Bode).

Cheers,
Jock

#=
Given `data::DataFrame`:
- Convert it to JSON:      `x = JSON.json(data)`
- Parse it back out again: `data2 = convert(DataFrame, JSON.parse(x))`
- data2 is element-wise equal to data
=#

################################################################################
# Convert a DataFrame to JSON

JSON.lower(x::Enum) = string(x)
JSON.lower(::Missing) = Vector{Union{Missing,Any}}()
JSON.lower(x::Complex) = [real(x), imag(x)]
JSON.lower(x::Set) = collect(x)

JSON.lower(x::DataFrames.DataFrame) = Dict{String, Vector{Any}}(
    "names" => DataFrames.names(x),
    "types" => DataFrames.eltypes(x),
    "columns" => DataFrames.columns(x)
)
JSON.lower(x::DataFrames.SubDataFrame) = JSON.lower(x[:])

################################################################################
# Convert data to a DataFrame, where data is parsed from JSON.
# Some data types need an explicit converter
function Base.convert(::Type{T}, x::AbstractString) where {T <: Union{Date, DateTime}}
    T(x)
end

Base.convert(::Type{Char}, x::AbstractString) = x[1]

function Base.convert(::Type{Set{T}}, x::AbstractVector) where T
    Set{T}(x)
end

function Base.convert(::Type{DataFrame}, x::Dict{String, Any})
    names, types, columns = try
        x["names"], x["types"], x["columns"]
    catch e
        error("Missing data: $(e.key)")
    end
    result = DataFrame()
    for (name, typename, coldata) in zip(names, types, columns)
        T1 = eval(Meta.parse(typename))  # E.g., Union{Missing, Int64}.
        T2 = Missings.T(T1)              # E.g., Int64
        @assert isconcretetype(T2) || T2 === Any "Not a concrete type"
        n = length(coldata)
        colname = Symbol(name)
        result[colname] = Vector{T1}(undef, n)
        for i = 1:n
            val = coldata[i]
            result[i, colname] = val == nothing ? missing : convert(T2, val)
        end
    end
    result
end

Parsers for custom types can be added. For example, here's one for ZonedDateTime.

using TimeZones

function Base.convert(::Type{TimeZones.ZonedDateTime}, x::AbstractString)
    x, tz = x[1:end-6], x[end-5:end]
    ZonedDateTime(DateTime(x), TimeZones.FixedTimeZone(tz))
end

Support for categorical values

It would be nice to have support for CategoricalValue out of the box since they're commonly used in DataFrames now.
It's only a one liner to add but could throw someone not that familiar with the julia data ecosystem.

StructType(::Type{CategoricalValue{String, T}}) where T = StructTypes.StringType()

Define `filter`

I was searching through some data for a particular row and noticed that [row for row in json_table if pred(row)] is ridiculously fast (much faster than creating a DataFrame from it and then filtering).

It would be nice to be apply a row filter directly via filter(pred, json_table) like you can with a DataFrame or many other table types. Clearly it already supports Iterators.filter so why not Base.filter? :)

Heterogeneous data sometimes detect wrong columns type

There are combinations of heterogeneus data where the wrong column types are discovered.

For example:

using JSONTables
using Tables

nonhomogenous = """
[
    {"a": 1, "b": 2, "c": 3},
    {"b": 4, "c": 8, "d": 5}
]
"""

JSONTables.jsontable(nonhomogenous)

You got:

JSONTables.Table{false, JSON3.Array{JSON3.Object, Base.CodeUnits{UInt8, String}, Vector{UInt64}}}([:a, :b, :c, :d], Dict{Symbol, Type}(:a => Union{Missing, Int64}, :b => Int64, :d => Int64, :c => Int64), JSON3.Object[{
   "a": 1,
   "b": 2,
   "c": 3
}, {
   "b": 4,
   "c": 8,
   "d": 5
}])

d type is detected as Int64 and this throws an error when building a table:

ct = Tables.columntable(jt)
ERROR: MethodError: Cannot `convert` an object of type Missing to an object of type Int64

arraytable significantly slows down when passed table with heterogenous columns

Here is the test code:

using JSONTables, DataFrames

results = DataFrame(cols=Int[], rows=Int[], arraytable=Float64[], objecttable=Float64[])
for cols in (100, 200, 300, 400, 500), rows in (10^3, 10^4, 10^5)
    @show (cols, rows)
    df = DataFrame(ones(rows, cols))
    df[!, 1] .= "a"
    df[!, 2] .= 1
    df[!, 3] .= true
    open(io -> objecttable(io, df), "test.json", "w")
    x1 = @elapsed open(io -> arraytable(io, df), "test.json", "w")
    open(io -> objecttable(io, df), "test.json", "w")
    x2 = @elapsed open(io -> objecttable(io, df), "test.json", "w")
    push!(results, [cols, rows, x1, x2])
end

and here is the benchmark result:

julia> results
15×4 DataFrame
│ Row │ cols  │ rows   │ arraytable │ objecttable │
│     │ Int64 │ Int64  │ Float64    │ Float64     │
├─────┼───────┼────────┼────────────┼─────────────┤
│ 1   │ 100   │ 1000   │ 0.178669   │ 0.0328257   │
│ 2   │ 100   │ 10000  │ 1.65927    │ 0.26272     │
│ 3   │ 100   │ 100000 │ 16.2332    │ 2.36529     │
│ 4   │ 200   │ 1000   │ 0.349468   │ 0.0498271   │
│ 5   │ 200   │ 10000  │ 3.49195    │ 0.595482    │
│ 6   │ 200   │ 100000 │ 34.8853    │ 4.9485      │
│ 7   │ 300   │ 1000   │ 0.547324   │ 0.0803132   │
│ 8   │ 300   │ 10000  │ 5.16746    │ 0.759614    │
│ 9   │ 300   │ 100000 │ 52.3498    │ 7.48296     │
│ 10  │ 400   │ 1000   │ 0.714898   │ 0.104794    │
│ 11  │ 400   │ 10000  │ 6.91257    │ 1.00389     │
│ 12  │ 400   │ 100000 │ 73.8235    │ 11.4878     │
│ 13  │ 500   │ 1000   │ 0.947894   │ 0.146453    │
│ 14  │ 500   │ 10000  │ 10.5129    │ 1.42812     │
│ 15  │ 500   │ 100000 │ 94.0811    │ 13.1131     │

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

arraytable (objecttable) fails when the argument is a nested table

I wanted to see the json file corresponding to a dataframe that has dataframes as elements in one of the columns.

3×3 DataFrame
 Row │ f       oc     sdf
     │ String  Int64  DataFrame
─────┼──────────────────────────────
   1 │ f1          1  3×2 DataFrame
   2 │ f2          2  2×2 DataFrame
   3 │ f3          3  3×2 DataFrame

I have previously verified that

julia> Tables.istable(dfn)
true

but I get this result

julia> objecttable(dfn)
ERROR: ArgumentError: DataFrame doesn't have a defined `StructTypes.StructType`
Stacktrace:
...

I wonder if it is possible to manage this type of structures

if i use jsontable on this string (http://bl.ocks.org/nautat/4085017)

jdata = "[
    {
        \"name\":\"bob\",
        \"salary\":13000,
        \"friends\":[
            {
                \"name\": \"sarah\",
                \"salary\":10000
            },
            {
                \"name\": \"bill\",
                \"salary\":5000
            }
        ]
    },
    {
        \"name\":\"marge\",
        \"salary\":10000,
        \"friends\":[
            {
                \"name\": \"rhonda\",
                \"salary\":10000
            },
            {
                \"name\": \"mike\",
                \"salary\":5000,
                \"hobbies\":[
                    {
                        \"name\":\"surfing\",
                        \"frequency\":10
                    },
                    {
                        \"name\":\"surfing\",
                        \"frequency\":15
                    }
                ]
            }
        ]
    },
    {
        \"name\":\"joe\",
        \"salary\":10000,
        \"friends\":[
            {
                \"name\": \"harry\",
                \"salary\":10000
            },
            {
                \"name\": \"sally\",
                \"salary\":5000
            }
        ]
    }
]"

and then i try to get the dataframe

julia> jsontable(jdata)
JSONTables.Table{false, JSON3.Array{JSON3.Object, Base.CodeUnits{UInt8, String}, Vector{UInt64}}}([:name, :salary, :friends], Dict{Symbol, Type}(:name => String, :salary => Int64, :friends => JSON3.Array{JSON3.Object, Base.CodeUnits{UInt8, String}, SubArray{UInt64, 1, Vector{UInt64}, Tuple{UnitRange{Int64}}, true}}), JSON3.Object[{
...



julia> DataFrame(jsontable(jdata))
3×3 DataFrame
 Row │ name    salary  friends
     │ String  Int64   Array…
─────┼───────────────────────────────────────────────────
   1 │ bob      13000  JSON3.Object[{\n     "name": "sa…
   2 │ marge    10000  JSON3.Object[{\n     "name": "rh…
   3 │ joe      10000  JSON3.Object[{\n     "name": "ha…

Export policy

objecttable and arraytable are currently not exported.

JSON writing API

objecttable and arraytable should allow passing a stream (and filename as a shorthand) to write into. Now they always produce a string which if not optimal.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.