juliadata / jsontables.jl Goto Github PK
View Code? Open in Web Editor NEWJSON3.jl + Tables.jl
License: MIT License
JSON3.jl + Tables.jl
License: MIT License
It is best shown on an example
julia> using DataFrames, JSONTables
julia> x = DataFrame(A=[true, false, true], B=[1, 2, missing],
C=[missing, "b", "c"], D=['a', missing, 'c'])
3×4 DataFrame
│ Row │ A │ B │ C │ D │
│ │ Bool │ Int64⍰ │ String⍰ │ Char⍰ │
├─────┼───────┼─────────┼─────────┼─────────┤
│ 1 │ true │ 1 │ missing │ 'a' │
│ 2 │ false │ 2 │ b │ missing │
│ 3 │ true │ missing │ c │ 'c' │
julia> s1 = arraytable(x)
"[{\"A\":true,\"B\":1,\"C\":null,\"D\":\"a\"},{\"A\":false,\"B\":2,\"C\":\"b\",\"D\":null},{\"A\":true,\"B\":null,\"C\":\"c\",\"D\":\"c\"}]"
julia> s2 = objecttable(x)
"{\"A\":[true,false,true],\"B\":[1,2,null],\"C\":[null,\"b\",\"c\"],\"D\":[\"a\",null,\"c\"]}"
julia> j1 = jsontable(s1)
JSONTables.Table{false,JSON3.Array{JSON3.Object,Base.CodeUnits{UInt8,String},Array{UInt64,1}}}(JSON3.Object[{
"A": true,
"B": 1,
"C": nothing,
"D": "a"
}, {
"A": false,
"B": 2,
"C": "b",
"D": nothing
}, {
"A": true,
"B": nothing,
"C": "c",
"D": "c"
}])
julia> j2 = jsontable(s2)
JSONTables.Table{true,JSON3.Object{Base.CodeUnits{UInt8,String},Array{UInt64,1}}}({
"A": [
true,
false,
true
],
"B": [
1,
2,
nothing
],
"C": [
nothing,
"b",
"c"
],
"D": [
"a",
nothing,
"c"
]
})
julia> DataFrame(j1)
3×4 DataFrame
│ Row │ A │ B │ C │ D │
│ │ Bool │ Union… │ Union… │ Union… │
├─────┼───────┼────────┼────────┼────────┤
│ 1 │ true │ 1 │ │ a │
│ 2 │ false │ 2 │ b │ │
│ 3 │ true │ │ c │ c │
julia> DataFrame(j2)
ERROR: MethodError: no method matching copy(::Nothing)
And there are two issues:
missing
gets converted to nothing
in the write-read back processDataFrame
constructor as internally a copy of the read-in vector is attempted and it fails as it contains nothing
- but only in objecttable
caseCC @quinnj
\/
escape is not handled correctly I think. E.g.
julia> JSON.parse("{\"a\":\"b\\/c\"}")
Dict{String,Any} with 1 entry:
"a" => "b/c"
but
julia> JSONTables.jsontable("{\"a\":\"b\\/c\"}")
JSONTables.Table{true,JSON3.Object{Base.CodeUnits{UInt8,String},Array{UInt64,1}}}(Error showing value of type JSONTables.Table{true,JSON3.Object{Base.CodeUnits{UInt8,String},Array{UInt64,1}}}:
ERROR: ArgumentError: encountered invalid escape character in json string: "b\/c"
(it would be great to get it fixed and tagged, in mid August I am running a workshop in which I will use JSON files containing such escapes).
Thank you!
Hi,
Thanks for writing this package. Right now I am an encountering this error while trying the example in README.md
julia> jtable=jsontable(text);
julia> df=DataFrame(jtable)
ERROR: MethodError: no method matching size(::String)
Closest candidates are:
size(::BitArray{1}) at bitarray.jl:77
size(::BitArray{1}, ::Any) at bitarray.jl:81
size(::Core.Compiler.StmtRange) at show.jl:1561
...
text
here is a String
of the json source. Not sure if it has something to do with DataFrames
. Also my package versions are
(v1.1) pkg> status JSONTables
Status `~/.julia/environments/v1.1/Project.toml`
[0f8b85d8] JSON3 v0.1.8
[b9914132] JSONTables v0.1.2
[bd369af6] Tables v0.2.11
(v1.1) pkg> status DataFrames
Status `~/.julia/environments/v1.1/Project.toml`
[34da2185] Compat v2.1.0
[a93c6f00] DataFrames v0.19.3
[bd369af6] Tables v0.2.11
Thanks.
I think it would be good to have a better story for heterogeneous data. Both of the following results (which are generated from the same data but where entries are ordered differently) are surprising and can cause problems.
julia> using JSONTables, DataFrames
julia> json_a = """[
{"timea": 1585154193000, "troublemaker": 97},
{"timea": 1310044361000}
]""";
julia> json_b = """[
{"timea": 1310044361000},
{"timea": 1585154193000,"troublemaker": 97}
]""";
julia> DataFrame(jsontable(json_a)) # throws error
ERROR: KeyError: key :troublemaker not found
Stacktrace:
[1] get(::JSON3.Object{Base.CodeUnits{UInt8,String},SubArray{UInt64,1,Array{UInt64,1},Tuple{UnitRange{Int64}},true}}, ::Symbol) at /home/gerhard/.julia/packages/JSON3/YGLA7/src/JSON3.jl:53
...
julia> DataFrame(jsontable(json_b)) # looses troublemaker silently
2×1 DataFrame
│ Row │ timea │
│ │ Int64 │
├─────┼───────────────┤
│ 1 │ 1310044361000 │
│ 2 │ 1585154193000 │
What I would have expected jsontable
to produce:
julia> using JSON3
julia> reduce((x, y) -> append!(x, y;cols=:union), JSON3.read(json_a);init=DataFrame())
2×2 DataFrame
│ Row │ timea │ troublemaker │
│ │ Int64 │ Int64? │
├─────┼───────────────┼──────────────┤
│ 1 │ 1585154193000 │ 97 │
│ 2 │ 1310044361000 │ missing │
julia> reduce((x, y) -> append!(x, y;cols=:union), JSON3.read(json_b);init=DataFrame())
2×2 DataFrame
│ Row │ timea │ troublemaker │
│ │ Int64 │ Int64? │
├─────┼───────────────┼──────────────┤
│ 1 │ 1310044361000 │ missing │
│ 2 │ 1585154193000 │ 97 │
If this is not possible or desired at least the documentation should include a clear warning about what to expect.
Thx!
Hi there,
I have been using some code for transforming a DataFrame to JSON and back again, with the requirement that the de-JSONified DataFrame is an exact copy of the input DataFrame, eltypes and all. I'd like to make this code public, and see that this package has the same purpose but doesn't preserve types. Can we combine our efforts?
My code is below (...credit where it's due, this was written by Josh Bode).
Cheers,
Jock
#=
Given `data::DataFrame`:
- Convert it to JSON: `x = JSON.json(data)`
- Parse it back out again: `data2 = convert(DataFrame, JSON.parse(x))`
- data2 is element-wise equal to data
=#
################################################################################
# Convert a DataFrame to JSON
JSON.lower(x::Enum) = string(x)
JSON.lower(::Missing) = Vector{Union{Missing,Any}}()
JSON.lower(x::Complex) = [real(x), imag(x)]
JSON.lower(x::Set) = collect(x)
JSON.lower(x::DataFrames.DataFrame) = Dict{String, Vector{Any}}(
"names" => DataFrames.names(x),
"types" => DataFrames.eltypes(x),
"columns" => DataFrames.columns(x)
)
JSON.lower(x::DataFrames.SubDataFrame) = JSON.lower(x[:])
################################################################################
# Convert data to a DataFrame, where data is parsed from JSON.
# Some data types need an explicit converter
function Base.convert(::Type{T}, x::AbstractString) where {T <: Union{Date, DateTime}}
T(x)
end
Base.convert(::Type{Char}, x::AbstractString) = x[1]
function Base.convert(::Type{Set{T}}, x::AbstractVector) where T
Set{T}(x)
end
function Base.convert(::Type{DataFrame}, x::Dict{String, Any})
names, types, columns = try
x["names"], x["types"], x["columns"]
catch e
error("Missing data: $(e.key)")
end
result = DataFrame()
for (name, typename, coldata) in zip(names, types, columns)
T1 = eval(Meta.parse(typename)) # E.g., Union{Missing, Int64}.
T2 = Missings.T(T1) # E.g., Int64
@assert isconcretetype(T2) || T2 === Any "Not a concrete type"
n = length(coldata)
colname = Symbol(name)
result[colname] = Vector{T1}(undef, n)
for i = 1:n
val = coldata[i]
result[i, colname] = val == nothing ? missing : convert(T2, val)
end
end
result
end
Parsers for custom types can be added. For example, here's one for ZonedDateTime
.
using TimeZones
function Base.convert(::Type{TimeZones.ZonedDateTime}, x::AbstractString)
x, tz = x[1:end-6], x[end-5:end]
ZonedDateTime(DateTime(x), TimeZones.FixedTimeZone(tz))
end
It would be nice to have support for CategoricalValue
out of the box since they're commonly used in DataFrames now.
It's only a one liner to add but could throw someone not that familiar with the julia data ecosystem.
StructType(::Type{CategoricalValue{String, T}}) where T = StructTypes.StringType()
I was wondering if there could be additional methods defined for JSONTables.Table
Say:
append!
size
I was searching through some data for a particular row and noticed that [row for row in json_table if pred(row)]
is ridiculously fast (much faster than creating a DataFrame
from it and then filtering).
It would be nice to be apply a row filter directly via filter(pred, json_table)
like you can with a DataFrame
or many other table types. Clearly it already supports Iterators.filter
so why not Base.filter
? :)
There are combinations of heterogeneus data where the wrong column types are discovered.
For example:
using JSONTables
using Tables
nonhomogenous = """
[
{"a": 1, "b": 2, "c": 3},
{"b": 4, "c": 8, "d": 5}
]
"""
JSONTables.jsontable(nonhomogenous)
You got:
JSONTables.Table{false, JSON3.Array{JSON3.Object, Base.CodeUnits{UInt8, String}, Vector{UInt64}}}([:a, :b, :c, :d], Dict{Symbol, Type}(:a => Union{Missing, Int64}, :b => Int64, :d => Int64, :c => Int64), JSON3.Object[{
"a": 1,
"b": 2,
"c": 3
}, {
"b": 4,
"c": 8,
"d": 5
}])
d
type is detected as Int64
and this throws an error when building a table:
ct = Tables.columntable(jt)
ERROR: MethodError: Cannot `convert` an object of type Missing to an object of type Int64
Here is the test code:
using JSONTables, DataFrames
results = DataFrame(cols=Int[], rows=Int[], arraytable=Float64[], objecttable=Float64[])
for cols in (100, 200, 300, 400, 500), rows in (10^3, 10^4, 10^5)
@show (cols, rows)
df = DataFrame(ones(rows, cols))
df[!, 1] .= "a"
df[!, 2] .= 1
df[!, 3] .= true
open(io -> objecttable(io, df), "test.json", "w")
x1 = @elapsed open(io -> arraytable(io, df), "test.json", "w")
open(io -> objecttable(io, df), "test.json", "w")
x2 = @elapsed open(io -> objecttable(io, df), "test.json", "w")
push!(results, [cols, rows, x1, x2])
end
and here is the benchmark result:
julia> results
15×4 DataFrame
│ Row │ cols │ rows │ arraytable │ objecttable │
│ │ Int64 │ Int64 │ Float64 │ Float64 │
├─────┼───────┼────────┼────────────┼─────────────┤
│ 1 │ 100 │ 1000 │ 0.178669 │ 0.0328257 │
│ 2 │ 100 │ 10000 │ 1.65927 │ 0.26272 │
│ 3 │ 100 │ 100000 │ 16.2332 │ 2.36529 │
│ 4 │ 200 │ 1000 │ 0.349468 │ 0.0498271 │
│ 5 │ 200 │ 10000 │ 3.49195 │ 0.595482 │
│ 6 │ 200 │ 100000 │ 34.8853 │ 4.9485 │
│ 7 │ 300 │ 1000 │ 0.547324 │ 0.0803132 │
│ 8 │ 300 │ 10000 │ 5.16746 │ 0.759614 │
│ 9 │ 300 │ 100000 │ 52.3498 │ 7.48296 │
│ 10 │ 400 │ 1000 │ 0.714898 │ 0.104794 │
│ 11 │ 400 │ 10000 │ 6.91257 │ 1.00389 │
│ 12 │ 400 │ 100000 │ 73.8235 │ 11.4878 │
│ 13 │ 500 │ 1000 │ 0.947894 │ 0.146453 │
│ 14 │ 500 │ 10000 │ 10.5129 │ 1.42812 │
│ 15 │ 500 │ 100000 │ 94.0811 │ 13.1131 │
This issue is used to trigger TagBot; feel free to unsubscribe.
If you haven't already, you should update your TagBot.yml
to include issue comment triggers.
Please see this post on Discourse for instructions and more details.
If you'd like for me to do this for you, comment TagBot fix
on this issue.
I'll open a PR within a few hours, please be patient!
I wanted to see the json file corresponding to a dataframe that has dataframes as elements in one of the columns.
3×3 DataFrame
Row │ f oc sdf
│ String Int64 DataFrame
─────┼──────────────────────────────
1 │ f1 1 3×2 DataFrame
2 │ f2 2 2×2 DataFrame
3 │ f3 3 3×2 DataFrame
I have previously verified that
julia> Tables.istable(dfn)
true
but I get this result
julia> objecttable(dfn)
ERROR: ArgumentError: DataFrame doesn't have a defined `StructTypes.StructType`
Stacktrace:
...
I wonder if it is possible to manage this type of structures
if i use jsontable on this string (http://bl.ocks.org/nautat/4085017)
jdata = "[
{
\"name\":\"bob\",
\"salary\":13000,
\"friends\":[
{
\"name\": \"sarah\",
\"salary\":10000
},
{
\"name\": \"bill\",
\"salary\":5000
}
]
},
{
\"name\":\"marge\",
\"salary\":10000,
\"friends\":[
{
\"name\": \"rhonda\",
\"salary\":10000
},
{
\"name\": \"mike\",
\"salary\":5000,
\"hobbies\":[
{
\"name\":\"surfing\",
\"frequency\":10
},
{
\"name\":\"surfing\",
\"frequency\":15
}
]
}
]
},
{
\"name\":\"joe\",
\"salary\":10000,
\"friends\":[
{
\"name\": \"harry\",
\"salary\":10000
},
{
\"name\": \"sally\",
\"salary\":5000
}
]
}
]"
and then i try to get the dataframe
julia> jsontable(jdata)
JSONTables.Table{false, JSON3.Array{JSON3.Object, Base.CodeUnits{UInt8, String}, Vector{UInt64}}}([:name, :salary, :friends], Dict{Symbol, Type}(:name => String, :salary => Int64, :friends => JSON3.Array{JSON3.Object, Base.CodeUnits{UInt8, String}, SubArray{UInt64, 1, Vector{UInt64}, Tuple{UnitRange{Int64}}, true}}), JSON3.Object[{
...
julia> DataFrame(jsontable(jdata))
3×3 DataFrame
Row │ name salary friends
│ String Int64 Array…
─────┼───────────────────────────────────────────────────
1 │ bob 13000 JSON3.Object[{\n "name": "sa…
2 │ marge 10000 JSON3.Object[{\n "name": "rh…
3 │ joe 10000 JSON3.Object[{\n "name": "ha…
objecttable
and arraytable
are currently not exported.
I encounter http://jsonlines.org/ more and more often recently. Maybe we can also add a function to read in the data in this format?
objecttable
and arraytable
should allow passing a stream (and filename as a shorthand) to write into. Now they always produce a string which if not optimal.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.