The draco from uwdata

draco's Issues

Don't use bar with just d
Don't use rule with only x or y
rule and tick need q
do not use negative values for size
size should have high preference for zero, even with bin
don't allow row without y, column without x
stack can only have one continuous and it has to be on x or y
only allow dates to be used for temporal (no string or number)
we overlap bars when we don't have an ordinal and only q

Refactor code

We should aim to not rely too much on files in our APIs. It makes it really hard to write in memory algorithms. Instead, we should mooe around in-memory objects and handle IO separately. I added a few TODOs in the code to show where I see problems.

Encoding is an array in queries

Write tests for python code

Generate feature vecors in parallel

Speed up development

Rename methods

For example, load_from_vl_json should be load_query_from_json

Active learning

Figure out what pairs of visualization we should ask an expert for labels.

Write more soft constraints

View visualizations by feature

A simple viewer where we can look at visualizations and see what soft constraints they violate.

remove scale

API to check specs

Check only against hard constraints.

Inspecting how to sample solutions from partial spec

Given a partial spec, and a set of hard constraints, sample from the set of all possible full specs that satisfies these hard constraints.

Support sort

Sort scales as feature

Fix outputs

For example, bin: 3 is not valid.

Fix scale.zero and scale.type: log

We should be parsing this spec correctly. Right now, we expect zero and log not to be nested under scale.

{
    "encoding": {
        "x": {
            "bin": 10,
            "field": "horsepower",
            "type": "quantitative"
        },
        "y": {
            "aggregate": "count",
            "scale": {
                "zero": True
            },
            "type": "quantitative"
        }
    },
    "mark": "bar"
}

Create good input/output examples

Use CompassQL.

@domoritz starts with 3-5
@Mestway adds more afterwards.

Get bugs

Get a list of wrong predictions. Quickly view them.

Get clingo in homebrew

Evaluation

Can we learn all good visualizations? We may miss important soft constraints so we cannot recommend certain charts.
Can we recommend the top 10 visualizations from http://viziometrics.org/?
Do we beat CompassQL in human ratings?

Running 'run_pipeline.sh examples/ab.json' produces wrong result

Running run_pipeline.sh examples/ab.json results in a bad result:

The field of encoding e0 is "e1" in the generated file, where "e1" is an encoding id not a field.

The result is supposed to be either a or b for the field.

FYI, soft constraints generated:

% ====== Data definitions ======
fieldtype(a,string).
cardinality(a,3).

fieldtype(b,number).
cardinality(b,6).

% ====== Query constraints ======
mark(bar).

encoding(e0).
channel(e0,x).
:- not field(e0,_).
:- not type(e0,_).
%0 { log(e0) } 1.
%0 { zero(e0) } 1.

encoding(e1).
channel(e1,y).
:- not field(e1,_).
type(e1,quantitative).
aggregate(e1,max).
%0 { log(e1) } 1.
%0 { zero(e1) } 1.

Generated full spec:

{                                                                                                                                              
    "$schema": "https://vega.github.io/schema/vega-lite/v2.0.json",
    "data": {
        "url": "examples/data/ab.csv"
    },
    "encoding": {
        "x": {
            "field": "e1",
            "type": "ordinal"
        },
        "y": {
            "aggregate": "max",
            "field": "b",
            "scale": {
                "zero": true
            },
            "type": "quantitative"
        }
    },
    "mark": "bar"
}

Add data, task and spec to ui

Use clyngor

Test with schema

Data gen issues

(I will keep adding to this list)

We don't support square mark types (I just added a constraint for that)
Prefer to generate aggregation together with lines and area unless its q q (I'm adding soft constraints to learn this)

Fix setup

Bad in labeler

left, right, both are bad. Red color.

Use proper temp dir

https://docs.python.org/3.6/library/tempfile.html

Task

task - mark
task - channel

Implement a way to get the number of violating soft constraints

UI to ask the user for labels

Skew

Add a soft constraint not to add zero when the difference between min and max is less than the distance to 0.
Discourage line or area without aggregation when the size of the data is larger than the size of the ordinal axis