Giter Club home page Giter Club logo

draco's People

Contributors

domoritz avatar haldenl avatar harshil93r avatar light-and-salt avatar mestway avatar ngehlenborg avatar peter-gy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

draco's Issues

Hard constraints

  • Don't use bar with just d
  • Don't use rule with only x or y
  • rule and tick need q
  • do not use negative values for size
  • size should have high preference for zero, even with bin
  • don't allow row without y, column without x
  • stack can only have one continuous and it has to be on x or y
  • only allow dates to be used for temporal (no string or number)
  • we overlap bars when we don't have an ordinal and only q

Refactor code

We should aim to not rely too much on files in our APIs. It makes it really hard to write in memory algorithms. Instead, we should mooe around in-memory objects and handle IO separately. I added a few TODOs in the code to show where I see problems.

Rename methods

For example, load_from_vl_json should be load_query_from_json

Active learning

Figure out what pairs of visualization we should ask an expert for labels.

Fix scale.zero and scale.type: log

We should be parsing this spec correctly. Right now, we expect zero and log not to be nested under scale.

{
    "encoding": {
        "x": {
            "bin": 10,
            "field": "horsepower",
            "type": "quantitative"
        },
        "y": {
            "aggregate": "count",
            "scale": {
                "zero": True
            },
            "type": "quantitative"
        }
    },
    "mark": "bar"
}

Get bugs

Get a list of wrong predictions. Quickly view them.

Evaluation

  • Can we learn all good visualizations? We may miss important soft constraints so we cannot recommend certain charts.
  • Can we recommend the top 10 visualizations from http://viziometrics.org/?
  • Do we beat CompassQL in human ratings?

Running 'run_pipeline.sh examples/ab.json' produces wrong result

Running run_pipeline.sh examples/ab.json results in a bad result:

The field of encoding e0 is "e1" in the generated file, where "e1" is an encoding id not a field.

The result is supposed to be either a or b for the field.

FYI, soft constraints generated:

% ====== Data definitions ======
fieldtype(a,string).
cardinality(a,3).

fieldtype(b,number).
cardinality(b,6).

% ====== Query constraints ======
mark(bar).

encoding(e0).
channel(e0,x).
:- not field(e0,_).
:- not type(e0,_).
%0 { log(e0) } 1.
%0 { zero(e0) } 1.

encoding(e1).
channel(e1,y).
:- not field(e1,_).
type(e1,quantitative).
aggregate(e1,max).
%0 { log(e1) } 1.
%0 { zero(e1) } 1.

Generated full spec:

{                                                                                                                                              
    "$schema": "https://vega.github.io/schema/vega-lite/v2.0.json",
    "data": {
        "url": "examples/data/ab.csv"
    },
    "encoding": {
        "x": {
            "field": "e1",
            "type": "ordinal"
        },
        "y": {
            "aggregate": "max",
            "field": "b",
            "scale": {
                "zero": true
            },
            "type": "quantitative"
        }
    },
    "mark": "bar"
}

Data gen issues

(I will keep adding to this list)

  • We don't support square mark types (I just added a constraint for that)
  • Prefer to generate aggregation together with lines and area unless its q q (I'm adding soft constraints to learn this)

Task

task - mark
task - channel

Skew

  • Add a soft constraint not to add zero when the difference between min and max is less than the distance to 0.
  • Discourage line or area without aggregation when the size of the data is larger than the size of the ordinal axis

Add importance to data fields

Users should be able to express whether a field is considered important or not.

Options for adding this are

  • A flag to indicate whether a field is important
  • A score
  • A total order
  • Preferences (foo > bar)

Data Gen Issues

  • Better way to generate 'stack' (maybe hard constraints will help)
  • D(x) x D(y) x Q(other) should have examples with agg on other
  • special case for type when generating
  • too many channel-channel
  • support count

Create pairs

We need pairs that we can ask the user for labels. These pairs should cover interesting correlations.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.