Giter Club home page Giter Club logo

gradual-typing-performance's People

Contributors

bennn avatar dfeltey avatar janvitek avatar maxsnew avatar mfelleisen avatar sabauma avatar samth avatar takikawa avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gradual-typing-performance's Issues

Artifact: missing README in the VM

There should be a copy of the README in the VM too, but it looks like only a link is installed and the document is not actually included.

Gregor Doesn't Work for Me

find-relative-path: contract violation
  expected: (and/c path-for-some-system? simple-form?)
  given: #<path:../usr/share/zoneinfo/America/New_York>
  context...:
   /home/max/racket/racket/collects/racket/path.rkt:114:0: do-explode-path
   /home/max/racket/racket/collects/racket/path.rkt:124:0: find-relative-path5
   /home/max/research/gradual-typing-performance/benchmarks/gregor/benchmark/base/tzinfo/private/os/unix.rkt:11:0: detect-tzid/unix
   /home/max/research/gradual-typing-performance/benchmarks/gregor/benchmark/base/tzinfo/private/zoneinfo.rkt:54:3: detect-system-tzid
   /home/max/research/gradual-typing-performance/benchmarks/gregor/benchmark/base/tzinfo/main.rkt:70:0: system-tzid
   /home/max/racket/racket/collects/racket/contract/private/arrow-val-first.rkt:363:18
   /home/max/research/gradual-typing-performance/benchmarks/gregor/benchmark/configuration0000000000000/moment.rkt: [running body]
   /home/max/research/gradual-typing-performance/benchmarks/gregor/benchmark/configuration0000000000000/main.rkt: [traversing imports]

@wilbowma says it's a known bug in Gregor.

MBTA pathology explanation

I don't think we explain the weird pathologies in the MBTA benchmark yet right? We should probably explain that, but not sure where the best place to put that is.

Feedback on README

"We recommend giving the VM at least 4GB of RAM if you want to run analysis on the largest datasets."

Should probably be "analyses" or "run the analysis".

"The .rktd files contained therein are racket data files in a simple S-expression format."

Capitalize "racket".

The analysis section should explain how to run the tools that are contained in the artifact. Something the reviewer can just copy+paste and run.

Same in the benchmark section. Also the commands should use absolute paths unless you also specify which directory they should be in.

4.2 - it sounds like this describes how to make a benchmark, but it should describe how the benchmarks are organized now (it says "should contain" instead of "a benchmark directory contains").

The README should list all the benchmark programs we have and their directories in the artifact. Maybe a note for each one.

We may want to explain some of the flags for running benchmarks. Also it's probably a good idea to suggest that the VM should be run with at least two cores allocated by Virtualbox.

The "Artifact Overview" section should say a bit more about what a reviewer can expect when running our artifact. Like list the benchmarks that are available, and maybe say roughly how long these take so the reviewer can allocate their time properly (if they accidentally select quad or gregor to try, that's not great).

Can we link to the Racket docs and say "here's where you go if you are confused by anything in here"? May also want to link to TR docs too.

Module graphs as Racket values

I propose we encode the module graphs we have as Racket values with the graph library and then output TikZ from that, rather than the other way around. This might be useful in doing the boundary-based prediction automation since the module graph information is needed there.

(also even better if we can build picts instead of tikz)

Comments: Related Work

  • should 2nd sentence say "JavaScript" rather than "TypeScript"?
  • reticulated should be a new paragraph, and also note that their focus was expressiveness (whereas TR is already expressive)
  • Allende as a section titled "macrobenchmarks" describing 2 projects, so I don't think we should use the term "microbenchmarks" to describe them. How about "small gradualtalk benchmarks"?
  • Allende notes a 36% speedup on one program, though they say it's "generally not so" that adding types speeds up (but wonder if there's a tipping point beyond which more type = no worse)
  • maybe note that STS "lattice" performance should fall between 22x and 6% slowdown; their "untyped" is very different from ours
  • can we cite the STOP paper for mypy? I feel bad about using a footnote

(If these should be in a different format -- inline in a pull request? -- I'd be happy to make that change)

Synth does not type check with Racket HEAD

Newer versions of Typed Racket seem to have changed the type of expt. Synth produces the following error message when run using Racket HEAD.

sequencer.rkt:26:14: Type Checker: type mismatch
  expected: Nonnegative-Real
  given: Number
  in: (* 440 (expt (expt 2 1/12) (- note 57)))
  context...:
   /home/spenser/local/racket-head/share/racket/pkgs/typed-racket-lib/typed-racket/typecheck/tc-toplevel.rkt:318:0: type-check
   /home/spenser/local/racket-head/share/racket/pkgs/typed-racket-lib/typed-racket/typecheck/tc-toplevel.rkt:562:0: tc-module
   /home/spenser/local/racket-head/share/racket/pkgs/typed-racket-lib/typed-racket/tc-setup.rkt:82:0: tc-module/full
   /home/spenser/local/racket-head/share/racket/pkgs/typed-racket-lib/typed-racket/typed-racket.rkt:24:4
   standard-module-name-resolver
   temp8
   /home/spenser/local/racket-head/share/racket/pkgs/typed-racket-lib/typed-racket/tc-setup.rkt:82:0: tc-module/full

website, to view datasets

After jfp, add gh-pages branch for viewing the datasets.

Not sure how to keep the site synced with master... maybe it's best to keep the website to a few commits, then always pull --rebase

wishlist: require/typed/check should desugar to an "only-in"

Currently, a call to require/typed/check in a typed module desugars to a plain require.

(require/typed/check "foo.rkt" [ ... ]) ===> (require "foo.rkt")

Instead, it would be nice to desugar to an only-in, picking just the named imports.

(require/typed/check "foo.rkt" [a (-> Any Any)]) ===> (require (only-in "foo.rkt" a))

Because otherwise there's a risk of name collisions from unused imports. These are hard to detect without running the certain typed/untyped configuration that triggers it.

Examples:

  • I had to rename a function time from Gregor to avoid colliding with the same function from racket/base. (A prefix-in could fix this with extra work elsewhere.)
  • After running half the 2**13 configs successfully, I saw an error on config 100000001000 (or something) because a module had an unused export that got picked up by require/typed/check.

Re-run on different inputs

Let's just verify that our results are consistent. I'll start queueing these on galicia.

  • tetris on the larger history
  • kcfa on the small test (Racket v6.2, Btop, 30 iters)
  • morsecode on med. test (Racket v6.2, Btop, 30 iters)
  • moresecode on large test (Racket v.62, Btop, 30 iters)
  • snake on the larger history
  • suffixtree on small test (finished on Racket v6.2, galicia, 30 iters)
  • funkytown on med. test (Racket v6.2, galicia, 30 iters)
  • kcfa on the large example, for k=1 (see #48)
  • gregor on larger test (finished on cluster)
  • zo on larger test
  • suffixtree on kcfa (large) test
  • maybe quad on the large test (it's over a minute fully-typed)

L-M-N figures

Create some N graphs, some MN graphs, and some LMN graphs. ASAP

GC time increasing?

We currently throw away GC time results in the benchmarks. Consider re-running some to correlate GC changes to CPU time changes.

Colors

I think the yellow in sec. 4.2.1 is a bit too hard to read. Can we pick a different shade? (maybe also a darker green?)

postmortem

Add a comprehensive analysis of slowdowns. Do this before pursing solutions.

  • how many contract checks does each boundary trigger, in the worst case?
  • what would the runtime be, supposing core data structures were never contract-protected
  • what does the contract profiler say about the worst variations in each benchmark?
  • MORE

Figure 1: LOC questions

  1. What did we use to generate line counts? Can I re-run using the sloccount utility? (sloccount ignores comments and whitespace.)
  2. What counts as "other" code? Should this number include library code like MBTA's graph and the zo-traversal's compiler/zo-lib?

Lighten `benchmark-util` dependency

benchmark-util should be removed from the benchmarks or made easier to install than raco pkg install <repo>/tools/benchmark-util

Ideas:

  • require benchmark-util via path instead of collection, to make bundling easier
  • put the useful essentials on pkgs.racket-lang
  • just add a proper makefile to the top of the gtp repo

Replace echo/sieve with l-nm

I'd converted the L-N/M plotting scripts to typed racket, and they perform well when gradually typed. Can we replace echo (alternatively, sieve) with these results?

(I'd rather replace echo because I still think sieve is an interesting, contrived example. I'm not sure what to say about echo besides it's great.)

Results

  • untyped runtime: 203ms
  • typed runtime: 141ms (0.69x overhead)
  • avg. gradually typed: 173ms (0.85x overhead)
  • max. gradually typed: 234ms (1.15x overhead)

I think these results will be even better on a larger input. Processing gregor takes the untyped variation 90,000ms, while the typed version took about 45,000ms iirc.

The l-nm picture for L=0 is very good.
lnm0

clarify: Benchmarks vs. Programs

Like reviewer C pointed out, we should be clear that our programs are real artifacts, but our benchmarks are sometimes synthetic.

Also be careful to call the programs "programs" and the tests we ran using them "benchmarks".

Re-run kcfa

One of the rows in the current kcfa file (kcfa-06-01.rktd) has one row with 28 entries, but all others have 50 entries. We should re-run the benchmark or use another data file.

Add note: recovering imports/exports

I'd like to include a note somewhere (not sure where) saying that recovering the names of values imported/exported from (all-defined-out) or (require "file.rkt") was challenging. Even when typing just one module, you need to recover the API types for all its imports.

Check overhead in suffixtree section

As Jan pointed out, we should check the 35x and 12x claims about subsets of the performance lattice.

These numbers should be produced by function calls to scripts/summary.rkt

HTDP mixin needs a type

The render mixin for the htdp benchmark is broken; we could not assign a type. The current hack is unacceptable.

Original code:

(define renderer (compose render-multi-mixin render-mixin))

Using compose doesn't work, but even the acceptable change to:

(define (renderer x) (render-multi-mixin (render-mixin x)))

Raises error messages like:

Type Checker: type mismatch
  expected: Class with row variable `r1201'
  given: Class with row variable `(Unknown Type: #(struct:Row 2069 #(struct:combined-frees #hasheq() ()) #(struct:combined-frees #hasheq() ()) #f #f () () () () #f))'

The current hack is:

(define (renderer x) x)

Don't duplicate READMEs in benchmarks

Could we set up the benchmarks so that they don't copy over README files? Maybe put the README files somewhere else?

(they take up megabytes of storage for no reason)

Morse code overhead?

Question from the wednesday: why does the typed version of morse code show 1.2x overhead?

"Configuration" abstraction

Right now, a configuration is a string 0100 or whatever. There's currently a small script for working with these, but it should be more abstract.

In particular, I'm annoyed that I've been doing:

untyped_config = "0" * num_modules

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.