nuprl / gradual-typing-performance Goto Github PK

Racket 99.23% PostScript 0.11% TeX 0.18% Scheme 0.28% HTML 0.01% CSS 0.02% Shell 0.05% Makefile 0.01% Perl 0.01% JavaScript 0.01% TypeScript 0.01% Python 0.11%

gradual-typing-performance's People

Contributors

Stargazers

Watchers

Forkers

bennn sabauma btlachance samth

gradual-typing-performance's Issues

Threats to validity section

We should write one. Maybe in sec. 4 or 5.

Artifact: missing README in the VM

There should be a copy of the README in the VM too, but it looks like only a link is installed and the document is not actually included.

Gregor Doesn't Work for Me

find-relative-path: contract violation
  expected: (and/c path-for-some-system? simple-form?)
  given: #<path:../usr/share/zoneinfo/America/New_York>
  context...:
   /home/max/racket/racket/collects/racket/path.rkt:114:0: do-explode-path
   /home/max/racket/racket/collects/racket/path.rkt:124:0: find-relative-path5
   /home/max/research/gradual-typing-performance/benchmarks/gregor/benchmark/base/tzinfo/private/os/unix.rkt:11:0: detect-tzid/unix
   /home/max/research/gradual-typing-performance/benchmarks/gregor/benchmark/base/tzinfo/private/zoneinfo.rkt:54:3: detect-system-tzid
   /home/max/research/gradual-typing-performance/benchmarks/gregor/benchmark/base/tzinfo/main.rkt:70:0: system-tzid
   /home/max/racket/racket/collects/racket/contract/private/arrow-val-first.rkt:363:18
   /home/max/research/gradual-typing-performance/benchmarks/gregor/benchmark/configuration0000000000000/moment.rkt: [running body]
   /home/max/research/gradual-typing-performance/benchmarks/gregor/benchmark/configuration0000000000000/main.rkt: [traversing imports]

@wilbowma says it's a known bug in Gregor.

MBTA pathology explanation

I don't think we explain the weird pathologies in the MBTA benchmark yet right? We should probably explain that, but not sure where the best place to put that is.

Artifact: running morsecode fails

The example run.sh invocation for morsecode fails for me with an open-input-file error for the frequency file.

typed / typed / typed / typed

The words all make sense, but that left column is baaaaad.

Feedback on README

"We recommend giving the VM at least 4GB of RAM if you want to run analysis on the largest datasets."

Should probably be "analyses" or "run the analysis".

"The .rktd files contained therein are racket data files in a simple S-expression format."

Capitalize "racket".

The analysis section should explain how to run the tools that are contained in the artifact. Something the reviewer can just copy+paste and run.

Same in the benchmark section. Also the commands should use absolute paths unless you also specify which directory they should be in.

4.2 - it sounds like this describes how to make a benchmark, but it should describe how the benchmarks are organized now (it says "should contain" instead of "a benchmark directory contains").

The README should list all the benchmark programs we have and their directories in the artifact. Maybe a note for each one.

We may want to explain some of the flags for running benchmarks. Also it's probably a good idea to suggest that the VM should be run with at least two cores allocated by Virtualbox.

The "Artifact Overview" section should say a bit more about what a reviewer can expect when running our artifact. Like list the benchmarks that are available, and maybe say roughly how long these take so the reviewer can allocate their time properly (if they accidentally select quad or gregor to try, that's not great).

Can we link to the Racket docs and say "here's where you go if you are confused by anything in here"? May also want to link to TR docs too.

Module graphs as Racket values

I propose we encode the module graphs we have as Racket values with the graph library and then output TikZ from that, rather than the other way around. This might be useful in doing the boundary-based prediction automation since the module graph information is needed there.

(also even better if we can build picts instead of tikz)

Comments: Related Work

should 2nd sentence say "JavaScript" rather than "TypeScript"?
reticulated should be a new paragraph, and also note that their focus was expressiveness (whereas TR is already expressive)
Allende as a section titled "macrobenchmarks" describing 2 projects, so I don't think we should use the term "microbenchmarks" to describe them. How about "small gradualtalk benchmarks"?
Allende notes a 36% speedup on one program, though they say it's "generally not so" that adding types speeds up (but wonder if there's a tipping point beyond which more type = no worse)
maybe note that STS "lattice" performance should fall between 22x and 6% slowdown; their "untyped" is very different from ours
can we cite the STOP paper for mypy? I feel bad about using a footnote

(If these should be in a different format -- inline in a pull request? -- I'd be happy to make that change)

Synth does not type check with Racket HEAD

Newer versions of Typed Racket seem to have changed the type of expt. Synth produces the following error message when run using Racket HEAD.

sequencer.rkt:26:14: Type Checker: type mismatch
  expected: Nonnegative-Real
  given: Number
  in: (* 440 (expt (expt 2 1/12) (- note 57)))
  context...:
   /home/spenser/local/racket-head/share/racket/pkgs/typed-racket-lib/typed-racket/typecheck/tc-toplevel.rkt:318:0: type-check
   /home/spenser/local/racket-head/share/racket/pkgs/typed-racket-lib/typed-racket/typecheck/tc-toplevel.rkt:562:0: tc-module
   /home/spenser/local/racket-head/share/racket/pkgs/typed-racket-lib/typed-racket/tc-setup.rkt:82:0: tc-module/full
   /home/spenser/local/racket-head/share/racket/pkgs/typed-racket-lib/typed-racket/typed-racket.rkt:24:4
   standard-module-name-resolver
   temp8
   /home/spenser/local/racket-head/share/racket/pkgs/typed-racket-lib/typed-racket/tc-setup.rkt:82:0: tc-module/full

website, to view datasets

After jfp, add gh-pages branch for viewing the datasets.

Not sure how to keep the site synced with master... maybe it's best to keep the website to a few commits, then always pull --rebase

wishlist: require/typed/check should desugar to an "only-in"

Currently, a call to require/typed/check in a typed module desugars to a plain require.

(require/typed/check "foo.rkt" [ ... ]) ===> (require "foo.rkt")

Instead, it would be nice to desugar to an only-in, picking just the named imports.

(require/typed/check "foo.rkt" [a (-> Any Any)]) ===> (require (only-in "foo.rkt" a))

Because otherwise there's a risk of name collisions from unused imports. These are hard to detect without running the certain typed/untyped configuration that triggers it.

Examples:

I had to rename a function time from Gregor to avoid colliding with the same function from racket/base. (A prefix-in could fix this with extra work elsewhere.)
After running half the 2**13 configs successfully, I saw an error on config 100000001000 (or something) because a module had an unused export that got picked up by require/typed/check.

Benchmark authorship

When the review is finished, clarify this in the paper.

Re-run on different inputs

Let's just verify that our results are consistent. I'll start queueing these on galicia.

L-M-N figures

Create some N graphs, some MN graphs, and some LMN graphs. ASAP

GC time increasing?

We currently throw away GC time results in the benchmarks. Consider re-running some to correlate GC changes to CPU time changes.

Colors

I think the yellow in sec. 4.2.1 is a bit too hard to read. Can we pick a different shade? (maybe also a darker green?)

Artifact: higher quality wallpaper

The current one is pretty low-res. I recommend getting an SVG from http://www.eecs.northwestern.edu/~robby/logos/ and either using that directly or rendering it to some high resolution.

postmortem

Add a comprehensive analysis of slowdowns. Do this before pursing solutions.

how many contract checks does each boundary trigger, in the worst case?
what would the runtime be, supposing core data structures were never contract-protected
what does the contract profiler say about the worst variations in each benchmark?
MORE

Data + Code freely available

Need to provide a URL for sharing data & code. Make a new branch for this?

Add N-deliv. and N/M-usa. to Fig 3/4

For N=3 and M=10.
Possibly replacing avg & max overhead, but I think there's space to fit.

Figure 1: LOC questions

What did we use to generate line counts? Can I re-run using the sloccount utility? (sloccount ignores comments and whitespace.)
What counts as "other" code? Should this number include library code like MBTA's graph and the zo-traversal's compiler/zo-lib?

Lighten `benchmark-util` dependency

benchmark-util should be removed from the benchmarks or made easier to install than raco pkg install <repo>/tools/benchmark-util

Ideas:

require benchmark-util via path instead of collection, to make bundling easier
put the useful essentials on pkgs.racket-lang
just add a proper makefile to the top of the gtp repo

Replace echo/sieve with l-nm

I'd converted the L-N/M plotting scripts to typed racket, and they perform well when gradually typed. Can we replace echo (alternatively, sieve) with these results?

(I'd rather replace echo because I still think sieve is an interesting, contrived example. I'm not sure what to say about echo besides it's great.)

Results

untyped runtime: 203ms
typed runtime: 141ms (0.69x overhead)
avg. gradually typed: 173ms (0.85x overhead)
max. gradually typed: 234ms (1.15x overhead)

I think these results will be even better on a larger input. Processing gregor takes the untyped variation 90,000ms, while the typed version took about 45,000ms iirc.

The l-nm picture for L=0 is very good.

clarify: Benchmarks vs. Programs

Like reviewer C pointed out, we should be clear that our programs are real artifacts, but our benchmarks are sometimes synthetic.

Also be careful to call the programs "programs" and the tests we ran using them "benchmarks".

Re-run kcfa

One of the rows in the current kcfa file (kcfa-06-01.rktd) has one row with 28 entries, but all others have 50 entries. We should re-run the benchmark or use another data file.

Add note: recovering imports/exports

I'd like to include a note somewhere (not sure where) saying that recovering the names of values imported/exported from (all-defined-out) or (require "file.rkt") was challenging. Even when typing just one module, you need to recover the API types for all its imports.

Clarify: many possible typings

Say that there are multiple possible typings for TR programs, and discuss the implications.

Check overhead in suffixtree section

As Jan pointed out, we should check the 35x and 12x claims about subsets of the performance lattice.

These numbers should be produced by function calls to scripts/summary.rkt

HTDP mixin needs a type

The render mixin for the htdp benchmark is broken; we could not assign a type. The current hack is unacceptable.

Original code:

(define renderer (compose render-multi-mixin render-mixin))

Using compose doesn't work, but even the acceptable change to:

(define (renderer x) (render-multi-mixin (render-mixin x)))

Raises error messages like:

Type Checker: type mismatch
  expected: Class with row variable `r1201'
  given: Class with row variable `(Unknown Type: #(struct:Row 2069 #(struct:combined-frees #hasheq() ()) #(struct:combined-frees #hasheq() ()) #f #f () () () () #f))'

The current hack is:

(define (renderer x) x)

untyped_config = "0" * num_modules