gradual-typing-performance's People
gradual-typing-performance's Issues
Threats to validity section
We should write one. Maybe in sec. 4 or 5.
Artifact: missing README in the VM
There should be a copy of the README in the VM too, but it looks like only a link is installed and the document is not actually included.
Gregor Doesn't Work for Me
find-relative-path: contract violation
expected: (and/c path-for-some-system? simple-form?)
given: #<path:../usr/share/zoneinfo/America/New_York>
context...:
/home/max/racket/racket/collects/racket/path.rkt:114:0: do-explode-path
/home/max/racket/racket/collects/racket/path.rkt:124:0: find-relative-path5
/home/max/research/gradual-typing-performance/benchmarks/gregor/benchmark/base/tzinfo/private/os/unix.rkt:11:0: detect-tzid/unix
/home/max/research/gradual-typing-performance/benchmarks/gregor/benchmark/base/tzinfo/private/zoneinfo.rkt:54:3: detect-system-tzid
/home/max/research/gradual-typing-performance/benchmarks/gregor/benchmark/base/tzinfo/main.rkt:70:0: system-tzid
/home/max/racket/racket/collects/racket/contract/private/arrow-val-first.rkt:363:18
/home/max/research/gradual-typing-performance/benchmarks/gregor/benchmark/configuration0000000000000/moment.rkt: [running body]
/home/max/research/gradual-typing-performance/benchmarks/gregor/benchmark/configuration0000000000000/main.rkt: [traversing imports]
@wilbowma says it's a known bug in Gregor.
MBTA pathology explanation
I don't think we explain the weird pathologies in the MBTA benchmark yet right? We should probably explain that, but not sure where the best place to put that is.
Artifact: running morsecode fails
The example run.sh
invocation for morsecode fails for me with an open-input-file
error for the frequency file.
typed / typed / typed / typed
Feedback on README
"We recommend giving the VM at least 4GB of RAM if you want to run analysis on the largest datasets."
Should probably be "analyses" or "run the analysis".
"The .rktd files contained therein are racket data files in a simple S-expression format."
Capitalize "racket".
The analysis section should explain how to run the tools that are contained in the artifact. Something the reviewer can just copy+paste and run.
Same in the benchmark section. Also the commands should use absolute paths unless you also specify which directory they should be in.
4.2 - it sounds like this describes how to make a benchmark, but it should describe how the benchmarks are organized now (it says "should contain" instead of "a benchmark directory contains").
The README should list all the benchmark programs we have and their directories in the artifact. Maybe a note for each one.
We may want to explain some of the flags for running benchmarks. Also it's probably a good idea to suggest that the VM should be run with at least two cores allocated by Virtualbox.
The "Artifact Overview" section should say a bit more about what a reviewer can expect when running our artifact. Like list the benchmarks that are available, and maybe say roughly how long these take so the reviewer can allocate their time properly (if they accidentally select quad or gregor to try, that's not great).
Can we link to the Racket docs and say "here's where you go if you are confused by anything in here"? May also want to link to TR docs too.
Module graphs as Racket values
I propose we encode the module graphs we have as Racket values with the graph
library and then output TikZ from that, rather than the other way around. This might be useful in doing the boundary-based prediction automation since the module graph information is needed there.
(also even better if we can build picts instead of tikz)
Comments: Related Work
- should 2nd sentence say "JavaScript" rather than "TypeScript"?
- reticulated should be a new paragraph, and also note that their focus was expressiveness (whereas TR is already expressive)
- Allende as a section titled "macrobenchmarks" describing 2 projects, so I don't think we should use the term "microbenchmarks" to describe them. How about "small gradualtalk benchmarks"?
- Allende notes a 36% speedup on one program, though they say it's "generally not so" that adding types speeds up (but wonder if there's a tipping point beyond which more type = no worse)
- maybe note that STS "lattice" performance should fall between 22x and 6% slowdown; their "untyped" is very different from ours
- can we cite the STOP paper for mypy? I feel bad about using a footnote
(If these should be in a different format -- inline in a pull request? -- I'd be happy to make that change)
Synth does not type check with Racket HEAD
Newer versions of Typed Racket seem to have changed the type of expt
. Synth produces the following error message when run using Racket HEAD.
sequencer.rkt:26:14: Type Checker: type mismatch
expected: Nonnegative-Real
given: Number
in: (* 440 (expt (expt 2 1/12) (- note 57)))
context...:
/home/spenser/local/racket-head/share/racket/pkgs/typed-racket-lib/typed-racket/typecheck/tc-toplevel.rkt:318:0: type-check
/home/spenser/local/racket-head/share/racket/pkgs/typed-racket-lib/typed-racket/typecheck/tc-toplevel.rkt:562:0: tc-module
/home/spenser/local/racket-head/share/racket/pkgs/typed-racket-lib/typed-racket/tc-setup.rkt:82:0: tc-module/full
/home/spenser/local/racket-head/share/racket/pkgs/typed-racket-lib/typed-racket/typed-racket.rkt:24:4
standard-module-name-resolver
temp8
/home/spenser/local/racket-head/share/racket/pkgs/typed-racket-lib/typed-racket/tc-setup.rkt:82:0: tc-module/full
website, to view datasets
After jfp, add gh-pages
branch for viewing the datasets.
Not sure how to keep the site synced with master... maybe it's best to keep the website to a few commits, then always pull --rebase
wishlist: require/typed/check should desugar to an "only-in"
Currently, a call to require/typed/check
in a typed module desugars to a plain require
.
(require/typed/check "foo.rkt" [ ... ]) ===> (require "foo.rkt")
Instead, it would be nice to desugar to an only-in
, picking just the named imports.
(require/typed/check "foo.rkt" [a (-> Any Any)]) ===> (require (only-in "foo.rkt" a))
Because otherwise there's a risk of name collisions from unused imports. These are hard to detect without running the certain typed/untyped configuration that triggers it.
Examples:
- I had to rename a function
time
from Gregor to avoid colliding with the same function fromracket/base
. (Aprefix-in
could fix this with extra work elsewhere.) - After running half the
2**13
configs successfully, I saw an error on config100000001000
(or something) because a module had an unused export that got picked up byrequire/typed/check
.
Benchmark authorship
When the review is finished, clarify this in the paper.
Re-run on different inputs
Let's just verify that our results are consistent. I'll start queueing these on galicia.
- tetris on the larger history
- kcfa on the small test (Racket v6.2, Btop, 30 iters)
- morsecode on med. test (Racket v6.2, Btop, 30 iters)
- moresecode on large test (Racket v.62, Btop, 30 iters)
- snake on the larger history
- suffixtree on small test (finished on Racket v6.2, galicia, 30 iters)
- funkytown on med. test (Racket v6.2, galicia, 30 iters)
- kcfa on the large example, for k=1 (see #48)
- gregor on larger test (finished on cluster)
- zo on larger test
- suffixtree on kcfa (large) test
- maybe quad on the large test (it's over a minute fully-typed)
L-M-N figures
Create some N graphs, some MN graphs, and some LMN graphs. ASAP
GC time increasing?
We currently throw away GC time results in the benchmarks. Consider re-running some to correlate GC changes to CPU time changes.
Colors
I think the yellow in sec. 4.2.1 is a bit too hard to read. Can we pick a different shade? (maybe also a darker green?)
Artifact: higher quality wallpaper
The current one is pretty low-res. I recommend getting an SVG from http://www.eecs.northwestern.edu/~robby/logos/ and either using that directly or rendering it to some high resolution.
postmortem
Add a comprehensive analysis of slowdowns. Do this before pursing solutions.
- how many contract checks does each boundary trigger, in the worst case?
- what would the runtime be, supposing core data structures were never contract-protected
- what does the contract profiler say about the worst variations in each benchmark?
- MORE
Data + Code freely available
Need to provide a URL for sharing data & code. Make a new branch for this?
Add N-deliv. and N/M-usa. to Fig 3/4
For N=3 and M=10.
Possibly replacing avg & max overhead, but I think there's space to fit.
Figure 1: LOC questions
- What did we use to generate line counts? Can I re-run using the
sloccount
utility? (sloccount
ignores comments and whitespace.) - What counts as "other" code? Should this number include library code like MBTA's
graph
and the zo-traversal'scompiler/zo-lib
?
Lighten `benchmark-util` dependency
benchmark-util
should be removed from the benchmarks or made easier to install than raco pkg install <repo>/tools/benchmark-util
Ideas:
- require
benchmark-util
via path instead of collection, to make bundling easier - put the useful essentials on
pkgs.racket-lang
- just add a proper makefile to the top of the
gtp
repo
Replace echo/sieve with l-nm
I'd converted the L-N/M plotting scripts to typed racket, and they perform well when gradually typed. Can we replace echo (alternatively, sieve) with these results?
(I'd rather replace echo because I still think sieve is an interesting, contrived example. I'm not sure what to say about echo besides it's great.)
Results
- untyped runtime: 203ms
- typed runtime: 141ms (0.69x overhead)
- avg. gradually typed: 173ms (0.85x overhead)
- max. gradually typed: 234ms (1.15x overhead)
I think these results will be even better on a larger input. Processing gregor
takes the untyped variation 90,000ms, while the typed version took about 45,000ms iirc.
clarify: Benchmarks vs. Programs
Like reviewer C pointed out, we should be clear that our programs are real artifacts, but our benchmarks are sometimes synthetic.
Also be careful to call the programs "programs" and the tests we ran using them "benchmarks".
Re-run kcfa
One of the rows in the current kcfa file (kcfa-06-01.rktd
) has one row with 28 entries, but all others have 50 entries. We should re-run the benchmark or use another data file.
Add note: recovering imports/exports
I'd like to include a note somewhere (not sure where) saying that recovering the names of values imported/exported from (all-defined-out)
or (require "file.rkt")
was challenging. Even when typing just one module, you need to recover the API types for all its imports.
Clarify: many possible typings
Say that there are multiple possible typings for TR programs, and discuss the implications.
Check overhead in suffixtree section
As Jan pointed out, we should check the 35x and 12x claims about subsets of the performance lattice.
These numbers should be produced by function calls to scripts/summary.rkt
HTDP mixin needs a type
The render mixin for the htdp benchmark is broken; we could not assign a type. The current hack is unacceptable.
Original code:
(define renderer (compose render-multi-mixin render-mixin))
Using compose
doesn't work, but even the acceptable change to:
(define (renderer x) (render-multi-mixin (render-mixin x)))
Raises error messages like:
Type Checker: type mismatch
expected: Class with row variable `r1201'
given: Class with row variable `(Unknown Type: #(struct:Row 2069 #(struct:combined-frees #hasheq() ()) #(struct:combined-frees #hasheq() ()) #f #f () () () () #f))'
The current hack is:
(define (renderer x) x)
Don't duplicate READMEs in benchmarks
Could we set up the benchmarks so that they don't copy over README files? Maybe put the README files somewhere else?
(they take up megabytes of storage for no reason)
Morse code overhead?
Question from the wednesday: why does the typed version of morse code show 1.2x overhead?
Artifact: running mbta fails to produce a plot
The plot rendering step seems to fail with a with-output-to-file
error. Something about lnm-cache-sample.rktd
. Just try to do run.sh
on mbta to see this.
"Configuration" abstraction
Right now, a configuration is a string 0100
or whatever. There's currently a small script for working with these, but it should be more abstract.
In particular, I'm annoyed that I've been doing:
untyped_config = "0" * num_modules
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.