play-with-go / preguide Goto Github PK
View Code? Open in Web Editor NEWpreguide is a guide/manual specification and validation tool, principally used as part of https://play-with-go.dev
License: BSD 3-Clause "New" or "Revised" License
preguide is a guide/manual specification and validation tool, principally used as part of https://play-with-go.dev
License: BSD 3-Clause "New" or "Revised" License
At the moment, PREGUIDE_SKIP_CACHE is simply compared against true
. However, it would be more useful to be able to re-generate a specific guide whilst skipping the cache, whilst allowing other guides to skip the regeneration phase if there is a cache hit.
We can do this by making PREGUIDE_SKIP_CACHE
take a regexp. A value of .
skips the cache for all guides.
At present it is possible for:
We should therefore:
For now we have punted proper guide-related error handling. There are various categories of error that we need to handle and report accurately:
preguide
, e.g. a step failsAll should, where possible, report accurate line number information (in the case of CUE this is somewhat a function of CUE itself reporting accurate error information)
More precisely:
This will require:
go115
and go114
, we might only need/want a block of prose (with directives) for go114
. Hence we would wrap that block with <!-- if: go114 --> ... <!-- endif -->
(format/name of directive TBD)go generate
" guide. In the context of the go115/en
instance of the former, we should link to the go115/en
instance of the latter. That suggests some sort of coordination between the scenarios defined by certain guides, but then also the ability to template links based on the current scenario/language. This could be achieved by special variables {{{ .PREGUIDE_LANGUAGE }}}' and
{{{ .PREGUIDE_SCENARIO }}}'In some guides, certain files will be repeatedly "edited". The sequence looks something like this:
This requires an #Upload(File)
step for each because we don't have a means of specifying edits on the remote session.
Nor would we necessarily want to support specification of edits because it might well leave files in a bad state if the user has also made changes.
However, it doesn't make sense to render the entire file contents to the user each time in the guide.
Instead we should support some means of specifying, via an #Upload(File)
step, what part(s) of a file should be rendered.
Ideas include:
[[1,5], [10, 15]]
Ellipses would be used to indicate un-shown content before/after, e.g.
...
func main() {
fmt.Println("main now looks like this")
}
Until we have a general solution for play-with-go/play-with-go#45, enforce that target paths should be absolute.
We currently have a handful of command-oriented sanitisers hard-coded into the preguide
project. For example, one sanitises the output from go test
in order to normalise timings from:
$ go test ./cmd/preguide
ok github.com/play-with-go/preguide/cmd/preguide 14.870s
to:
$ go test ./cmd/preguide
ok github.com/play-with-go/preguide/cmd/preguide 0.042s
However:
Writing sanitisers in CUE feels wrong. Writing them in Go feels better. The question then is how to extend preguide
in this "pluggable" way.
Ideally we would want to be able to specify, ultimately at the step level, what sanitisers to apply. Much like the existing approach, we would have a derive step that, for a given statement, determines which sanitisers to apply, from the superset specified for that step. If we assume for one second we will go the Go route, then that specification can take the form of a list of references to a type via its import path:
Steps: use_module: en: preguide.#Command & {
Sanitisers: ["example.com/blah.GoGetFixer"]
Source: """
mkdir \(Defs.mod2)
cd \(Defs.mod2)
go mod init mod.com
go get play-with-go.dev/userguides/{{.REPO1}}
go run play-with-go.dev/userguides/{{.REPO1}}
"""
}
Note that the list of sanitisers to apply will always be applied after the sanitising of variables. For example, if we had the following actual output:
$ go get play-with-go.dev/userguides/mod1abcde12345
go: downloading play-with-go.dev/userguides/mod1abcde12345 v0.0.0-20200901194510-cc2d21bd1e55
where mod1abcde12345
is a prestep-generated repository name addressable via the variable `REPO1, then the per-step sanitisers would be applied to:
$ go get play-with-go.dev/userguides/{{.REPO1}}
go: downloading play-with-go.dev/userguides/{{.REPO1}} v0.0.0-20200901194510-cc2d21bd1e55
In the example step, example.com/blah.GoGetFixer
is a reference to a function with the following signature that is used to derive a possibly-nil
sanitiser.
import "mvdan.cc/sh/v3/syntax"
func(stmt *syntax.Stmt) func(envVars []string, output string) string
envVars
is the list of variable name templates that resulted from the sanitisation of variables. In the case of the example above that list would be ["{{.REPO1}}"]
, and we would want a function, say, GoGetFixer
to return a sanitiser that normalises the versions to something like:
$ go get play-with-go.dev/userguides/{{.REPO1}}
go: downloading play-with-go.dev/userguides/{{.REPO1}} v0.0.0-20060102150405-abcde12345
(the date and time here being the time
package format reference, the commit sha being a well-defined constant for commits).
At this point we ask ourselves the question: do we just want a func(envVars []string, output string) string
sanitiser, or do we optimise for the fact that many sanitisers will be line-based and also support func(envVars []string, output []string) []string
?
A couple of options implementation-wise:
preguide
recompile and exec
"itself" as it discovers it is missing sanitiser implementationsmain
shim around a preguide
implementation package, which includes a map from string sanitiser function references to the actual functionThe second option is considerably simpler in implementation terms but slightly harder from a usage perspective (as the guide author writing a custom sanitiser), the first is simpler from a user perspective but considerably more complex from an implementation perspective.
For now we will continue to develop sanitisers in this repository, until such time as a pattern/winning solution clearly presents itself.
Related to play-with-go/play-with-go#6
preguide
supports templating of prestep
variables. That is to say, {{.ENV}}
style templates can appear in markdown and/or script. They get checked/replaced as part of executing preguide
, specifically pre the running of any script steps. The output of preguide
is then normalised to re-instate these templates, producing a stable, deterministic output.
Jekyll/Hugo also support some form of templating. For Jekyll (via Liquid) this templating takes the form of objects ({{...}}
) and tags ({% ... %}
). For Hugo it's regular text/template
and https://gohugo.io/content-management/shortcodes/ which variously take the form of {{ ... }}
, {{% ... %}}
and {{< ... >}}
.
By default, a guide defines the ["{{", "}}"]
delimiter pair. Therefore, the normalised output of preguide
escapes these normalised blocks in {% raw %}.. {% endraw %}
blocks so that Jekyll does not interpret them.
However, the story doesn't end there.
Arguably, the guide author should be somewhat aware of the fact that the output of preguide
is consumed by Jekyll/Hugo. Not least because preguide
itself needs to be aware of that in order to correctly escape the normalised templates.
Indeed, the guide author might want to leverage Jekyll/Hugo template expansion (the site URL for example). To do so they would need to set the guide delimiters to be something other than those used by Jekyll/Hugo, e.g. ["{{{", "}}}"]
. The use of the Jekyll/Hugo template would need to be limited to the markdown prose, unsurprisingly, because the script input would not be subject to template expansion pre-running because the preguide
step happens earlier in the pipeline. Not only that, literals like {{
might actually be a valid part of the script step and therefore cannot be escaped as input to preguide
, else the step would not function as expected.
However this doesn't preclude values that look like Jekyll/Hugo templates being valid input/output from script steps. We therefore need to escape such values when generating the output from preguide
, specifically the output from directives. However there are currently two problems (with using Jekyll):
{{ "{% raw %}" }}
because it can't distinguish that from the start of a raw block. This further means that the result of directives (input and output, post prestep
variable normalisation) cannot contain {%.*%}
blocks (it can't even contain {%
) because we cannot escape them for display. Whilst likely rare, this is somewhat unfortunate (particularly in light of the second point)text/template
-style parse+execute approach to consume the markdown prose in preguide
. Hence we can't sanitise prose and directive block results separately, it all happens "in one go" once the directives have been replaced. As a consequence, our limitation on the {%.*%}
blocks not appearing in directive output spills over to being a general limitation on the markdown prose tooThis situation appears (although this issue exists precisely in order to verify this fact) to be better in the Hugo world because they follow a text/template
approach to parsing input. If we shift to a text/template
parse+execute approach, we can trivially escape the directive blocks separately from prose, and anything that looks like a Hugo template/shortcode be escaped using {{< "{{<" >}}
approach. (This probably requires us to configure preguide
in some way to avoid it being hardcoded with some description of what and how to escape the result of script blocks.)
Hence we can remove the restriction on {%.%}
not appearing in the result of a script block.
Shifting to the text/template
parse+execute approach would allow us to normalise:
This really ensures that what the user will run is valid, because it ensures that InformationOnly
steps are side-effect free.
The output from a full run vs the referenced-only run should be identical post sanitisation, not consider the non-referenced steps.
This would probably work by triggering two runs of the script simultaneously.
This code base really needs a serious tidy up. Creating this as an umbrella issue:
To catch any issue in CUE itself, but also any issues with how we are using CUE.
If a step contains something like:
false
but is then changed to:
! false
cmd/preguide
does not detect this as a cache miss, because the same bash
results.
Whilst in an ideal situation we might optimise by simply ensuring the exit code from the previous run is now as expected, it might be more practical to simply cause a cache miss and re-run for now.
Commands like go get
do not have deterministic order of output:
--- a/_posts/2020-11-09-using-staticcheck_go115_en.markdown
+++ b/_posts/2020-11-09-using-staticcheck_go115_en.markdown
@@ -36,7 +36,7 @@ You should already have completed:
This guide is running using:
<pre data-command-src="Z28gdmVyc2lvbgo="><code class="language-.term1">$ go version
-go version go1.15.8 linux/amd64
+go version go1.15.15 linux/amd64
</code></pre>
### Installing Staticcheck
@@ -50,8 +50,8 @@ Use `go get` to install Staticcheck:
go: downloading honnef.co/go/tools v0.0.1-2020.1.6
go: found honnef.co/go/tools/cmd/staticcheck in honnef.co/go/tools v0.0.1-2020.1.6
go: downloading golang.org/x/tools v0.0.0-20200410194907-79a7a3126eef
-go: downloading golang.org/x/xerrors v0.0.0-20191204190536-9bdfabe68543
go: downloading github.com/BurntSushi/toml v0.3.1
+go: downloading golang.org/x/xerrors v0.0.0-20191204190536-9bdfabe68543
</code></pre>
_Note: so that this guide remains reproducible we have spcified an explicit version, `v0.0.1-2020.1.6`.
We should add the ability for a step to declare that its output is semantically equivalent modulo line order.
Having to run https://github.com/play-with-go/preguide/blob/3d53e5c0b0fabe19e6fc303cfdac5804d0eaed85/_scripts/dockerBuildSelf.sh as part of each main
build feels fragile. And potentially unnecessary. From memory, this is required because we can't control how the user has built preguide
, and hence can't know whether they have used CGO_ENABLED=0
or not.
Perhaps there is a way in which we can avoid this, by failing preguide
in a case where it was built with CGO_ENABLED=1
? If so, we could potentially use a "simple" docker image like busybox
?
preguide
runs the actual steps for a guide within a container in order to isolate the host system from any potential nasties.
Our current setup for achieving this is a bit janky. So that the user of preguide
does not have to have Go installed, we support running a pre-build docker image that includes the equivalent version of preguide
to that running:
preguide/cmd/preguide/gencmd.go
Lines 1610 to 1628 in 1120d15
But to handle the fact that sometimes we will find ourselves running a version of preguide
where there is not a prebuilt docker image, we use the PREGUIDE_DEVEL_IMAGE
environment variable to indicate that the running binary should be mounted and run in a busybox image as the entrypoint. This is pretty poor because the user might be running preguide
on a mac, at which point this fails.
We can, however, do better.
preguide
preguide
instance can use runtime/debug.BuildInfo
to determine this) that is being run is a non-pre-release semver version (in which case a pre-build docker image will be available), we need to build self. i.e. Go is requiredpreguide
user cache directory, e.g. filepath.Join(os.UserCacheDir(), "unity","selfbuilds")
, using github.com/rogpeppe/go-internal/cache.Cache
for content addressing and trimming of the cachepreguide
is a directory replace, build into the target of the replace (we should .gitignore
/.bin
or similar)This way we can always quickly build self for GOOS=linux
GOARCH=amd64
CGO_ENABLED=0
and use the smallest busybox image.
Rather than write 0.042s
for every test time, we can instead tolerate some real values. But then use the ComparisonOutput
sanitiser to replace all times with XXs. That way, test output is compared modulo the actual test times.
The same approach can be used for benchmarks, where it is even more critical that the model output contains some vaguely sensible timings.
Perhaps some back error handling now that we've moved to concurrent guide processing?
Seen randomly locally immediately after upgrading to a later preguide version in PWG.
Because they influence the content of the generated out CUE.
When viewing a guide in the browser, the output is consumed via Jekyll to create an HTML page. That HTML interacts with the PWD SDK. Part of this interaction involves using the terminal name as a CSS selector. There will be, therefore, certain restrictions on what characters can/cannot appear in a terminal name. We should identify these and enforce them in the github.com/play-with-go/preguide
CUE schema.
I have the following guide
{{{ step compile }}}
{{{ step change_something }}}
{{{ step compile }}}
Preguide complains that compile
is superfluous sicne it has been defined twice since the way that we're defining steps, IIUC it can only have a unique name and a specific order. Do we want to force writers to have to use a unique step name and make the reference in the guide.cue
file?
If a code block to be uploaded contains something like <nil>
"surprising" things happen.
Include a test to verify this. As well as a test that verifies behaviour when using a diff renderer.
e.g. something like
Steps: painkiller_add_fever_advice: preguide.#Upload & {
Target: "\(Defs.painkiller_dir)/\(Defs.painkiller_go)"
Renderer: preguide.#RenderDiff & {Pre: -1}
Source: #"""
package main
import "fmt"
//go:generate \#(Defs.gorun) \#(Defs.stringer_pkg) -type=Pill
type Pill int
const (
Placebo Pill = iota
Ibuprofen
)
func main() {
fmt.Printf("For headaches, take %v\n", Ibuprofen)
}
"""#
}
Which defines Pre
in terms of the previous upload block for the same Target
.
When a generated out package is written to disk, this includes a number of maps. Seemingly, going via https://beta.pkg.go.dev/cuelang.org/[email protected]/encoding/gocode/gocodec#Codec.Decode and https://beta.pkg.go.dev/cuelang.org/[email protected]/cue#Value.Syntax we do not end up with deterministic output.
This would seem to suggest that gocodec
should take some sort of option to sort map values during decoding (because there is order in cue.Value
values).
Until this is fixed we will see some extraneous diffs in generated out packages.
An umbrella issue for future work:
<!-- if: go115 -->...<!-- end -->
To ensure that it matches the schema of a prestep.
This should also be added to the PWG controller.
Need to explain to people what preguide
actually does, particularly in the context of play-with-go.dev
Might be worth initially simply linking to the play-with-go.dev
contributing docs:
https://github.com/play-with-go/play-with-go/blob/main/CONTRIBUTING.md
Whether/how we do this remains to be seen.
However, one thing we should do, when skipping the cache, is update the prestep version information in the out guide. It appears this is also not getting updated.
When we create a file using a command (e.g. go mod edit
) and subsequently make a change by hand, e.g. to add a comment, we have no way of using the Diff renderer.
It should be possible to render a diff in this scenario.
Sometimes, you want to showcase a step where some of the source statements could produce an error. The way to solve that today is through some hackery to assert the code in the step i.e:
Steps: gopher_update_fail: preguide.#Command & {
Source: """
cd \(Defs.gopher_dir)
# We launch this in a subshell to assert the status code
code=$(\(Defs.cmdgo.get) -u -v \(Defs.public_mod)@main; echo $?)
# Assert that status code is correct or fail
[ $code == 2 ] || false
"""
}
Does it even make sense to provide a way to allow the user to specify that some statements are expected to output a specific code?
It should be possible to parameterise preguide
with a set of style rules that can catch vet
/lint
like errors in markdown or the CUE script.
e.g.
git
directories should be clean at the end of the guidegorelease
, unless configured otherwisePlaceholder for decorator support.
We currently vary the output of a step in the declaration of a step itself:
preguide/cmd/preguide/testdata/renderers.txt
Lines 88 to 98 in 78bc677
This is wrong. Applying ellipsis, or rendering a specific range of lines, is a purely presentational thing.
We should therefore be doing this in "decorators" that modify the result of the step
function. Something like:
{{ step "step0" | lineRange [[2,2]] }}
When the new evaluator work stabilises, upgrade preguide
and all reverse dependencies (gitea
and play-with-go
) to use the latest CUE version (currently using v0.2.2
). We are currently relying on a number of workarounds, and in any case there will no longer be any active development on the old evaluator.
We need to be able to:
-run
regexp, go test -run
-style flag, that can also takes its value from an env var-v
and environment variables that set GOMODCACHE
and GOCACHE
will massively speed up iterations of a guide's developmente.g. if a file to be uploaded includes backticks, we will currently try and expand those backticks when we come to run the upload step because we are not using '
escaped heredoc.
At a high level we want to:
#TerminalName
to constrain _#stepCommon.Terminal
Steps: [name=string]
that allows specialisation at language and/or scenarioFor 1 we should be able to do:
...
#StepTypeUploadFile: #StepType & 4
#Guide: {
#Step: (#Command | #CommandFile | #Upload | #UploadFile ) & {
Name: string
StepType: #StepType
Terminal: string
}
#stepCommon: {
Name: string
StepType: #StepType
Terminal: #TerminalName
}
....
#TerminalNames: [ for k, _ in Terminals {k}]
#TerminalName: *#TerminalNames[0] | or(#TerminalNames)
// Terminals defines the required remote VMs for a given guide
Terminals: [name=string]: #Terminal & {
Name: name
}
But given we chose to have package instances themselves "implement" preguide.#Guide
, then we need some way to be able to refer to the embedded definitions #Command
etc (because they are effectively definitions on the instance of preguide.#Guide
, given that they are defined in terms of regular fields of preguide.#Guide
).
This approach by definition requires us to embed the preguide.#Guide
definition in a script file. Which currently causes us to run into cuelang/cue#565.
For 2, we should be able to move to a definitions of Steps
that looks something like this:
#Guide: {
#Step: {
// ...
}
#Scenario: {
Name: string
Image: string
}
Scenarios: [not(_#Language)]: #Scenario & {
Name: name
}
_#ScenarioName: or([ for name, _ in Scenarios {name}])
Languages: [...#Language]
_#Language: or(Languages)
for scenario, _ in Scenarios for _, language in Languages {
Steps: "\(scenario)": "\(language)": [string]: #Step
}
_#StepName: not(_#Language) & not(_#ScenarioName) & string
// Now we know that _#Language, _#ScenarioName and _#StepName
// are disjoint sets
// Allow specification of steps at any level of granularity
Steps: [name=_#StepName]: #Step & {
Name: name
}
Steps: [_#Language]: [name=_#StepName]: #Step & {
Name: name
}
Steps: [_#Language]: [_#ScenarioName]: [name=_#StepName]: #Step & {
Name: name
}
Steps: [_#ScenarioName]: [name=_#StepName]: #Step & {
Name: name
}
Steps: [_#ScenarioName]: [_#Language]: [name=_#StepName]: #Step & {
Name: name
}
// Default finer granualrities from coarser ones
Steps: [_#Language]: [name=string]: *Steps[name] | #Step
Steps: [_#ScenarioName]: [name=string]: *Steps[name] | #Step
Steps: [lang=_#Language]: [_#ScenarioName]: [name=string]: *Steps[lang][name] | #Step
Steps: [scenario=_#ScenarioName]: [_#Language]: [name=string]: *Steps[scenario][name] | #Step
// Constrain that the finest granularity of step should be consistent
// regardless of how it is specified
Steps: [scenario=_#ScenarioName]: [lang=_#Language]: [name=string]: Steps[lang][scenario][name]
}
#Language: "en" | "fr" // etc
This change is contingent on change 1, however. Because we need to drop our ok
-based check on terminal names:
#ok: true & and([ for s in Steps for l in s {list.Contains(#TerminalNames, l.Terminal)}])
And it also requires us to have some answer on how we constrain _#ScenarioName
to not be a language code. Two options on that front:
// Option 1
Scenarios: [#Language]: _|_
// Option 2
#ScenarioName: not(or(#Language))
This is covered in cuelang/cue#571.
The second change is also contingent on:
With this second change we can effectively require that every step is well defined for every language in every scenario (via judicious use of defaults). That saves us ever having to implement that fallback logic in Go (although we could do if we don't get a solution to cuelang/cue#577). Under this proposal, languages are listed by the guide author; scenarios are declared along with the images they require. Therefore, we simply need to walks Steps
to find all step names, at which point we have our universe to iterate.
One thing worth bearing in mind at this point is whether it is ever possible/required to only have a step defined for a specific scenario. If so, what does this mean?
In order to handle non-idempotent output (e.g. a pseudoversion for a published commit) we use a block like:
Steps: golist_greetings: preguide.#Command & {
InformationOnly: true
RandomReplace: "v0.0.0-\(_#StablePsuedoversionSuffix)"
Source: """
go list -m -f {{.Version}} \(Defs.greetings_mod)
"""
}
This is less than ideal because each pseudoversion ends up with the same "random" value.
Instead, we could use the guide name as a seed for a pseudo random generator.
This would give reproducible random sequences, where each value would be different. Hence, the user would enjoy a slightly more realistic experience if they see a number of pseudoversions, for example.
(Those pseudoversions will still differ from the actual output, but that's more easily explained).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.