modus-continens / modus Goto Github PK
View Code? Open in Web Editor NEWA language for building Docker/OCI container images
Home Page: https://modus-continens.com
License: GNU Affero General Public License v3.0
A language for building Docker/OCI container images
Home Page: https://modus-continens.com
License: GNU Affero General Public License v3.0
Currently, nullary predicates are denoted as identifiers without parentheses, e.g. a :- b, c
. @maowtm suggested to consider using the syntax with ()
, e.g. a() :- b(), c()
. The parentheses can also be optional. Here are my thoughts about this:
ubuntu
might be more intuitive than ubuntu()
, since it is just a reference to an image, and not a calla :- b(), c
is the same as a() :- b, c()
@barr , @thevirtuoso1973 , what are your thoughts about it?
Add a framework for executing integration/system tests.
Possible approaches:
!p
if we haven't proved p
.Relevant links:
As it turns out, it is actually possible to use arbitrary buildkit frontend in a Dockerfile, by adding a special comment at the beginning:
#syntax = image/of/frontend
This allows the user to simply DOCKER_BUILDKIT=1 docker build -f
the source file, even though it is not in Dockerfile syntax. And it requires no additional installation of any buildkit tools. If we create a buildkit frontend which takes in Modusfile (and, by some means, the query), user can simply docker build
our Modusfile without even needing to download the Modus program themselves...
Wonder if this is a thing worth exploring... Also, actually this is currently the only way I know of to tell Docker to use a custom buildkit frontend (instead of running an additional buildkit service).
Main thing(s) left to do:
;
for logical or.There are queries/programs one could construct that should work1, but would exceed some fixed maximum depth.
There may be a way to dynamically determine the maximum depth, given a program.
An ideal value for the maximum depth would be the number of possible facts that can be inferred from a program.
This is because paths in an SLD tree correspond to a proof, where every intermediate node proves a (sub)goal that's required to prove the desired goal. Therefore the depth of a path cannot exceed the number of possible facts that can be inferred from a program.
Of course in Modus we have ungrounded facts so there are infinite possible facts that can be inferred from programs, so we will consider queries instead.
So I think we should use an expression like predicateCount * numConstants^maxArity
to approximate the number of facts that can be inferred from a query.
Alternative approach that I was previously considering is below:
Some kind of bottom-up evaluation which levies the maximum chain of expansion may work, e.g for:
foo.
bar :- foo.
something like: (foo, 1)
then (bar, 2)
.
Although one issue with the bottom-up approach is infinite number of producible facts (due to string_concat
), e.g.:
f(X) :- f(Y), string_concat(Y, "1", X).
f("")
And also we would have to address that we have ungrounded facts.
Of course, there are also programs which are fully recursive foo :- foo.
and would always be infinite. ↩
This issue tracks the progress of porting OpenJDK images to Modus, s.t. they no longer use Dockerfiles.
https://github.com/docker-library/openjdk
This is an image provided by Docker as a "Docker Official Image" so it may be interesting to note which parts of their image creation pipeline is a general step vs. specific to the openjdk
image.
Some early ideas of features to Modus that might be added to assist this:
buildx
's multi platform building?We might find that #69 is related.
Also, see https://github.com/modus-continens/openjdk-images.
Things that would be directly applicable in our port:
use(url, image, path).
url(str, protocol, host, port, path).
image(str, name, tag).
path(str, directory, filename).
or having builtin be prefixed with something like $path
or @path
or having uppercase names for builtins etc
or having proper namespaces, such as std/run(...)
Currently, something like this will fail to parse:
a :- from("alpine").
b :- a, run("echo aaa > /tmp/a").
Tried to change the parser myself but I was just not familiar enough with how nom works... @thevirtuoso1973 please help
Not something urgent now, but it looks like we are cloning Literal etc a lot. Plus when we add source info into literal (?) it will be even more information needed to clone.
Also now that ClauseID need to store a literal for built-in rules it also has the same problem.
This fails:
python(python_version, "ubuntu", distr_version) :-
# from(f"ubuntu:${distr_version}"),
string_concat("ubuntu:", distr_version, X),
from(X),
# run(f"apt-get install -y python${python_version}").
string_concat("apt-get install -y python", python_version, Y),
run(Y).
Not sure what's the best solution. Maybe we should just filter out these things before parsing?
error: unknown predicate - _operator_merge_begin ┌─ merge-test.Modusfile:6:6 │ 6 │ )::merge. │ ^^^^^
Should ideally just say "unknown operator merge"
The current debian image is too big for good UX (120 MB fetched from scratch).
Looks like it can't handle spaces in certain positions...
@thevirtuoso1973
When we run all the test suites there's a chance the tests marked #[serial]
will still be run in parallel. This doesn't appear to happen if we run an individual file's tests.
Could be a bug with the library we use.
--progress=plain
etc.-f Modusfile
I discussed this with @mechtaev in chat before we decided to put everything on GitHub so it's more visible. Here is a recall of the basic idea...
Previously it was decided that operators would be implemented as "start/end" internal predicates, so
foo :- (a, b)::merge
will become
foo :- merge_begin(...), a, b, merge_end(...)
when translating to the IR, where merge_begin
and merge_end
would actually not be accessible to the user.
However when implementing it I thought that there is really no reason why we have to forbid any manual usage of these "intrinsics", since they do not break any datalog logic. We could have a defined way to name and use them, for example _predicate_merge_begin
(...end
) which both takes an "ID" argument to indicate paring, and any ::merge
would be translated into those. They will be "intrinsics" in the sense that "run" is an intrinsic - the SLD solver just treats them as always true (as long as it is fully grounded), and they are really only meant to be interpreted by any image generation code.
However, by allowing user access to this "special" namespace, we could allow them to define their own operators. Consider the following Modusfile:
build_project :-
(cc("a.cpp"), cc("b.cpp"))::with_std("c++14"),
(cc("c.cpp"), cc("d.cpp"))::with_std("c++17").
_operator_with_std_begin(id, std) :- _operator_env_begin(id, "CFLAGS", f"-std=${std}").
_operator_with_std_end(id, std) :- _operator_env_end(id, "CFLAGS", f"-std=${std}").
Note that the with_std
operator is "defined" in the last two line.
I do admit that this just feels too much like a hack, so maybe it's better to hide this _predicate
namespace after all and, if needed, have some other way to define operators.
We previously discussed the possibility for having complex data types like arrays, but I remember we decided to go with all strings for now. Since the entrypoint property is a string array, how should set_entrypoint
be implemented? Currently it just take one argument and make the entrypoint an array of length 1. (I think this covers most of the use cases, since you can just specify an entry shell script anyway.)
Maybe just tag the image with the crate version, and use the correct tag when generating Dockerfiles
After some thinking I think we can go with the following. The consequence is that as long as the instructions inside ::merge
only consists of run
, copy
and ::copy
everything can be squished into one layer. Given this, do we want to simply forbid having anything other than run
and (::
)copy
inside ::merge
?
exec
layer.copy
or ::copy
inside ::merge
, mount the source image to a temporary directory like /modus_merge_tmp.xnMpVW0Oo6
.run
s and copy
s together in order, and generate a script like the following:
sh -c 'run-command-1' && \
...
cp -R '/modus_merge_tmp.xnMpVW0Oo6/copy/src/path' '/copy/dst/path' && \
...
@mechtaev any thoughts?
There seems to be an issue with using escape chars in f-strings.
Currently modus transpile
doesn't work fully correctly because it turns out COPY --from=src_image . .
in Dockerfile does not respect the WORKDIR property of src_image
, and will instead resolve any relative path relative to /
.
Example:
FROM alpine AS s1
WORKDIR /tmp
RUN echo hello > hello
FROM alpine
COPY --from=s1 hello hello
RUN cat hello
Step 4/6 : FROM alpine
---> c059bfaa849c
Step 5/6 : COPY --from=s1 hello hello
COPY failed: stat hello: file does not exist
e.g. a :: b
This causes a panic, but it really shouldn't:
a :- string_concat("", A, A), b(A).
b("hello").
Adding a dummy rule will currently avoid this panic, and give valid proof:
a :- string_concat("", A, A), b(A).
b("hello").
string_concat(A, B, C) :- string_concat(A, B, C).
thread 'main' panicked at 'can parse outside format expansion: Error(VerboseError { errors: [(LocatedSpan { offset: 0, line: 1, fragment: "${target_folder}/buildkit-frontend", extra: () }, Nom(Escaped))] })', src/translate.rs:57:14
This is probably because escaped
returns an error if nothing can be consumed.
Let's discuss this in our next meeting...
Failed to resolve input a("aabbb")
a("aabbb") requires string_concat("a", X1, "aabbb") + string_concat(X1, "b", X2) + a(X2)
taking X1 = "abbb", X2 = "abb" // maybe don't report expansion of empty rule explicitly, to make error message shorter?
a("abb") requires string_concat("a", X3, "abb") + string_concat(X3, "b", X4) + a(X4)
taking X3 = "bb", X4 = "b"
a("b") requires string_concat("a", X5, "b")
which is not satisfiable. // this line may also be: which requires X5 to be known, in the case of invalid ungrounded variables (not the case here)
ResolutionError { history: GoalWithHistory, reason: ResolutionErrorReason (e.g. Unsat(Literal), RequireGrounded(Vec<Variable>)) }
thiserror
which makes wrapping of runtime errors easy.Tag
.nom
to deal with this upstream.Consider:
foo(X) :- bar(X) ; baz.
This should be allowed, just enforce that X cannot be ungrounded.
Currently it will throw an error since there are two different signatures for foo.
I had a discussion with @barr, and he suggested to use operators that define conditions of cache invalidation. We were thinking of something like that:
::no_cache
always invalidate the cache::invalidate_cache_if_before(CMD)
invalidate cache if CMD, executed before the layer is applied, returns non-exit code.::invalidate_cache_if_after(CMD)
invalidate cache if CMD, executed after the layer is applied, returns non-exit code.::invalidate_cache_if_changed(PATH)
invalidate cache if the file/directory changes after the layer is applied.This is just a rough sketch of the API.
Originally posted by @mechtaev in #57 (comment)
Or maybe something like
::invalidate_cache_if_different(CMD_BEFORE, CMD_AFTER)
Originally posted by @mechtaev in #57 (comment)
::no_cache
may not be possible)Currently there is a few implicit assumption in the code that we're building Linux containers with Modus itself also running on Linux. These includes
path.starts_with("/")
I don't know much about Windows containers - we might not even need to support them if they are quite rare and requires a lot of extra effort, but if we want Modus itself to be runnable on Windows then I will have to address the second point.
path(str, parent, filename)
)X = "foo"
to that predicate.Caching is an important part of Docker and any build system. Docker currently does not provide sufficient tools to control caching. Here are some things we need to think about:
When we execute something like run(f"git clone ${URL}")
, the layer will be reused as long as ${URL}
does not change, even if the git repository gets updated. In principle, we can add something like run(f"git clone ${URL}")::no_cache
, but then the cache will be invalidated every time, even if the repository is not updated. BuildKit provides an ad-hoc support for git, but this will not work for other VCS and any other scenarios. So, it would be nice to have a more fine-grained control over caching of layers.
In Docker, the registry of images and the build cache are separate things. Thus, it is unclear what our "minimal build tree" means w.r.t. caching, i.e. do we compute the minimal tree w.r.t the images in the database, or w.r.t. to the cached data?
Ideally, we should pick a principled and intuitive approach that is also not too far from what Docker and BuildKit do, so that it was realistic to implement and maintain.
@barr , @maowtm , @thevirtuoso1973 , any thoughts?
I've been playing around with this for a bit since we basically have all the core datalog logic stuff ready, and I made this thing:
output(S) :- run(S).
entry(S) :- anbncn(S), output("Accept!").
anbncn("") :- .
anbncn(S) :- abc_split(S, A, B, C), streq(A, B), streq(B, C).
streq(A, B) :- string_concat("", A, B).
charne("a", "b").
charne("a", "c").
charne("b", "c").
charne(A, B) :- charne(B, A).
abc_split(S, A, B, C) :-
count_chars(S, "a", A, BC),
count_chars(BC, "b", B, CC),
count_chars(CC, "c", C, "").
# normal route
count_chars(S, C, O, R) :-
string_concat(C, RR, S),
count_chars(RR, C, O2, R),
string_concat("1", O2, O).
# other char
count_chars(S, C, O, R) :-
charne(C, C1),
string_concat(C1, RR, S),
streq(O, ""),
streq(R, S).
# empty case
count_chars("", C, O, R) :-
streq(C, C),
streq(O, ""),
streq(R, "").
The same strategy can be used to turn any turing machine into a modus program that takes entry(S)
for some string S
, and output a docker image iff TM accept (just use the string as a tape, have something like TM(q, tape_left, tape_current, tape_right)
). This means that the claim that this language is not Turing complete in the doc repo is actually false.
It looks like there has to be a parenthesis i.e. (from(...))::set_...
. Was this deliberate/can this be relaxed?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.