grailbio / bigmachine Goto Github PK
View Code? Open in Web Editor NEWBigmachine is a library for self-managing serverless computing in Go
License: Apache License 2.0
Bigmachine is a library for self-managing serverless computing in Go
License: Apache License 2.0
Please only consider this if you've received interest from others (and they support this). I'm mostly undertaking this as a pet project and don't wish to unduly burden you for my sake.
I've been away for 10 days and, returning to bigmachine, I'm having to rely upon extensive debugging statements in an ongoing attempt to try to grok the underlying mechanism of the solution. I still only have an admittedly loose grasp of this and am fumbling through.
The GCE implementation is able to create remote (containerized) nodes and, I believe, basic (HTTP non-TLS) RPC is working. I'm challenged debugging (particularly go routines) and because I don't have a good overall perspective.
Is|Are there (a) state diagram(s) for bigmachine? I think there are 2-3 different diagrams that are of interest:
CoreOS Container Linux is at end-of-life. Flatcar is the obvious replacement.
TestMutualHTTPS
appears to be flaky. It causes many builds to fail with one of the two following messages.
--- FAIL: TestMutualHTTPS (5.19s)
ec2machine_test.go:92: bad error Get https://localhost:45725/: dial tcp 127.0.0.1:45725: connect: connection refused
FAIL
FAIL github.com/grailbio/bigmachine/ec2system 5.871s
2019/11/24 04:23:17 http: TLS handshake error from 127.0.0.1:47642: tls: failed to verify client's certificate: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "bigmachine")
--- FAIL: TestMutualHTTPS (3.83s)
ec2machine_test.go:92: bad error Get https://localhost:45231/: write tcp 127.0.0.1:47642->127.0.0.1:45231: write: broken pipe
FAIL
FAIL github.com/grailbio/bigmachine/ec2system 4.615s
We should have a backend for GCP compute engine.
See package ec2system for inspiration.
The initial release v0.5
does not use SemVer and -- IIUC -- this is problematic for Go Modules:
go get github.com/grailbio/[email protected]
go get github.com/grailbio/[email protected]: no matching versions for query "v0.5"
Whereas getting it by its commit, works:
go get github.com/grailbio/bigmachine@6ab40c5aa9bcc7030513289106a2fa20697ed731
go: finding github.com/grailbio/bigmachine 6ab40c5aa9bcc7030513289106a2fa20697ed731
and:
go list -f "{{ .Dir }}" -m github.com/grailbio/bigmachine
${GOPATH}/pkg/mod/github.com/grailbio/[email protected]
We should have a backend for Azure VMs, see package ec2system for inspiration. Also: #1.
I'm attempting to use bigmachine with your circlePI example, but:
go run main.go
2019/10/04 10:07:44 waiting for machines to come online
2019/10/04 10:07:44 resetting http client https://localhost:46237/ while calling to Supervisor.Ping: temporary network error
2019/10/04 10:07:45 https://localhost:46237/ Supervisor.Ping: succeeded after 1 retries
2019/10/04 10:07:45 https://localhost:46237/: zip: not a valid zip file
2019/10/04 10:07:45 machine https://localhost:46237/ STOPPED
2019/10/04 10:07:45 zip: not a valid zip file
Will try digging into this myself but it's discouraging :-)
I was unable to find this example published in the repo. It would be useful as I could then determine more quickly whether this is my error.
Perhaps "Getting Started..."?
Currently Bigmachine machines are named an opaque "Addr". These are https urls, e.g., https://ec2-.../
. Addrs however do not necessarily uniquely correspond to a logical machine. In systems like AWS EC2, DNS names may be recycled, and thus Addrs need not be stable. The solution to this can be fairly simple: just suffix the URL with some sort of GUID, e.g., https://ec2-.../unique-id
. We should think about whether we want this unique id to have other properties as well.
This is a pointer to the original bigslice issue.
I think it would be interesting to have a Kubernetes backed implementation.
This would provide a more generic solution than per Cloud implementations and could facilitate cross-Cloud deployments too.
The signature of Start
is:
(*System) Start(ctx context.Context, count int) ([]*bigmachine.Machine, error)
Whereas (its converse) Shutdown
is:
(*System) Shutdown()
It feels as though it would be more consistent , if Shutdown
's signature included both context.Context
and []*bigmachine.Machine
, also returning an error.
Even then, bigmachine.Machine
's type does not include a unique ID for the machine (beyond an IP address; often not used as a key), would it make sense to add one?
I'm not retaining the list of machines created by (Start
in) the GCE implementation and so, conversely when asked to Shutdown
, I must first enumerate all the instances that (I think) have been created (I'm doing this by tag, could potentially use IP) and then make a call to delete these.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.