Giter Club home page Giter Club logo

distribution's Introduction

Build Status GoDoc License: Apache-2.0 codecov FOSSA Status OCI Conformance OpenSSF Scorecard

The toolset to pack, ship, store, and deliver content.

This repository's main product is the Open Source Registry implementation for storing and distributing container images and other content using the OCI Distribution Specification. The goal of this project is to provide a simple, secure, and scalable base for building a large scale registry solution or running a simple private registry. It is a core library for many registry operators including Docker Hub, GitHub Container Registry, GitLab Container Registry and DigitalOcean Container Registry, as well as the CNCF Harbor Project, and VMware Harbor Registry.

This repository contains the following components:

Component Description
registry An implementation of the OCI Distribution Specification.
libraries A rich set of libraries for interacting with distribution components. Please see godoc for details. Note: The interfaces for these libraries are unstable.
documentation Full documentation is available at https://distribution.github.io/distribution.

How does this integrate with Docker, containerd, and other OCI client?

Clients implement against the OCI specification and communicate with the registry using HTTP. This project contains a client implementation which is currently in use by Docker, however, it is deprecated for the implementation in containerd and will not support new features.

What are the long term goals of the Distribution project?

The Distribution project has the further long term goal of providing a secure tool chain for distributing content. The specifications, APIs and tools should be as useful with Docker as they are without.

Our goal is to design a professional grade and extensible content distribution system that allow users to:

  • Enjoy an efficient, secured and reliable way to store, manage, package and exchange content
  • Hack/roll their own on top of healthy open-source components
  • Implement their own home made solution through good specs, and solid extensions mechanism.

Contribution

Please see CONTRIBUTING.md for details on how to contribute issues, fixes, and patches to this project. If you are contributing code, see the instructions for building a development environment.

Communication

For async communication and long running discussions please use issues and pull requests on the github repo. This will be the best place to discuss design and implementation.

For sync communication we have a #distribution channel in the CNCF Slack that everyone is welcome to join and chat about development.

Licenses

The distribution codebase is released under the Apache 2.0 license. The README.md file, and files in the "docs" folder are licensed under the Creative Commons Attribution 4.0 International License. You may obtain a copy of the license, titled CC-BY-4.0, at http://creativecommons.org/licenses/by/4.0/.

distribution's People

Contributors

aaronlehmann avatar ahmetb avatar aibaars avatar andreykostov avatar brianbland avatar caervs avatar crazy-max avatar crosbymichael avatar deleteriouseffect avatar denverdino avatar dependabot[bot] avatar dmcgowan avatar jamstah avatar jlhawn avatar joaodrp avatar lebauce avatar mdlinville avatar milosgajdos avatar richardscothern avatar runcom avatar samalba avatar scjane avatar shin- avatar stevvooe avatar svendowideit avatar thajeztah avatar tiborvass avatar tt avatar vieux avatar wy65701436 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

distribution's Issues

Allow for configurable path prefix

The registry should not assume it's running at the root of the server.

eg: both https://myregistry/v2/ and https://myregistry/something/v2/ should be possible.

The typical scenario when this is happening is a registry being run through a corporate portal that would "mount" the registry backend on a subpath.

This remark is valid as well for client code to be written.

See docker-archive/docker-registry#284 for historical reference. Supersedes docker-archive/docker-registry#818.

This can be accomplished by adding a configuration section to the "http" configuration section:

HTTP struct {
    ...

    // Prefix specifies a prefix to mount the API urls at for the registry instance. For example,
    // instead of responding to paths at "/v2/...", the registry instance will respond to paths at 
    // "<prefix>/v2/...".
    Prefix string
}

Uploads should not spool to local disk

Right now, blob uploads are being spooled to local disk, requiring that uploads are always routed to the same host.

Considerations:

  1. A request that modifies an upload should be able to be served by any registry host in a cluster.
  2. Digest verification should avoid unnecessary round trips when reading data. This can be addressed by resumable digests or something similar.
    #10 should be fixed before fixing this problem.

Storage driver tests invalid offset boundary

Storage driver test suite's TestReadStreamWithOffset method tries to make a ReadStream call with offset=size and expects to get a reader of size=0 and EOF. [source]

This is not accurate. If a file is 10 bytes, there is no offset=10 for this path. Practically, asking for offset=10 and offset=20 are the same thing for a file of 10 bytes, it does not exist. Therefore ReadStream(path, 10) should be an InvalidOffsetError.

I can quickly fix in the storagedriver implementation with an if clause but that shouldn't be the fix as there's a semantic mistake with the boundaries here. We've discussed and fixed a similar bug in the past. docker-archive/docker-registry#787

LMK if you agree and then I can send a PR. cc: @stevvooe @BrianBland

Define backend lifecycle behavior in registry webapp

Currently, the lifecycle of a storagedriver is very adhoc. docker-archive/docker-registry#354 described some desired behavior, indicating an instance should be able to start, even if the backend is down or inaccessible. We need to define what the behavior is and what failures are seen when the backend is down. This may require a health check endpoint.

This issue supersedes docker-archive/docker-registry#354 and should take inspiration from the discussion there.

Registry next generation

From @dmp42 on October 7, 2014 19:14

Dear community (on top of the head: @wking @bacongobbler @noxiouz @ncdc @proppy @vbatts and many others - also @shin- @samalba @jlhawn @dmcgowan ),

In a shell

Work is starting to design an entirely new "registry" - meaning new storage driver API, new image format, new http API, new architecture (eg: relation to other services), new docker engine code, and finally new technology for the service.

If you haven't seen it yet, there is a proposal for the new image format allowing "signing" there: moby/moby#8093 that triggered and fuels this desire for change.

Reading it will give you a hint on the envisioned new image format from an engine perspective.

Below, I'll try to cover all bases in a Q&A fashion. Please comment if you have more questions. If you have ideas and suggestions, you can open tickets with a title like "NG: Fantastic Idea".

Holy c! What will happen to the registry as we know it?

As a part of the docker infrastructure, the existing "V1" registry will continue to be used on production servers for the foreseeable future, delivering V1 images to V1-only docker engines (< 1.4) and "both-ways" engines (>=1.4,<2?). It might eventually be replaced by a V2 registry with a reimplementation of V1 endpoints, but that remains to be seen, since that would be a rather dull task.

As an open-source project, I'll continue to steward it and will merge interesting work and fixes from the community, and we will continue to provide security releases if need be, but it's unlikely major new features or changes will happen.

I feel that it has now reached its full "maturity" (for better or worse), and that the new extension mechanism we merged in 0.9 opens room enough to everyone to keep doing interesting things with it while the core of it will enter "maintenance" mode.

Thus registry 1.0 will be the last (and final, IMO) major release of "V1", that will likely be maintained (like I said) for at least a full year.

You said, "new technology"?

Yes. The new registry will be developed in go instead of python.

The reasons for that are:

  • reduce the "gap" inside the community and build on a common technology, using common libraries (libtrust and @dmcgowan, I'm looking at you)
  • thus easing integration test with the rest of the platform, etc
  • start with a clean slate
  • bet on a language that has a good concurrency model from the get go - no pun - https://golang.org/doc/effective_go.html#concurrency
  • while python is a robust, mature and well established technology (stack), it really starts smelling funny in a number of places - some young blood / fresh air will do us all good :)

Starting from scratch sure has its downsides, and I can't say I'm happy ditching the accumulated experience with V1/python (especially all the good work done on drivers), but in the end it's a reasonned choice, and I believe the benefits out-weight the downsides.

Why oh why change? ... the storage format

We want image signing capability. We believe we can't have it without an image format change (content addressable ids for a start).

Furthermore, the current storage format has terrible shortcomings:

  • it's hard to garbage collect
  • it has a long history of security issues
  • it's awkward to understand and use
  • it consumes space
  • it breaks too easily
  • it's not versioned, or extensible
  • it's impossible to map that format to a purely "static" delivery service
  • hence it's not possible to envision radically different distribution channels (bittorrent, filesystem, etc)

The new image format drastically simplifies the concepts:

  • an image is a json file, with a mandatory, namespaced name, a list of tarsums (eg: content-addressable layers ids), some opaque metadata, a signature
  • a layer is a binary blob, mapping to a tarsum

Exit "ancestry" (now implicit from the order of layers inside the image "manifest").
Exit "layers are images are layers".
Exit "layer json" etc.

Backward compatibility is a requirement, so, it's likely the V2 registry will be able to "generate" V1 content as well on the backend storage. Generating V2 content from V1 datastores should also be possible (might be provided by third-party scripts).

Why oh why change? ... the rest API

The current API ties to the format, and shares most of its defects (awkward, needlessly complex, not "static-able").

Also, the authentication model and relation to other bricks I consider "broken" (given how difficult it is to use/implement for most people).

The new API will be much simpler, with only a couple endpoints.

GET/PUT image manifest

PUT link layer into image

PUT layer

GET layer from image

GET list tags

And the GET part will make it super-easy to deliver the payload through a simple "static" http server.

We hence expect cache mirroring (for example) to be much more simple.

As far as authentication is concerned, the plan should be standardizing on JWT/OAuth.

Why oh why change? ... the technology - I mean, man, that really sucks, python is so cool and I barely started understanding the codebase

Change is good, man.

New things, new adventures! Be a part of it!

Why oh why change? ... the drivers API

The drivers API was never really "designed".

There was an initial interface that eventually grew organically, then was then ripped out of the registry to provide a basis for third-party drivers implementors.

It does bear the scars of its history (eg: it's butt-ugly for one thing).

The new interface will likely be way more concise and clean.

What I can think of for now is something like:

write_stream
read_stream
put
get
list
mv

Given go nature, we need to figure out what's the best way to make drivers standalone (eg: without the need to recompile the registry to use a new one).

Also, I definitely want push-resume support in there (S3 does support that, though we don't exploit it right now).

These are the two challenges that face us.

Any other cool ideas on the driver side of things, please jump in (thinking specifically about you @noxiouz and @bacongobbler).

New extensions model?

It took us a year to finally come-up with a decent extension mechanism for the V1 registry (on top of signals).

I strongly believe that good extensibility is what will make the new registry cool, and I would love to have it, well thought, from the very first version of registry V2.

Again, given go nature, we can't have dynamically (runtime) loaded standalone extensions, so, we need to figure out something also there.

HTTP based communication is fine by me (in a micro-services world), and also elegantly solve scalability and delegation problems.

Here as well, ideas are welcome :).

Do you say the previous registry was just crap entirely?

No. It did serve its purpose well, parts of it are really cool, I enjoyed stewarding it a lot, and I really think the most awesome part of it is the nascent community around it.

Now, it's not ready for the future, which is why we need to move on.

Wait! You have it all figured out?

No, not yet.

The vision is there.

We know the shortcomings.

And we did all the errors.

But it remains to be designed and built, and I want this process to happen with the community, capitalizing on the good vibe we had these past months.

So, how does this work?

I'll start a V2 (or next-gen) branch soon, so that development happens in the open and PRs can be merged, and will bring in more manpower to contribute the "foundations" (research is going on for S3 and filesystem drivers).

The plan is to figure out ASAP:

  • the drivers interface and model soon enough so that drivers author can jump on it and dogfood it
  • the extension model
  • the HTTP API

so that we can move on the actual implementation and let the community get crazy with extensions.

Also, if you have desires, wishes, ideas, please submit a ticket here, starting with "NG: " in the title. I don't think we need this to be too formal to start with - so let see how this goes.

If you want to be more involved than that, then you can definitely help with answering / triaging said tickets, or go ahead with fully-fleshed proposals and PRs (proposals can be PRs themselves I guess? do we need to be formal on that?).

Thanks again community, for it has been a very good journey so far, and I'm confident the next one will be even more awesome!

Copied from original issue: docker-archive/docker-registry#612

Registry metadata endpoint at root

In order to aid in client development supporting both v1 and v2 registries, specify a metadata endpoint which provides information related to the capabilities of the registry, which versions are supported, and what the mirror options are. The metadata endpoint would live at the root registry URL and return a JSON configuration value. For example an image with the name "registry.example.com/myimage" would have a metadata endpoint at https://registry.example.com/ and could have a v2 supported registry at https://registry.example.com/v2/.

The JSON value will contain a section for each version of the registry which is supported. The values for each version is defined by the version since many of the keys do not apply to each version. If the metadata endpoint cannot be loaded, then the client should assume a default configuration which is v1 only, no mirrors, and the index is at the endpoint + "/v1/".

JSON value

{
   "endpoints": {
      "v2": {
         "version": "2"                   // Version of registry
         "root": "https://endpoint/v2/",  // V2 root endpoint (default: /v2/)
         "push": true,                    // Whether push is supported or pull-only
         "fallback": "v1",                // Where to fallback on manifest 404 (default: no fallback)
         "mirrors": [                     // Pull mirrors
            {
               "root": "https://mirror/v2/", // Mirror endpoint
               "manifests": true             // Whether mirror contains manifests
            },
         ],
      },
      "v1": {
         "version": "1",                      // Version of registry
         "index": "http://indexendpoint/v1/", // V1 index endpoint (default: /v1/)
         "push": true,                        // Whether push is support or pull-only
         "mirrors": [                         // Layer mirrors (index mirroring not supported)
            {
               "root": "https://mirror/v1/"
            }
         ]                        
      }
   }
}

edit 1: Updated JSON format to add top level key for endpoints, mirrors as array of objects, fallback a pointer instead of a boolean, and version number

Webhook notifications

We need quite soon an asynchronous way to notify (remote) third-party code that "events" are happening (pull, push start, push end? pretty much).

One use case is to notify the docker hub and index of events from the registry. Other use cases are enumerated in docker-archive/docker-registry#689, which is superseded by this issue.

Proposal

To implement this feature, we will notify listeners via http messages when a given event happens. Reliable, at-least-once delivery will be attempted until a 2xx response is received.

Notification listeners will be configured in the registry configuration file. Authenticity for a listener can be controlled by TLS certificates if an https endpoint is defined. The configuration may support other sinks besides http/https.

A summary of events in the initial version are the following:

  • Manifest Pushed
  • Manifest Pulled
  • Manifest Deleted
  • Blob Pushed
  • Blob Pulled
  • Blob Deleted

The following sections cover some of the proposed events and an example of what that data might look like. The protocol is subject to change and we are unlikely to lock it down for the beta release.

Actor and Source

The event notifications should include a source and actor, created from the request and authorization context of the request. The actor should indicate which authorized user initiated the event action and any request context (request id, etc.). The source should identify the registry system and node from which the event originated.

Registry Notification Types

Manifests

Manifest Pushed

  • Name
  • Tag
  • Manifest Hash
  • Current Time
  • Account (If Known)
Example
{
    "name": "library/ubuntu",
    "tag": "14.04",
    "manifestHash": "sha256:2c39bdf1854ced6871b52452cbe8833edda34c7f299583567b400cecf2998837",
    "currentTime": "2015-01-13T00:15:28.469328",
    "url": "https://registry.docker.com/v2/library/ubuntu/manifests/14.04",
    "account": "tianon"
}

Manifest Pulled

  • Name
  • Tag
  • Manifest Hash
  • Current Time
  • Account (If Known)
Example
{
    "name": "jlhawn/crawler-webserver",
    "tag": "v3.1.4-a159",
    "manifestHash": "sha256:941e74dc593fc8976122c316647048399fa0b490b858769739924eea5ea14334",
    "currentTime": "2015-01-13T00:18:05.236706",
    "url": "https://registry.docker.com/v2/jlhawn/crawler-webserver/manifests/v3.1.4-a159",
    "account": "samalba"
}

Manifest Deleted

  • Name
  • Tag
  • Manifest Hash
  • Current Time
  • Account (If Known)
Example
{
    "name": "dmcgowan/fresh-baked-cookies",
    "tag": "chocolate-chip",
    "manifestHash": "sha256:4f98543ac6f792c1352e8c5a0555601ab3fd3497810581606c36c7416c5ffe37",
    "currentTime": "2015-01-13T00:18:54.781669",
    "url": "https://registry.docker.com/v2/dmcgowan/fresh-baked-cookies/manifests/chocolate-chip",
    "account": "dmcgowan"
}

Blobs

Blob Pushed (Upon conclusion)

  • Name
  • Blob Hash
  • Current Time
  • Account (If Known)
Example
{
    "name": "dmp42/portrait-landscape",
    "blobHash": "sha256:bb40a9ef6deed88cb1de6a50b6e2df6d301f52374805b98ca77b819b0f3ed3fa",
    "currentTime": "2015-01-13T00:21:58.110908",
    "url": "https://registry.docker.com/v2/dmp42/portrait-landscape/blobs/sha256%3Abb40a9ef6deed88cb1de6a50b6e2df6d301f52374805b98ca77b819b0f3ed3fa",
    "account": "goforgopher"
}

Blob Pulled (Upon conclusion)

  • Name
  • Blob Hash
  • Current Time
  • Account (If Known)
Example
{
    "name": "stevvooe/fresh-powder",
    "blobHash": "sha256:d712d5f15eda3abf569f3423714ad10d065d43421ad9efeeb9f6a1f6113ce0a8",
    "currentTime": "2015-01-13T00:23:15.083209",
    "url": "https://registry.docker.com/v2/stevvooe/fresh-powder/blobs/sha256%3Ad712d5f15eda3abf569f3423714ad10d065d43421ad9efeeb9f6a1f6113ce0a8",
    "account": "mikelipz"
}

Originally proposed by @jlhawn from https://gist.github.com/jlhawn/1c177252d343ae042e09.

V1 and V2 compose configuration

Create a fig configuration for running a v1 and v2 registry behind nginx.

  • Nginx dockerfile and configuration (or use volume)
  • V1 docker instance (registry:latest) and link
  • V2 docker instance (??) and link

Storage driver TestStatCall modtime granularity

Currently tests for Stat() call require modtime to be exactly to the same be with the current time. On cloud machines, that simply can't be the case and clocks can be several seconds off.

    if start.After(fi.ModTime()) {
        c.Errorf("modtime %s before file created (%v)", fi.ModTime(), start)
    }

    if fi.ModTime().After(expectedModTime) {
        c.Errorf("modtime %s after file created (%v)", fi.ModTime(), expectedModTime)
    }

Even if the time spent on the wire gets longer, the test case would simply fail:
I was testing this with azure driver and encountered a failure due to one-second off, here's that:

c.Errorf("modtime %s before file created (%v)", fi.ModTime(), start)
... Error: modtime 2015-01-16 20:10:02 +0000 UTC before file created (2015-01-16 12:10:03 -0800 PST)

This should probably have some ษ›=ยฑ5s around the modtime value, because the point is to verify mtim works and sane. In this case even ยฑ60s should be ok. How about:

    if start.Add(time.Second*epsilon).After(fi.ModTime()) {
        c.Errorf("modtime %s before file created (%v)", fi.ModTime(), start)
    }

    if fi.ModTime().After(expectedModTime.Add(time.Second*epsilon)) {
        c.Errorf("modtime %s after file created (%v)", fi.ModTime(), expectedModTime)
    }

lmk if you agree and I can send the patch. cc: @stevvooe

Update StorageDriver IPC subsystem for storage driver changes in docker/docker-registry#820

IPC Subsystem build has been disabled after PR docker-archive/docker-registry#820. There are two issues that led to this:

  1. The initial implementation of the new API methods caused the IPC system to hand on certain calls.
  2. docker-archive/docker-registry#814 needs to be fully worked out before the IPC implementation is updated.
  3. Time type is not support by libchan (docker/libchan#75 or docker/libchan#79).

Once these are resolved, we can reintegrate the IPC support.

This issue supersedes docker-archive/docker-registry#831

Document V1 and V2 setup

Add documentation for setting up a V1 and V2 registry behind a proxy such as nginx. In addition provide example configurations for nginx.

Update circle.yml to use godep for builds

The build system should use the vendored dependencies and commits should start including updates to dependencies. This will require some careful changes to circle.yml.

This needs to be done before we merge #91.

Support for layer federation

The registry should be able to serve some layers from a different server, if directed to do so. For example, an ISV builds on a Red Hat base image. The ISV layers are served from cdn.isv.com and the Red Hat layers are served from cdn.redhat.com.

Please see docker-archive/docker-registry#662 for some background information on this issue.

Pull

When pulling the layers associated with an image manifest, there should be enough information in the manifest to indicate the specific URL for each layer. For example:

$ docker pull isv/app
bef54b8f8a2f <- served from cdn.redhat.com
8da983e1fdd5 <- served from cdn.isv.com

Push

When pushing an image manifest, there should be some way to indicate different URLs for different layers. If you previously pulled a manifest with that information, it must be retained. It should also be possible to create a new manifest and alter the URL of certain layers. When pushing a layer whose URL differs from that of the registry, the registry should store the metadata about the layer, but the actual content of the layer would not be transmitted (it is assumed to already exist at the specified URL).

Use custom media types for request and responses

Custom media types will make it easier for us to iterate on request and response formats for API methods. It will also make migrating to new formats and maintaining backward compatibility much more straightforward.

Implementation plan:

  1. Experiment with media types in webhook implementation #42. For this, we'll try to use application/vnd.docker.distribution.events.v1+json for the event format.
  2. Set baseline use of application/json to refer to the current specification.
  3. Define custom media types for other formats (manifest, layers, etc.). Here are some possible examples:
    • Layer: application/vnd.docker.distribution.layer+tar
    • Manifest: application/vnd.docker.distribution.manifest.v1+json
    • Signed Manifest: application/vnd.docker.distribution.manifest.v1+prettyjws

Registry 500-out

Using samalba/docker-registry and pushing using 1.5RC I can trigger 500 consistently.

Unfortunately, there is very little information available in the registry logs save the access log itself:

192.168.100.68 - - [23/Jan/2015:18:03:42 +0000] "PUT /v2/dmp42/testthing/blobs/uploads/86925eee-5116-4101-9921-73df5e9b7de7?_state=OhwhFPtmp1PHSaoeklO8z2yiwS5-OiMYphRR7TG8DNh7Ik5hbWUiOiJkbXA0Mi90ZXN0dGhpbmciLCJVVUlEIjoiODY5MjVlZWUtNTExNi00MTAxLTk5MjEtNzNkZjVlOWI3ZGU3IiwiT2Zmc2V0IjowLCJTdGFydGVkQXQiOiIyMDE1LTAxLTIzVDE4OjAzOjQxLjU3OTkzMjgzOFoifQ%3D%3D&digest=tarsum.v1%2Bsha256%3A974f176a8980209133a39750d7a3fd06f3addd572303e41819a38b089b89b154 HTTP/1.1" 500 84 "" "docker/1.5.0 go/go1.4.1 git-commit/7e803ba kernel/3.8.0-29-generic os/linux arch/amd64" 

Reorganize package layout of repo

The current repo layout is organized around being "the registry". This means that the main handlers and webapp functionality are in the root package. This organization does not seem compatible with the current plans. I propose we do the following:

  • #26 Carve out a distribution package in the repository root. Definitions for distribution related services and objects should make up this package as we build out this project.
  • #26 Move the web application components to a registry package.
  • #24 Break the common package down into constituent packages. These kinds of packages are never a good idea and end up being a ghetto. For example, this package includes regular expressions, tarsum parsing and a stringset, which are unrelated. No reason to group these just because they don't belong elsewhere.
  • #21 Create a package for dealing with manifests and their signatures. Currently, this is in storage. This needs to be a discrete package, allowing one to create, sign and verify manifests. I am of the opinion that this should be owned by distribution and should be part of the interface to docker core.

I'm sure this is incomplete, so suggestions are welcome.

Store manifest signatures separately from content

To allow for content-addressable manifests, we'll need to store signatures separately from the targeted content. If multiple signatures are available for a given manifest, they should be merged.

This issue is part of moving towards immutable manifest ids and there are other changes that need to take place to receive the benefit. Discussion is in docker-archive/docker-registry#804.

Immutable image manifest references

After discussion in moby/moby#9015 and docker-archive/docker-registry#804, its clear that we need support for immutable references to v2 image references.

Here are the following conditions for support, from #804.

  1. For the initial version, the manifest id is controlled by the registry. The manifest id should be returned as part of the response to a manifest PUT, in addition to a Location header with the canonical URL for the manifest (ie /v2/<name>/manifests/<tag>/<digest>).
  2. The "digest" of the manifest is the sha256 of the "unsigned" portion of manifest, with sorted object keys. The id is only calculated by the registry. This is dependent on #25, allowing us to merge signatures from separate pushes of identical content.
  3. PUT operations on the manifest are no longer destructive. If the content is different, the "tag" is updated to point at the new content. All revisions remain addressable by digest. Conflicting signatures are stored separately.
  4. The DELETE method on /v2/<name>/manifests/<tag> should be clarified to delete all revisions of a given tag, whereas DELETE on /v2/<name>/manifests/<tag>/<digest> should only delete the revision with the request digest.

The following are the tasks required to accomplish this:

  • API Specification must be updated with the following endpoints:
Method Path Entity Description
GET /v2/<name>/manifests/<tag>/<digest> Manifest Fetch the manifest identified by name, tag and digest.
DELETE /v2/<name>/manifests/<tag>/<digest> Manifest Delete the manifest identified by name, tag and digest
  • Add a header to /v2/<name>/manifests/<tag> that includes the manifest id.
  • Registry implementation must support new API methods that qualify reference with id and backend must store revisions.
  • Registry client PR (moby/moby#10740) needs to updated with support for pulling qualified image references in the following format:
docker pull <image>:<tag>@<id>

Potential eventual consistency race condition on layer upload

When using an eventually-consistent storage backend (S3), it is possible to receive a 500 error when uploading a layer. The error that the registry returned to the http client was as follows, which happened after uploading the entire contents of the layer:

UNKNOWN: [Path not found: /docker/registry/v2/repositories/test/mysql/uploads/9114e61d-093e-4d3e-aedd-f64c8850db8b/data]

This indicates that this was a read-after-write consistency issue when trying to verify the spooled upload data before moving it to its final location on the backend storage. Even if we were to have pre-computed the checksum, this would fail to move for the same reason with this timing condition.

We should determine the proper semantics for handling this sort of case, as is happens rarely on S3, but often enough for me to have reproduced it several times in the last few days.

Test that all regions work

All available AMZ regions should be supported, and this should be tested against the test buckets that were made available earlier.

Proposal - Support content redirects for blob downloads

To reduce the amount of downstream bandwidth congestion for the registry, we should optionally support content-redirects to the underlying storage system for blob download requests. For instance, when using S3 for object storage, we can redirect the client to a temporary signed S3 download URL, removing the need to transfer data through the registry itself.

Because a download URL is strongly tied to where the file itself is stored, we need to add a method to the storagedriver.StorageDriver interface which can return a static URL path for a given blob path. I'm proposing we add this in the form of URLFor(path string) (string, error), which will return a static download URL which is expected to be valid for at least a short period of time. Because not all storage systems (local filesystem as a basic example) do not support URL referencing, we must also add a new error type: UnsupportedMethodErr, which may also be used for future optional extensions.

This unfortunately gets a bit hairy when we consider alternative static URL providers (CDNs), such as CloudFront, which in particular generates signed URLs based on the path of the hosted content. For s3+cloudfront, the CloudFront URL provider would need to know the base path at which the files are stored in the S3 bucket, which is a configuration parameter of the S3 storagedriver itself.

One solution here is to add an optional LayerHandler subsystem, which can take a storage.Layer and serve its contents via a Resolve(layer storage.Layer) (http.Handler, error) method. In the case of s3+cloudfront, the CloudFront LayerHandler constructs the S3 object URL and translates it to a CloudFront URL then serves a 307 Temporary Redirect with a Location header referencing the new URL. An example translation would look like https://mybucket.s3.amazonaws.com/myobject?AWSAccessKeyId=ACCESSKEY&Expires=EXPIRATION&Signature=S3SIGNATURE -> https://mycloudfrontsubdomain.cloudfront.net/myobject?Expires=EXPIRATION&Signature=CFSIGNATURE&Key-Pair-Id=CFKEYPAIRID, in which the CloudFront URL generator utilizes only the path from the S3 URL (or any URL, this isn't specific to S3). The LayerHandler would be configured with its own optional name/parameters map, similar to to the storagedriver and auth systems. When not provided, no alternate Layer serving will be performed.

There is one addition to the configuration struct required for this change:

  • Add a field to the HTTP section for LayerHandler.
    • Valid options are delegate (use the underlying storagedriver with no translation) and cloudfront for now. If either of these are specified, the registry will use URL redirects for serving blobs.
      • For cloudfront, the following parameters must be supplied: base, keyid, and keysecret, just as in the python registry.

tl;dr Making the following additions:

// New storagedriver error type
// UnsupportedMethodErr may be returned in the case where a StorageDriver implementation does not support an optional method.
var UnsupportedMethodErr = errors.New("Unsupported method")

type StorageDriver interface {
    // New method
    // URLFor returns a URL which may be used to retrieve the content stored at the given path.
    // May return an UnsupportedMethodErr in certain StorageDriver implementations.
    URLFor(path string) (string, error)
}

// New interface
type LayerHandler interface {
    // Resolve returns an http.Handler which can be used to serve the contents of the given storage.Layer.
    Resolve(layer storage.Layer) (http.Handler, error)
}

This should solve #37 and #38

Add standardized error handling for extraneous registry configuration options

Because many components of the registry have variable configuration options (auth, storage, layer serving, etc), we currently accept arbitrary parameters in the form of a map[string]interface{}. While this is useful for extensibility and future plugins, it's very easy to accidentally add extra meaningless configuration options or simply misspell optional parameters, which are then just ignored.

I'm formally proposing (from @stevvooe's suggestion in #67) that we have a standardized error behavior/message when unexpected options are provided for any of these components with optional/variable parameters, to reduce the chance of confusion that the current behavior may cause. This should probably cause the registry to fail to start.

Backend administration tools for registry

We should include a few administration tools for verifying registry backend data and overall health. How these will run and work is not important, but we need to do the following:

  1. Scan and read blob data, verifying that blobs store the data that they declare is contained.
  2. Scan and traverse various data links, ensuring integrity amongst the links.
  3. Scan upload directories and delete uploads that are old or abandoned.
  4. An fs cli tool that let's one list and explore the backend, as seen by the registry.

These might be best a crontab jobs or backend registry jobs that just run as part of the app. Some are data intensive (such as data verification), so those would need to be carefully coordinated.

Add contextual logging to registry app and storage

After standardizing on logrus, we haven't fully utilized its functionality to provide performance monitoring and debugging. I have a stashed commit that adds request information and performance metrics that can introduce the concept.

This issue will be complete when:

  • UUID generated for each individual request injected into logging context.
  • Logging is contextually integrated with web application and storage backend.
  • Tools and libraries are present to make it easy to add logging to various parts of the application (ie integrate net.Context).
  • Event oriented runtime metrics communicated via log messages

This supersedes docker-archive/docker-registry#635.

Error on Check Registry Connection

I check Registry V2 API.
cmd: curl -X GET -k http://myregistry:port/v2/
response: {}
container logs: "GET /v2/ HTTP/1.1" 200 2 "" "curl/7.35.0"
I'm not sure if it is correct.

However, error response happens,
cmd: curl -X GET -k http://myregistry:port/v2/_ping
response: 404 page not found
container logs: "GET /v2/_ping HTTP/1.1" 404 19 "" "curl/7.35.0"

If I use wrong command, or I misunderstand API SPEC?
thanks a lot!

Support SIGV4 for S3

We need to support AMZ authentication version 4 in order to use frankfurt (and we should use it as a default everywhere either way).

It should be possible to simply disable SIGV4, but this should not be documented.

Support secure on S3

It should be possible to toggle on/off secure transport to the S3 backend driver (should be off by default if there is a huge perf impact).

Also needs documentation in the README about how to configure that.

Upload store must support multi-host registry instances

Currently, the implementation of uploads uses the local file system to store state. This was done to move the implementation along before setting too many details in stone and to validate the API.

This is terrible and must be fixed before releasing a serious implementation of the registry.

There are two options for implementing state storage:

  1. Store state in the Location header of upload-related requests. Something like <host>/v2/<name>/blobs/uploads/<uuid>?_state=<hmac + urlbase64 encoded:{UUID: <uuid>, Name: <name>, Offset: 0}> would be appropriate, given that these are opaque to the client. This is desirable in that we remove the need for shared state, allowing the registry to scale better.
  2. Store the upload state in redis. This will support more sophisticated coordination at the cost of scalability.

I'm in favor of 1, but we may implement both in the future. We should compare and contrast the benefits of each approach.

Radar

This is a catch-all issue to track other issues in different projects that are (possibly?) relevant:

Support for pull-through cached mirroring

The registry should be able to operate as a pull-through mirroring cache. This means that if a pull operation cannot proceed locally due to missing content, the registry should be able to defer the request to an upstream registry.

Please see docker-archive/docker-registry#658 for some background information on this issue.

Pull

When pulling a manifest, if the content is available locally, it will be served as is. Optionally, the local may check the remote if the content has been updated, using conditional http requests and update the local content. If it is not available locally, the request should be forwarded to the remote registry. If the remote request proceeds, the manifest should be stored locally then served in response to the local request.

When pulling a blob, if the content is available locally, serve it as is. If not available locally, forward the request to the remote registry. If the data is directly available, the data should be forwarded to the client and stored locally, concurrently. If the remote issues a redirect, the local registry should download the data into the local cache and serve the data directly.

Push

All push operations are only attempted on the "local" registry. If they fail, they will not be forwarded to the remote registry.

Open Questions
  1. How should authorization behave? Should credentials be forwarded or should the proxy client obtains its own credentials?
  2. Should we allow one to configure an ACL for outgoing remote requests?

Metadata endpoint support in engine

Update the engine registry client to use the new metadata endpoint specification. This allows for simplifying the engine-side logic for configuring the registry endpoints.

See #80 for specification

Port Next-Generation Issues from docker/docker-registry

To ensure that we aren't missing any functionality, we need to port issues from docker/docker-registry. Unfortunately, this will be a manual process but we can track it with this ticket. For issues listed here, we need to do one of the following:

  1. Port the issue verbatim to this repo, linking back to the closed original.
  2. Write a proposal in docker/distribution, linking back to the original in docker/docker-registry.
  3. Close the original.

Here is the list of issues current tagged as next-generation:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.