Giter Club home page Giter Club logo

rudolfs's Introduction

Rudolfs

Cirrus CI - Specific Branch Build Status Crates.io AUR version Docker Image Version (latest semver) Docker Image Size (latest semver)

A high-performance, caching Git LFS server with an AWS S3 back-end.

Features

  • Multiple backends:

    1. AWS S3 backend with an optional local disk cache.
    2. Local disk backend.
  • A configurable local disk cache to speed up downloads (and reduce your S3 bill).

  • Corruption-resilient local disk cache. Even if the disk is getting blasted by cosmic rays, it'll find corrupted LFS objects and purge them from the cache transparently. The client should never notice this happening.

  • Encryption of LFS objects in both the cache and in permanent storage.

  • Separation of GitHub organizations and projects. Just specify the org and project names in the URL and they are automatically created. If two projects share many LFS objects, have them use the same URL to save on storage space.

  • A tiny (<10MB) Docker image (jasonwhite0/rudolfs).

The back-end storage code is very modular and composable. PRs for implementing other storage back-ends are welcome. If you begin working on this, please let us know by submitting an issue.

Non-Features

  • There is no client authentication. This is meant to be run in an internal network with clients you trust, not on the internet with malicious actors.

Running It

Generate an encryption key (optional)

If configured, all LFS objects are encrypted with the xchacha20 symmetric stream cipher. You must generate a 32-byte encryption key before starting the server.

Generating a random key is easy:

openssl rand -hex 32

Keep this secret and save it in a password manager so you don't lose it. We will pass this to the server below via the --key option. If the --key option is not specified, then the LFS objects are not encrypted.

Note:

  • If the key ever changes (or if encryption is disabled), all existing LFS objects will become garbage. When the Git LFS client attempts to download them, the SHA256 verification step will fail.
  • Likewise, if encryption is later enabled after it has been disabled, all existing unencrypted LFS objects will be seen as garbage.
  • LFS objects in both the cache and in permanent storage are encrypted. However, objects are decrypted before being sent to the LFS client, so take any necessary precautions to keep your intellectual property safe.

Development

For testing during development, it is easiest to run it with Cargo. Create a file called test.sh (this path is already ignored by .gitignore):

# Your AWS credentials.
export AWS_ACCESS_KEY_ID=XXXXXXXXXXXXXXXXXXXX
export AWS_SECRET_ACCESS_KEY=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
export AWS_DEFAULT_REGION=us-west-1

# Change this to the output of `openssl rand -hex 32`.
KEY=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

cargo run -- \
  --cache-dir cache \
  --host localhost:8080 \
  --max-cache-size 10GiB \
  --key $KEY \
  s3 \
  --bucket foobar

If you just need to use the local disk as the backend, use the following bash.

# Change this to the output of `openssl rand -hex 32`.
export RUDOLFS_KEY=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

cargo run -- --port 8080 local --path=/data

Note: Always use a different S3 bucket, cache directory, and encryption key than what you use in your production environment.

Warning: This server may not be accessible from other machines. Specifying --host localhost:8080 will often bind the server to an internal-only loopback network interface (i.e., if localhost resolves to 127.0.0.1 or [::1]). Thus, to make the server accessible from the outside world, specify --host 0.0.0.0:8080 or just --port 8080 (the default IP the server will bind to is 0.0.0.0). IP 0.0.0.0 means the server shall try to bind to all available network interfaces, both internal and external. See #38 (comment) for more information.

Production

To run in a production environment, it is easiest to use docker-compose:

  1. Create a .env file next to docker-compose.yml with the configuration variables:

    AWS_ACCESS_KEY_ID=XXXXXXXXXXXXXXXXXXXQ
    AWS_SECRET_ACCESS_KEY=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    AWS_DEFAULT_REGION=us-west-1
    LFS_ENCRYPTION_KEY=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    LFS_S3_BUCKET=my-bucket
    LFS_MAX_CACHE_SIZE=10GB
    
  2. Use the provided docker-compose.yml file to run a production environment:

    docker-compose up -d
    
    # use minio yml
    docker-compose -f ./docker-compose.minio.yml up -d
    
    # use local disk yml
    docker-compose -f ./docker-compose.local.yml up -d
  3. [Optional]: It is best to use nginx as a reverse proxy for this server. Use it to enable TLS. How to configure this is better covered by other tutorials on the internet.

Note:

  • A bigger cache is (almost) always better. Try to use ~85% of the available disk space.
  • The cache data is stored in a Docker volume named rudolfs_data. If you want to delete it, run docker volume rm rudolfs_data.

AWS Credentials

AWS credentials must be provided to the server so that it can make requests to the S3 bucket specified on the command line (with --bucket).

Your AWS credentials will be searched for in the following order:

  1. Environment variables: AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
  2. AWS credentials file. Usually located at ~/.aws/credentials.
  3. IAM instance profile. Will only work if running on an EC2 instance with an instance profile/role.

The AWS region is read from the AWS_DEFAULT_REGION or AWS_REGION environment variable. If it is malformed, it will fall back to us-east-1. If it is not present it will fall back on the value associated with the current profile in ~/.aws/config or the file specified by the AWS_CONFIG_FILE environment variable. If that is malformed or absent it will fall back to us-east-1.

Client Configuration

Add a file named .lfsconfig to the root of your Git repository and commit it so everyone is using the same LFS server:

[lfs]
url = "http://gitlfs.example.com:8080/api/my-org/my-project"
              ─────────┬──────── ──┬─ ─┬─ ───┬── ─────┬────
                       │           │   │     │        └ Replace with your project's name
                       │           │   │     └ Replace with your organization name   
                       │           │   └ Required to be "api"
                       │           └ The port your server started with
                       └ The host name of your server

Optionally, I also recommend changing these global settings to speed things up:

# Increase the number of worker threads
git config --global lfs.concurrenttransfers 64

# Use a global LFS cache to make re-cloning faster
git config --global lfs.storage ~/.cache/lfs

License

MIT License

Thanks

This was developed at Environmental Systems Research Institute (Esri) who have graciously allowed me to retain the copyright and publish it as open source software.

rudolfs's People

Contributors

jamesyfc avatar jasonwhite avatar malaporte avatar neo-zhixing avatar notake avatar richardhongyu avatar sercand avatar southclaws avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

rudolfs's Issues

Docker image latest/0.3.1 - panicked at 'there is no timer running...'

In attempting to run rudolfs server via the image uploaded to docker I'm getting the following error:

[+] Running 1/0
 ⠿ Container rodulfs-demo_app_1  Recreated                                                                                          0.0s
Attaching to app_1
app_1  |  2021-06-03T18:43:12.188Z INFO  rudolfs > Initializing storage...
app_1  |  2021-06-03T18:43:12.188Z INFO  rudolfs::storage::s3 > Connecting to S3 bucket '<!--redacted-->' at region '<!--redacted-->'
app_1  | thread 'main' panicked at 'there is no timer running, must be called from the context of a Tokio 0.2.x runtime', /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.25/src/time/driver/handle.rs:24:32
app_1  | note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
app_1 exited with code 101

Here is what I did:

# Go to a working directory
mkdir rodulfs-demo
cd rodulfs-demo

# Get the default docker compose configuration
curl -O https://raw.githubusercontent.com/jasonwhite/rudolfs/master/docker-compose.yml

# Export environment variables required by rudolfs
export AWS_ACCESS_KEY_ID=<...>
export AWS_SECRET_ACCESS_KEY=<...>
export AWS_DEFAULT_REGION=<region of S3 bucket>
export LFS_ENCRYPTION_KEY=<Paste key saved to Secrets Manager>
export LFS_S3_BUCKET=<S3 bucket for LFS storage>
export LFS_MAX_CACHE_SIZE=10GiB

docker compose up

I also tried switching to the 0.3.0 docker image and got the same error. I will try the 0.2.x versions next, but I see I'll need to change my docker-compose.yml file first.

Use docker supplied tini

Noticed this while I was tidying up the code (I'm not trying to be nitpicky, just trying to get familiar with this code base because I plan to use it as a dependency for another project soon and I want to make sure I understand how it works).

rudolfs/Dockerfile

Lines 13 to 19 in 9f2eaad

# Use Tini as our PID 1. This will enable signals to be handled more correctly.
#
# Note that this can't be downloaded inside the scratch container as we have no
# chmod command.
#
# TODO: Use `--init` instead when it is more well-supported (this should be the
# case by Jan 1, 2020).

I was going to take care of this if it is still desirable in your view @jasonwhite

I have a PR ready but it is based on top of #46 so I'll just post the PR as draft till that is merged. No need for more merge resolution noise 😄

Add namespaces?

It might be useful (from an administration perspective) to store LFS objects from different repositories separately.

The API would then look like this:

GET  /api/org/project/object/XXXX
PUT  /api/org/project/object/XXXX
POST /api/org/project/batch
POST /api/org/project/verify

This way, we can more easily prune data and see how much space different repositories are using.

Rate limiting of S3 causes incomplete cached objects

When uploading many objects at once (e.g., >500), we may start hitting some rate limiting errors on the S3 API. This in turn causes the upload to S3 to be cut off mid-stream. This is problematic because the caching layer doesn't handle errors in the upload very well. As the stream of chunks is uploaded by the client, it is split in two; one goes to S3 and the other goes to disk. When the stream is cut off to S3, the disk storage back-end thinks the file upload is complete and saves it accordingly. The next time the user goes to download the same object, the incomplete object is detected and deleted.

Here is the code where the stream is split into two:

rudolfs/src/storage/mod.rs

Lines 145 to 180 in 1c54a8b

/// Duplicates the underlying byte stream such that we have two identical
/// LFS object streams that must be consumed in lock-step.
///
/// This is useful for caching LFS objects while simultaneously sending them
/// to a client.
pub fn split(self) -> (Self, Self) {
let (len, stream) = self.into_parts();
let (sender, receiver) = mpsc::channel(0);
let stream = stream.and_then(move |chunk| {
// TODO: Find a way to not clone the sender.
sender
.clone()
.send(chunk.clone())
.and_then(move |_| Ok(chunk))
.map_err(|e| io::Error::new(io::ErrorKind::Other, e))
});
let a = LFSObject {
len,
stream: Box::new(stream),
};
let b = LFSObject {
len,
stream: Box::new(receiver.map_err(|()| {
io::Error::new(
io::ErrorKind::Other,
"failed during body duplication",
)
})),
};
(a, b)
}

This should be fixed such that each item in the stream gets sent to S3 first and then sent to disk only if it is sent successfully to S3. Errors in either stream should cause the upload to fail for both. This might be tricky to implement correctly and probably requires a custom stream type.

If the upload fails, the error should be propagated back to the client which should then retry the upload from the beginning.

Lock api support?

Does this repo support lfs file locking?

When I try to run lfs lock I get errors.
image

error trying to connect: Connection refused (os error 111)

Hi, thanks for the LFS server!

I'm trying to get the minimal docker-compose with Minio up but do not succeed:

_______@_______ ~/Documents/rudolfs » docker-compose up
Recreating lfs_rudolfs_1 ... done
Starting lfs_minio_1     ... done
Attaching to lfs_minio_1, lfs_rudolfs_1
rudolfs_1  |  2020-02-11T11:08:52.884 INFO  rudolfs > Initializing storage...
rudolfs_1  |  2020-02-11T11:08:52.886 ERROR rudolfs > error trying to connect: Connection refused (os error 111)
lfs_rudolfs_1 exited with code 1
minio_1    | Endpoint:  http://172.22.0.2:9000  http://127.0.0.1:9000
minio_1    | 
minio_1    | Browser Access:
minio_1    |    http://172.22.0.2:9000  http://127.0.0.1:9000
minio_1    | 
minio_1    | Object API (Amazon S3 compatible):
minio_1    |    Go:         https://docs.min.io/docs/golang-client-quickstart-guide
minio_1    |    Java:       https://docs.min.io/docs/java-client-quickstart-guide
minio_1    |    Python:     https://docs.min.io/docs/python-client-quickstart-guide
minio_1    |    JavaScript: https://docs.min.io/docs/javascript-client-quickstart-guide
minio_1    |    .NET:       https://docs.min.io/docs/dotnet-client-quickstart-guide
lfs_rudolfs_1 exited with code 1
lfs_rudolfs_1 exited with code 1
lfs_rudolfs_1 exited with code 1
lfs_rudolfs_1 exited with code 1
lfs_rudolfs_1 exited with code 1
lfs_rudolfs_1 exited with code 1
lfs_rudolfs_1 exited with code 1
^CGracefully stopping... (press Ctrl+C again to force)
Stopping lfs_rudolfs_1   ... done
Stopping lfs_minio_1     ... done

I can successfully connect to Minio via http://localhost:9000 but the rudolfs-container is in a restart loop. Also you can see in the docker-compose.yml below, I added a test container (then docker-compose exec test /bin/ash) from which I am able to ping and curl the Minio service.

.env (temporary secrets)

AWS_ACCESS_KEY_ID=kG5jGQs8jmPM6kQQeuEV
AWS_SECRET_ACCESS_KEY=j6DhdwWUEnNuxv22kBS79WV3dAej9LrVF7S3sRga
AWS_DEFAULT_REGION=us-west-1
LFS_ENCRYPTION_KEY=3f083affa058951a17e7ddecc994d9bc5bf07d7b05b6803d806284eee4d7ca0d
LFS_S3_BUCKET=test
LFS_MAX_CACHE_SIZE=10GB

docker-compose.yml

Changes: Remove links because it is deprecated. Remove other networks s.t. hopefully the default network gets used only.

version: "3"
services:
  minio:
    image: minio/minio:latest
    ports:
    - "9000:9000"
    volumes:
    - miniodata:/data
    environment:
      # force using given key-secret instead of creating at start
    - MINIO_ACCESS_KEY=${AWS_ACCESS_KEY_ID}
    - MINIO_SECRET_KEY=${AWS_SECRET_ACCESS_KEY}
    command: ["server", "/data"]
    # networks:
    # - backend
  # test:
  #   image: alpine:latest
  #   command: ["ash", "-c", "while true; do sleep 3600; done"]
  #   # networks:
  #   # - backend
  rudolfs:
    image: jasonwhite0/rudolfs:latest
    #    build:
    #      context: .
    #      dockerfile: Dockerfile
    ports:
    - "8080:8080"
    volumes:
    - data:/data
    restart: always
    environment:
    - AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID}
    - AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY}
    - AWS_DEFAULT_REGION=${AWS_DEFAULT_REGION}
    - LFS_ENCRYPTION_KEY=${LFS_ENCRYPTION_KEY}
    - LFS_S3_BUCKET=${LFS_S3_BUCKET}
    - LFS_MAX_CACHE_SIZE=${LFS_MAX_CACHE_SIZE}
    - AWS_S3_ENDPOINT=http://minio:9000
    entrypoint:
    - /tini
    - --
    - /rudolfs
    - --cache-dir
    - /data
    - --key
    - ${LFS_ENCRYPTION_KEY}
    - --s3-bucket
    - ${LFS_S3_BUCKET}
    - --max-cache-size
    - ${LFS_MAX_CACHE_SIZE}
    # networks:
    # - backend
  # A real production server should use nginx. How to configure this depends on
  # your needs. Use your Google-search skills to configure this correctly.
  #
  # nginx:
  #   image: nginx:stable
  #   ports:
  #   - 80:80
  #   - 443:443
  #   volumes:
  #   - ./nginx.conf:/etc/nginx/nginx.conf
  #   - ./nginx/errors.log:/etc/nginx/errors.log

volumes:
  data:
  miniodata:
# networks:
#   backend:

supports git-upload-pack?

When git lfs push --all, the first request is something like this:
api/org/project/info/refs?service=git-upload-pack - 404 Not Found

2.9.0 Results in Bad Request

2.9.0 of GIT LFS does not seem to work with Rudolfs. A "Bad Request" is returned. Only discrepancy I can find is the addition of a transfers object to the json: '"transfers":["lfs-standalone-file","basic"]'. Rolling back 2.7.1 seems to work fine. Running GIT on Windows as the client end. Server is on Ubuntu 18.04LTS running in a docker-compose container.

X-Forwarded-Host header support

Please add X-Forwarded-* headers support or some ENV option to set Host by hands, now as I see used simple Host header:

api: req.base_uri().path_and_query("/api").build().unwrap()

this broke almost all under balancer usage, and unusable in k8s

How to host local only with docker?

I've tried building a simple image using:

FROM jasonwhite0/rudolfs
ENTRYPOINT ["/tini", "--", "/rudolfs"]
CMD ["--host=localhost:8080","--key=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx", "local", "--path=/lfsdata"]

It starts fine:

2021-11-17T16:15:28.856Z INFO  rudolfs > Initializing storage...
2021-11-17T16:15:28.858Z INFO  rudolfs > Local disk storage initialized.
2021-11-17T16:15:28.858Z INFO  rudolfs > Listening on 127.0.0.1:8080

I've added the following to my .lfsconfig:

[lfs]
url = "http://127.0.0.1:8080/api/ab/project"

However git push fails:

Uploading LFS objects:   0% (0/2), 0 B | 0 B/s, done.
batch response: Post "http://127.0.0.1:8080/api/ab/project/objects/batch": EOF

What did I miss?

Failed to query S3 bucket (''). Retrying... When used with non-AWS S3 Service

Hey, great project! This is perfect for my use-case.

I'm having some trouble getting it running against an S3 compatible store that's not run by Amazon. Specifically, Scaleway object storage.

Here's how I'm invoking it:

rudolfs --cache-dir=./.cache --key=... --s3-bucket=git-lfs-test

With this environment:

export AWS_S3_ENDPOINT=https://s3.fr-par.scw.cloud
export AWS_ACCESS_KEY_ID=...
export AWS_SECRET_ACCESS_KEY=...
export AWS_DEFAULT_REGION=fr-par
export LFS_ENCRYPTION_KEY=...
export LFS_S3_BUCKET=git-lfs-test
export LFS_MAX_CACHE_SIZE=10GB

Which outputs the following:

 2020-02-18T11:26:38.451 INFO  rudolfs > Initializing storage...
 2020-02-18T11:26:39.097 ERROR rudolfs::storage::s3 > Failed to query S3 bucket (''). Retrying...
 2020-02-18T11:26:40.273 ERROR rudolfs::storage::s3 > Failed to query S3 bucket (''). Retrying...
 2020-02-18T11:26:42.428 ERROR rudolfs::storage::s3 > Failed to query S3 bucket (''). Retrying...
 2020-02-18T11:26:46.604 ERROR rudolfs::storage::s3 > Failed to query S3 bucket (''). Retrying...
 2020-02-18T11:26:54.765 ERROR rudolfs::storage::s3 > Failed to query S3 bucket (''). Retrying...
 2020-02-18T11:27:11.330 ERROR rudolfs::storage::s3 > Failed to query S3 bucket (''). Retrying...
 2020-02-18T11:27:11.334 ERROR rudolfs              > S3 returned an unknown error

I'm not sure where to go from here - is there any way to get verbose/debug output for a more useful error?

Thanks!

local files

Can this also be used for serving (large) local files?
If not: what would be the estimated effort to implement this?
Depending on that: I would help to implement

Verify encryption key on startup

There is a danger of using a different encryption key than the one that was originally used to encrypt the LFS objects. Right now, there is no error or warning if the wrong encryption key is used. One might only notice the problem when clients start failing to download LFS objects.

Instead, there should be some metadata associated with the object store that, when decrypted, contains a predefined string. This could then be used to check if the encryption key is valid or not.

Downloading blobs programmatically

Is there any client compatible with rudolfs? I'm using Kotlin and conveniently tried to use this library (specifically the client and pointer modules) but got 404 for missing endpoint? (http://localhost:8081/api/some-org/some-repo/info/lfs/objects/batch)

perhaps batch is not supported?

My use case:
S3 to contain files versioned by git using LFS (rudolfs), then at runtime access (download) different versions of these files (while performing relatively fast checkouts in the local FS).

GCS backend

If GCS backend support is in the cards, I maybe able to sponsor the work.

Connection reset by peer

I encounter an error when I am running the docker-compose.minio.yml locally :
after a git push

batch response: Post "http://localhost:8081/api/nebnes/dirty-lfs-test/objects/batch": read tcp 127.0.0.1:38150->127.0.0.1:8081: read: connection reset by peer
error: failed to push some refs to 'github.com:nebnes/dirty-lfs-test.git

I have a test repo with this .lsconfig :

[lfs]
url = "http://localhost:8081/api/nebnes/dirty-lfs-test"

Thanks in advance

Uploading LFS objects hangs

I'm testing rudolfs on my local machine but I'm having an issue where uploading lots of lfs systematically hangs.

I built the master binary and launched it with a local storage ./rudolfs --host=0.0.0.0:8080 local --path=/tmp/lfs-storage. From an existing repo with approximately 5k lfs objects I'm force pushing all lfs with git lfs push origin --all.

I see rudolfs logging POST (verify) and PUT (objects) for every object upload, and see objects being added to /tmp/lfs-storage/objects. So far so good, however at one point I no longer see the puts but instead just a bunch of

POST /api/Org/Repo/objects/batch - 200 OK

The client remaining stuck at

Uploading LFS objects:  65% (4244/6521), 959 MB | 105 MB/s

If I cancel and relaunch the git lfs push origin --all

I just see a number of

POST /api/Org/Repo/objects/batch - 200 OK

with the client remaining stuck as mentioned above

What would be the right way to debug this?

Add home page explaining how to use it

Add an HTML page for GET / that explains how to use the server. This would serve as a good place to document the service.

A template engine would need to be used in order to construct the URL to the server. Askama looks like a good choice.

A simple white page with centered content will do the job.

The following information should be on the home page.


This is Rudolfs, a Git LFS server.

To use it in your project, add a file named .lfsconfig to the root of your repository with the following contents:

[lfs]
url = "https://gitlfs.example.com/api/my-org/my-repo"

...where my-org/my-repo is the name of the Git repository.

Please see https://git-lfs.github.com/ for more information about how to use the Git LFS client.

Failed to query S3 bucket... couldn't find AWS credentials

Similar to #13, I'm getting the following:

 2020-03-09T16:17:16.005 ERROR rudolfs::storage::s3 > Failed to query S3 bucket ('Couldn't find AWS credentials in environment, credentials file, or IAM role.'). Retrying...

I have credentials in ~/.aws/credentials, and can connect to S3 using the aws cli. The shell script I'm using to run the rudolfs Docker container contains exports with my credentials as well. So this error message surprises me.

I don't have Rust experience, and not much with Docker, but if you could just point me in the right direction to debug, I'd be grateful!

kill process that runs on startup?

via htop, I see this, which I suspect is from tinkering with rudolfs a few months ago:
Screen Shot 2020-09-15 at 09 19 21

I can run ps -e | grep -i "rudo" and sudo kill -9 xxxxx that process, but it keeps coming back.

Is there somewhere I can remove this from a scheduler of some sort? I just don't need it to be running all the time!

What are the minimal permissions that are required for the server to run on an AWS bucket?

I created a user on AWS IAM management console exclusively for the server. However, after setting a minimal set of permissions (READ/WRITE/LIST/etc) which I thought were necessary, the server returned an error upon launch:
Invalid S3 credentials

After allowing all S3 operations on the LFS bucket, the server runs smoothly.

What is the minimal set of permissions required for the server to interact properly with the S3 service?

IPFS Storage

I'm currently trying to work out if this would be possible, and logging an issue here to see if there's any other interest or opinions while I do 👍

More robust S3 response code handling

First of all: thanks very much for your work. I have been using rudolfs and have managed to upload ~100GB over ~20K files.

While I was doing that, I started noticing errors which only went away when I did
git config --global lfs.concurrenttransfers 1

This made me realize that you probably have to add more logic to cover over Rusoto's error handling. You already mention this in your comments:

                // Rusoto really sucks at correctly reporting errors.
                // Lets work around that here.

What I suspect is happening is that S3 is returning a 503 slow-down which is getting lumped into RusotoError::Unknown(_) and I think you are handling it like every other transient error.

In our own code, we have done something like this:

/// Roughly inferred what makes sense from
/// https://docs.aws.amazon.com/AmazonS3/latest/API/ErrorResponses.html#ErrorCodeList
 pub fn s3_status_code_is_permanent(s: StatusCode) -> bool {
     s == StatusCode::BAD_REQUEST || // 400 
     s == StatusCode::UNAUTHORIZED || // 401
     s == StatusCode::FORBIDDEN || // 403
     s == StatusCode::NOT_FOUND // 404
 }

and then

  .map_err(|e| match e {
            RusotoError::Unknown(ref r) => {
                 if r.status == StatusCode::NOT_FOUND {
                     Error::NotFound
                 } else {
                     error!("S3 Failure on key {}, Err {:?}", key, e);
                     e.into()
                 }
             }
             _ => {
                 error!("S3 Failure on key {}, Err {:?}", key, e);
                 e.into()
             }   

Connect to Google Cloud Storage

Hi!
Is it possible to add another env variable for S3_HOSTNAME or something like that?
It is required for connecting the server to other storage host rather then AWS (Google for example);
Thank you!

404 Not Found

Not sure if this is a bug or just my own failure in configuration.

I have rudolfs run in WSL (either in Docker or with cargo run, the issue is the same).
It is configured with an S3 bucket and a dedicated AWS user with permissions to read/write objects to that bucket.

On Windows, I have a small test repo. lfs is tracking .txt files and a I have a single .txt file in the repo.
The remote to is pointed at a repo on our Azure Devops Server.

When I run:
git push -u origin --all

I get:

Uploading LFS objects:   0% (0/1), 0 B | 0 B/s, done.
batch response: Repository or object not found: http://localhost:8080/objects/batch
Check that it exists and that you have proper access to it
error: failed to push some refs to 'https://<tfs_host>/tfs/<org>/<project>/_git/test-repo'

rudolfs outputs the following:

 2024-01-24T08:12:23.745Z INFO  rudolfs > Initializing storage...
 2024-01-24T08:12:23.745Z INFO  rudolfs::storage::s3 > Connecting to S3 bucket 'bucket-gitlfs' at region 'eu-west-2'
 2024-01-24T08:12:23.799Z INFO  rudolfs::storage::s3 > Successfully authorized with AWS
 2024-01-24T08:12:23.800Z INFO  rudolfs::storage::cached > Prepopulated cache with 0 entries (0 B)
 2024-01-24T08:12:23.800Z INFO  rudolfs                  > Listening on 0.0.0.0:8080
 2024-01-24T08:24:59.197Z INFO  rudolfs::logger          > [127.0.0.1] POST /locks/verify - 404 Not Found (58us 500ns)
 2024-01-24T08:24:59.688Z INFO  rudolfs::logger          > [127.0.0.1] POST /objects/batch - 404 Not Found (36us 900ns)

Not really sure what the 404s are referring to. The /objects/batch one occurs ever time the push is run.
Any help would be appreciated.

Collect and report metrics

It would be very useful to be able to answer the following questions:

  • How many cache hits vs misses are there?
  • How much data has been transferred to and from permanent storage?
  • How many requests have there been to and from permanent storage?
  • What is the size of the cache?
  • What is the average request latency?
  • How many errors have occurred?

All of this would be implemented by the caching storage adapter.

One possible solution would be to use Prometheus instrumentation. We can then have a Grafana dashboard pointing to the server to scrape these metrics.

Uploading lfs objects ... actively refused

I've been documenting steps for setting up a personal git multi-repository server on a simple home LAN linux box, complete with a rudolfs server. Although I've found a few important details and steps to be missing from the rudolfs documentation, I've got to the point where I believe everything should be set up and running correctly, however when making use of LFS tracked objects, the initial push from a test PC -

git push --set-upstream gitolite3@myserver:somespace/somenewrepo master

fails uploading the LFS data with -

Remote "gitolite3@myserver:somespace/somenewrepo" does not support the Git LFS locking API. Consider disabling it with:
  $ git config lfs.http://myserver:8015/somespace/somenewrepo.locksverify false
Uploading LFS objects:   0% (0/7), 0 B | 0 B/s, done.
batch response: Post "http://myserver:8015/somespace/somenewrepo/objects/batch": dial tcp 192.168.1.71:8015: connectex: No connection could be made because the target machine actively refused it.
error: failed to push some refs to 'myserver:somespace/somenewrepo'

Firstly, sorry in advance in the highly likely situation that the problem is nothing to do with rudolfs, but documenting this particular issue might at least leave a tip for anyone else running in to the same issue. I wondered if anyone has suggestions about what I might have missed or done wrong.

Here are a few things I've checked -

I've no problems doing normal, non-LFS git operations with this 'myserver'; I can clone, edit, and push repo changes from my test PC to this git repo server.

With a new repo on my test PC that uses LFS-tracked files, it has a '.lfsconfig' file with -

[lfs]
	url = "http://myserver:8015/somespace/somenewrepo"

It's not clear to me whether this URL format is suitable for the rudolfs server. I could see no rudolfs documentation on if or how the server handles those '/' path separators in the URL. Should this be fine or does anything need configuring on the rudolfs server to know how those separators should be handled?

On the linux Git + LFS server, I set up a systemd service to run '/etc/systemd/system/rudolfs.service' -

[Unit]
Description=Rudolfs (Git LFS) server
After=network.target

[Service]
Type=simple
User=rudolfs_user
Group=devservices_group
Environment="RUDOLFS_KEY=xxxxxxxxxx...."
ExecStart=/usr/local/bin/rudolfs --cache-dir /rudolfs/cache --host localhost:8015 --max-cache-size=2GiB local --path=/rudolfs/storage

[Install]
WantedBy=multi-user.target

and I can see this service is running and listening on :8015 -

systemctl status rudolfs.service

rudolfs.service - Rudolfs (Git LFS) server
	 Loaded: loaded (/etc/systemd/system/rudolfs.service; enabled; vendor preset: enabled)
	 Active: active (running) since Sun 2022-09-25 19:26:45 BST; 16h ago
   Main PID: 474 (rudolfs)
	  Tasks: 5 (limit: 4224)
		CPU: 111ms
	 CGroup: /system.slice/rudolfs.service
			 └─474 /usr/local/bin/rudolfs --cache-dir /rudolfs/cache --host localhost:8015 --max-cache-size=2GiB local --path=/rudolfs/storage

... systemd[1]: Started Rudolfs (Git LFS) server.
... INFO  rudolfs > Initializing storage...
... rudolfs > Local disk storage initialized.
... rudolfs > Listening on [::1]:8015

On my test PC (on the same 192.168.1.??? sub-net), I use a powershell port check to see that the server is only partially connecting on port 8015 -

> Test-NetConnection -ComputerName myserver -Port 8015
	WARNING: TCP connect to (192.168.1.71 : 8015) failed
	ComputerName           : myserver
	RemoteAddress          : 192.168.1.71
	RemotePort             : 8015
	InterfaceAlias         : Ethernet
	SourceAddress          : 192.168.1.32
	PingSucceeded          : True
	PingReplyDetails (RTT) : 1 ms
	TcpTestSucceeded       : False

The port ping success at least tells me that I am initially reaching myserver:8015 but something thereafter is failing, which would seem to coincide with the original "...target machine actively refused it" error.

Finally, the 'myserver' machine has its 'iptables' configured as follows -

#Trivially allow localhost (lo) connections -
sudo iptables -A INPUT -i lo -j ACCEPT

#Allow 192.168.1.[2..255] IPs to connect to the rudoLFS server, listening at port 8015, otherwise block all other connections to 8015 -
sudo iptables -A INPUT -p tcp --dport 8015 -m iprange --src-range 192.168.1.2-192.168.1.255 -j ACCEPT
sudo iptables -A INPUT -p tcp --dport 8015 -j DROP

And these rules are preserved across reboots through the use of the 'iptables-persistent' utility/service.
Update: I've also tested removing all these iptables rules (sudo iptables -F INPUT), leaving no entries remaining, and I get the same behaviour.

Is there something I can try to better home in on the the problem or does anyone have suggestions as to the source of the problem?
Also, once I have this issue resolved, would it be useful to anyone if I were to document my full, detailed, multi-repo. git and LFS server configuration set-up steps (perhaps in the 'discussions' section) in case it's helpful for others, like me, who might like slightly more explicit and detailed steps for using 'rudolfs'?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.