Giter Club home page Giter Club logo

zam's Introduction

ZFS Automatic Manager

ZAM is a python-based command-line tool for maintaining a ZFS file system.

ZAM is very much a work in progress, and the name is certainly not final.

Currently the only feature that ZAM supports is periodically taking snapshots and replicating them to remote servers. ZAM is not even able to delete old snapshots, although that feature is a top priority.

Development Process

Run these commands from the root of the ZAM repository to setup the development environment:

python3 -m venv ".venv" --prompt "ZAM"
source ".venv/bin/activate"
.venv/bin/python3 -m pip install --upgrade pip
pip install -e '.'
pip install -r requirements_dev.txt

Before each commit, run these commands and fix any issues:

black --line-length 80 src tests
mypy src/zam/task.py
flake8 src tests
pytest

Installation

To install zam itself, simply run these two commands:

python setup.py build
python setup.py install --optimize=1

To install Linux-specific files that may be useful (e.g. example config file, systemd service), run this command:

make install_utils

zam's People

Contributors

ivan-johnson avatar

Watchers

 avatar

zam's Issues

Simplify permission management

  • Can we split ZAM into two users? One for creating snapshots, one for deleting them? Does that improve security at all? If not, we should at least have one for local and one for remote.
  • We should make it easy to give the ZAM user(s) the right permissions. We could do sudo zam --set-zfs-permissions, but I don't like giving ZAM root, even if only briefly. We could instead do ~zam --get-permission-commands, with the expectation that the user manually runs each command with an understanding of what it does.

I tentatively want this implemented before adding ZAM to the AUR. If my opinion changes, the priority will be reduced.

ZAM Design Spike

Thus far when creating ZAM, I've basically skipped the design phase. This was a mistake.

We need, for example:

  • abstraction layers (protocols)
  • a (UML?) document describing the relation between the different classes/protocols
  • ...

Remove old config code

Until the new config design is finalized, we should not waste time maintaining the current config code. As such, it should be deleted and instead we should just hard code values for testing.

The run function in the task protocol has too many responsibilities

The run function in the task protocol of scheduler.py is responsible for both running the class and computing the time at which it should next be run. These responsibilities should be split into separate getNextActionionableDatetime (?) and run functions.

Besides just making for a cleaner interface, this would also allow the scheduler to fully initialize its prioritized heap without having to actually run each task.

Support recursing on datasets

Currently ZAM config must specify each individual dataset. We should add a way to recurse on a dataset or zpool.

Recursive entries should have an optional blocklist (prefix match string literal, or regexp?).

When operating on a dataset, we should use the best fit from the config. e.g. if the config defines the foo and foo/bar datasets, then foo/bar/baz should use the settings from foo/bar. Equivalently (?), a config entry for foo/bar/baz defines an implicit blocklist entry on foo/bar (which in turn is blocked from foo).

It's unlikely the user will notice if snapshot replication fails

Nobody reads logs. How is the user expected to notice if ZAM repeatedly fails to update a replica? ZAM should somehow notify the user if it hasn't updated a replica for N days. Similarly, we should notify the user any time that ZAM itself crashes.

My current preference for the method of notification is a /etc/profile.d/*.sh script that basically cats some sort of ZAM log file. The user could clear the messages by either deleting the log file, or updating some sort of $HOME/.local/share/zam/ignore_message_before_this_date file.

ZAM needs a real installer

We need to simplify the process of setting up ZAM on a machine for the first time. In addition to the items described by #1, this may include:

  • Creating a daemon user for local use (i.e. actually for actually executing the ZAM script)
  • Creating a daemon user for remote instances of ZAM to use
  • Ensuring that the daemon users have sufficient ZFS permissions
  • Setting up SSH keys to access the remote replicas
  • Getting ZAM in the AUR

Refactor `task.get_next_runtime` to `task.get_next_possible_runtime`

Currently the task scheduler requires that the task be able to deduce exactly when it will next need to run. How should we handle cases where it doesn't know when to run next? For example, a replicator_task might want to only run when the device is not in use. Currently it can schedule itself for midnight, but it has no recourse if the device is still in use when midnight arrives besides duplicating scheduling logic in the runner function.

In order to cleanly support this sort of use case the task.get_next_runtime function should be refactored to task.get_next_possible_runtime. Before the task is run, this function should be called again to see if the time has moved back.

De-duplicate config files

With the current config framework you might, for example, have to specify the exact same destination replica multiple times for many managed datasets. There should be some syntax for avoiding this. e.g. enable setting a default value from a broader context:

{
    "DEFAULT#managed-datasets[*].destinations": [
        {
            "remote-host": "olympus",
            "pool": "o-nas",
            "dataset": "zam/s-root/data/home/i",
            "windows": [
                {"max-age": {"weeks": 6}, "period": {"hours": 1}},
                {"max-age": {"months": 6}, "period": {"days": 1}},
                {"max-age": {"years": 6}, "period": {"weeks": 1}},
                {"period": {"months": 1}}
            ]
        }
    ],
    "managed-datasets": [
        {
            "source": {
                "pool": "s-root",
                "dataset": "data/home/i",
                "windows": [
                    {"max-age": {"days": 2}, "period": {"minutes": 10}},
                    {"max-age": {"weeks": 1}, "period": {"hours": 1}}
                ]
            },
            "snapshot-period": {"minutes": "10"},
            "replication-period": {"hours": "1"},
            "prune-period": {"hours": "1"}
        },
        {
            "source": {
                "pool": "s-root",
                "dataset": "root/default",
                "windows": [
                    {"max-age": {"weeks": 1}, "period": {"hours": 3}}
                ]
            },
            "snapshot-period": {"minutes": "10"},
            "replication-period": {"hours": "1"},
            "prune-period": {"hours": "1"}
        }
    ]
}

Or better yet, use some sort of super-fancy variables:

{
    "DEFINE#dest-default-dataset": "zam",
    "DEFINE#dest-default": {
        "remote-host": "olympus",
        "pool": "o-nas",
        "windows": [
            {"max-age": {"weeks": 6}, "period": {"hours": 1}},
            {"max-age": {"months": 6}, "period": {"days": 1}},
            {"max-age": {"years": 6}, "period": {"weeks": 1}},
            {"period": {"months": 1}}
        ]
    },
    "managed-datasets": [
        {
            "source": {
                "pool": "s-root",
                "dataset": "data/home/i",
                "windows": [
                    {"max-age": {"days": 2}, "period": {"minutes": 10}},
                    {"max-age": {"weeks": 1}, "period": {"hours": 1}}
                ]
            },
            "destinations": [
                {
		    "REF": "dest-default",
                    "dataset": "${dest-default-dataset}/s-root/data/home/i",
                }
            ],
            "snapshot-period": {"minutes": "10"},
            "replication-period": {"hours": "1"},
            "prune-period": {"hours": "1"}
        },
        {
            "source": {
                "pool": "s-root",
                "dataset": "root/default",
                "windows": [
                    {"max-age": {"weeks": 1}, "period": {"hours": 3}}
                ]
            },
            "destinations": [
                {
		    "REF": "dest-default",
                    "dataset": "${dest-default-dataset}/s-root/root/default",
                }
            ],
            "snapshot-period": {"minutes": "10"},
            "replication-period": {"hours": "1"},
            "prune-period": {"hours": "1"}
        }
    ]
}

Bear in mind that one of the primary objectives of this project is to use as few lines of code as possible. Is this feature worth the extra LOC?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.