ucphhpc / docker-migrid Goto Github PK

View Code? Open in Web Editor NEW

3.0 5.0 7.0 123.34 MB

Containerized MiG

License: GNU General Public License v2.0

Makefile 31.67% Shell 68.33%

containerization grid-computing development-environment scalability storage-service

docker-migrid's People

Contributors

Stargazers

Watchers

Forkers

patchofscotland bjornconnolly benibr bjarke42 sebastianprehn aputtu

docker-migrid's Issues

feature request: cron support for */7

Users do request the option to use 'interval' option in scheduling tasks.
cron options like */7 and 3-8 to be supported.

Future release workflow of docker-migrid

Godmorgen :-) I want to suggest a change to the commit/release workflow of the docker-migrid project.

Currently the repository contains multiple examples, some for development and some for real deployments etc.
At KU we deploy our custom docker-compose and env file. AU uses a custom env and a overlay file is used for docker-compose which superseeds some setting from the erda config. In the end the env is always deployment specific.

I suggest that we make the following changes for the future:

remove all deployment specific configuration (like dev.erda.dk etc) files from the repository
1. the docker-migrid repo should be as clean and generic as possible
2. all site specific deployment config should be part of the deployment process (eg. Ansible)
keep 3 example deployments:
1. one for development (migrid.test with local DNS etc.)
2. one for SIF development (like the one above but with GDP enabled)
3. one as advanced production example that serves as baseline for AU overlay (migrid.example.com, without local DNS etc.)
remove the GIT_REV from the example env files and only use GIT_BRANCH
1. then no more version bump is necessary when migrid-sync gets updated
2. meaning that the docker-migrid repo only changes if there are change within repo itself
3. a production setup has it's own env file anyway and one can set the GIT_REV there for version pinning
we can then introduce a semantic versioning to docker-migrid following https://semver.org/
1. this will make it way more easy for admins to decide when to use a new version of docker-migrid

Feedback appreciated :-)

Freeze Archives - file link error

Links from freeze archives not ok in web browser.
Nor does freeze archives file downloads work from browser.
The downloaded file is corrupted and 10KB size regardless it's original size.

The link in browser shows as using showfreezefiles.py ie: "https://int.erda.au.dk/wsgi-bin/showfreezefile.py?freeze_id=archive-XXXXX&path=Freeze-test/Sub1/DJI_0267.JPG"

Document firewall and fail2ban host integration

In native migrid deployments we rely on certain firewall rules for port forwarding, protecting against service overload and limiting e.g. password cracking attempts.
In the docker-migrid setup one needs to handle most such configuration on the host running the containers. Yet, there are some log files and configuration files generated in the actual build/deployment in play.
We need to at least document which components are in play and how a minimal such firewall and fail2ban setup can help fortify the site against abuse.

Added to milestone 1 because that is our own migration target, but in practice it really fits any production deployment.

Logging up and download of files via SFTP/FTPS

We are using MiG-sftp-subsys for ssh access. I cannot see what users are up and downloading. I have searched the logs but it seems that its not being logged.

I would like it to be as with webdav and web page that I can see what user, by email, data size, and file name they are up and downloading.

It also has to be logged outside the container so that we can retain the file in case of docker container restart. Also we would like to use log parsing so we can build statistics from the activity on ERDA.

When checking for empty sequences

https://github.com/rasmunk/docker-migrid/blob/9b8b092f1eb1d7117088a19fe8e0a48bd740fcd4/mig/shared/pattern.py#L70

As per https://www.python.org/dev/peps/pep-0008/

For sequences, (strings, lists, tuples), use the fact that empty sequences are false.

`
Yes: if not seq:
if seq:

No: if len(seq):
if not len(seq):`

You can improve both L70 and L57 with:

`if not self.trigger_paths:

....

Which cases should be tested in CI?

docker-migrid offeres a lot of different container versions which can be used: centos7, rocky8, rocky9 and also 3 different environments (prod, dev, dev_gp). We should decide which combinations of those should be tested in CI.
Testing production configs is probably not possible because is requires further DNS configs.

Please write down which aspects are deprecated or which combinations you use, then I can add them to the CI tests.

Match check against pattern attributes

https://github.com/rasmunk/docker-migrid/blob/9b8b092f1eb1d7117088a19fe8e0a48bd740fcd4/mig/shared/workflows.py#L1950

Does the match check not work if you include every attribute you wan't to check against?
i.e. (input_file, trigger_paths, outputs, recipes, variables).

`# TODO apply this to pattern as well
# need to still check variables as they might not match exactly
clients_patterns = get_wp_with(
configuration, client_id=client_id, first=False, owner=client_id,
trigger_paths=pattern['trigger_paths'], output=pattern['output'],
vgrids=pattern['vgrids'])

_logger.debug('clients_patterns: ' + str(clients_patterns))
_logger.debug('pattern: ' + str(pattern))

for client_pattern in clients_patterns:
    pattern_matches = True
    try:
        if client_pattern['input_file'] != pattern['input_file']:
            pattern_matches = False
        if client_pattern['trigger_paths'] != pattern['trigger_paths']:
            pattern_matches = False
        if client_pattern['outputs'] != pattern['outputs']:
            pattern_matches = False
        if client_pattern['recipes'] != pattern['recipes']:
            pattern_matches = False
        if client_pattern['variables'] != pattern['variables']:
            pattern_matches = False`

Apache web log should land outside the container

Now the log for migrid is only inside the container in httpd directory stored in the "old fashion" apache log format.

We would like to have all this stored in one log under state/log/apache.log (name is just an suggestion)

The log format should be the same for all log files, so ISO8601 dateformat, a log type (DEBUG, INFO, WARN, ERROR, CRITICAL, ...) and then the log entry itself. Today your logfiles does not actually use ISO8601 so it would be nice if they would all use this format.

Internal system cache should use fast scratch space

Various internals rely on the mig_system_run folder for storing volatile information like caches, session tracking and status markers.
On production systems we use a fast scratch space in tmpfs for this purpose to speedup operations on those files. The contents are automatically re-generated on use and do not require persistence across restarts, so in-memory storage fits well. A fast flash-based storage could be another option depending on memory and storage availability.

For e.g. the status markers to work between services the storage needs to be shared, however. Otherwise things like account suspension and expire will not transparently take affect in all containers.

In docker-migrid a similar fast scratch space should be integrated to improve performance. It cannot be completely automated because it requires the host to provide a suitable location and point the containers to use it.

Troubleshooting section is missing in readthedocs

The troubleshooting section from the docs https://github.com/ucphhpc/docker-migrid/tree/master/doc/source/sections/troubleshooting is not visible in the readthedocs build: https://docker-migrid.readthedocs.io/en/latest/sections/getting-started/index.html
I'm not sure why, maybe because the section lacks a index.rst?

Define equality checks of objects via "rich comparison" methods

https://github.com/rasmunk/docker-migrid/blob/9b8b092f1eb1d7117088a19fe8e0a48bd740fcd4/mig/shared/pattern.py#L16

When you wan't to do an equality check between instances of the same object type, you should implement it via the so called "rich comparison" methods as per https://docs.python.org/2/reference/datamodel.html#object.__eq__

An example can be seen here https://devinpractice.com/2016/11/29/python-objects-comparison/

This then allows for the direct comparisons checks, e.g:

(self.__eq__)
pattern1 == pattern2

(self.__ne__)
pattern1 != pattern2

Bug: Cannot load modules/mod_wsgi_python3.so

I experience a bug with the current version:

When I clone the repo and run make I get the error below. I had look inside the container, the file is missing, only mod_wsgi.so exists.

I also had the problem the other way around but I'm not sure how this is triggered. I guess this has something todo with PREFER_PYTHON3 and WITH_PY3, which are both set to False in the default example.

Any ideas how to fix this permanently?

migrid  | INFO: Enforcing timezone Europe/Copenhagen (/usr/share/zoneinfo/Europe/Copenhagen)
migrid  | Creating or renewing user: [email protected]
migrid  | Created or updated /C=DK/ST=NA/L=NA/O=Test Org/OU=NA/CN=Test User/[email protected] in user database and in file system
migrid  | Ensure correct permissions for [email protected]
migrid  | chown: changing ownership of ‘/home/mig/state/vgrid_files_readonly’: Read-only file system
migrid  | Add sftp password login for [email protected]
migrid  | Traceback (most recent call last):
migrid  |   File "/home/mig/mig/cgi-bin/fakecgi.py", line 87, in <module>
migrid  |     client_id, csrf_limit)
migrid  |   File "/home/mig/mig/shared/pwhash.py", line 532, in make_csrf_token
migrid  |     xor_id = "%s" % (int(salt, 16) ^ int(b16encode(force_utf8(merged)), 16))
migrid  | ValueError: invalid literal for int() with base 16: ''
migrid  | Failed to set sftp password login
migrid  | Add ftps password login for [email protected]
migrid  | Traceback (most recent call last):
migrid  |   File "/home/mig/mig/cgi-bin/fakecgi.py", line 87, in <module>
migrid  |     client_id, csrf_limit)
migrid  |   File "/home/mig/mig/shared/pwhash.py", line 532, in make_csrf_token
migrid  |     xor_id = "%s" % (int(salt, 16) ^ int(b16encode(force_utf8(merged)), 16))
migrid  | ValueError: invalid literal for int() with base 16: ''
migrid  | Failed to set ftps password login
migrid  | Add webdavs password login for [email protected]
migrid  | Traceback (most recent call last):
migrid  |   File "/home/mig/mig/cgi-bin/fakecgi.py", line 87, in <module>
migrid  |     client_id, csrf_limit)
migrid  |   File "/home/mig/mig/shared/pwhash.py", line 532, in make_csrf_token
migrid  |     xor_id = "%s" % (int(salt, 16) ^ int(b16encode(force_utf8(merged)), 16))
migrid  | ValueError: invalid literal for int() with base 16: ''
migrid  | Failed to set webdavs password login
migrid  | Run services: httpd script monitor sshmux events cron transfers imnotify vmproxy rsyslogd
migrid  | httpd: Syntax error on line 50 of /etc/httpd/conf/httpd.conf: Cannot load modules/mod_wsgi_python3.so into server: /etc/httpd/modules/mod_wsgi_python3.so: cannot open shared object file: No such file or directory
migrid  | Failed to start httpd: 1
migrid exited with code 1

Expose gdp.log in state/log/ on sensitive data sites

On sensitive data sites (enable_gdp) we have an extra gdp.log, which keeps track of all user operations.
We have generally configured it to use (r)syslog and write logged entries in /var/log/mig/gdp.log.
Since containers expose /var/log/ under log/CONTAINER/ on the host it is already available on the host in log/CONTAINER/mig/gdp.log for each container on such GDP-sites.
However, we received a request for moving the log into state/log/ along with the other migrid logs, and it should be possible to adjust the docker-migrid rsyslog conf to write there instead.

Rsyslog config error

With the current master branch the container complain about the rsyslog.conf not being parsable.

~/W/k/docker-migrid (master)> docker compose logs migrid-webdavs
migrid-webdavs  | INFO: Enforcing timezone Europe/Copenhagen (/usr/share/zoneinfo/Europe/Copenhagen)
migrid-webdavs  | Setting MiG state cleanup
migrid-webdavs  | Enabling IO session cleanup for: davs
migrid-webdavs  | Run services: webdavs rsyslogd
migrid-webdavs  | /home/mig/mig/shared/pwcrypto.py:55: CryptographyDeprecationWarning: Python 2 is no longer supported by the Python core team. Support for it is now deprecated in cryptography, and will be removed in a future release.
migrid-webdavs  |   import cryptography
[  OK  ]ebdavs  | Starting MiG webdavs daemon: grid_webdavs.py[  OK  ]
migrid-webdavs  |
migrid-webdavs  | rsyslogd: error during parsing file /etc/rsyslog.conf, on or before line 13: warnings occured in file '/etc/rsyslog.conf' around line 13 [v8.2102.0-15.el8 try https://www.rsyslog.com/e/2207 ]
migrid-webdavs  | rsyslogd: error during parsing file /etc/rsyslog.conf, on or before line 13: invalid character '=' - is there an invalid escape sequence somewhere? [v8.2102.0-15.el8 try https://www.rsyslog.com/e/2207 ]
migrid-webdavs  | rsyslogd: error during parsing file /etc/rsyslog.conf, on or before line 13: invalid character '"' - is there an invalid escape sequence somewhere? [v8.2102.0-15.el8 try https://www.rsyslog.com/e/2207 ]
migrid-webdavs  | rsyslogd: error during parsing file /etc/rsyslog.conf, on or before line 13: warnings occured in file '/etc/rsyslog.conf' around line 13 [v8.2102.0-15.el8 try https://www.rsyslog.com/e/2207 ]
migrid-webdavs  | rsyslogd: error during parsing file /etc/rsyslog.conf, on or before line 13: invalid character '"' - is there an invalid escape sequence somewhere? [v8.2102.0-15.el8 try https://www.rsyslog.com/e/2207 ]

Is this a known problem?

Std. jobs from generateconfs in cron

standard jobs from generateconfs.py implementet in containers. i.e migstateclean.

This for general cleanup jobs in MiGrid.

Build error on v1.0.5 with rocky8

I experience a build error with the latest version v1.0.5 when I build with rocky8 and development.env and docker-compose_development.yml

 => [migrid-volume-init install_mig 9/9] RUN cp generated-confs/MiGserver.conf /home/mig/mig/server/     && cp generated-confs/static-skin.css /home/mig/mig/images/     && cp generated-confs/index.html /home  0.2s
 => [migrid-volume-init setup_mig_configs  1/25] RUN cd /home/mig/mig     && python shared/httpsclient.py | grep -A 80 "xml version"     > /home/mig/state/wwwpublic/oiddiscover.xml                             0.5s
 => [migrid-volume-init setup_mig_configs  2/25] RUN cp generated-confs/sshd_config-MiG-sftp-subsys /etc/ssh/     && chown 0:0 /etc/ssh/sshd_config-MiG-sftp-subsys                                              0.2s
 => ERROR [migrid-volume-init setup_mig_configs  3/25] RUN cd /home/mig/mig/src/libpam-mig     && make && make install                                                                                           1.3s
------
 > [migrid-volume-init setup_mig_configs  3/25] RUN cd /home/mig/mig/src/libpam-mig     && make && make install:
0.330 /home/mig/mig/shared/pwcrypto.py:55: CryptographyDeprecationWarning: Python 2 is no longer supported by the Python core team. Support for it is now deprecated in cryptography, and will be removed in a future release.
0.330   import cryptography
0.523 /home/mig/mig/shared/pwcrypto.py:55: CryptographyDeprecationWarning: Python 2 is no longer supported by the Python core team. Support for it is now deprecated in cryptography, and will be removed in a future release.
0.523   import cryptography
0.728 /home/mig/mig/shared/pwcrypto.py:55: CryptographyDeprecationWarning: Python 2 is no longer supported by the Python core team. Support for it is now deprecated in cryptography, and will be removed in a future release.
0.728   import cryptography
0.832 which: no python-config in (/home/mig/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin)
0.843 which: no python3-config in (/home/mig/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin)
0.876 which: no python-config in (/home/mig/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin)
0.886 which: no python3-config in (/home/mig/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin)
0.901 /bin/sh: --: invalid option
0.901 Usage:    /bin/sh [GNU long option] [option] ...
0.901   /bin/sh [GNU long option] [option] script-file ...
0.901 GNU long options:
0.901   --debug
0.901   --debugger
0.901   --dump-po-strings
0.901   --dump-strings
0.901   --help
0.901   --init-file
0.901   --login
0.901   --noediting
0.901   --noprofile
0.901   --norc
0.901   --posix
0.901   --rcfile
0.901   --rpm-requires
0.901   --restricted
0.901   --verbose
0.901   --version
0.901 Shell options:
0.901   -ilrsD or -c command or -O shopt_option         (invocation only)
0.901   -abefhkmnptuvxBCHP or -o option
0.979 which: no python-config in (/home/mig/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin)
0.990 which: no python3-config in (/home/mig/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin)
0.992 make: --cflags: Command not found
1.025 which: no python-config in (/home/mig/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin)
1.036 which: no python3-config in (/home/mig/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin)
1.068 which: no python-config in (/home/mig/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin)
1.079 which: no python3-config in (/home/mig/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin)
1.093 /bin/sh: --: invalid option
1.093 Usage:    /bin/sh [GNU long option] [option] ...
1.093   /bin/sh [GNU long option] [option] script-file ...
1.093 GNU long options:
1.093   --debug
1.093   --debugger
1.093   --dump-po-strings
1.093   --dump-strings
1.093   --help
1.093   --init-file
1.093   --login
1.093   --noediting
1.093   --noprofile
1.093   --norc
1.093   --posix
1.093   --rcfile
1.093   --rpm-requires
1.093   --restricted
1.093   --verbose
1.093   --version
1.093 Shell options:
1.093   -ilrsD or -c command or -O shopt_option         (invocation only)
1.093   -abefhkmnptuvxBCHP or -o option
1.095 gcc -std=gnu99 -fPIC   -D'MIG_UID=1000' -D'MIG_GID=1000' -D'RATE_LIMIT_EXPIRE_DELAY=300' -D'JOBSIDMOUNT_HOME="/home/mig/state/webserver_home/"' -D'JOBSIDMOUNT_LENGTH=64' -D'JUPYTERSIDMOUNT_HOME="/home/mig/state/sessid_to_jupyter_mount_link_home/"' -D'JUPYTERSIDMOUNT_LENGTH=64' -D'PASSWORD_MIN_LENGTH=8' -D'PASSWORD_MIN_CLASSES=3' -D'SHARELINK_HOME="/home/mig/state/sharelink_home/"' -D'SHARELINK_LENGTH=10' -D'USERNAME_REGEX="^[a-zA-Z0-9][a-zA-Z0-9.@_-]{0,127}$"' -D'LIBPYTHON=""'  -Wall -Wpedantic  -o libpam_mig.so libpam_mig.c -shared
1.241 In file included from libpam_mig.c:96:
1.241 migauthhandler.c:39:10: fatal error: Python.h: No such file or directory
1.241  #include <Python.h>
1.241           ^~~~~~~~~~
1.241 compilation terminated.
1.245 make: *** [Makefile:92: libpam_mig.so] Error 1
------
failed to solve: process "/bin/sh -c cd $MIG_ROOT/mig/src/libpam-mig     && make && make install" did not complete successfully: exit code: 2
make: *** [dockerbuild] Error 17

I'm not sure where to look exactly for the root of this problem. I guess it's a error about python versions?!

Also I guess the CI didn't find it because the default is centos7, I opened #51 to address this issue.

Take out volume mount for mig/ directory in production

As already discussed earlier I'd like to suggest, that we remove the volume mount for the mig/ application folder from the production compose file.

The current situation is: during first start the migrid Docker container populates the mig/ directory with the version of the application that is built into the container.
If the container is rebuilt with a newer version and the whole setup is started, the mig/ folder inside the container get overmounted by the already existing volume. Thus we end up with a running Docker container tagged as a newer version which effectively runs the old version.

This is accepable behavior for a development environment where a user wants to change the application on-the-fly and see the results.

For a production environment a user should be able to see the running version by looking at the image tag of the containers.

Current solutions to avoid the version mismatch are: running make clean or even running make distclean and redeploy everything.
Both are meant to be used manual and cause downtime. The latter is also very slow.

Without the mig/ volume it would be possible to rebuild the container with a new version, then just rerun telling docker-compose to recreate the container. This is fast and intuitive and does not need extra steps in the automation (eg. Ansible).

So my suggestions is to remove the mig volume from all containers in the production template docker-compose file:

    volumes:
...
      - type: volume
        source: mig
        target: /home/mig/mig

Any thoughts and suggestions appreciated.

Github Actions don't run on all PRs

For some reason, there are PRs like #43 which did not trigger the CI test.

From https://github.com/ucphhpc/docker-migrid/blob/master/.github/workflows/ci.yml it looks like they should run on all opened PRs to me.

We should in general ensure that all merged commits have successfully completed those basic tests.

Identify and document internal and external container dependencies

The production setup mostly splits the stack into separate service containers. The main migrid container still runs multiple services due to service dependencies and in practice keeps the dependencies internal to the container.
The other containers are mostly independent as long as they share the user data file system(s) and a few additional state folders like mig_system_run.
Some services and facilities like the auth notifications additionally rely on ''named pipes'' in the file system for inter-process communication.
All such internal and external dependencies must be identified and at least documented. Providing suggestions on how they can be managed in a distributed setup is a next logical step and outlining or supporting a distributed production setup would be the final goal. This could involve docker swarm mode or manual distribution of containers.

The future of the single profile

As already discussed with Jonas, the single profile in the docker-compose example files is not used in any production setup right now (please correct me if I'm wrong).

I'd like to remove the existing examples and documentation to reduce complexity and make room for new :-)

Any objections?

Question about usage of named Docker volumes

The current usage of the Docker volumes is:

Data is stored in directories inside the repo (state, mig, certs, etc.) and not within Dockers own structure (/var/lib/docker/volumes)
These directories might get removed by make targets like clean or distclean
Named Docker volumes are created via docker compose, with the above mentioned directories as targets
Those named volumes are then assigned to containers
The named volumes are never cleaned up by any make targets

In some corner cases this leads to unwanted behaviour:

If the target path of a volume should change, the volume is not recreated automatically, thus after rebuilding everything the containers run with the old volume
Same happens if the path of the docker-migrid repository changes
In both cases the wrong path is not visible in the docker-migrid repo, only if one inspects the existing volumes
Deleting the named volumes does not delete the data, unlike one might expect

I would like to the discuss the option of completely removing the named volumes and setting the bindmounts only in the volumes list of each container. This would result in

no more hidden leftovers after make distclean
no more manual cleanup necessary if directories are changed
possibility to see what is mounted whereto in the compose section of each container
removal of 400 lines in docker-migrid, resulting in better readability

logrotate inside docker

Logrotate should not be running inside the docker container and meddling with files that are outside the container itself.

We would like an option so that if you want logrotate to be active from inside the container you can set it in env file, example:

ENABLE_LOGROTATE=true

Default should be that logrotate is not enabled.

A work around for now is, everytime you start migrid, execute the following:

docker exec -it migrid /bin/bash -c "rm /etc/cron.daily/logrotate"

define mail smtp_sender

Mails sent from migrid has fixed sender. We would like to be able to define sender.
Please expose smtp-sender config to env.