Hi Guys
I have been trying to run the Docker image on Google Container Engine and i have had a bunch of problems.
- It seems impossible to run the stock image on GCE (and i presume any Kubernetes powered environment) when you attach a long-term storage to it - example on GCE would be starting a container like this:
- image: arangodb/arangodb:2.8.9
name: arango
env:
- name: ARANGO_ROOT_PASSWORD
value: "some-pass"
ports:
- containerPort: 8529
name: arango
volumeMounts:
- name: arango-persistent-storage
mountPath: /var/lib/arangodb
with the volumes defined as
volumes:
- name: arango-persistent-storage
gcePersistentDisk:
pdName: arango-dev-disk
fsType: ext4
Trying to start a container like this seems to fail - with permission errors - it breaks on line 22 (docker-entrypoint.sh)
I believe the reason this happens is because the docker file at build time, sets some permissions on /var/lib/arangodb, which are then completely ignored when the Kubernetes runtime sticks a new directory on-top of that at runtime.
The problem is made worse because of the forced user in the Dockerfile (to arangodb) - as a result, the docker-entrypoint.sh can not fix the issue at run-time.
I have previously cited the Postgres docker file as a great example of database docker files - and I will do so again now.
In order to fix this and get it deployed to GCE - I had to create my own Dockerfile as follows (note, the only reason i install GOSU is to switch back to arangodb user at run-time, it is possible to run this without all that, and just run the DB as root, which i am sure is bad):
Dockerfile.zip
and my own docker-entrypoint.sh as follows - ignore 66-94 - thats to implement initial data population as per this other issue here and point 3 below.
entry.sh.zip
ends up with a container running something like this:
root@backend-deployment-2274667476-83t1w:/# ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 4336 728 ? Ss 22:02 0:00 /bin/sh -c /opt
arangodb 7 3.2 7.1 3531816 271028 ? Sl 22:02 0:03 arangod --datab
root 11 0.0 0.0 0 0 ? Z 22:02 0:00 [entry.sh] <def
root 44 0.0 0.0 20236 3232 ? Ss 22:03 0:00 bash
root 53 0.0 0.0 17500 2096 ? R+ 22:04 0:00 ps aux
now - I am not saying this is the perfect entrypoint, I have just shoved the gosu command in front of all of the calls to arangod to make sure the user is correct, but it runs!
- because we can't specify via the environment variables the core database storeage location (or the foxx app location), and the default locations for each are in /var/lib - we come across the following problems
- there is no way to mount a volume and not use its root - we always have to mount a volume as /var/lib/arangodb or /var/lib/arangodb-apps - we always have to mount at least two volumes - instead we could mount one, simplify backups etc.
- If we want to share our storage volume (for instance, on my live-like dev environment in GCE im completely happy to have postgres, arango, and anything else dump stuff into a shared volume) arango will always write to the root for both foxx and data - unlike, (sorry) Postgres - which lets you specify a data directory, and we can make sure this is not the root of the mounted volume this way.
- an absolute must for me is the ability to specify some database initialisation state / date, in the case of ArangoDB i think that is probably arangodump output - and for that stuff to only execute if there is no existing database in the data store (Specified by the variable discussed in point 2)
TLD - i have spent more time reworking the docker file and startup for Arango than is reasonable given the promise of Docker, I am bogged down with configuration issues which should not be the case if this docker stuff was reworked slightly.
it appears to really struggle on Kubernetes, yet have had no problems with the Postgres file at all on any platform - I know I keep banging on about Postgres, but it just works, and I can see why someone new to Arango might throw it away as a result - which is super unahppy times!
ps, I still love Arango :)