amplab / docker-scripts Goto Github PK

View Code? Open in Web Editor NEW

261.0 261.0 102.0 568 KB

Dockerfiles and scripts for Spark and Shark Docker images

Shell 100.00%

docker-scripts's People

Contributors

Stargazers

Watchers

Forkers

andreschumacher pombredanne mrrama vdt 24601 inthecloud247 imace fairchild gwillink scivm colorscode rtvt123 agur xuefengwu mrphilroth zhuomingliang clausd lipin achiash htaox ycaihua crossjam pdam weiliupa alphamanimal lbustelo xiaom peterklipfel rikima levidehaan iambowen gyliu513 arshak digirithm cnopens wdxxs2z edwardt amitoj holodigm broxtronix rrichardson julianshen lazycrazyowl outmana schevalier atlaspilotpuppy aculich coursera4ashok rickyking hcchen baontq mbit-cloud fonhorst gchen jmshelby omriiluz reference-project watermen snowwolph andyyes jnadler searler linearregression victorhong apsaltis mkscala lingya diguabo is00hcw yyljlyy sberajaw miguelperalvo nosmokingpistol corelchen robinloxley1 tempbottle pjcrosbie kwin-wang vigarbuaa zsmj513 young8 aimer1027 pyxixi2012 isbarton zhilinwang datastark saarus giserh jeyaramashok dtya yscia007 cabecada vishnugonela fieryswampshire

docker-scripts's Issues

Line 61 of`start_spark_cluster.sh` will fail if the host machine has set `http_proxy` environment variable

Line 61 of start_spark_cluster.sh will fail if the host machine has set http_proxy environment variable, because the proxy server could not connect to MASTER_IP, which is a private IP

any news on scripts for 1.1?

Is there any plan / idea for when the scripts for dockering 1.1 will become available?

Unable to start spark-master:0.9.0

Hi all,

I'm trying to deploy a spark cluster 0.9.0 but the script is blocked at the stage waiting for master.

The logs from the spark-master is as follow

SPARK_HOME=/opt/spark-0.9.0
HOSTNAME=master
SCALA_VERSION=2.10.3
PATH=/opt/spark-0.9.0:/opt/scala-2.10.3/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
SPARK_VERSION=0.9.0
PWD=/
JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
SHLVL=1
HOME=/
SCALA_HOME=/opt/scala-2.10.3
container=lxc
_=/usr/bin/env
MASTER_IP=172.17.0.6
preparing Spark
starting Hadoop Namenode
starting sshd
starting Spark Master

And docker ps -a shows 👍
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
8c19bdb67f2b amplab/spark-master:0.9.0 /root/spark_master_f 4 minutes ago Exit 1 boring_wright

Is the Exit 1 correct ?

Cannot get a simple test Java app running in OSX/Vagrant environemnt

I have a Vagrant setup running the docker scripts using docker 0.9. I also have a simple maven project that tries to replicate your Shell example. I keep getting failures on the submission.

Java Main is:

public class SparkMain {

    protected static String master = "spark://master:7077"; // change to your master URL
    protected static String sparkHome = "/opt/spark-0.9.0";

    public static void main(String [] args ){

        JavaSparkContext sc = new JavaSparkContext(master, "Test App",
                sparkHome, JavaSparkContext.jarOfClass(SparkMain.class));

        JavaRDD<String> file = sc.textFile("hdfs://master:9000/user/hdfs/test.txt");
        //JavaRDD<String> file = sc.textFile("README.md");
        System.out.println(file.count());
        sc.stop();
    }
}

When running the test with "README.md", I see an error that its cannot find "/vagrant/README.md". In that case I don't understand why Spark things that the file is relative to the vagrant vm and not the docker containers.
When I use the hdfs url, then I just get a lot of these:

14/05/09 00:05:50 INFO scheduler.DAGScheduler: Missing parents: List()
14/05/09 00:05:50 INFO scheduler.DAGScheduler: Submitting Stage 0 (MappedRDD[1] at textFile at SparkMain.java:19), which has no missing parents
14/05/09 00:05:50 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from Stage 0 (MappedRDD[1] at textFile at SparkMain.java:19)
14/05/09 00:05:50 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
14/05/09 00:05:55 INFO client.AppClient$ClientActor: Executor updated: app-20140509000548-0012/0 is now FAILED (Command exited with code 1)
14/05/09 00:05:55 INFO cluster.SparkDeploySchedulerBackend: Executor app-20140509000548-0012/0 removed: Command exited with code 1
14/05/09 00:05:55 INFO client.AppClient$ClientActor: Executor added: app-20140509000548-0012/3 on worker-20140508215925-worker3-43556 (worker3:43556) with 1 cores
14/05/09 00:05:55 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20140509000548-0012/3 on hostPort worker3:43556 with 1 cores, 512.0 MB RAM

I've tried several things:

Have the nameserver address in the resolve.conf file
Tried creating an hdfs user in the vagrant vm and run mvm exec under that user (did not work)

really simple doc issue

The 1.0.0 boot2docker vm doesn't have bash installed! So one must use tcb-ab to install bash.

Docker worker install runs forever 'waiting for workers to register ............................'

This issue has been previously raised and closed by dbaba
#39

But the suggested solution is not working for me.

Following the example explained in https://amplab.cs.berkeley.edu/2013/10/23/got-a-minute-spin-up-a-spark-cluster-on-your-laptop-with-docker/

I've tried so far,

sudo ./docker-scripts/deploy/deploy.sh -i amplab/spark:1.0.0 -c
sudo ./docker-scripts/deploy/deploy.sh -i amplab/spark:0.8.0 -c
sudo ./docker-scripts/deploy/deploy.sh -i amplab/spark:1.0.0
sudo ./docker-scripts/deploy/deploy.sh -i amplab/spark:0.8.0
sudo ./docker-scripts/deploy/deploy.sh -i amplab/spark:1.0.0 -w 3

My output is

[vagrant@docker ~]$ sudo ./docker-scripts/deploy/deploy.sh -i amplab/spark:1.0.0 -w 3
*** Starting Spark 1.0.0 ***
starting nameserver container
started nameserver container: ee09901077c4c1a61e3cdb4d79b30060de777abae4c9fd0580b2176a2aa4f58a
DNS host->IP file mapped: /tmp/dnsdir_28254/0hosts
NAMESERVER_IP: 172.17.0.20
waiting for nameserver to come up
starting master container
started master container: 723fafc1b39de3fe6a28130968b4d65e8adf78ea131176621203f362a689ab8e
MASTER_IP: 172.17.0.21
waiting for master ................
waiting for nameserver to find master
starting worker container
started worker container: 3a061e3091fe435f18673c29c586e58e024011620f7864fd55191cbad23001d5
starting worker container
started worker container: 1c2be21cc224942f4d26efacc9b0d09ed900affaee1bdafff0bf9a2ddf54b4c3
starting worker container
started worker container: 65bc96a8777e1e68dcece7bf73dfcf09b4a2d64bb2372a9ce0916ed99fc52692
waiting for workers to register .......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

Is there a way to check the installation log of these containers?

Build image for Shark 0.9.1

Path does not resolve to spark

I noted that in the spark-base Dockerfile, PATH is set as follows:
ENV PATH $SPARK_HOME:$SCALA_HOME/bin:$PATH

I doubt it is harmful (the file has always been like this), but seems like the /bin is missing after $SPARK_HOME

waiting for nameserver for ever

getting this msg and it runs for ever...

sudo ./deploy/deploy.sh -i amplab/spark:0.9.0 
*** Starting Spark 0.9.0 ***
starting nameserver container
WARNING: WARNING: Local (127.0.0.1) DNS resolver found in resolv.conf and containers can't use it. Using default external servers : [8.8.8.8 8.8.4.4]
started nameserver container:  4a6ba6682fc59b1ea99fc82644c16fd8c6b5aeffa158b3143076e74422640564
DNS host->IP file mapped:      /tmp/dnsdir_17059/0hosts
NAMESERVER_IP:                 172.17.0.5
waiting for nameserver to come up ............

Build image for Spark 0.9.1

Running application on the cluster

Am a Spark and Docker noob and this is actually a question and not an issue.

I followed your instructions and was able to setup the cluster and run the example. This is what I see as my cluster status -

vagrant@packer-virtualbox-iso:/vagrant/sparkling$ sudo docker ps
CONTAINER ID        IMAGE                           COMMAND                CREATED             STATUS              PORTS                NAMES
8f5d44eefa65        amplab/spark-worker:0.9.0       /root/spark_worker_f   About an hour ago   Up About an hour    8888/tcp             prickly_lumiere     
33c48ef9d17e        amplab/spark-worker:0.9.0       /root/spark_worker_f   About an hour ago   Up About an hour    8888/tcp             stoic_feynman       
d91e47ed0b90        amplab/spark-worker:0.9.0       /root/spark_worker_f   About an hour ago   Up About an hour    8888/tcp             ecstatic_babbage    
e173ecd4f4c0        amplab/spark-master:0.9.0       /root/spark_master_f   About an hour ago   Up About an hour    7077/tcp, 8080/tcp   berserk_nobel       
d67f979d70fe        amplab/dnsmasq-precise:latest   /root/dnsmasq_files/   About an hour ago   Up About an hour

I have written a Spark program for Linear Regression which runs perfectly in the local mode. It is a very small program and on github here

Now, I want to run this program on my spark cluster. The instructions in the Spark programming guide leave me scratching my head about what to do next. Want your help to know what is the right way to run the application -

I get the scala prompt when I do the docker attach. Should I run my application from this prompt?
I have a Vagrant setup on which am running docker. On my vagrant ubuntu box I have the application code which I compile and assemble using sbt. Can I somehow deploy the application after assembly from the sbt to the cluster?

If this has been explained elsewhere then please point me as I could not find any example on how to run an application program on a spark cluster.

Thank you very much.

spark-worker default_cmd Script not Working as Expected

I am using spark 1.0.0 docker images. It appears to me that the script default_cmd in spark-worker is not working as it should be working. This script calls prepare_spark $1 of /root/spark_files/configure_spark.sh. I have debugged it a lot. Even, I have called configure_spark.sh from spark-base image by using docker run -it.
The problem is that these script do not replace the __MASTER__ tag in core_site.xml in /root/hadoop_files/ with the argument provided. Instead, the worker expects the master to be master. That is, it is static.
Please, can someone help me out with this as I need it to create clusters on different machines. If I am not able to specify master like this, then I cannot create a cluster on different machines as the worker nodes will not know about the master. It works on single machine though, but that is because I have installed the docker-dns service.

Couldn't start spark 1.0.0

I repeatedly get the following error:

started nameserver container: 23fbb2b99f1a3de88ca310ab992f9ec93eb2fe201860509bcc98324e43532535
DNS host->IP file mapped: /tmp/dnsdir_16657/0hosts
NAMESERVER_IP:
waiting for nameserver to come up Usage: grep [OPTION]... PATTERN [FILE]...
Try 'grep --help' for more information.
dig: couldn't get address for '': not found
.Usage: grep [OPTION]... PATTERN [FILE]...

dnsmasq is no longer necessary?

It looks like this project was started prior to the release of docker's host networking feature.
The use of dnsmasq here is for reverse DNS (since the container and host don't share the same hostname), right? Now that one can do docker run --net host -P .... to make the container and host share the same network interface, it's no longer necessary to have a custom reverse DNS solution. This approach does work: I've gotten Spark and HDFS running in docker 1.1.2 w/out custom DNS. You might consider removing the dnsmasq dependency or perhaps clarifying to users which docker versions require custom DNS.

Thanks for posting these scripts, though! They've had significant educational value 8)

Starting worker container never completed on boot2docker (Docker v1.0.0)

I was trying to run Spark 1.0.0 on docker with this project scripts.
After installing python and bash with tae-ab, I ran the following command on boot2docker vm:

sudo ./deploy/deploy.sh -i amplab/spark:1.0.0 -c

then the command never finished.
Did I miss something?

The terminal showed the message waiting for workers to register and dots followed it.
The command didn't seem to get stuck as the dots were increased continuously but never finished over a couple of hours.

Here is the entire log until ctrl + c:

** Starting Spark 1.0.0 ***
starting nameserver container
Unable to find image 'amplab/dnsmasq-precise' locally
Pulling repository amplab/dnsmasq-precise

started nameserver container:  8a6c93484ff992538fb7d706cf7d348477920a90f41a6ce7068120c2afb4d04f
DNS host->IP file mapped:      /tmp/dnsdir_25925/0hosts
NAMESERVER_IP:                 172.17.0.2
waiting for nameserver to come up 
starting master container
Unable to find image 'amplab/spark-master:1.0.0' locally
Pulling repository amplab/spark-master

started master container:      009ca3cb6a3bb034af5aaac7f7898d37d7f898d692a1e27cfc28b205982c6575
MASTER_IP:                     172.17.0.3
waiting for master ............
waiting for nameserver to find master 
starting worker container
Unable to find image 'amplab/spark-worker:1.0.0' locally
Pulling repository amplab/spark-worker

started worker container:  ed2ddaefd11e2934cd7ac0d5717f244a64e6997d55791a2b272400642918ffe9
starting worker container
started worker container:  1ffec6ce05937bf37cb96d9ecd3f0cdfd6c66a75c098f964608af144fa3d9f3b
waiting for workers to register ................................................
................................................................................
................................................................................
................................................................................

Any suggestion is appreciated.

Call to master/172.17.0.3:9000 failed on connection exception: java.net.ConnectException: Connection refused

I tried to follow the Spark example (spark:0.8.0 image) but I get errors because no service is running on port 9000:

$ sudo docker attach 27550fe348c3410c50ff7a7a395a7444f79945fbc980dc78b401a96b75a54a3d
sudo: unable to resolve host ip-10-244-4-249
14/02/08 16:34:56 INFO ipc.Client: Retrying connect to server: master/172.17.0.3:9000. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/02/08 16:34:57 INFO ipc.Client: Retrying connect to server: master/172.17.0.3:9000. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/02/08 16:34:58 INFO ipc.Client: Retrying connect to server: master/172.17.0.3:9000. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/02/08 16:34:59 INFO ipc.Client: Retrying connect to server: master/172.17.0.3:9000. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/02/08 16:35:00 INFO ipc.Client: Retrying connect to server: master/172.17.0.3:9000. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/02/08 16:35:01 INFO ipc.Client: Retrying connect to server: master/172.17.0.3:9000. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/02/08 16:35:02 INFO ipc.Client: Retrying connect to server: master/172.17.0.3:9000. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
put: Call to master/172.17.0.3:9000 failed on connection exception: java.net.ConnectException: Connection refused
starting Spark Shell

Of course the sample will fail:

scala> val textFile = sc.textFile("hdfs://master:9000/user/hdfs/test.txt")
14/02/08 16:43:21 INFO MemoryStore: ensureFreeSpace(36192) called with curMem=0, maxMem=530593873
14/02/08 16:43:21 INFO MemoryStore: Block broadcast_0 stored as values to memory (estimated size 35.3 KB, free 506.0 MB)
textFile: org.apache.spark.rdd.RDD[String] = MappedRDD[1] at textFile at <console>:12

scala> textFile.count()
14/02/08 16:43:29 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/02/08 16:43:29 WARN LoadSnappy: Snappy native library not loaded
14/02/08 16:43:30 INFO Client: Retrying connect to server: master/172.17.0.3:9000. Already tried 0 time(s).
14/02/08 16:43:31 INFO Client: Retrying connect to server: master/172.17.0.3:9000. Already tried 1 time(s).
14/02/08 16:43:32 INFO Client: Retrying connect to server: master/172.17.0.3:9000. Already tried 2 time(s).
14/02/08 16:43:33 INFO Client: Retrying connect to server: master/172.17.0.3:9000. Already tried 3 time(s).
14/02/08 16:43:34 INFO Client: Retrying connect to server: master/172.17.0.3:9000. Already tried 4 time(s).
14/02/08 16:43:35 INFO Client: Retrying connect to server: master/172.17.0.3:9000. Already tried 5 time(s).
14/02/08 16:43:36 INFO Client: Retrying connect to server: master/172.17.0.3:9000. Already tried 6 time(s).
14/02/08 16:43:37 INFO Client: Retrying connect to server: master/172.17.0.3:9000. Already tried 7 time(s).
14/02/08 16:43:38 INFO Client: Retrying connect to server: master/172.17.0.3:9000. Already tried 8 time(s).
14/02/08 16:43:39 INFO Client: Retrying connect to server: master/172.17.0.3:9000. Already tried 9 time(s).
java.net.ConnectException: Call to master/172.17.0.3:9000 failed on connection exception: java.net.ConnectException: Connection refused

Connecting on the master and checking the open ports, I get:

# lsof -n|grep LIST
sshd      131 root    3u  IPv4              36387      0t0      TCP *:ssh (LISTEN)
sshd      131 root    4u  IPv6              36389      0t0      TCP *:ssh (LISTEN)
java      172 hdfs   12u  IPv6              36486      0t0      TCP 172.17.0.3:7077 (LISTEN)
java      172 hdfs   17u  IPv6              36490      0t0      TCP *:http-alt (LISTEN)

Running Docker version 0.8.0, build cc3a8c8

Not able to pull shark 0.8.0 image

$ docker -v
Docker version 0.7.0, build 0d078b6

$ sudo docker pull -t="0.8.0" amplab/shark-master
Pulling repository amplab/shark-master
2013/12/03 10:58:32 Server error: 404 trying to fetch remote history for amplab/shark-master

same command works well for amplab/spark-master image

Error in Workers

While running spark cluster with docker 0.7, I am getting this error:

13/12/03 19:04:38 ERROR StandaloneExecutorBackend: error while creating actor
java.net.UnknownHostException: 1a183a2affd5: Name or service not known
at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:866)
at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1258)
at java.net.InetAddress.getAllByName0(InetAddress.java:1211)
at java.net.InetAddress.getAllByName(InetAddress.java:1127)
at java.net.InetAddress.getAllByName(InetAddress.java:1063)
at java.net.InetAddress.getByName(InetAddress.java:1013)
at akka.remote.netty.ActiveRemoteClient$$anonfun$connect$1.apply$mcV$sp(Client.scala:170)
at akka.util.Switch.liftedTree1$1(LockUtil.scala:33)
at akka.util.Switch.transcend(LockUtil.scala:32)
at akka.util.Switch.switchOn(LockUtil.scala:55)
at akka.remote.netty.ActiveRemoteClient.connect(Client.scala:158)
at akka.remote.netty.NettyRemoteTransport.send(NettyRemoteSupport.scala:153)
at akka.remote.RemoteActorRef.$bang(RemoteActorRefProvider.scala:247)
at org.apache.spark.executor.StandaloneExecutorBackend.preStart(StandaloneExecutorBackend.scala:48)
at akka.actor.ActorCell.create$1(ActorCell.scala:508)
at akka.actor.ActorCell.systemInvoke(ActorCell.scala:600)
at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:209)
at akka.dispatch.Mailbox.run(Mailbox.scala:178)
at akka.dispatch.ForkJoinExecutorConfigurator$MailboxExecutionTask.exec(AbstractDispatcher.scala:516)
at akka.jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:259)
at akka.jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:975)
at akka.jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1479)
at akka.jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)

Versions of Ubuntu and Docker:
Ubuntu 12.04.3 LTS , Release: 12.04, Codename: precise
Docker version 0.7.0, build 0d078b6

Password for Master are not in docs

The docs don't give the default password or a way to set it while creating the containers.

Improve Python support

The PySpark shell is currently not supported very well by the Python installation in the Spark base images. Although the changes are small they require rebuilding the base images. See http://www.rankfocus.com/run-berkeley-sparks-pyspark-using-docker-couple-minutes/

Can't run the deploy.sh

qianyuxiang@qianyuxiangdeMacBook-Pro:~$sudo docker-scripts/deploy/deploy.sh -i amplab/spark:1.0.0 -w 2 -c
*** Starting Spark 1.0.0 ***
starting nameserver container
time="2015-12-07T21:24:26+08:00" level=fatal msg="Post http:///var/run/docker.sock/v1.18/containers/create: dial unix /var/run/docker.sock: no such file or directory. Are you trying to connect to a TLS-enabled daemon without TLS?"
error: could not start nameserver container from image amplab/dnsmasq-precise

I just clone the repository from github and run the script.But this works on my virtual centos machine

Can't run deploy/deploy.sh in boot2docker VM

When I follow the README instructions on OS X with boot2docker v1.0.0, I get the following:

docker@boot2docker:~$ git clone https://github.com/amplab/docker-scripts.git
Cloning into 'docker-scripts'...
remote: Reusing existing pack: 1011, done.
remote: Counting objects: 50, done.
remote: Compressing objects: 100% (39/39), done.
remote: Total 1061 (delta 18), reused 31 (delta 10)
Receiving objects: 100% (1061/1061), 144.43 KiB, done.
Resolving deltas: 100% (429/429), done.
docker@boot2docker:~$ cd docker-scripts/
docker@boot2docker:~/docker-scripts$ sudo ./deploy/deploy.sh
sudo: unable to execute ./deploy/deploy.sh: No such file or directory

Building Spark and running unit tests within a Docker container

This is a general question about Spark on Docker. Let me know if there is a better place to ask this. I asked a similar question on the Spark dev list.

I am having trouble building Spark and running all the unit tests within a Docker container. The JVM complains that there isn't enough memory, though I believe I've set the appropriate JAVA_OPTS and granted the Docker container plenty of memory.

Do you folks have some instructions on how to build Spark from source and run all the unit tests within a Docker container? I took a look through the scripts here but couldn't find anything.

For the record, I'm trying to build and test Spark as follows:

# start the container like this
# docker run -m 4g -t -i centos bash

export JAVA_OPTS="-Xms512m -Xmx1024m -XX:PermSize=64m -XX:MaxPermSize=128m -Xss512k"

# build
sbt/sbt -Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 -Pkinesis-asl  -Phive -Phive-thriftserver package assembly/assembly

# Scala unit tests
sbt/sbt -Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 -Pkinesis-asl -Phive -Phive-thriftserver catalyst/test sql/test hive/test mllib/test

Cannot start nameserver

Hi!

I'm trying to start a spark cluster but I have a problem with nameserver container. The deploy script is stuck in waiting for nameserver to come up. When I checked the nameserver container I can see the following logs:

dnsmasq: cannot access directory /etc/dnsmasq.d: Permission denied

dnsmasq: cannot access directory /etc/dnsmasq.d: Permission denied

dnsmasq: cannot access directory /etc/dnsmasq.d: Permission denied

dnsmasq: cannot access directory /etc/dnsmasq.d: Permission denied

dnsmasq: cannot access directory /etc/dnsmasq.d: Permission denied

dnsmasq: cannot access directory /etc/dnsmasq.d: Permission denied

dnsmasq: cannot access directory /etc/dnsmasq.d: Permission denied

dnsmasq: cannot access directory /etc/dnsmasq.d: Permission denied

dnsmasq: cannot access directory /etc/dnsmasq.d: Permission denied

dnsmasq: cannot access directory /etc/dnsmasq.d: Permission denied

dnsmasq: cannot access directory /etc/dnsmasq.d: Permission denied

dnsmasq: cannot access directory /etc/dnsmasq.d: Permission denied

dnsmasq: cannot access directory /etc/dnsmasq.d: Permission denied

dnsmasq: cannot access directory /etc/dnsmasq.d: Permission denied

dnsmasq: cannot access directory /etc/dnsmasq.d: Permission denied

dnsmasq: cannot access directory /etc/dnsmasq.d: Permission denied

dnsmasq: cannot access directory /etc/dnsmasq.d: Permission denied

dnsmasq: cannot access directory /etc/dnsmasq.d: Permission denied

dnsmasq: cannot access directory /etc/dnsmasq.d: Permission denied

dnsmasq: cannot access directory /etc/dnsmasq.d: Permission denied

The command I used was: sudo ./deploy/deploy.sh -i amplab/spark:1.0.0 -w 3
Am I forgetting some param? I tried passing -v flag and did not work...

Why does master and worker nodes expect the hostname static as "master"

I am using spark 1.0.0 docker images. When I start master node with hostnames other than "master" it simply fails. Moreover, the worker nodes try to contact the master node using the name master instead of the IP provided in the command line argument of docker run command. It changes the /etc/hadoop/core-site.xml, but why does it contact the master node with the name "master". Following are the logs of master and worker respectively:

1- Master log with hostname other than master:
core@coreos-2 ~ $ docker run -itP -h master spark-master:1.0.0

core@coreos-2 ~ $ docker run -itP spark-master:1.0.0
SPARK_HOME=/opt/spark-1.0.0
HOSTNAME=ad28c0356f17
TERM=xterm
SCALA_VERSION=2.10.3
PATH=/opt/spark-1.0.0:/opt/scala-2.10.3/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
SPARK_VERSION=1.0.0
PWD=/
JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
SHLVL=1
HOME=/root
SCALA_HOME=/opt/scala-2.10.3
_=/usr/bin/env
MASTER_IP=172.17.0.2
preparing Spark
starting Hadoop Namenode
starting sshd
starting Spark Master
starting org.apache.spark.deploy.master.Master, logging to /opt/spark-1.0.0-bin-hadoop1/sbin/../logs/spark-hdfs-org.apache.spark.deploy.master.Master-1-ad28c0356f17.out
Warning: SPARK_MEM is deprecated, please use a more specific config option
(e.g., spark.executor.memory or SPARK_DRIVER_MEMORY).
Spark Command: /usr/lib/jvm/java-7-openjdk-amd64/bin/java -cp ::/opt/spark-1.0.0-bin-hadoop1/conf:/opt/spark-1.0.0-bin-hadoop1/lib/spark-assembly-1.0.0-hadoop1.0.4.jar -XX:MaxPermSize=128m -Dspark.akka.logLifecycleEvents=true -Xms800m -Xmx800m org.apache.spark.deploy.master.Master --ip master --port 7077 --webui-port 8080
========================================

14/09/30 09:19:19 INFO SecurityManager: Changing view acls to: hdfs
14/09/30 09:19:19 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hdfs)
14/09/30 09:19:20 INFO Slf4jLogger: Slf4jLogger started
14/09/30 09:19:20 INFO Remoting: Starting remoting
Exception in thread "main" java.net.UnknownHostException: master: Name or service not known
    at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
    at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:901)
    at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1293)
    at java.net.InetAddress.getAllByName0(InetAddress.java:1246)
    at java.net.InetAddress.getAllByName(InetAddress.java:1162)
    at java.net.InetAddress.getAllByName(InetAddress.java:1098)
    at java.net.InetAddress.getByName(InetAddress.java:1048)
    at akka.remote.transport.netty.NettyTransport$$anonfun$addressToSocketAddress$1.apply(NettyTransport.scala:382)
    at akka.remote.transport.netty.NettyTransport$$anonfun$addressToSocketAddress$1.apply(NettyTransport.scala:382)
    at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
    at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
    at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:42)
    at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

2- Worker log:
core@coreos-1 ~/docker-scripts/spark-1.0.0/spark-worker $ docker run -P -h worker spark-worker:1.0.0 10.132.232.22

WORKER_IP=172.17.0.54
preparing Spark
starting Hadoop Datanode
 * Starting Apache Hadoop Data Node server hadoop-datanode
starting datanode, logging to /var/log/hadoop//hadoop--datanode-worker.out
   ...done.
starting sshd
starting Spark Worker
Warning: SPARK_MEM is deprecated, please use a more specific config option
(e.g., spark.executor.memory or SPARK_DRIVER_MEMORY).
14/09/30 09:33:38 INFO SecurityManager: Changing view acls to: hdfs
14/09/30 09:33:38 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hdfs)
14/09/30 09:33:39 INFO Slf4jLogger: Slf4jLogger started
14/09/30 09:33:40 INFO Remoting: Starting remoting
14/09/30 09:33:40 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkWorker@worker:48571]
14/09/30 09:33:40 INFO Worker: Starting Spark worker worker:48571 with 1 cores, 1500.0 MB RAM
14/09/30 09:33:40 INFO Worker: Spark home: /opt/spark-1.0.0
14/09/30 09:33:41 INFO WorkerWebUI: Started WorkerWebUI at http://worker:8081
14/09/30 09:33:41 INFO Worker: Connecting to master spark://master:7077...
14/09/30 09:33:41 WARN Remoting: Tried to associate with unreachable remote address [akka.tcp://sparkMaster@master:7077]. Address is now gated for 60000 ms, all messages to this address will be delivered to dead letters.
14/09/30 09:33:41 INFO RemoteActorRefProvider$RemoteDeadLetterActorRef: Message [org.apache.spark.deploy.DeployMessages$RegisterWorker] from Actor[akka://sparkWorker/user/Worker#-1054615506] to Actor[akka://sparkWorker/deadLetters] was not delivered. [1] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
14/09/30 09:34:01 INFO Worker: Connecting to master spark://master:7077...
14/09/30 09:34:01 INFO RemoteActorRefProvider$RemoteDeadLetterActorRef: Message [org.apache.spark.deploy.DeployMessages$RegisterWorker] from Actor[akka://sparkWorker/user/Worker#-1054615506] to Actor[akka://sparkWorker/deadLetters] was not delivered. [2] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
14/09/30 09:34:21 INFO Worker: Connecting to master spark://master:7077...
14/09/30 09:34:21 INFO RemoteActorRefProvider$RemoteDeadLetterActorRef: Message [org.apache.spark.deploy.DeployMessages$RegisterWorker] from Actor[akka://sparkWorker/user/Worker#-1054615506] to Actor[akka://sparkWorker/deadLetters] was not delivered. [3] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
14/09/30 09:34:41 ERROR Worker: All masters are unresponsive! Giving up.

P.S: The worker container is present on different machine (coreos-1). Therefore, it cannot connect to master, as there is no global discovery service.