Docker and Spark are two technologies which are very hyped these days. The repository contains a Docker file to build a Docker image with Apache Spark.
Log into your Ubuntu installation as a user with sudo privileges. Install wget and wget docker
sudo apt-get install wget
Get the latest Docker package.
wget -qO- https://get.docker.com/ | sh
The system prompts you for your sudo password. Then, it downloads and installs Docker and its dependencies.
Note: If your company is behind a filtering proxy, you may find that the apt-key command fails for the Docker repo during installation. To work around this, add the key directly using the following:
wget -qO- https://get.docker.com/gpg | sudo apt-key add -
Verify docker is installed correctly.
$ docker run hello-world Unable to find image 'hello-world:latest' locally latest: Pulling from library/hello-world 535020c3e8ad: Pull complete af340544ed62: Pull complete Digest: sha256:a68868bfe696c00866942e8f5ca39e3e31b79c1e50feaee4ce5e28df2f051d5c Status: Downloaded newer image for hello-world:latest Hello from Docker. This message shows that your installation appears to be working correctly. To generate this message, Docker took the following steps: 1. The Docker client contacted the Docker daemon. 2. The Docker daemon pulled the "hello-world" image from the Docker Hub. 3. The Docker daemon created a new container from that image which runs the executable that produces the output you are currently reading. 4. The Docker daemon streamed that output to the Docker client, which sent it to your terminal. To try something more ambitious, you can run an Ubuntu container with: $ docker run -it ubuntu bash Share images, automate workflows, and more with a free Docker Hub account: https://hub.docker.com For more examples and ideas, visit: https://docs.docker.com/userguide/
docker pull duyetdev/docker-spark
docker build --rm -t duyetdev/docker-spark .
If using boot2docker make sure your VM has more than 2GB memory
In your /etc/hosts file add $(boot2docker ip) as host ‘sandbox’ to make it easier to access your sandbox UI
Open yarn UI ports when running container
docker run -it -p 8088:8088 -p 8042:8042 -h sandbox duyetdev/docker-spark bash
docker run -d -h sandbox duyetdev/docker-spark -
In order to check whether everything is OK, you can run one of the stock examples, coming with Spark.
cd /usr/local/spark # run the spark shell ./bin/spark-shell --master yarn-client --driver-memory 1g --executor-memory 1g --executor-cores 1 # execute the the following command which should return 1000 scala> sc.parallelize(1 to 1000).count()