This extends the previous Docker tutorials. This is a stand-alone Spark cluster tutorial on Docker compose. Step 1: The image docker file “spark.dockerfile” in the folder “docker-test/docker/spark/”.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | FROM maven:3.5-jdk-8-alpine RUN apk update \ && apk upgrade \ && apk add --update bash \ && apk add --update curl \ && rm -rf /var/cache/apk/* ARG SPARK_ARCHIVE=https://www.apache.org/dist/spark/spark-2.3.0/spark-2.3.0-bin-hadoop2.7.tgz RUN curl -s $SPARK_ARCHIVE | tar -xz -C /usr/local/ ENV SPARK_HOME /usr/local/spark-2.3.0-bin-hadoop2.7 ENV PATH $PATH:$SPARK_HOME/bin EXPOSE 4040 5005 6066 7077 8080 8081 WORKDIR $SPARK_HOME |
Step 2: The pom.xml file.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>com.myproject</groupId> <artifactId>hellodocker</artifactId> <version>1.0</version> <properties> <maven.compiler.source>1.8</maven.compiler.source> <maven.compiler.target>1.8</maven.compiler.target> </properties> <dependencies> <dependency> <!-- Spark dependency --> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.11</artifactId> <version>2.3.0</version> <scope>provided</scope> </dependency> </dependencies> <build> <plugins> <plugin> <!-- Build an executable JAR --> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-jar-plugin</artifactId> <version>3.1.0</version> <configuration> <archive> <manifest> <addClasspath>true</addClasspath> <classpathPrefix>lib/</classpathPrefix> <mainClass>com.mypkg.HelloDocker</mainClass> </manifest> </archive> </configuration> </plugin> </plugins> </build> </project> |
Step 3: The Spark code “HelloDocker.java”…