This extends Create or append a file to HDFS – Hadoop API tutorial to write an AVRO file to HDFS. Step 1: Include the AVRO library files in the pom.xml file.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 | <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>com.mytutorial</groupId> <artifactId>simple-hadoop</artifactId> <packaging>jar</packaging> <version>1.0-SNAPSHOT</version> <name>simple-hadoop</name> <url>http://maven.apache.org</url> <properties> <hadoop.version>2.7.2</hadoop.version> <avro.version>1.8.1</avro.version> </properties> <dependencies> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>4.12</version> <scope>test</scope> </dependency> <!-- Hadoop --> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-hdfs</artifactId> <version>${hadoop.version}</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>${hadoop.version}</version> </dependency> <dependency> <groupId>org.apache.avro</groupId> <artifactId>avro</artifactId> <version>${avro.version}</version> </dependency> <!-- Required for the FsInput class --> <dependency> <groupId>org.apache.avro</groupId> <artifactId>avro-mapred</artifactId> <version>${avro.version}</version> <classifier>hadoop2</classifier> </dependency> </dependencies> <build> <plugins> <!-- build an uber jar with hadoop dependencies included --> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-shade-plugin</artifactId> <executions> <execution> <phase>package</phase> <goals> <goal>shade</goal> </goals> </execution> </executions> </plugin> </plugins> </build> </project> |
Step 2: The AVRO files are schema based….