As a Java developer it is a very common task to read file contents to a String object. It is also very common in pre-interview written tests read the contents of a file and apply regex to split string, etc.
4 things to watch-out for in File processing
1) Files must be closed once read. “Try with resources” feauture in java 7 is used to auto close the file once read.
2) Favor reading from a classpath over loading from an absolute path.
3) Scanner is for ASCII files, and a line-oriented scanner cannot be used for binary files. You have no guarantee that the binary file even has “lines” delimited by newline characters.
4) Reading large files directly into memory can cause memory issues. Read Processing large files efficiently in Java
Java Scanner class in action
#1. Scanner class reading from an absolute file path
JDK 7 or later must be used to take advantage of the try with resources that auto closes the file.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
package com.read.file; import java.io.File; import java.io.FileNotFoundException; import java.util.Scanner; public class MyFileReader { public static void main(String[] args) { String path = "C:\\Users\\akumaras\\workspace\\test\\src\\com\\read\\file\\readme.txt"; //try with resource auto close the file try (Scanner sc = new Scanner(new File(path))) { String readContent = sc.useDelimiter("\\Z").next(); System.out.println(readContent); } catch (FileNotFoundException e) { e.printStackTrace(); } } } |
The “java.util.regex.Pattern class states:
“\Z” The end of the input but for the final terminator, if any.
“\z” The end of the input.
The extra “\” is added to escape, since backslash is a special character in Java String. For e.g. to print a \ or ” which are special in string literal you have to escape it with another \ which gives us \\ and \”. Similarly, you need to escape “\” in “\Z” with another “\” which becomes “\\Z“.
#2. Scanner class dynamically constructing an absolute path
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
package com.read.file; import java.io.File; import java.io.FileNotFoundException; import java.util.Scanner; public class MyFileReader { public static void main(String[] args) { //gives the current working folder String currentWorkDir = System.getProperty("user.dir"); //convert the package name to path String packagePath = MyFileReader.class.getPackage().getName().replace(".", "/"); try (Scanner sc = new Scanner(new File(currentWorkDir + "/src/" + packagePath + "/" + "readme.txt"))) { String readContent = sc.useDelimiter("\\Z").next(); System.out.println(readContent); } catch (FileNotFoundException e) { e.printStackTrace(); } } } |
#3. Scanner class reading from the classpath relatively
The approaches #1 and #2 of reading a file via absolute path is not recommended because if the you move the deployed files to some other location then you will get “FileNotFoundException”. A better approach is to read from your classpath. The method “getResourceAsStream” in the java.lang.Class API to the rescue.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
package com.read.file; import java.util.Scanner; public class MyFileReader { public static void main(String[] args) { //getResourceAsStream to the rescue try (Scanner sc = new Scanner( MyFileReader.class.getResourceAsStream("readme.txt"), "UTF-8")) { String readContent = sc.useDelimiter("\\Z").next(); System.out.println(readContent); } } } |
Why is this a better approach?
If you build a jar or war packaging of the above “MyFileReader” and “readme.txt” it can be deployed anywhere. For example, let’s build a jar file with Maven.
Step 1: Create a Maven Jar project
1 |
mvn archetype:generate -DgroupId=com.read.file -DartifactId=readFileWithScanner |
Step 2: Create the files MyFileReader.java & readme.txt
src/main/java: com.read.file.MyFileReader.java
src/main/resources: com.read.file.readme.txt
The “src/main/resources” folder can be created with right mouse click on “readFileWithScanner” and then “new –> Source Folder” and then typing “src/main/resources” as the source folder.
After importing into eclipse looks like:
Step 3: Ensure that the pom.xml uses Java 7 or later
to take advantage of the “Try with resources” feature.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>com.read.file</groupId> <artifactId>readFileWithScanner</artifactId> <version>1.0-SNAPSHOT</version> <packaging>jar</packaging> <name>readFileWithScanner</name> <url>http://maven.apache.org</url> <properties> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> </properties> <dependencies> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>3.8.1</version> <scope>test</scope> </dependency> </dependencies> <build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId> </plugin> </plugins> <pluginManagement> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId> <version>3.2</version> <configuration> <source>1.7</source> <target>1.7</target> </configuration> </plugin> </plugins> </pluginManagement> </build> </project> |
Step 4: Build the jar file
1 |
mvn clean package |
Step 5: The built jar file looks like
Step 6: Copy this built jar file to say c:\temp folder and run
1 2 3 |
C:\Users\akumaras\projects>java -classpath c:\Temp\readFileWithScanner-1.0-SNAPSHOT.jar com.read.file.MyFileReader A big brown fox jumped over the fence |
So, you can run this jar file in any folder as the lookup of the file is relative to the classpath.
#4. 2 Scanners: one for reading the file line by line & the other for tokenizing the line on spaces
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
package com.read.file; import java.util.Scanner; public class MyFileReader { public static void main(String[] args) { Scanner fileScanner = new Scanner(MyFileReader.class.getResourceAsStream("readme.txt"), "UTF-8"); try { while (fileScanner.hasNextLine()) { String line = fileScanner.nextLine(); Scanner lineScanner = new Scanner(line); while (lineScanner.hasNext()) { String token = lineScanner.next(); System.out.println(token); //you can do whatever you want with the tokens } lineScanner.close(); } } finally { fileScanner.close(); } } } |
The output:
1 2 3 4 5 6 7 8 |
A big brown fox jumped over the fence |