Q1. What is the maximum possible length of a Java String & how much heap space do you need to store the maximum possible String object?
A1. A Java String internally uses a char array (i.e. char[]) and the indices of an array is an integer. The maximum value of an integer is Integer.MAX_VALUE, which is 2^31 – 1 (or approximately 2 billion). So, you can store a file up to 2 GB in size as a String, and for which you need at least 4 GB memory to store as each char is 2 bytes in Java, plus additional ~4 GB memory for creating the String object, so, in total around 8GB of heap space. So, it is not a good design to read a large file ~2GB into a string for further processing.
Q2. What will be the output if you try to read a 2 GB file into a StringBuilder?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
import java.io.BufferedReader; import java.io.File; import java.io.FileInputStream; import java.io.IOException; import java.io.InputStreamReader; public class ReadLargeFileUsingStringBuilder { public static void main(String[] args) throws IOException { File file = new File("c:/temp/large-2gb-file.xml"); // ~2 GB FileInputStream fis = new FileInputStream(file); InputStreamReader isr = new InputStreamReader(fis); BufferedReader br = new BufferedReader(isr); String line = null; StringBuilder sb = new StringBuilder(); while((line = br.readLine()) != null) { sb.append(line); } br.close(); } } |
A2. If you run as it is you will get “java.lang.OutOfMemoryError: Java heap space”
1 2 3 4 5 6 7 8 9 |
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:3332) at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137) at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:421) at java.lang.StringBuilder.append(StringBuilder.java:136) at com.util.ReadLargeFileUsingStringBuilder.main(ReadLargeFileUsingStringBuilder.java:21) |
Now, if you run it with an 8 GB heap memory, 4 GB to store 2 GB file, and another 4 GB for reading the file into a string.
1 2 3 |
-d64 -Xms8000m -Xmx8000m |
You will get a slightly different “OutOfMemoryError – Requested array size exceeds VM limit”
1 2 3 4 5 6 7 8 9 |
Exception in thread "main" java.lang.OutOfMemoryError: Requested array size exceeds VM limit at java.util.Arrays.copyOf(Arrays.java:3332) at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137) at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:421) at java.lang.StringBuilder.append(StringBuilder.java:136) at com.util.ReadLargeFileUsingStringBuilder.main(ReadLargeFileUsingStringBuilder.java:21) |
When you do “sb.append(line);”, if the underlying array char[] knows that it needs more capacity to satisfy a specific append, it doubles its capacity. This happens each time it runs out of room. The above error indicates that it has gone past the Integer.MAX_VALUE, which is 2^31 – 1.
Q3. How will you fix the above error?
A3. Firstly, it is not a good design to read a 2 GB file into memory. You must read as a stream, and write out as a stream without storing the whole file in memory. You 1) process the contents as you read in chunks 2) split file into smaller sizes, or 3) use a byte array as shown below
The following code will run without throwing “OutOfMemoryError” if run with 3 GB or more heap memory. The byte array will consume 2 GB memory and an additional 1 GB for processing.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
import java.io.File; import java.io.FileInputStream; import java.io.IOException; public class ReadLargeFileUsingByteArray { public static void main(String[] args) throws IOException { File file = new File("c:/temp/large-2gb-file.xml"); FileInputStream fis = new FileInputStream(file); int total = 0; int size = (int)file.length(); byte[] bytes = new byte[size]; //~2GB while(fis.available() > 0) { int result = fis.read(bytes, total, size - total ); total = total + result; } fis.close(); } } |
Runs if you execute with a 3 GB heap size.
1 2 3 |
-d64 -Xms3000m -Xmx3000m |
Q4. What happens to the following code?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
import java.io.File; import java.io.FileInputStream; import java.io.IOException; public class ReadLargeFileUsingByteArray { public static void main(String[] args) throws IOException { File file = new File("c:/temp/large-2gb-file.xml"); FileInputStream fis = new FileInputStream(file); int total = 0; int size = (int)file.length(); byte[] bytes = new byte[size]; //1.8GB while(fis.available() > 0) { int result = fis.read(bytes, total, size - total ); total = total + result; System.out.println("****" + result); } String str = new String(bytes, "UTF-8"); System.out.println(str); fis.close(); } } |
A4. The above code will throw “Exception in thread “main” java.lang.OutOfMemoryError: Java heap space”. But if you increase the heap memory to 8 GB.
1 2 3 |
-d64 -Xms8000m -Xmx8000m |
It runs because you need 4 GB to store as a String (i.e. 2 GB * 2 bytes per char) and an additional 4 GB for processing.
Q5. How will you prevent the following code from throwing “java.lang.OutOfMemoryError: Requested array size exceeds VM limit” when the bytes are closer to 2 GB?
1 2 3 |
org.apache.hadoop.io.Text input = new org.apache.hadoop.io.Text(new String(bytes)) |
A5. The above code can throw the following exception.
1 2 3 4 5 6 7 8 9 |
java.lang.OutOfMemoryError: Requested array size exceeds VM limit at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57) at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) at java.nio.charset.CharsetEncoder.encode(CharsetEncoder.java:795) at org.apache.hadoop.io.Text.encode(Text.java:450) at org.apache.hadoop.io.Text.set(Text.java:198) at org.apache.hadoop.io.Text.<init>(Text.java:88) |
It can be prevented by just passing the “bytes” directly as shown below.
1 2 3 |
org.apache.hadoop.io.Text input = new org.apache.hadoop.io.Text(bytes) |
Relevant links
1) Java primitives & objects – memory consumption interview Q&As