How to Read and Write PDF file in Java

In this post, we are going to learn to read and write a pdf file using Java code. The pdf is a portable document file that is used to transport informative documents. It is one of the widely used file formats.

To deal with pdf file in Java, we use pdfbox library which is the design and developed by the apache foundation. It is used to create, read, write, append the pdf file in an efficient way.

You must download this library before reading the pdf file. You can download it here. The downloaded files are basically JARs files so import them into your project and start working with pdf.

There are some important classes PDDocument, PDPage, PDPageContentStream, etc are required to load and fetch data.

If you are working with the maven project then use the following dependencies in your pom.xml file.

// pom.xml

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>com.studytonight.pdfExample</groupId>
  <artifactId>pdfExample</artifactId>
  <version>0.0.1-SNAPSHOT</version>
  <dependencies>
  <!-- https://mvnrepository.com/artifact/org.apache.pdfbox/pdfbox -->
<dependency>
    <groupId>org.apache.pdfbox</groupId>
    <artifactId>pdfbox</artifactId>
    <version>2.0.12</version>
</dependency>
  </dependencies>
</project>

It will automatically download all the required JARs in you project. So, you don't need to download the JARs manually. After adding these dependencies use the below example code in you Java project to read and write PDF file.

Time for an Example:

Let's start with an example to read a pdf file using the PDFBox library. Here, we have a pdf file test.pdf that we are loading with load() method and reading using getText() method.

import java.io.File;
import java.io.IOException;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.text.PDFTextStripper;

public class Main {
	public static void main(String[] args) throws IOException{  
		try {
			// Creating file instance
			File file = new File("test.pdf");
			// Loading pdf file
			PDDocument document = PDDocument.load(file);
			PDFTextStripper pdfStripper = new PDFTextStripper();
			// Fetching PDF document
			String text = pdfStripper.getText(document);
			System.out.println(text);
			// Closing the document
			document.close();
		}catch(Exception e) {
			System.out.println(e);
		}

	}
}

Example:

Since PDFBox is made for pdf handling, then we can write data to a pdf file. Here, we are writing data to test.pdf file in append mode, for append mode we used APPEND constant in the PDPageContentStream so that the file data will not replace. We used setFont() method to set the font and save() method to save the changes to the file.

import java.io.File;
import java.io.IOException;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
import org.apache.pdfbox.pdmodel.font.PDType1Font;

public class Main {
	public static void main(String[] args) throws IOException{  
	      PDDocument document = PDDocument.load(new File("test.pdf")); 
	      PDPage page = document.getPage(0);
	      PDPageContentStream contentStream = new PDPageContentStream(document, page, PDPageContentStream.AppendMode.APPEND,true,true);
	      contentStream.beginText(); 
	       
	      //Setting the font  
	      contentStream.setFont(PDType1Font.TIMES_ROMAN, 12);

	      //Setting the text position 
	      contentStream.newLineAtOffset(25, 500);

	      String text = "This message is writtern to the pdf file."; 
	      contentStream.showText(text);      
	      contentStream.endText();
	      contentStream.close();	      
	      // Saving file after writing
	      document.save(new File("test.pdf"));
	      document.close();
	}
}

After executing this program, it will create a PDF file test.pdf in the current directory. You can open that file and see it contains the text that we saved with the code.