Dynamically Creating PDFs in a Web Applicationby Sean C. Sullivan
On a recent logistics project, a customer asked our team to build a web site that would allow users to query a legacy system for shipment information. The customer defined three main requirements:
- The shipping information had to be returned in the form of a PDF document.
- The PDF file must be downloadable through a web browser.
- The PDF file must be viewable using the Adobe Acrobat Reader.
Our team had plenty of experience with J2EE web applications, but we had little experience with PDF documents. We needed to find a pure Java class library that could produce sophisticated PDF documents in a server-side web application. We found a solution that completely met our needs: iText.
iText Class Library
iText is an open source pure Java class library for creating and manipulating PDF documents. Bruno Lowagie and Paulo Soares lead the project. The iText API enables a Java developer to programmatically create PDF documents. iText delivers a rich set of features:
- Support for both PDF and FDF documents
- Various page sizes
- Landscape or portrait layouts
- Page headers
- Page footers
- Page numbering
- Document encryption
- JPEG, GIF, PNG, and WMF images
- Ordered and unordered lists
- Document templates
iText is an open source library. At the time of this writing, the iText software is available under a dual license: the Mozilla Public License (MPL) and the LGPL. Consult the iText web site for details. In this article, you'll see the iText API in action. We will demonstrate how to use iText and servlets to dynamically generate PDF documents in a server-side application.
First, you will need to obtain the iText JAR file. Visit the iText web site and download the current release. At the time of this writing, the current iText release is version 0.99. The iText web site provides API documentation and a comprehensive tutorial.
In addition to iText, we'll be using servlets, too. If you aren't familiar with servlets, you can learn about them in Jason Hunter's book, Java Servlet Programming. You will need to obtain a J2EE application server or a standalone servlet engine. Some good open source options are Tomcat, Jetty, and JBoss. The rest of this article assumes that you are using Jakarta Tomcat 4.1.
The iText API
The iText API is intuitive and easy to use. Using iText, you will be able to programmatically create customized PDF documents. The iText library consists of the following packages:
com.lowagie.servlets com.lowagie.text com.lowagie.text.html com.lowagie.text.markup com.lowagie.text.pdf com.lowagie.text.pdf.codec com.lowagie.text.pdf.hyphenation com.lowagie.text.pdf.wmf com.lowagie.text.rtf com.lowagie.text.xml com.lowagie.tools
For generating PDF files, you'll need only
Our example application uses these iText classes:
com.lowagie.text.pdf.PdfWriter com.lowagie.text.Document com.lowagie.text.HeaderFooter com.lowagie.text.Paragraph com.lowagie.text.Phrase com.lowagie.text.Table com.lowagie.text.Cell
The key classes are
will always use both of these classes when creating PDF documents.
Document is an object-oriented representation of a PDF document.
You can add content to the document by invoking methods provided by the
Document class. A
PdfWriter object associates a
Document with a
Coordinate System for iText Documents
When I wrote my first iText program, I stumbled over the coordinate system. I naively assumed that iText's coordinate system was identical to Swing's coordinate system. This is not the case.
In Swing, the origin (0, 0) is located in the upper left-hand corner of a component. In iText, the origin is located in the bottom left-hand corner of a page.
Using iText in a Web Application
During your design phase, you must decide how you plan to use iText. I've built web applications using both of the following techniques.
Create the PDF file on the server's filesystem. The application
java.io.FileOutputStream to write the file to
the server's filesystem. The user will download the file via HTTP
Create the PDF file in memory using
java.io.ByteArrayOutputStream. The application sends the PDF
bytes to the client via the servlet's output stream.
Download the source code for this example:
I prefer technique B to technique A because the application does not write to the server's filesystem, and the application is guaranteed to work in a clustered server environment. Technique A can fail if your application runs in a clustered environment, and the server cluster does not provide session affinity.
Our example application consists of a single class:
This servlet uses technique B from the previous section. The
OutputStream is a
ByteArrayOutputStream, the PDF document bytes will be in memory.
PDFServlet receives an HTTP request, it will dynamically
generate a PDF document and send the document to the client.
PDFServlet class extends
javax.servlet.http.HttpServlet and imports two of the iText
Most servlets override either the
doPost method or the
doGet method. Our servlet is no different. The
PDFServlet class overrides the
doGet method. The
servlet will generate a PDF file any time it receives an incoming HTTP
In a nutshell, the servlet's
doGet method does the
- Creates a
ByteArrayOutputStreamobject that contains the PDF document bytes.
- Sets the HTTP response headers on the response object.
- Gets the servlet output stream.
- Writes the document bytes to the servlet output stream.
- Flushes the servlet output stream.
Figure 1. Editing
doGet in Eclipse
generatePDFDocumentBytes method is responsible for creating
the PDF document. The three most important objects in this method are the
Document object, the
PdfWriter object. The
Document with the
Document doc = new Document(); ByteArrayOutputStream baosPDF = new ByteArrayOutputStream(); PdfWriter docWriter = null; docWriter = PdfWriter.getInstance(doc, baosPDF); // ...
Adding content to a
Document is done with the
doc.add(new Paragraph( "This document was created by a class named: " + this.getClass().getName())); doc.add(new Paragraph( "This document was created on " + new java.util.Date()));
When you are done adding content, close the
After closing the document, the
ByteArrayOutputStream object is returned to the caller.
ByteArrayOutputStream contains all bytes for the PDF
HTTP Response Headers
In this application, we care only about four HTTP response headers:
Cache-control. If you've never
worked with HTTP headers before, consult the HTTP 1.1 specification.
doGet method in the
You'll notice that the HTTP response headers are set before any data is
written to the servlet output stream. This is an important, yet subtle, point.
Let's look at each response header in more detail.
Pages: 1, 2