Reading and Writing Excel Files with POI
Pages: 1, 2
Reading Document Properties with HPSF
Users of Microsoft Word, Excel, or PowerPoint can attach additional information to their documents using the File->Properties... menu. Properties include a document's title, a summary, a category, and keywords. The application itself adds further information: the last author, the date and time of the last modification, the date and time the document was printed, and so on.
Properties are stored separately from the document itself. As you know, an
OLE 2 CDF file is a container consisting of directories and files internally,
and POIFS gives you access to these files. The files are also called streams.
Document properties are stored in streams of their own within a POIFS
filesystem. Consider a Word document: while you see a simple file named
myfile.doc on your hard disk, internally, it consists of a file
called WordDocument and two files called
SummaryInformation and DocumentSummaryInformation.
There are usually other files, but we won't care about them here.
Can you guess what these streams contain? Yes, WordDocument
contains the text you edited with Word. The document properties are stored in
the SummaryInformation and DocumentSummaryInformation
streams. It would be too simple to have all of the properties in a single stream, so
Microsoft decided to use two of them. To make things even more complicated,
these streams' names start with the octal \005 character. This is
a non-printable character and is omitted above.
The nice thing with the standard Microsoft-defined properties is that they don't care whether the main document is a Word document, an Excel file, a PowerPoint presentation, or some other document. Once you know how to read the properties of an Excel file, you can retrieve the properties of all other documents, too.
Luckily, you don't have to figure out too much, because your Java program can extract these information using POI's HPSF package. HPSF stands for Horrible Property Set Format and is POI's implementation for reading properties. Writing is not yet supported.
HPSF has a simple API for reading Microsoft's standard properties and a more complicated but general API for reading any sort of property sets. We'll focus on the simple API here because it should suffice for most applications.
A Sample Application
Let's have a look at a sample Java program that reads the "title" property of an OLE 2 CDF document:
import java.io.*;
import org.apache.poi.hpsf.*;
import org.apache.poi.poifs.eventfilesystem.*;
/**
* <p>Sample application showing how to read a OLE 2 document's
* title. Call it with the document's file name as command line
* parameter.</p>
*
* @author Rainer Klute (klute@rainer-klute.de)
*/
public class ReadTitle
{
public static void main(String[] args) throws IOException
{
final String filename = args[0];
POIFSReader r = new POIFSReader();
r.registerListener(new MyPOIFSReaderListener(),
"\005SummaryInformation");
r.read(new FileInputStream(filename));
}
static class MyPOIFSReaderListener implements POIFSReaderListener
{
public void processPOIFSReaderEvent(POIFSReaderEvent event)
{
SummaryInformation si = null;
try
{
si = (SummaryInformation)
PropertySetFactory.create(event.getStream());
}
catch (Exception ex)
{
throw new RuntimeException
("Property set stream \"" + event.getPath() +
event.getName() + "\": " + ex);
}
final String title = si.getTitle();
if (title != null)
System.out.println("Title: \"" + title + "\"");
else
System.out.println("Document has no title.");
}
}
}
The main() method uses POIFS' eventing filesystem to read a
stream named \005SummaryInformation from the OLE 2 document named
on the command line. When the POIFSReader encounters this stream,
it transfers control to the processPOIFSReaderEvent() method of
MyPOIFSReaderListener. You've already learned the details of this mechanism from the first article of this series.
What does processPOIFSReaderEvent() do? As a parameter, it
essentially gets an input stream containing some of the document's properties,
including the title. To access the properties, the application must create a
PropertySet instance from the contents of the input stream, as in
the following statement:
si = (SummaryInformation) PropertySetFactory.create(event.getStream());
The statement consists of three steps:
event.getStream()retrieves the input stream from thePOIFSReaderEventpassed in by thePOIFSReader.- The class
PropertySetFactory's static methodcreate()is called with the input stream as parameter. As the name implies,PropertySetFactoryis a factory class with a "machine" to transform an input stream into aPropertySetinstance. This machine is thecreate()method. - We cast the
PropertySetreturned fromcreate()toSummaryInformation. ThePropertySetclass has all of the mechanisms to read property sets in general.SummaryInformationis a subclass ofPropertySetwith special knowledge about Microsoft's standard properties.
Since several things can go wrong, the statement is encapsulated in a try
block. The sample application simply catches all exceptions. A production
program will likely differentiate between several possible causes for
exceptions. Besides any I/O exceptions, an HPSF-specific exception could occur.
For example, the NoPropertySetStreamException is thrown if the
input stream does not contain a property set or if the property set is
faulty.
A quite unlikely but still possible error is that the
\005SummaryInformation stream contains a valid property set, but
not the summary information property set. If this should ever happen, the cast
to SummaryInformation is not possible and a
ClassCastException is thrown.
Once the SummaryInformation instance is established, the rest
is simple. The application just calls getTitle() and prints the
result.
Besides getTitle(), SummaryInformation contains
other convenience methods, including getApplicationName(),
getAuthor(), getCharCount(), and
getCreateDateTime(). The HPSF Javadoc documentation describes them in
more detail.
The Document Summary Information
Unfortunately, not all properties reside in the summary information property
set. Many, but not all, OLE 2 files have an additional property set. It is
called the document summary information and resides in a stream named
\005DocumentSummaryInformation. This property set holds properties
like the document's category, the number of multimedia clips in a PowerPoint
presentation, and more.
To access this property set, your application should proceed as shown above,
except that you should register for the
\005DocumentSummaryInformation stream. You probably want to
register for both summary information and document summary information. As
before, you should also hand over the input stream containing the document
summary information to PropertySetFactory.create(). However, the
factory method will return a DocumentSummaryInformation
object, not a SummaryInformation instance. If you registered for
both types of property sets, you should check the type of the returned object.
Either use Java's instanceof operator or the query methods
isSummaryInformation() and
isDocumentSummaryInformation(). Remember that
create() always returns a PropertySet, so you can
call these methods on it. The PropertySet class provides two
methods, because the property set could be custom-defined .
More Property Sets
If you want to deal with custom-defined property sets, or if you want to read user-defined properties from the standard property sets, you will have to use the general property set API. This is more complicated than the API outlined above and is not covered by this article. To learn about it, you should read the HPSF HOW-TO in the POI documentation and consult the Javadocs.
Conclusion
This article covered HSSF and how to output to Excel. We also covered HPSF and how to read property sets and document summary information. Next time, we'll show how to transform from XML to Excel format using the HSSF Serializer and Cocoon, as well as the current HDF and Word format.
References
Avik Sengupta is a committer on the Jakarta POI project. He is currently Director of Technology at Itellix Software Solutions.
Andrew Oliver is a professional cat herder who moonlights as a software developer, and runs a small consultancy with an ugly retro-looking web site that one day will be updated when he and his compadres aren't too busy working with clients.
Rainer Klute is a committer on the Jakarta POI project.
Appendix 1
Example 1. To create and return a workbook from a servlet
package org.apache.poi.hssf.usermodel.examples;
import java.io.*;
import java.net.*;
import javax.servlet.*;
import javax.servlet.http.*;
import org.apache.poi.hssf.usermodel.*;
public class HSSFCreate extends HttpServlet {
public void init(ServletConfig config) throws ServletException {
super.init(config);
}
public void destroy() {
}
/** Processes requests for both HTTP GET and POST methods.
* @param request servlet request
* @param response servlet response
*/
protected void processRequest(HttpServletRequest request,
HttpServletResponse response) throws ServletException, IOException {
response.setContentType("application/vnd.ms-excel");
HSSFWorkbook wb = new HSSFWorkbook();
HSSFSheet sheet = wb.createSheet("new sheet");
// Create a row and put some cells in it. Rows are 0 based.
HSSFRow row = sheet.createRow((short)0);
// Create a cell and put a value in it.
HSSFCell cell = row.createCell((short)0);
cell.setCellValue(1);
// Or do it on one line.
row.createCell((short)1).setCellValue(1.2);
row.createCell((short)2).setCellValue("This is a string");
row.createCell((short)3).setCellValue(true);
// Write the output
OutputStream out = response.getOutputStream();
wb.write(out);
out.close();
}
/** Handles the HTTP <code>GET</code> method.
* @param request servlet request
* @param response servlet response
*/
protected void doGet(HttpServletRequest request,
HttpServletResponse response) throws ServletException, IOException {
processRequest(request, response);
}
/** Handles the HTTP POST method.
* @param request servlet request
* @param response servlet response
*/
protected void doPost(HttpServletRequest request,
HttpServletResponse response) throws ServletException, IOException {
processRequest(request, response);
}
/** Returns a short description of the servlet.
*/
public String getServletInfo() {
return "Example to create a workbook in a servlet using HSSF";
}
}
Return to ONJava.com.