ONJava.com -- The Independent Source for Enterprise Java
oreilly.comSafari Books Online.Conferences.

advertisement

AddThis Social Bookmark Button

Reading and Writing Excel Files with POI

by Avik Sengupta, Andrew Oliver and Rainer Klute
04/16/2003

In our previous article, we discussed what the POI project is all about, showed how to read and write OLE 2 Compound Document files, and gave a brief history of the POI project. Probably half of the folks who read that article are scratching their heads now, thinking "How do I write out a spreadsheet?" Good news! In this article we'll show you how to read and write Excel files and how to read any Microsoft file's document properties.

Conventions

The POI project is nearing a 2.0 release and is in a stage of rapid development, with new features and changes integrating nightly. In order to keep this article relevant, we'll refer to the recent 1.9-development release. While compiling the sources with later releases should work, there may be minor changes between now and the 2.0 release.

Excel Basics

The Microsoft Excel 97 file format is also known as "BIFF8." Recent versions of Excel have changed very little about this file format, and writing out the new intricacies would serve no purpose other than to make everyone upgrade. So when we say Excel 97 format, we mean Excel 97-to-XP format.

Related Reading

Head First Java
Your Brain on Java - A Learner's Guide
By Bert Bates, Kathy Sierra

HSSF

Our implementation of the Excel 97 file format is called "HSSF," which stands for, you guessed it, Horrible SpreadSheet Format. (We admire their method of making simple things complicated and oversimplifying things that should have been done with more flexibility.) HSSF may have a comical name, but is a very serious API. HSSF lets you read, write, and modify Excel files using nothing but Java.

What does HSSF have to do with POIFS, which was covered in the previous article? Like all of the other POI APIs, it is built on top of POIFS. So there is code contained within HSSF that is very similar to the examples included with the previous article. Generally, however, you don't need to know about POIFS APIs when writing to HSSF APIs.

HSSF APIs

HSSF has two APIs for reading: usermodel and eventusermodel. The former is most familiar, and the latter is more cryptic but far more efficient. The usermodel consists primarily of the classes in the org.apache.poi.hssf.usermodel package, as well as org.apache.poi.hssf.eventusermodel. (In earlier versions of HSSF, this was in the eventmodel package.) The usermodel package maps the file into familiar structures like Workbook, Sheet, Row, and Cell. It stores the entire structure in memory as a set of objects. The eventusermodel package requires you to become more familiar with the actual low-level structures of the file format. It operates in a manner similar to XML's SAX APIs or the AWT event model (the origin of the name)--and can be trickier to use. It is also read-only, so you cannot modify files using the eventusermodel.

Reading Using usermodel

Reading files using the HSSF usermodel is simple. Create a new inputstream and construct an instance of HSSFWorkbook.

InputStream myxls = new FileInputStream("workbook.xls"));
HSSFWorkbook wb     = new HSSFWorkbook(myxls);

With the HSSFWorkbook instance, you can now retrieve a sheet, its rows, and its cells:

HSSFSheet sheet = wb.getSheetAt(0);       // first sheet
HSSFRow row     = sheet.getRow(2);        // third row
HSSFCell cell   = row.getCell((short)3);  // fourth cell

This fetches the fourth cell of the third row from the first sheet in the workbook. You can retrieve a value from the cell object. Be sure to note the cell type before retrieving its value.

if (cell.getCellType() == HSSFCell.CELL_TYPE_STRING) {
	("The Cell was a String with value " + cell.getStringCellValue());
} else if (cell.getCellType() == HSSFCell.CELL_TYPE_NUMERIC) {
	("The cell was a number " + cell.getCellValue());
} else () {
	("The cell was nothing we're interested in");
}

Note that you may get an exception if you try and ask for an unrelated datatype.

Handle dates carefully while using HSSF. Excel stores all dates as numbers, internally. The only way to distinguish a date is by the formatting of the cell. (If you have ever formatted a cell containing a date in Excel, you will know what I mean.)

Therefore, for a cell containing a date, cell.getCellType() will return HSSFCell.CELL_TYPE_NUMERIC. However, you can use a utility function, HSSFDateUtil.isCellDateFormatted(cell), to check if the cell can be a date. This function checks the format against a few internal formats to decide the issue, but by its very nature it is prone to false negatives.

Appendix 1 contains a complete example of using HSSF to create and return a workbook from a servlet.

Example 1 reads in an Excel spreadsheet and converts it to comma-separated values.

Writing Files Using usermodel

Writing XLS files is even simpler. Create a new instance of HSSFWorkbook. At some point, you'll need to create an outputstream to write out the file to disk; however, this can be done at the end.

HSSFWorkbook wb          = new HSSFWorkbook();
FileOutputStream fileOut = new FileOutputStream("workbook.xls");
wb.write(fileOut);
fileOut.close();

To create sheets and rows, you must do so from the parent object. For instance:

HSSFSheet sheet = wb.createSheet();
HSSFRow row     = sheet.createRow((short)0); 
HSSFCell cell   = row.createCell((short)0); 
cell.setCellValue(1); 
row.createCell((short)1).setCellValue(1.2); 
row.createCell((short)2).setCellValue("This is a string");
row.createCell((short)3).setCellValue(true);

To style a cell, create a style and assign it to the cell. Assign that style to as many cells as should be styled in that fashion. This confuses a number of new users of HSSF, as it is currently possible to create an invalidly large number of styles. Ideally, you want to create one style for the same rules that you assign it. If you have a summary row and want to make its cells bold and underlined, create a summaryRowStyle and assign it to any cell that is on a summary row.

It is important to realize that the CellFormat and CellStyle objects are members of the workbook that are referenced by the cell.

...

HSSFCellStyle style = workbook.createCellStyle();
style.setDataFormat(HSSFDataFormat.getBuiltinFormat("($#,##0_);[Red]($#,##0)"));
style.setFillBackgroundColor(HSSFColor.AQUA.index);
style.setFillPattern(HSSFCellStyle.BIG_SPOTS);

...

someCell.setCellStyle(style);
someOtherCell.setCellStyle(style);

Newer editions of HSSF allow you use a limited set of formulas. Note that this is a beta-quality feature. You should do appropriate testing before using it.

A formula assignment looks like this:

someCell.setCellFormula(SUM(A1:A2:);

At present, you can use any built-in function or operator in the formula, except logical operators and functions (such as the IF function). This feature is being worked on even as we write this.

Modifying Files Using usermodel

Files are modified by reading and rewriting. Example 3 takes in an Excel file and changes the values based on rules specified in a .property file.

Reading Files with the eventusermodel

Reading files via the eventusermodel is much harder but is much more memory-efficient, since it expects the application to process the data as it is read. Reading in a file in this way is analogous to reading an XML document using SAX, as opposed to DOM. You have to register your interest in the structures you want, and the library will call back when such structures are encountered. Before using this model, however, you must familiarize yourself with some basic structures in an Excel workbook.

In HSSF, the low-level binary structures are called records. Records come in different types, each of which is modelled as a Java class in the org.apache.poi.hssf.record package. For example, the BOFRecord denotes the start of the Workbook or Sheet section. The RowRecord denotes that a row exists and stores its style information. Any record exposing the CellValueRecordInterface is a cell value. These include NumericRecord, LabelSSTRecord, and FormulaRecord. (There are a few more that are obsolete or for optimization, but generally speaking, HSSF converts them.)

See The Structure of an HSSF Spreadsheet for more details.

private EventRecordFactory factory = new EventRecordFactory();
factory.registerListener(new ERFListener() {
    public boolean processRecord(Record rec) {
        (got BOF Record);              
        return true;              
    }
}, new short[] {BOFRecord.sid});
factory.processRecords(someInputStream);

The Structure of an HSSF Spreadsheet

HSSF, as we mentioned, is built on top of POIFS. More to the point, an Excel 97+ file is an OLE 2 Compound Document. The underlying OLE 2 Compound Document stores a stream, or File. This "stream" is always named Workbook (except in Excel 95, which HSSF does not handle). Currently, HSSF deals strictly with this stream and delegates all functionality to POIFS. In recent builds, HSSF is aware enough to preserve these other nodes. Macros and Images are stored in separate streams and sometimes even separate directories within the OLE 2 CDF file. Macros should be preserved; however, we have no API as of yet to handle them.

Within each stream is a set of records. A record is just an array of bytes, with a header and a body. The header contains the record type (AKA the "id") and the length of the following data. The body is broken up into fields. Fields contain numeric data (including references to other records), character data, or flags.

The following is the top-level structure of an Excel Workbook:

Bla.xls {
    OLE2CDF headers
    "Workbook" stream {
        Workbook {
            Static String Table Record..
            Sheet names... and pointers
        } 
        Sheet {
            ROW
            ROW
            ...
            NUMBER RECORD (cell)
            LABELSST Record (cell)
            ...
        }
        Sheet
    }
}
... images, macros, etc.
Document Summary
Summary

Pages: 1, 2

Next Pagearrow