ONJava.com    
 Published on ONJava.com (http://www.onjava.com/)
 See this if you're having trouble printing code examples


Making Sense of Java's Dates

by Philipp K. Janert
06/04/2003

Introduction

Proper handling of calendar dates in computer programs is hard. Not only are there obvious internationalization requirements (English: January, French: Janvier, German: Januar, etc.), but also issues regarding different calendar systems (not every culture counts years starting with the birth of Jesus Christ). If very high precision or very long time scales have to be treated properly, additional concerns need to be addressed, such as the possibility of leap seconds or calendar system changes. (The Gregorian calendar commonly used in the West was adopted only in 1582, and not by all countries on the same day!)

Over all of the issues concerning leap seconds, time zones, daylight savings time (DST), and lunar calendars, it is easy to forget that measuring time is a very simple concept: time progresses linearly. Once an origin of the time axis has been defined, any point in time is uniquely identified by the time elapsed since the origin. Note that this is independent of the geographical location or the local time zone — for a given point in time, the duration since the origin is the same for any location (ignoring relativistic corrections).

Java In a Nutshell

Related Reading

Java In a Nutshell
By David Flanagan

The difficulties arise when we try to interpret this point in time according to some calendar, i.e., representing it in terms of months, days, or years. Geographical information becomes relevant at this step: the same point in time corresponds to different times of day, depending on the location (i.e., time zone). Modifications based on interpreted dates are often required (which date corresponds to the day a month from today?) and pose additional difficulties: over- and underflows (a month from Dec. 15 is next year), as well as ambiguities (which day exactly corresponds to a month from Jan. 30?).

In the original JDK 1.0, the representation for a point in time was lumped together with the responsibility to interpret it into the class java.util.Date. While relatively easy to handle, it was not amenable to internationalization. This was recognized relatively early; since JDK 1.1.4 or JDK 1.1.5, the various responsibilities for handling dates have been distributed among the following classes:

java.util.Date Represents a point in time.
abstract java.util.Calendar
java.util.GregorianCalendar extends java.util.Calendar
Interpretation and manipulation of Dates.
abstract java.util.TimeZone
java.util.SimpleTimeZone extends java.util.TimeZone
Representation of an arbitrary offset from Greenwich Mean Time (GMT), including information about applicable daylight savings rules.
abstract java.text.DateFormat extends java.text.Format
java.text.SimpleDateFormat extends java.text.DateFormat
Transformation into well-formatted, printable String and vice versa.
java.text.DateFormatSymbols Translation of the names of months, weekdays, etc., as an alternative to using the information from Locale.
java.sql.Date extends java.util.Date
java.sql.Time extends java.util.Date
java.sql.Timestamp extends java.util.Date
Represent points in time, and also include proper formatting for use in SQL statements.

Note that DateFormat and related classes are in the java.text.* package. All date-handling classes in the java.sql.* package extend java.util.Date. All other classes are in the java.util.* package.

The "new" classes form three separate inheritance hierarchies, with the top-level classes (Calendar, TimeZone, and DateFormat) being abstract. For each abstract class, the Java Standard Library provides one concrete implementation.

java.util.Date

The class java.util.Date represents a point in time. In many applications, such an abstraction would be called a "TimeStamp." In the standard Java library implementation, this point in time is represented by the number of milliseconds since the start of the Unix epoch on January 1, 1970, 00:00:00 GMT. Conceptually, this class is therefore a very thin wrapper around a long.

In concordance with this interpretation, observe that the only methods in this class that are not deprecated (besides those getting and setting the number of milliseconds) are those required to allow ordering.

This class depends on System.currentTimeMillis() to obtain the current point in time. Its accuracy and precision is therefore determined by the implementation of System and the underlying layer (essentially the OS) that it calls.

The java.util.Date API

The names and conventions used in the API of the original Date class have caused no end of confusion. While the decision to count months from 0-11 and years from 1900 mimicked the C Standard Library's convention, the decision to call the function returning the number of milliseconds since the start of the Unix epoch getTime() and the one returning the day of the month getDate() apparently was the Java class' designer's own.

java.util.Calendar

Semantics

The Calendar class represents a point in time (a "Date"), interpreted appropriately for some locale and time zone. Each Calendar instance wraps a long variable containing the number of milliseconds since the epoch for the represented point in time.

This means that Calendar is neither a (stateless) transformer or interpreter, nor a factory for modified dates. It does not support idioms such as:

Month Interpreter.getMonth( inputDate )

or

Date Factory.addMonth( inputDate )

Instead, a Calendar instance must be initialized to some Date. This Calendar instance can then be modified or queried for interpreted properties.

Bizarrely, instances of this class are always initialized to the current time. It is not possible to obtain a Calendar instance initialized to an arbitrary Date — the API forces the programmer to set the date explicitly by a subsequent method call such as setTime( date ) on an existing instance.

Access to Interpreted Fields and Class Constants

The Calendar class follows an unusual idiom for allowing access to the individual fields of the interpreted date instance. Rather than offering a number of dedicated property getters and setters (such as getMonth()), it offers only one, which takes an identifier for the requested field as argument:

int get( Calendar.MONTH ) etc.

Notice that this function always returns an int!

The identifiers for the fields are defined in the Calendar class as public static final variables. (These identifiers are raw integers, not wrapped into an enumeration abstraction.)

Besides the identifiers (or keys) for the fields, the Calendar class defines a number of additional public static final variables holding the values for the fields. So, to test whether a certain date (represented by the Calendar instance calendar) falls into the first month of the year, one would write code like this:

if( calendar.get( Calendar.MONTH ) == Calendar.JANUARY ) {...}

Note that the months are called JANUARY, FEBRUARY, etc., irrespective of location (as opposed to more neutral names such as MONTH_1, MONTH_2, and so on). There is also a field UNDECIMBER, representing the 13th month of the year, which is required by some (non-Gregorian) calendars.

Unfortunately, keys and values are neither distinguished by name nor by grouping into separate nested interfaces.

Manipulation

The Calendar offers three ways to modify the date represented by the current instance: set(), add(), and roll(). The set() method simply sets the specified field to the desired value. The difference between add() and roll() concerns the way they treat over- and underflows: while add() propagates changes to "smaller" or "larger" fields, roll() does not. For instance, when adding a month to a Calendar instance representing Dec. 15, the year will be incremented when using add(), but left untouched when using roll(). The decision to have two different functions for either case was motivated by their possible uses in GUI situations.

The way Calendar is implemented, it contains redundant data: all of the individual fields can be computed from the number of milliseconds since the epoch given a time zone, and vice versa. The class declares the abstract methods computeFields() and computeTime() for these operations, respectively, as well as the complete() method, which performs a complete round-trip. Because there are two sets of redundant data, the two sets can get out of synch. According to the class' documentation, dependent data is recomputed lazily when changes are made. Subclasses must maintain a set of dirty flags to signal when recomputation is required.

Additional Functionality

Implementation Leakage

It has to be said that implementation details have been oozing into the APIs to an uncommon degree for the "new" date-handling classes. Up to a point, this is a reflection of their intended use as base classes for customized development, but it also seems to occasionally be a consequence of insufficient clarity in the design of the public interfaces. Whether the Calendar abstraction maintains two redundant data sets or not is properly an implementation detail, and should therefore be hidden from clients of the class. This includes clients who intend to reuse the class through inheritance, as well.

The additional functions offered by the Calendar base class fall into three categories. There are several static factory methods to obtain instances initialized for arbitrary time zones and locales. As mentioned above, all instances obtained this way have already been initialized to the current time. No factory methods are provided to obtain a Calendar instance initialized to an arbitrary point in time.

The second group of methods consists of the methods before( Object ) and after( Object ). They take arguments of type Object, thus allowing these methods to be overridden in subclasses for arbitrary types of arguments.

Finally, there are a number of functions to get and set additional properties, such as the current time zone. Among them are several methods that query the possible and actual minimum and maximum values of certain fields for the current calendar implementation.

When Does the Week Begin?

The documentation on the Calendar classes devotes considerable text to the proper counting of weeks in a month or year. Which weekday is considered the beginning of the week differs from country to country. In the U.S., a week is commonly considered to start on Sunday. In parts of Europe, a week starts on Monday and ends on Sunday. This can affect which week is considered the first full week of the year (or month) and also the counting of weeks throughout the year.

java.util.GregorianCalendar

The class GregorianCalendar is the only commonly available subclass of Calendar. It provides an implementation of the basic Calendar abstraction suitable for the interpretation of dates according to the conventions used commonly in the West. It adds a number of public constructors, as well as some functions specific to Gregorian Calendars, such as isLeapYear().

java.util.TimeZone and java.util.SimpleTimeZone

The TimeZone class and its subclasses are auxiliary classes, required by Calendar to interpret dates according to the selected time zone. Semantically, a time zone specifies a certain offset to be added to GMT to reach the local time. Clearly, this offset changes when daylight savings time (DST) is in effect. The TimeZone abstraction therefore needs to keep track not only of the additional offset to be applied if DST is in effect, but also of the rules that determine when DST is in effect, in order to calculate the local time for any given date and time.

The abstract base class TimeZone provides basic methods to handle "raw" (without taking DST into account) and actual offsets (in milliseconds!), but implementation of any functionality related to DST rules is left to subclasses, such as SimpleTimeZone. The latter class provides several ways to specify rules controlling the beginning and ending of DST, such as a giving an explicit day in a month or a certain weekday following a given date. Each TimeZone also has a human-readable, locale-dependent display name. Display names come in two styles: LONG and SHORT.

Time zones are unambiguously determined by an identifier string. The base class provides the static method String[] getAvailableIDs() to obtain all installed "well-known" standard time zones. (There are 557 for my installation, using JDK 1.4.1.) The JavaDoc defines the proper syntax to build custom time zone identifiers, if the need arises. Also provided are static factory methods, to obtain TimeZone instances — either for a specific ID or the default for the current location. SimpleTimeZone also provides some public constructors and, surprisingly for an abstract class, so does TimeZone. (The JavaDoc states: "For invocation by subclass constructors." Apparently, it should have been declared protected.)

java.text.DateFormat

While Calendar and related classes handle the locale-specific interpretation of dates, the DateFormat classes assist with the transformation of dates to and from human-readable strings. When representing points in time, an additional localization issue arises: not only the language, but also the date format is locale-dependent (U.S.: Month/Day/Year, Germany: Day.Month.Year, etc.). The DateFormat utility tries to manage these differences for the application programmer.

The abstract base class DateFormat does not require (and does not permit) the definition of arbitrary, programmer-defined date formats. Instead, it defines four different format styles: SHORT, MEDIUM, LONG, and FULL (in increasing order of verbosity). Given a locale and a style, the programmer can rely on the class to use an appropriate date format.

The abstract base class DateFormat does not define static methods for formatting (date -> text) or parsing (text -> date). Instead, it defines several static factory methods to obtain instances (of concrete subclasses) initialized for a given locale and a chosen style. Since the standard formats always include both date and time, additional factory methods are available to obtain instances treating only the time or date part. The String format( Date ) and Date parse( String ) methods then perform the transformation. Note that concrete subclasses may choose to break this idiom.

The Calendar object used internally to interpret dates is accessible and can be modified, as are the employed TimeZone and NumberFormat objects. However, the locale and style can no longer be changed once the DateFormat has been instantiated.

Also available are (abstract) methods for piece-wise parsing or formatting, taking an additional ParsePosition or FieldPosition argument, respectively. There are two versions for each of these methods. One takes or returns a Date instance and the other takes or returns a general Object, to allow handling of alternatives to Date in subclasses. The class defines several public static variables with names ending in _FIELD to identify the various possible fields for use with FieldPosition (cf. the JavaDoc for java.util.Format).

The only commonly available concrete subclass of DateFormat is SimpleDateFormat. It provides all of the aforementioned functionality, additionally allowing the definition of arbitrary date-formatting patterns. There is a rich syntax to specify formatting patterns; the JavaDoc gives the full details. The pattern can be specified as an argument to the constructors of this class or set explicitly.

Printing a Timestamp: A Cut-and-Paste Example

Imagine you want to print the current time in a user-defined format; for instance, to a log file. Here is how to do this:

// Create a formatter with the following pattern: Hour(0-23):Minute:Second
SimpleDateFormat formatter = new SimpleDateFormat( "HH:mm:ss" ); 
Date now                   = new Date();
String logEntry            = formatter.format( now );

// To read the string back in
try {
    Date sometime = formatter.parse( logEntry );
} catch ( ParseException exc ) {
    exc.printStackTrace();
}

Note the ParseException that needs to be caught. It is thrown when the beginning of the input string cannot be parsed.

The Classes in java.sql.*

The date-and-time-handling classes in the java.sql.* all extend java.util.Date. The fact that there are three of them reflects the need to model the three standard SQL92 types DATE, TIME, and TIMESTAMP.

Like java.util.Date, all three classes in the SQL package are thin wrappers around a numeric value representing a point in time. The Date and Time classes ignore the information regarding the time of day or the calendar date, respectively.

The Timestamp class, however, not only includes the usual time and date information up to millisecond precision, but also allows storing additional data to accurately represent a point in time with nanosecond precision. (A nanosecond is a billionth of a second.)

Besides shadowing the corresponding SQL datatypes, these classes handle transformations to and from SQL-conforming String representations. To this end, each of the three classes overrides the toString() method. Furthermore, each class provides a static factory method, valueOf( String ), which returns an instance of the class that it has been invoked on, initialized to the time value represented by the String passed to it. The format of the String representation for all of these methods is fixed by the SQL standard and cannot be changed by the programmer.

The additional data required to store nanosecond information has not been very well integrated with the rest of the data representing the usual time and date information in the Timestamp class. For example, calling getTime() on a Timestamp instance will return the number of milliseconds since the start of the Unix epoch, ignoring the nanosecond data. Similarly, according to the JavaDoc, the hashCode() method has not been overridden in the subclass, and therefore also ignores the nanosecond data.

The JavaDoc for java.sql.Timestamp states that the "inheritance relationship (...) really denotes implementation inheritance, and not type inheritance," but even this statement is incorrect, since Java has no notion of private (i.e. implementation) inheritance. Instead of inheriting from java.util.Date, all of the classes in the java.sql.* package should have been designed to encapsulate a java.util.Date object, exposing only the methods required — at the very least, methods such as hashCode() should have been properly overridden.

A final comment concerns the handling of time zones by the database engine. The classes in the java.sql.* package do not allow one to specify the intended time zone explicitly. Database servers (or drivers) are free to interpret this information as being valid in the server's local time zone, which may be subject to change (for instance, due to daylight savings time).

Summary

From the foregoing discussion it should be clear that Java's date-handling classes are not just complicated, but also poorly designed. Encapsulation is leaky, the APIs are baroque and not well-organized, and uncommon idioms are employed frequently for no good reason. The implementation holds additional surprises (I suggest a look at the actual type of the object returned from Calendar.getInstance( Locale ) for all available locales!) On the other hand, the classes manage to treat all of the difficulties inherent in internationalized date handling and, in any case, are here to stay. I hope that this article was a little contribution in helping to clarify their proper usage.

Call Me By My True Names

As a last example of the wonderful consistency and orthogonality of Java's APIs, I would like to list three (maybe there are more!) different methods to obtain the number of milliseconds since the start of the Unix epoch:

  • long java.util.Date.getTime()
  • long java.util.Calendar.getTimeInMillis() (New with JDK 1.4.1. Note that java.util.Calendar.getTime() returns a Date object!)
  • long java.lang.System.currentTimeMillis()

Acknowledgements

The author would like to thank Wilhelm Fitzpatrick (Seattle) for a careful reading of the manuscript and valuable comments.

References

Philipp K. Janert is a software project consultant, server programmer, and architect.


Return to ONJava.com

Copyright © 2009 O'Reilly Media, Inc.