ONJava.com -- The Independent Source for Enterprise Java
oreilly.comSafari Books Online.Conferences.

advertisement

AddThis Social Bookmark Button

Internationalization, Part 1
Pages: 1, 2

Character Encodings

Text representation has traditionally been one of the most difficult problems of internationalization. Java, however, solves this problem quite elegantly and hides the difficult issues. Java uses Unicode internally, so it can represent essentially any character in any commonly used written language. As I noted earlier, the remaining task is to convert Unicode to and from locale-specific encodings. Java includes quite a few internal byte-to-char and char-to-byte converters that handle converting locale-specific character encodings to Unicode and vice versa. Although the converters themselves are not public, they are accessible through the InputStreamReader and OutputStreamWriter classes, which are character streams included in the java.io package.



Any program can automatically handle locale-specific encodings simply by using these character stream classes to do their textual input and output. Note that the FileReader and FileWriter classes use these streams to automatically read and write text files that use the platform's default encoding.

Example 8-2 shows a simple program that works with character encodings. It converts a file from one specified encoding to another by converting from the first encoding to Unicode and then from Unicode to the second encoding. Note that most of the program is taken up with the mechanics of parsing argument lists, handling exceptions, and so on. Only a few lines are required to create the InputStreamReader and OutputStreamWriter classes that perform the two halves of the conversion. Also note that exceptions are handled by calling LocalizedError.display( ). This method is not part of the Java API; it is a custom method shown in Example 8-5 at the end of this chapter.

Example 8-2. ConvertEncoding.java

package je3.i18n;
import java.io.*;

/** A program to convert from one character encoding to another */
public class ConvertEncoding {
    public static void main(String[  ] args) {
        String from = null, to = null;
        String infile = null, outfile = null;
        for(int i = 0; i < args.length; i++) { // Parse command-line arguments.
            if (i == args.length-1) usage( );   // All args require another.
            if (args[i].equals("-from")) from = args[++i];
            else if (args[i].equals("-to")) to = args[++i];
            else if (args[i].equals("-in")) infile = args[++i];
            else if (args[i].equals("-out")) outfile = args[++i];
            else usage( );
        }

        try { convert(infile, outfile, from, to); }  // Attempt conversion.
        catch (Exception e) {                        // Handle exceptions.
            LocalizedError.display(e);  // Defined at the end of this chapter.
            System.exit(1);
        }
    }

    public static void usage( ) {
        System.err.println("Usage: java ConvertEncoding <options>\n" +
                           "Options:\n\t-from <encoding>\n\t" + 
                           "-to <encoding>\n\t" +
                           "-in <file>\n\t-out <file>");
        System.exit(1);
    }

    public static void convert(String infile, String outfile,
                               String from, String to)
              throws IOException, UnsupportedEncodingException
    {
        // Set up byte streams.
        InputStream in;
        if (infile != null) in = new FileInputStream(infile);
        else in = System.in;
        OutputStream out;
        if (outfile != null) out = new FileOutputStream(outfile);
        else out = System.out;
        
        // Use default encoding if no encoding is specified.
        if (from == null) from = System.getProperty("file.encoding");
        if (to == null) to = System.getProperty("file.encoding");
        
        // Set up character streams.
        Reader r = new BufferedReader(new InputStreamReader(in, from));
        Writer w = new BufferedWriter(new OutputStreamWriter(out, to));
        
        // Copy characters from input to output.  The InputStreamReader
        // converts from the input encoding to Unicode, and the
        // OutputStreamWriter converts from Unicode to the output encoding.
        // Characters that cannot be represented in the output encoding are
        // output as '?'
        char[  ] buffer = new char[4096];
        int len;
        while((len = r.read(buffer)) != -1)  // Read a block of input.
            w.write(buffer, 0, len);         // And write it out.
        r.close( );                           // Close the input.
        w.close( );                           // Flush and close output.
    }
}

Handling Local Customs

The second problem of internationalization is the task of following local customs and conventions in areas such as date and time formatting. The java.text package defines classes to help with this duty.

The NumberFormat class formats numbers, monetary amounts, and percentages in a locale-dependent way for display to the user. This is necessary because different locales have different conventions for number formatting. For example, in France, a comma is used as a decimal separator instead of a period, as in many English-speaking countries. A NumberFormat object can use the default locale or any locale you specify. NumberFormat has factory methods for obtaining instances that are suitable for different purposes, such as displaying monetary quantities or percentages. In Java 1.4 and later, the java.util.Currency class can be used with NumberFormat object so that it can correctly print an appropriate currency symbol.

The DateFormat class formats dates and times in a locale-dependent way for display to the user. Different countries have different conventions. Should the month or day be displayed first? Should periods or colons separate fields of the time? What are the names of the months in the language of the locale? A DateFormat object can simply use the default locale, or it can use any locale you specify. The DateFormat class is used in conjunction with the TimeZone and Calendar classes of java.util. The TimeZone object tells the DateFormat what time zone the date should be interpreted in, while the Calendar object specifies how the date itself should be broken down into days, weeks, months, and years. Almost all locales use the standard GregorianCalendar. SimpleDateFormat is a useful subclass of DateFormat: it allows dates to be formatted to or parsed from a date format specified with a simple template string.

The Collator class compares strings in a locale-dependent way. This is necessary because different languages alphabetize strings in different ways (and some languages don't even use alphabets). In traditional Spanish, for example, the letters "ch" are treated as a single character that comes between "c" and "d" for the purposes of sorting. When you need to sort strings or search for a string within Unicode text, you should use a Collator object, either one created to work with the default locale or one created for a specified locale.

The BreakIterator class allows you to locate character, word, line, and sentence boundaries in a locale-dependent way. This is useful when you need to recognize such boundaries in Unicode text, such as when you are implementing a word-wrapping algorithm.

Example 8-3 shows a class that uses the NumberFormat and DateFormat classes to display a hypothetical stock portfolio to the user following local conventions. The program uses various NumberFormat and DateFormat objects to format (using the format( ) method) different types of numbers and dates. These Format objects all operate using the default locale but could have been created with an explicitly specified locale. The program displays information about a hypothetical stock portfolio, formatting dates and numbers and monetary values according to the current or the specified locale. Figure 8-2 shows example output in different locales. The output was produced by running the program in the default locale, with the arguments "en GB" and "ja JP".

Figure 8-2. Stock portfolios formatted for U.S., British, and French locales
Figure 8-2. Stock portfolios formatted for U.S., British, and French locales

Example 8-3. Portfolio.java

package je3.i18n;
import java.text.*;
import java.util.*;
import java.io.*;

/**
 * A partial implementation of a hypothetical stock portfolio class.
 * We use it only to demonstrate number and date internationalization.
 **/
public class Portfolio {
    EquityPosition[  ] positions;        // The positions in the portfolio
    Date lastQuoteTime = new Date( );   // Time for current quotes

    // Create a Portfolio
    public Portfolio(EquityPosition[  ] positions, Date lastQuoteTime) {
        this.positions = positions;
        this.lastQuoteTime = lastQuoteTime;
    }
    
    // A helper class: represents a single stock purchase
    static class EquityPosition {
        String name;             // Name of the stock.
        int shares;              // Number of shares held.
        Date purchased;          // When purchased.
        Currency currency;       // What currency are the prices expressed in?
        double bought;           // Purchase price per share
        double current;          // Current price per share

        // Format objects like this one are useful for parsing strings as well
        // as formatting them.  This is for converting date strings to Dates.
        static DateFormat dateParser = new SimpleDateFormat("yyyy-MM-dd");

        EquityPosition(String n, int s, String date, Currency c,
                       double then, double now) throws ParseException
        {
            // Convert the purchased date string to a Date object.
            // The string must be in the format yyyy-mm-dd
            purchased = dateParser.parse(date);
            // And store the rest of the fields, too.
            name = n; shares = s; currency = c;
            bought = then; current = now;
        }
    }

    // Return a localized HTML-formatted string describing the portfolio
    public String toString( ) {
        StringBuffer b = new StringBuffer( );

        // Obtain NumberFormat and DateFormat objects to format our data.
        NumberFormat number = NumberFormat.getInstance( );
        NumberFormat price = NumberFormat.getCurrencyInstance( );
        NumberFormat percent = NumberFormat.getPercentInstance( );
        DateFormat shortdate = DateFormat.getDateInstance(DateFormat.MEDIUM);
        DateFormat fulldate = DateFormat.getDateTimeInstance(DateFormat.LONG,
                                                             DateFormat.LONG);


        // Print some introductory data.
        b.append("<html><body>");
        b.append("<i>Portfolio value at ").
            append(fulldate.format(lastQuoteTime)).append("</i>");
        b.append("<table border=1>");
        b.append("<tr><th>Symbol<th>Shares<th>Purchased<th>At<th>" +
                 "Quote<th>Change</tr>");
        
        // Display the table using the format( ) methods of the Format objects.
        for(int i = 0; i < positions.length; i++) {
            b.append("<tr><td>");
            b.append(positions[i].name).append("<td>");
            b.append(number.format(positions[i].shares)).append("<td>");
            b.append(shortdate.format(positions[i].purchased)).append("<td>");
            // Set the currency to use when printing the following prices
            price.setCurrency(positions[i].currency);
            b.append(price.format(positions[i].bought)).append("<td>");
            b.append(price.format(positions[i].current)).append("<td>");
            double change =
                (positions[i].current-positions[i].bought)/positions[i].bought;
            b.append(percent.format(change)).append("</tr>");
        }
        b.append("</table></body></html>");
        return b.toString( );
    }
    
    /**
     * This is a test program that demonstrates the class
     **/
    public static void main(String[  ] args) throws ParseException {
        Currency dollars = Currency.getInstance("USD");
        Currency pounds = Currency.getInstance("GBP");
        Currency euros = Currency.getInstance("EUR");
        Currency yen = Currency.getInstance("JPY");

        // This is the portfolio to display.
        EquityPosition[  ] positions = new EquityPosition[  ] {
            new EquityPosition("WWW", 400, "2003-01-03", dollars, 11.90,13.00),
            new EquityPosition("XXX", 1100, "2003-02-02", pounds, 71.09,27.25),
            new EquityPosition("YYY", 6000, "2003-04-17", euros, 23.37,89.12),
            new EquityPosition("ZZZ", 100, "2003-8-10", yen, 100000,121345)
        };

        // Create the portfolio from these positions
        Portfolio portfolio = new Portfolio(positions, new Date( ));

        // Set the default locale using the language code and country code
        // specified on the command line.
        if (args.length == 2) Locale.setDefault(new Locale(args[0], args[1]));

        // Now display the portfolio.
        // We use a Swing dialog box to display it because the console may
        // not be able to display non-ASCII characters like currency symbols
        // for Pounds, Euros, and Yen.
        javax.swing.JOptionPane.showMessageDialog(null, portfolio,
                               Locale.getDefault( ).getDisplayName( ),
                               javax.swing.JOptionPane.INFORMATION_MESSAGE);

        // The modal dialog starts another thread running, so we have to exit
        // explictly when the user dismisses it.
        System.exit(0);
    }
}

Setting the Locale

Example 8-3 contains code that explicitly sets the locale using the language code and the country code specified on the command line. If these arguments are not specified, it uses the default locale for your system. When experimenting with internationalization, you may want to change the default locale for the entire platform so you can see what happens. How you do this is platform-dependent. On Unix platforms, you typically set the locale by setting the LANG environment variable. For example, to set the locale for Canadian French, using a Unix csh-style shell, use this command:

% setenv LANG fr_CA

Or, to set the locale to English as spoken in Great Britain when using a Unix sh-style shell, use this command:

$ export LANG=en_GB

To set the locale in Windows, use the Regional Settings control on the Windows Control Panel.

David Flanagan is the author of a number of O'Reilly books, including Java in a Nutshell, Java Examples in a Nutshell, Java Foundation Classes in a Nutshell, JavaScript: The Definitive Guide, and JavaScript Pocket Reference.


View catalog information for Java Examples in a Nutshell, 3rd Edition

Return to ONJava.com.