4.2. Using regexes in Java: Test for a Pattern

. Problem

You're ready to get started using regular expression processing to beef up your Java code by testing to see if a given pattern can match in a given string.

. Solution

Use the Java Regular Expressions Package, java.util.regex.

. Discussion

The good news is that the Java API for regexes is actually easy to use. If all you need is to find out whether a given regex matches a string, you can use the convenient boolean matches( ) method of the String class, which accepts a regex pattern in String form as its argument:

if (inputString.matches(stringRegexPattern)) {
    // it matched... do something with it...
}

This is, however, a convenience routine, and convenience always comes at a price. If the regex is going to be used more than once or twice in a program, it is more efficient to construct and use a Pattern and its Matcher(s). A complete program constructing a Pattern and using it to match is shown here:

import java.util.regex.*;

/**
 * Simple example of using regex class.
 */
public class RESimple {
    public static void main(String[] argv) throws PatternSyntaxException {
        String pattern = "^Q[^u]\\d+\\.";
        String input = "QA777. is the next flight. It is on time.";

        Pattern p = Pattern.compile(pattern);

        boolean found = p.matcher(input).lookingAt( );

        System.out.println("'" + pattern + "'" +
            (found ? " matches '" : " doesn't match '") + input + "'");
    }
}

The java.util.regex package consists of two classes, Pattern and Matcher, which provide the public API shown in Example 4-1.

Example 1. Regex public API

/** The main public API of the java.util.regex package.
 * Prepared by javap and Ian Darwin.
 */

package java.util.regex;

public final class Pattern {
    // Flags values ('or' together)
    public static final int 
        UNIX_LINES, CASE_INSENSITIVE, COMMENTS, MULTILINE,
        DOTALL, UNICODE_CASE, CANON_EQ;
    // Factory methods (no public constructors)
    public static Pattern compile(String patt);
    public static Pattern compile(String patt, int flags);
    // Method to get a Matcher for this Pattern
    public Matcher matcher(CharSequence input);
    // Information methods
    public String pattern( );
    public int flags( );
    // Convenience methods
    public static boolean matches(String pattern, CharSequence input);
    public String[] split(CharSequence input);
    public String[] split(CharSequence input, int max);
}

public final class Matcher {
    // Action: find or match methods
    public boolean matches( );
    public boolean find( );
    public boolean find(int start);
    public boolean lookingAt( );
    // "Information about the previous match" methods
    public int start( );
    public int start(int whichGroup);
    public int end( );
    public int end(int whichGroup);
    public int groupCount( );
    public String group( );
    public String group(int whichGroup);
    // Reset methods
    public Matcher reset( );
    public Matcher reset(CharSequence newInput);
    // Replacement methods
    public Matcher appendReplacement(StringBuffer where, String newText);
    public StringBuffer appendTail(StringBuffer where);
    public String replaceAll(String newText);
    public String replaceFirst(String newText);
    // information methods
    public Pattern pattern( );
}

/* String, showing only the regex-related methods */
public final class String {
     public boolean matches(String regex);
    public String replaceFirst(String regex, String newStr);
    public String replaceAll(String regex, String newStr)
    public String[] split(String regex)
    public String[] split(String regex, int max);
}

This API is large enough to require some explanation. The normal steps for regex matching in a production program are:

  1. Create a Pattern by calling the static method Pattern.compile( ) .

  2. Request a Matcher from the pattern by calling pattern.matcher(CharSequence) for each String (or other CharSequence) you wish to look through.

  3. Call (once or more) one of the finder methods (discussed later in this section) in the resulting Matcher.

The CharSequence interface, added to java.lang with JDK 1.4, provides simple read-only access to objects containing a collection of characters. The standard implementations are String and StringBuffer (described in Chapter 3), and the "new I/O" class java.nio.CharBuffer.

Of course, you can perform regex matching in other ways, such as using the convenience methods in Pattern or even in java.lang.String. For example:

// StringConvenience.java  -- show String convenience routine for "match"
String pattern = ".*Q[^u]\\d+\\..*";
String line = "Order QT300. Now!";
if  (line.matches(pattern)) {
    System.out.println(line + " matches \"" + pattern + "\"");
} else {
    System.out.println("NO MATCH");
}

But the three-step list just described is the "standard" pattern for matching. You'd likely use the String convenience routine in a program that only used the regex once; if the regex were being used more than once, it is worth taking the time to "compile" it, since the compiled version runs faster.

As well, the Matcher has several finder methods, which provide more flexibility than the String convenience routine match( ). The Matcher methods are:

match( )

Used to compare the entire string against the pattern; this is the same as the routine in java.lang.String. Since it matches the entire String, I had to put .* before and after the pattern.

lookingAt( )

Used to match the pattern only at the beginning of the string.

find( )

Used to match the pattern in the string (not necessarily at the first character of the string), starting at the beginning of the string or, if the method was previously called and succeeded, at the first character not matched by the previous match.

Each of these methods returns boolean, with true meaning a match and false meaning no match. To check whether a given string matches a given pattern, you need only type something like the following:

Matcher m = Pattern.compile(patt).matcher(line);
if (m.find( )) {
    System.out.println(line + " matches " + patt)
}

But you may also want to extract the text that matched, which is the subject of the next recipe.

The following recipes cover uses of this API. Initially, the examples just use arguments of type String as the input source. Use of other CharSequence types is covered in Recipe Recipe 4.5.