WindowsDevCenter.com
oreilly.comSafari Books Online.Conferences.

advertisement


AddThis Social Bookmark Button

Hacking Word
Pages: 1, 2, 3, 4, 5

expert

Hack #82: Perform Power Text Searches with Regular Expressions

When wildcards just aren’t enough, tap VBScript for powerful string searching in Word.



Although Word’s wildcard searching is much better than most users realize, if you’ve previously used a language like Perl, Python, or JavaScript, you might prefer sticking with the special characters you already know for your searches. Besides, sometimes wildcards just aren’t up to the job.

To borrow an example from O’Reilly’s Learning Python, suppose you need to replace any occurrence of “red pepper” or “green pepper” with “bell pepper” if and only if they occur together in a paragraph before the word “salad,” but not if they are followed (with no space) by the string “corn.” That’s definitely way out of Word’s wildcards’ league. (The pattern is \b(red|green)(\s+pepper(?!corn)(?=.*salad)), for those of you too impatient to wait until the full example at the end of this hack.)

Though VBA doesn’t have built-in support for regular expressions, Microsoft does include a RegExp object with VBScript. With a slight change to your settings in the Visual Basic Editor, you can use the RegExp object in your macros.

First, select Tools -> Macro -> Visual Basic Editor, and then choose Tools -> References. In the next dialog, shown in Figure 9-3, check the “Microsoft VBScript Regular Expressions 5.5” box and click the OK button.

Figure 9-3. Setting a reference to VBScript regular expressions from the Visual Basic
Figure 9-3. Setting a reference to VBScript regular expressions from the Visual Basic Editor

Now you can include instances of the RegExp object in your macros. The following section describes the RegExp object.

RegExp’s Properties and Methods

The RegExp object has four properties, described in the following list:

Pattern
The pattern string to search for.
Global
Whether search is for all occurrences that match Pattern, or just the first. This is a Boolean value, and the default is False.
IgnoreCase
Whether search is case-sensitive. This is a Boolean value, and the default is False.
MultiLine
Whether Pattern is matched across line breaks. This is a Boolean value, and the default is False.

The RegExp object has three methods, described in the following list:

Execute
Returns a Matches collection containing the matched substrings and information about those substrings.
Replace

Replaces all the substrings in a searched string that match a pattern with a replacement string. The syntax for this method is:

RegExpobject.Replace("string to search", "replacement pattern")
Test
Whether a search has successfully matched a pattern. Returns a Boolean value. Since this method always returns True if there were one or more successful matches, there’s no need to set the Global property when using this method.

The Matches collection returned by the Execute method contains one or more Match objects, which have three properties, shown in the following list:

FirstIndex
The position of the Match’s first character within the search string
Length
The number of characters in the Match
Value
The matched string

Using the RegExp Object in a Macro

The following macro interactively tests search patterns against the selected text.

Place this macro in the template of your choice [Hack #50] and either run it from the Tools-> Macro -> Macros dialog or put a button for it on a menu or toolbar [Hack #1].

    Sub RegExpTest( )
	  Dim re As RegExp
	  Dim strToSearch As String
	  Dim strPattern As String
	  Dim strResults As String
	  Dim oMatches As MatchCollection
	  Dim oMatch As Match
	  
    strToSearch = Selection.Text

	  Set re = New RegExp
	  re.Global = True
	  re.IgnoreCase = True
	  
    Do While (1)
	      strPattern = InputBox("Enter search pattern string:", _
	                            "RegExp Search", "")
	  If Len(strPattern) = 0 Then Exit Do

	  re.Pattern = strPattern

	  Set oMatches = re.Execute(strToSearch)
	  If oMatches.Count <> 0 Then
  	 strResults = Chr(34) & strPattern & Chr(34) & _
	                " matched " & oMatches.Count & " times:" _
	                & vbCr & vbCr
	     For Each oMatch In oMatches
	         strResults = strResults & _
	                      oMatch.Value & _
                        ": at position " & _
                        oMatch.FirstIndex & vbCr
      Next oMatch
	  Else
	      strResults = Chr(34) & strPattern & Chr(34) & _
	                 " didn't match anything. Try again."

	  End If
	  MsgBox strResults
 Loozp

 End Sub

When you run this macro, you’ll be prompted with the dialog shown in Figure 9-4.

The dialog shown in Figure 9-5 displays the search results.

The RegExp object supports the same metacharacters you might have seen in Perl:

   \ | ( ) [ { ^ $ * + ? .

You also get all the classic Perl character-class shortcuts:

   \d \D \s \S \w \W

Figure 9-4. Enter your search pattern here, including any special characters
Figure 9-4. Enter your search pattern here, including any special characters

Figure 9-5. Fine-tune your search patterns interactively
Figure 9-5. Fine-tune your search patterns interactively

For a full listing of special characters for using the RegExp object, see http://msdn.microsoft.com/library/default.asp?url=/library/en-us/script56/html/vspropattern.asp.

Performing Replacements

When using the Replace method, you can group and reuse parts of the matched pattern. Known as backreferencing, this is a powerful technique. The following code snippet demonstrates how to change the format of some dates in a string:

  re.Replace("(September) (\d\d?), (\d{4})", "$2 $1, $3")

This code will change a date like “September 12, 1978” into “12 September, 1978.” Modifying the code to replace September with a different month won’t require making any change to the replacement string, thanks to backreferencing.

Bringing all of this together, the following macro shows you how to use the “bell pepper” pattern discussed at the beginning of this hack to get the results shown in Figure 9-6.

Figure 9-6. Performing complex replacements with regular expressions
Figure 9-6. Performing complex replacements with regular expressions

Place this macro in the template of your choice [Hack #50] and either run it from the Tools -> Macro -> Macros dialog or put a button for it on a menu or toolbar [Hack #1]:

    Sub FixPeppers( )
    Dim re As RegExp
    Dim para As Paragraph
    Dim rng As Range
    Set re = New RegExp
    re.Pattern = "\b(red|green)(\s+pepper(?!corn)(?=.*salad))"
    re.IgnoreCase = True
    re.Global = True
    For Each para In ActiveDocument.Paragraphs
      Set rng = para.Range
      rng.MoveEnd unit:=wdCharacter, Count:=-1
      rng.Text = re.Replace(rng.Text, "bell$2")
    Next para
    End Sub
TIP: For more on regular expressions, check out “Hack Word from Python” [Hack #85], “Hack Word from Perl” [Hack #86], and Mastering Regular Expressions (O’Reilly).

Pages: 1, 2, 3, 4, 5

Next Pagearrow