The package java.util.regex contains three classes to support the full version of regular expressions.
The classes are
Class | Usage |
---|---|
Pattern | holds the compiled form of a regular expression. |
Matcher | associates the string to be matched with a Pattern and it performs the actual match. |
PatternSyntaxException | represents an error in a malformed regular expression. |
Pattern class holds the compiled form of a regular expression and it is immutable.
It has no public constructor. The class contains a static compile() method, which returns a Pattern object.
The compile() method is overloaded.
static Pattern compile(String regex) static Pattern compile(String regex, int flags)
The following snippet of code compiles a regular expression into a Pattern object:
String regex = "[a-z]@."; // Compile the regular expression into a Pattern object Pattern p = Pattern.compile(regex);
The flags parameter is a bit mask which can modify the way the pattern is matched.
The flags defined as int constants in the Pattern class is listed in the following table.
Flag | Description |
---|---|
Pattern.CANON_EQ | Enables canonical equivalence. Two characters match only if their full canonical decompositions match. The expression "a\u030A", for example, will match the string "\u00E5" when this flag is specified. By default, matching does not take canonical equivalence into account. |
Pattern.CASE_INSENSITIVE | Enables case-insensitive matching. This flag sets the case-insensitive matching only for US-ASCII charset. For Unicode charset, use UNICODE_CASE flag and this flag. |
Pattern.COMMENTS | Permits whitespace and comments in pattern. When this flag is set, whitespace is ignored and embedded comments starting with # are ignored until the end of a line. In this mode, whitespace is ignored, and embedded comments starting with # are ignored until the end of a line. |
Pattern.DOTALL | By default, . does not match line terminators. In dotall mode, the expression . matches any character, including a line terminator. By default this expression does not match line terminators. |
Pattern.LITERAL | Enables literal parsing of the pattern. When this flag is specified then the input string that specifies the pattern is treated as a sequence of literal characters. Metacharacters or escape sequences in the input sequence will be given no special meaning. |
Pattern.MULTILINE | Enables multiline mode. In multiline mode the expressions ^ and $ match just after or just before, respectively, a line terminator or the end of the input sequence. By default these expressions only match at the beginning and the end of the entire input sequence. |
Pattern.UNICODE_CASE | Enables Unicode-aware case folding. When this flag is specified then case-insensitive matching, when enabled by the CASE_INSENSITIVE flag, is done in a manner consistent with the Unicode Standard. By default, case-insensitive matching assumes that only characters in the US-ASCII charset are being matched. |
Pattern.UNICODE_CHARACTER_CLASS | Enables the Unicode version of predefined character classes and POSIX character classes. |
Pattern.UNIX_LINES | Enables Unix lines mode. When this flag is set, only the \n character is recognized as a line terminator. |
The following code compiles a regular expression setting the CASE_INSENSTIVE and DOTALL flags.
The matching will be case-insensitive for US-ASCII charset and the expression. will match a line terminator.
// Prepare a regular expression String regex = "[a-z]@."; // Compile the regular expression into a Pattern object setting the // CASE_INSENSITIVE and DOTALL flags Pattern p = Pattern.compile(regex, Pattern.CASE_INSENSITIVE|Pattern.DOTALL);