Metacharacters are characters with special meanings in Java regular expression.
The metacharacters supported by the regular expressions in Java are as follows:
( ) [ ] { { \ ^ $ | ? * + . < > - = !
The metacharacters [
and ]
specifies a character class inside a regular expression.
A character class is a set of characters. The regular expression engine will attempt to match one character from the set.
The character class "[ABC]" will match characters A, B, or C. For example, the strings "woman" or "women" will match the regular expression "wom[ae]n".
We can specify a range of characters using a character class.
The range is expressed using a hyphen -
character.
For example, [A-Z]
represents any uppercase English letters;
"[0-9]" represents any digit between 0 and 9.
^
means not.
For example, [^ABC]
means any character except A, B, and C.
The character class [^A-Z]
represents any character except uppercase letters.
If ^
appears in a character class except in the beginning,
it just matches a ^
character.
For example, "[ABC^]" will match A, B, C, or ^.
You can also include two or more ranges in one character class.
For example, [a-zA-Z]
matches any character a through z and A through Z.
[a-zA-Z0-9]
matches any character a through z (uppercase and lowercase),
and any digit 0 through 9.
The following table has examples of Character Classes
Character Classes | Meaning |
---|---|
[abc] | Character a, b, or c |
[^xyz] | A character except x, y, and z |
[a-z] | Characters a through z |
[a-cx-z] | Characters a through c, or x through z, which would include a, b, c, x, y, or z. |
[0-9&&[4-8]] | Intersection of two ranges (4, 5, 6, 7, or 8) |
[a-z&&[^aeiou]] | All lowercase letters minus vowels |
The following table lists some frequently used predefined character classes.
Meaning | |
---|---|
. | Any character |
\d | A digit. Same as [0-9] |
\D | A non-digit. Same as [^0-9] |
\s | A whitespace character. Same as [ \t\n\x0B\f\r] which include.
|
\S | A non-whitespace character. Same as [^\s] |
\w | A word character. Same as [a-zA-Z_0-9]. |
\W | A non-word character. Same as [^\w] |
The following code uses \d
to match all digits.
\\d
is used in the string to escape the \
.
import java.util.regex.Matcher; import java.util.regex.Pattern; /*from ww w .ja v a 2 s . co m*/ public class Main { public static void main(String args[]) { Pattern p = Pattern.compile("Java \\d"); String candidate = "Java 4"; Matcher m = p.matcher(candidate); if (m != null) System.out.println(m.find()); } }
The code above generates the following result.
The following code \w+
to match any word.
Double slash is used to escape \
.
import java.util.regex.Matcher; import java.util.regex.Pattern; // w w w. java2s . c om public class Main { public static void main(String args[]) { String regex = "\\w+"; Pattern pattern = Pattern.compile(regex); String candidate = "asdf Java2s.com"; Matcher matcher = pattern.matcher(candidate); if (matcher.find()) { System.out.println("GROUP 0:" + matcher.group(0)); } } }
The code above generates the following result.