REGEXP_SUBSTR function
REGEXP_SUBSTR returns string of data type VARCHAR2 or CLOB. REGEXP_SUBSTR uses regular expressions to specify the start and end of the returned string. The simplest format for this function is:
REGEXP_SUBSTR(source_string, pattern_to_find)The general format for the function with all the options is:
REGEXP_SUBSTR(source-string,
pattern-to-find [,
position,
occurrence,
match-parameter])
- source_string:the source string
- pattern-to-find:the pattern to search for.
- position:where to start.
- occurrence:occurrence of the pattern-to-find in the source-string you want to search for.
- match-parameter:for further customizing.
- "i" in match-parameter can be used for caseinsensitive matching
- "c" in match-parameter can be used for casesensitive matching
- "n" in match-parameter allows the period to match the new line character
- "m" in match-parameter allows for more than one line in source-string
The metacharacters list:
| Metacharacters | Meaning |
|---|---|
| \ | Specify the escape sequence |
| \d | Digit character |
| \D | Non-digit character |
| \w | Word character |
| \W | Non-word character |
| \s | Whitespace character |
| \S | Non-whitespace character |
| \A | Matches only at the beginning of a string or before a newline character at the end of a string |
| \Z | Matches only at the end of a string |
| ^ | Matches the position at the start of the string. |
| $ | Matches the position at the end of the string. |
| * | Matches the preceding character zero or more times. |
| + | Matches the preceding character one or more times. |
| ? | Matches the preceding character zero or one time. |
| *? | Matches the preceding pattern element 0 or more times |
| +? | Matches the preceding pattern element 1 or more times |
| ?? | Matches the preceding pattern element 0 or 1 time |
| {n} | Matches a character exactly n times, where n is an integer. |
| {n,} | Matches the preceding pattern element at least n times |
| {n,m} | Matches a character at least n times and at most m times, where n and m are both integers. |
| . | Matches any single character except null. |
| (pattern) | A subexpression that matches the specified pattern. |
| x|y | Matches x or y, where x and y are one or more characters. war|peace matches war or peace. |
| [abc] | Matches any of the enclosed characters. |
| [a-z] | Matches any character in the specified range. |
| [:alphanum:] | matches alphanumeric characters 0-9, A-Z, and a-z. |
| [:alpha:] | matches alphabetic characters A-Z and a-z. |
| [:blank:] | matches space or tab. |
| [:digit:] | matches digits 0-9. |
| [:graph:] | matches non-blank characters. |
| [:lower:] | matches lowercase alphabetic characters a-z. |
| [:print:] | is similar to [:graph:] except [:print:] includes the space character. |
| [:punct:] | matches punctuation characters .,"`, and so on. |
| [:space:] | matches all whitespace characters. |
| [:upper:] | matches all uppercase alphabetic characters A-Z. |
| [:xdigit:] | matches characters permissible in a hexadecimal number 0-9, A-F, and a-f. |
| [..] | Matches one collation element, like a multicharacter element. |
| [==] | Specifies equivalence classes. |
| \n | A backreference to an earlier capture, where n is a positive integer. |
Example:
| Example | Description |
|---|---|
| \n | matches the newline character |
| \\ | matches \ |
| \( | matches ( |
| ^A | matches if A is the first character in the string. |
| $B | matches if B is the last character in the string. |
| f*d | matches flood, food, and so on. |
| fo+d | matches fod, food, and so on. |
| fo?d | matches fd and fod only. |
| fo{2}d | matches food. |
| fo{2,3}d | matches food and foood only. |
| [ab]bc | matches abc and bbc. |
| [a-c]bc | matches abc, bbc, and cbc. |
SQL> SELECT REGEXP_SUBSTR('Mississippi', 'si', 1, 2, 'i') FROM dual;
RE
--
si
SQL> SELECT REGEXP_SUBSTR('asdf asdf lord ','l[[:alpha:]]{2}') AS result FROM dual;
RES
---
lor
SQL>