Regular Expression

What Is a Regular Expression?

A regular expression is a pattern that can match a piece of text.

The Wildcard

The period character (dot) matches any single character. '.ython' would match both the string 'python' and the string 'jython'.

Escaping Special Characters

To escape you place a backslash in front of it. Use 'java2s\\.com' matches 'python.org', and nothing else.

Character Sets

Character set is created by enclosing a substring in brackets.

Such a character set will match any of the characters it contains, so '[pj]ython' would match both 'python' and 'jython', but nothing else.

We can use ranges, such as '[a-z]' to match any character from a to z, and we can combine such ranges by putting one after another, such as '[a-zA-Z0-9]' to match uppercase and lowercase letters and digits. The character set will match only one such character.

To invert the character set, put the character ^ first, as in '[^abc]' to match any character except a, b, or c.

In general, special characters such as dots, asterisks, and question marks have to be escaped with a backslash.

Alternatives and Subpatterns

We use the "pipe" character | for alternatives. So, your pattern would be 'python|perl'.

To use the choice operator on part of a string.

'p(ython|erl)'

Optional and Repeated Subpatterns

A question mark after a subpattern makes it optional. Each optional subpattern is enclosed in parentheses. It may appear in the matched string. For example, the pattern

r'(http://)?(www\.)?java2s\.com'

would match all of the following strings (and nothing else):


'http://www.java2s.com' 
'http://java2s.com' 
'www.java2s.com' 
'java2s.com' 

The question mark means that the subpattern can appear once or not at all. There are a few other operators that allow you to repeat a subpattern more than once:

Pattern Meaning
(pattern)* pattern is repeated zero or more times
(pattern)+ pattern is repeated one or more times
(pattern){m,n} pattern is repeated from m to n times

For example, r'w*\.java2s\.com' matches 'www.python.org', but also '.java2s.com', 'ww.java2s.com', and 'wwwwwww.java2s.com'.

Similarly, r'w+\.java2s\.com' matches 'w.python.org' but not '.java2s.com', and r'w{3,4}\.java2s\.com' matches only 'www.java2s.com' and 'wwww.java2s.com'.

Beginning and End of a String

We use a caret ^ to mark the beginning:'^ht+p' would match 'http://java2s.com' and 'htttttp://java2s.com', but not 'www.http.com'.

The end of a string may be indicated by the dollar sign $.





















Home »
  Python »
    Language Basics »




Python Basics
Operator
Statements
Function Definition
Class
Buildin Functions
Buildin Modules