Regular Expression
1.meta characters
. ^ $ * + ? { } [ ] \ | ( )
1.1) character class []
- regular expression between [ and ] is matching characters
- '-' operator mean From - To ( [0-5] -> [012345] )
- '^' operator mean Not
- '.' operator mean all characters except \n
- '*' operator mean repeat characters 0~N ( N maybe 2000000000 because memory limit )
- '+' operator mean repeat characters 1~N
- {m,n} operator mean repeat characters M~N
- '?' operator mean repeat characters 0~1
#regular expression
[abc]
# match
a
# match
before
# not match
dededede
# alphabet
[a-zA-Z]
# number
[0-9] = \d
# Not number
[^0-9] = \D
# Match whitespace characters
[ \t\n\r\f\v] = \s
# Not Match whitespace characters
[^ \t\n\r\f\v] = \S
# Match number+alpha
[a-zA-Z0-9_] = \w
# Not Match number+alpha
[a-zA-Z0-9_] = \W
# a+all characters+b
[a.b]
# Match a.b
[a[.]b]
2. Python regular expression
syntax
import re
p = re.compile('[a-z]+')
1. Match function
- match string data and regular expression
import re
p = re.compile('[a-z]+')
target = "python"
m = p.match(target)
# re.match('[a-z]+' , "python")
if m :
print('match {}'.format(m.group())
target = "1234"
# None
m = p.match(target)
if m :
print('not match')
2. Search function
- matach all string data and regular expression
import re
p = re.compile('[a-z]+')
m = p.search('1234 Python')
# re.search('[a-z]+' , "1234 Python")
if m :
print(m.group())
> ython
3. Findall function
- return list string data match regular expression
import re
p = re.compile('[0-9]+')
result = p.findall("life0s.12tooshort")
# re.findall('[0-9]+',"life0s.12tooshort")
print(result)
> ['0', '12']
4. Match method
- Object Match has group() , start() , end() , span() function
- group() is return match characters
- start() is return match characters start index
- end() is return match characters end index
- span() is return match characters (start,end) index
>>> m = p.match("python")
>>> m.group()
'python'
>>> m.start()
0
>>> m.end()
6
>>> m.span()
(0, 6)