krzna February 2016

re.IGNORECASE unexpected behaviour in python 2.7

adding re.IGNORECASE to my regex causes some matches to fail. This is what I was trying:

print re.sub(r'[^a-z0-9 ]', '~', 'this (is) some tandom. text+ and [some] symbols {+/\-}', re.IGNORECASE)
>>>'this ~is~ some tandom. text+ and [some] symbols {+/\\-}'

we can see that many symbols were not replaced with '~' in the above, but when I try the same without re.IGNORECASE all the special characters are replaced with '~'

print re.sub(r'[^a-zA-Z0-9 ]', '~', 'this (is) some tandom. text+ and [some] symbols {+/\-}')
>>> 'this ~is~ some tandom~ text~ and ~some~ symbols ~~~~~~'

is there something I am missing about re.IGNORECASE? doesnt it just match both uppercase and lowercase alphabets while leaving the rest (digits, special chars, etc) unchanged? (I am using Anaconda's python 2.7 if that might be of any help)

Answers


Wiktor Stribi┼╝ew February 2016

You misplaced the flag value, use

print re.sub(r'[^a-z0-9 ]', '~', 'this (is) some tandom. text+ and [some] symbols {+/\-}', flags=re.IGNORECASE)
# or
print re.sub(r'[^a-z0-9 ]', '~', 'this (is) some tandom. text+ and [some] symbols {+/\-}', 0, re.IGNORECASE)

IDEONE demo

See re.sub docs:

re.sub(pattern, repl, string, count=0, flags=0) The optional argument count is the maximum number of pattern occurrences to be replaced; count must be a non-negative integer.

You use the flag instead of a count. When you passed re.IGNORECASE, the count became non-negative, and only replaced some, not all characters.

Post Status

Asked in February 2016
Viewed 2,519 times
Voted 9
Answered 1 times

Search




Leave an answer