regex - Python regular expressions: search and replace weirdness -


i use python regular expression problem. you'd expect result of

import re re.sub("s (.*?) s", "no", "this string")  

to "this no string", right? in reality it's "thinotring". sub function uses entire pattern group replace, instead of group want replace.

all re.sub examples deal simple word replacement, if want change depending on rest of string? in example...

any appreciated.

edit:

the look-behind , look-forward tricks won't work in case, need fixed width. here actual expression:

re.sub(r"<a.*?href=['\"]((?!http).*?)['\"].*?>", 'test', string) 

i want use find links in string don't begin http, can prefix in front of links (to make them absolute rather relative).

your regex matches first s last s, if replace match "no", "thinotring".

the parentheses don't limit match, capture text matched whatever inside them in special variable called backreference. in example, backreference number 1 contain is a. can refer backreference later in same regex using backslashes , number of backreference: \1.

what want lookaround:

re.sub(r"(?<=s ).*?(?= s)", "no", "this string") 

(?<=s ) means: assert possible match s before current position in string, don't make part of match.

same (?= s), asserts string continue s after current position.

be advised lookbehind in python limited strings of fixed length. if problem, can sort of work around using...backreferences!

re.sub(r"(s ).*?( s)", r"\1no\2", "this string") 

ok, contrived example, shows can do. edit, it's becoming apparent you're trying parse html regex. not such idea. search "regex html" , you'll see why.

if still want it:

re.sub(r"(<a.*?href=['"])((?!http).*?['"].*?>)", r'\1http://\2', string) 

might work. extremely brittle.


Comments

Popular posts from this blog

c++ - Convert big endian to little endian when reading from a binary file -

C#: Application without a window or taskbar item (background app) that can still use Console.WriteLine() -

unicode - Are email addresses allowed to contain non-alphanumeric characters? -