python - What exactly do "u" and "r" string flags do, and what are raw string literals? -


while asking this question, realized didn't know raw strings. claiming django trainer, sucks.

i know encoding is, , know u alone since unicode.

but r exactly? kind of string result in?

and above all, heck ur do?

finally, there reliable way go unicode string simple raw string?

ah, , way, if system , text editor charset set utf-8, u anything?

there's not "raw string"; there raw string literals, string literals marked 'r' before opening quote.

a "raw string literal" different syntax string literal, in backslash, \, taken meaning "just backslash" (except when comes right before quote otherwise terminate literal) -- no "escape sequences" represent newlines, tabs, backspaces, form-feeds, , on. in normal string literals, each backslash must doubled avoid being taken start of escape sequence.

this syntax variant exists because syntax of regular expression patterns heavy backslashes (but never @ end, "except" clause above doesn't matter) , looks bit better when avoid doubling each of them -- that's all. gained popularity express native windows file paths (with backslashes instead of regular slashes on other platforms), that's needed (since normal slashes work fine on windows too) , imperfect (due "except" clause above).

r'...' byte string (in python 2.*), ur'...' unicode string (again, in python 2.*), , of other 3 kinds of quoting produces same types of strings (so example r'...', r'''...''', r"...", r"""...""" byte strings, , on).

not sure mean "going back" - there no intrinsically , forward directions, because there's no raw string type, it's alternative syntax express normal string objects, byte or unicode may be.

and yes, in python 2.*, u'...' is of course distinct '...' -- former unicode string, latter byte string. encoding literal might expressed in orthogonal issue.

e.g., consider (python 2.6):

>>> sys.getsizeof('ciao') 28 >>> sys.getsizeof(u'ciao') 34 

the unicode object of course takes more memory space (very small difference short string, ;-).


Comments

Popular posts from this blog

c++ - Convert big endian to little endian when reading from a binary file -

C#: Application without a window or taskbar item (background app) that can still use Console.WriteLine() -

unicode - Are email addresses allowed to contain non-alphanumeric characters? -