The escape sequences found in strings, ucs and cset literals are shown below. Each of these types interpret the sequences slightly differently, as described in the following sections.
Escape sequence | Character | ASCII |
---|---|---|
\b | Backspace | 8 |
\d | Delete | 127 |
\e | Escape | 27 |
\f | Formfeed | 12 |
\l | Linefeed | 10 |
\n | Newline | 10 |
\r | Return | 13 |
\t | Horizontal tab | 9 |
\v | Vertical tab | 11 |
\’ | Single quote | 39 |
\" | Double quote | 34 |
\\ | Backslash | 92 |
\ddd | Octal code, 1-3 digits | |
\xdd | Hexadecimal code, 1 or 2 digits | |
\^c | Control code | |
\N | Platform line ending (not csets) | 10 or 13,10 |
\udddd | Unicode 1-4 hexadecimal digits | |
\Udddddd | Unicode 1-6 hexadecimal digits |
Each escape sequence (except perhaps \N) produces one character. In particular, the two Unicode escape sequences produce a single character corresponding to one Unicode code point. The other characters in the string must form valid UTF-8; so for example
u"\xFF"
is not allowed. Instead, to get a string containing a single character 255 one should use :-
u"\u00FF"
Plain string literals are just the same as ucs, with two exceptions. Firstly, there is of course no restriction regarding valid UTF-8. Secondly, the Unicode escapes expand to the UTF-8 sequences for the particular codepoint. Thus
"\u00FF"
is identical to
"\xc3\xbf"
Like ucs, each escape sequence represents a single character. The \N sequence is not recognised (and just represents “N”). Since a cset is really just a set of character numbers (and knows nothing of UTF-8),
'\xFF'
is identical to
'\u00FF'
Both contain a single character number 255.
The hyphen character has a special meaning in cset literals. It is used to specify ranges of characters. For example
'a-z'
is the same as the predefined value &lcase
. Therefore the hyphen must be escaped when used outside of this context :-
punctuation := '.,;!\-:'
Without the backslash, the above cset would also contain all 24 ASCII characters between “!” and “:”.
Contents