Literals and escape sequences

Introduction

The escape sequences found in strings, ucs and cset literals are shown below. Each of these types interpret the sequences slightly differently, as described in the following sections.

Escape sequences

Escape sequence	Character	ASCII
\b	Backspace	8
\d	Delete	127
\e	Escape	27
\f	Formfeed	12
\l	Linefeed	10
\n	Newline	10
\r	Return	13
\t	Horizontal tab	9
\v	Vertical tab	11
\’	Single quote	39
\"	Double quote	34
\\	Backslash	92
\ddd	Octal code, 1-3 digits
\xdd	Hexadecimal code, 1 or 2 digits
\^c	Control code
\N	Platform line ending (not csets)	10 or 13,10
\udddd	Unicode 1-4 hexadecimal digits
\Udddddd	Unicode 1-6 hexadecimal digits

Ucs literals

Each escape sequence (except perhaps \N) produces one character. In particular, the two Unicode escape sequences produce a single character corresponding to one Unicode code point. The other characters in the string must form valid UTF-8; so for example

  u"\xFF"

is not allowed. Instead, to get a string containing a single character 255 one should use :-

  u"\u00FF"

String literals

Plain string literals are just the same as ucs, with two exceptions. Firstly, there is of course no restriction regarding valid UTF-8. Secondly, the Unicode escapes expand to the UTF-8 sequences for the particular codepoint. Thus

  "\u00FF"

is identical to

  "\xc3\xbf"

Cset literals

Like ucs, each escape sequence represents a single character. The \N sequence is not recognised (and just represents “N”). Since a cset is really just a set of character numbers (and knows nothing of UTF-8),

  '\xFF'

is identical to

  '\u00FF'

Both contain a single character number 255.

The hyphen character has a special meaning in cset literals. It is used to specify ranges of characters. For example

  'a-z'

is the same as the predefined value &lcase. Therefore the hyphen must be escaped when used outside of this context :-

  punctuation := '.,;!\-:'

Without the backslash, the above cset would also contain all 24 ASCII characters between “!” and “:”.

Contents