All the examples here are in Python 3.

Unicode escaped characters

Start a string with \u to make a unicode characters using its code.

A unicode character will be displayed in human-readable form.

>>> '\u003e'
'>'

>>> print('\u003e')
>
>>> '\u00cd'
'ร'
>>> '\uabcd'
'๊ฏ'
>>> 'foo \u003e \u00cd \xf0\x9f\x98\x80 baz'
'foo > ร รฐ\x9f\x98\x80 baz'

Raw

If you make it a raw string, then Python will escape it.

>>> r'\u003e'
'\\u003e'

Same as above but with escaping backslash using a double backslash.

>>> '\\u003e'
'\\u003e'
>>> print(r'\u003e')
\u003e

Unicode characters

Note that in Python 3, all strings are unicode strings. There is no unicode keyword anymore, only str.

An emoji:

>>> '๐Ÿ˜€'
'๐Ÿ˜€'

Convert from string to bytes

>>> 'Hello ๐Ÿ˜€'.encode('utf-8')
b'Hello \xf0\x9f\x98\x80'

Here you could actually leave out utf-8 and get the same result, but it is good to be explicit.

Convert from bytes to string

>>> b'Hello \xf0\x9f\x98\x80'.decode('utf-8')
'Hello ๐Ÿ˜€'

Note the unicode characters are left as is.

>>> b'foo \u003e \u00cd \xf0\x9f\x98\x80 baz'.decode('utf-8')
'foo \\u003e \\u00cd ๐Ÿ˜€ baz'

Encode as ASCII

Convert from string to bytes. We specify the ASCII standard, which will give an error on unicode characters which cannot be represented..

The default behavior implies errors='strict' and can raise an error.

>>> 'Hello ๐Ÿ˜€'.encode('ascii')
# UnicodeEncodeError: 'ascii' codec can't encode character '\U0001f600' in position 6: ordinal not in range(128)

We can choose to replace or ignore unicode characters.

>>> 'Hello ๐Ÿ˜€ ั…ะตะปะปะพ world'.encode('ascii', errors='replace')
b'Hello ? ????? world'
>>> plain_text = 
>>> 'Hello ๐Ÿ˜€ ั…ะตะปะปะพ world'.encode('ascii', errors='ignore')
b'Hello   world'

Note youโ€™ll get bytes above, so you should convert back to string with .decode(), so you can work with it as a string.

'Hello ๐Ÿ˜€ ั…ะตะปะปะพ world'.encode('ascii', errors='replace').decode()

That is useful for stripping out non-ascii characters.

Codecs

Using the built-in codecs module.

Example

>>> import codecs
>>> codecs.decode(r'\u003e', 'unicode-escape')
'>'

API

Some

codecs.encode(obj, encoding='utf-8', errors='strict')

codecs.decode(obj, encoding='utf-8', errors='strict')