Slugify

Unicode cheatsheet for Python encoding.

Remove non-ASCII

Strip out non-ASCII characters like emojis and special language characters.

value.encode("ascii", errors="replace").decode()

This converts from Unicode to ASCII, then back again, replacing characters with ?.

e.g.

'accent _ é smiley 😀'.encode("ascii", errors="replace").decode()
# 'accent _ ? smiley ?'

Remove symbols

Replace all characters that are not alphanumeric with a dash. Adjacent characters will become a single dash.

SLUG_PATTERN.sub(NEW_CHAR, TEXT)

e.g. Note how the Unicode emoji gets stripped out, but the accented character is kept.

import re

PATTERN = re.compile(r"[\W_]+")
    
SLUG_PATTERN.sub('-', 'accent _ é smiley 😀')
'accent-é-smiley-'

e.g. Remove punctuation symbols.

SLUG_PATTERN.sub('-', '!@#$%^& é 😀')
# '-é-'

Remove non-ASCII and symbols

This combines the sections above.

It also strips out any dash characters on the ends of the result.

import re

PATTERN = re.compile(r"[\W_]+")


def slugify(value):
    """
    Convert value to a slug - safe for URLs and filenames.

    Non-ASCII characters and symbols are replaced, so the result is only basic
    alphanumeric and hyphens.
    """
    value = value.encode("ascii", errors="replace").decode()
    value = SLUG_PATTERN.sub("-", value)

    return value.strip("-")

Use packages

See old SO post.

Related

Remove non-ASCII

Remove symbols

Remove non-ASCII and symbols

Use packages