emoji_data.definitions module#

Regular expressions for Emoji Definitions

Note

initial_emoji_patterns() MUST be called first before using any of the functions in the module.

class emoji_data.definitions.QualifiedType(*values)[source]#

Bases: Enum

RGI_Emoji_Qualification — the status of emoji sequences

This is an enumerated property of strings, defined by the emoji-test.txt file [emoji-data]. It assigns one of the three values in ED-18, ED-18a, ED-19 to each emoji in ED-27 RGI emoji set and related sequences with missing variation selectors. The property value names and short aliases are:

  • Fully_Qualified, FQE

  • Minimally_Qualified, MQE

  • Unqualified, UQE

FULLY_QUALIFIED = 'FQE'#
MINIMALLY_QUALIFIED = 'MQE'#
UNQUALIFIED = 'UQE'#
emoji_data.definitions.detect_qualified(s)[source]#

Detect qualified type of emoji string

  • qualified emoji character — An emoji character in a string that
      1. has default emoji presentation or

      1. is the first character in an emoji modifier sequence or

      1. is not a default emoji presentation character, but is the first character in an emoji presentation sequence.

  • fully-qualified emoji — A qualified emoji character, or an emoji sequence in which each emoji character is qualified.

  • minimally-qualified emoji — An emoji sequence in which the first character is qualified but the sequence is not fully qualified.

  • unqualified emoji — An emoji that is neither fully-qualified nor minimally qualified.

Parameters:

s (str) – Emoji string to detect

Return type:

QualifiedType

Return type:

QualifiedType

Parameters:

s (str)

emoji_data.definitions.get_emoji_patterns()[source]#
Return type:

Mapping[str, Pattern[str]]

emoji_data.definitions.initial_emoji_patterns()[source]#

Initial the emoji patterns dictionary

MUST be called first before using any of the functions in the module.

emoji_data.definitions.is_basic_emoji_character(c)[source]#

basic emoji — Emoji characters excluding Emoji Components

basic_emoji := emoji_character - emoji_component
Return type:

bool

Parameters:

c (str)

  • These characters are emoji characters but not emoji components.

emoji_data.definitions.is_default_emoji_presentation_character(c)[source]#

default emoji presentation character — A character that, by default, should appear with an emoji presentation, rather than a text presentation.

default_emoji_presentation_character := \p{Emoji_Presentation}
Return type:

bool

Parameters:

c (str)

emoji_data.definitions.is_default_text_presentation_character(c)[source]#

default text presentation character — A character that, by default, should appear with a text presentation, rather than an emoji presentation.

default_text_presentation_character := \P{Emoji_Presentation}
Return type:

bool

Parameters:

c (str)

emoji_data.definitions.is_emoji_character(c)[source]#

detect emoji character

A character that has the Emoji property.

emoji_character := \p{Emoji}
Return type:

bool

Parameters:

c (str)

  • These characters are recommended for use as emoji.

emoji_data.definitions.is_emoji_combining_sequence(s)[source]#

Emoji combining sequence

An emoji combining sequence is a combination of:

  1. A base emoji character

  2. One or more combining characters (like skin tone modifiers)

This typically includes:

  • Emoji modifier sequences (emoji + skin tone modifier)

  • Emoji ZWJ sequences (emojis combined with Zero Width Joiner)

  • Emoji presentation sequences

  • Text presentation sequences

Return type:

bool

Parameters:

s (str)

emoji_data.definitions.is_emoji_component(c)[source]#

emoji component — A character that has the Emoji_Component property.

  • These characters are used in emoji sequences but normally do not appear on emoji keyboards as separate choices, such as keycap base characters or Regional_Indicator characters.

  • Some emoji components are emoji characters, and others (such as tag characters and ZWJ) are not.

Return type:

bool

Parameters:

c (str)

emoji_data.definitions.is_emoji_core_sequence(s)[source]#

emoji core sequence — A sequence of the following form:

emoji_core_sequence :=
    emoji_character
| emoji_presentation_sequence
| emoji_keycap_sequence
| emoji_modifier_sequence
| emoji_flag_sequence
Return type:

bool

Parameters:

s (str)

emoji_data.definitions.is_emoji_flag_sequence(s)[source]#

emoji flag sequence — A sequence of two Regional Indicator characters, where the corresponding ASCII characters are valid region sequences as specified by Unicode region subtags in [CLDR], with idStatus = “regular”, “deprecated”, or “macroregion”. See also Annex B: Valid Emoji Flag Sequences.

emoji_flag_sequence :=
    regional_indicator regional_indicator

regional_indicator := \p{Regional_Indicator}

A singleton Regional Indicator character is not a well-formed emoji flag sequence.

Return type:

bool

Parameters:

s (str)

emoji_data.definitions.is_emoji_keycap_sequence(s)[source]#

emoji keycap sequence — A sequence of the following form:

emoji_keycap_sequence := [0-9#*] \x{FE0F 20E3}
Return type:

bool

Parameters:

s (str)

  • These sequences are in the emoji-sequences.txt file listed under the type_field Emoji_Keycap_Sequence

emoji_data.definitions.is_emoji_modifier(c)[source]#

emoji modifier — A character that can be used to modify the appearance of a preceding emoji in an emoji modifier sequence.

emoji_modifier := \p{Emoji_Modifier}
Return type:

bool

Parameters:

c (str)

emoji_data.definitions.is_emoji_modifier_base(c)[source]#

emoji modifier base — A character whose appearance can be modified by a subsequent emoji modifier in an emoji modifier sequence.

emoji_modifier_base := \p{Emoji_Modifier_Base}
Return type:

bool

Parameters:

c (str)

emoji_data.definitions.is_emoji_modifier_sequence(s)[source]#

emoji modifier sequence — A sequence of the following form:

emoji_modifier_sequence :=
    emoji_modifier_base emoji_modifier
Return type:

bool

Parameters:

s (str)

emoji_data.definitions.is_emoji_presentation_selector(c)[source]#

emoji presentation selector — The character U+FE0F VARIATION SELECTOR-16 (VS16), used to request an emoji presentation for an emoji character. (Also known as emoji variation selector in prior versions of this specification.)

emoji_presentation_selector := \x{FE0F}
Return type:

bool

Parameters:

c (str)

emoji_data.definitions.is_emoji_presentation_sequence(s)[source]#

emoji presentation sequence — A variation sequence consisting of an emoji character followed by a emoji presentation selector.

emoji_presentation_sequence := emoji_character emoji_presentation_selector
Return type:

bool

Parameters:

s (str)

  • The only valid emoji presentation sequences are those listed in emoji-variation-sequences.txt [emoji-data].

emoji_data.definitions.is_emoji_sequence(s)[source]#

emoji sequence — A core sequence, tag sequence, or ZWJ sequence, as follows:

emoji_sequence :=
    emoji_core_sequence
| emoji_zwj_sequence
| emoji_tag_sequence
Return type:

bool

Parameters:

s (str)

Note

all emoji sequences are single grapheme clusters: there is never a grapheme cluster boundary within an emoji sequence. This affects editing operations, such as cursor movement or deletion, as well as word break, line break, and so on. For more information, see [UAX29].

emoji_data.definitions.is_emoji_tag_sequence(s)[source]#

emoji tag sequence (ETS) — A sequence of the following form:

emoji_tag_sequence := tag_base tag_spec tag_end
tag_base           := emoji_character
                    | emoji_modifier_sequence
                    | emoji_presentation_sequence
tag_spec           := [\x{E0020}-\x{E007E}]+
tag_end            := \x{E007F}
Return type:

bool

Parameters:

s (str)

  • The tag_spec consists of all characters from U+E0020 TAG SPACE to U+E007E TAG TILDE. Each tag_spec defines a particular visual variant to be applied to the tag_base character(s). Though tag_spec includes the values U+E0041 TAG LATIN CAPITAL LETTER A .. U+E005A TAG LATIN CAPITAL LETTER Z, they are not used currently and are reserved for future extensions.

  • The tag_end consists of the character U+E007F CANCEL TAG, and must be used to terminate the sequence.

  • A sequence of tag characters that is not part of an emoji_tag_sequence is not a well-formed emoji tag sequence.

emoji_data.definitions.is_emoji_zwj_element(s)[source]#

emoji ZWJ element — An element that can be used in an emoji ZWJ sequence, as follows:

emoji_zwj_element :=
    emoji_core_sequence
| emoji_tag_sequence
Return type:

bool

Parameters:

s (str)

emoji_data.definitions.is_emoji_zwj_sequence(s)[source]#

emoji ZWJ sequence — An emoji sequence with at least one joiner character.

emoji_zwj_sequence :=
emoji_zwj_element ( ZWJ emoji_zwj_element )+

ZWJ := \x{200d}
Return type:

bool

Parameters:

s (str)

emoji_data.definitions.is_extended_pictographic_character(c)[source]#

extended pictographic character — a character that has the Extended_Pictographic property.

  • These characters are pictographic, or otherwise similar in kind to characters with the Emoji property.

  • The Extended_Pictographic property is used to customize segmentation (as described in [UAX29] and [UAX14]) so that possible future emoji ZWJ sequences will not break grapheme clusters, words, or lines. Unassigned codepoints with Line_Break=ID in some blocks are also assigned the Extended_Pictographic property. Those blocks are intended for future allocation of emoji characters.

Return type:

bool

Parameters:

c (str)

emoji_data.definitions.is_qualified_emoji_character(s, i)[source]#

An emoji character in a string that

    1. has default emoji presentation or

    1. is the first character in an emoji modifier sequence or

    1. is not a default emoji presentation character, but is the first character in an emoji presentation sequence.

Parameters:
  • s (str) – the string where the character in it

  • i (int) – index of the character in the string to check if qualified

Return type:

bool

Return type:

bool

Parameters:
emoji_data.definitions.is_regional_indicator(s)[source]#

A singleton Regional Indicator character is not a well-formed emoji flag sequence.

Return type:

bool

Parameters:

s (str)

emoji_data.definitions.is_rgi_emoji_sequence(s)[source]#

RGI emoji sequence - Recommended for General Interchange emoji sequences

These are the only emoji sequences that are recommended for general interchange.

Return type:

bool

Parameters:

s (str)

emoji_data.definitions.is_tag_base(s)[source]#
Return type:

bool

Parameters:

s (str)

emoji_data.definitions.is_tag_spec(s)[source]#
Return type:

bool

Parameters:

s (str)

emoji_data.definitions.is_tag_term(c)[source]#
Return type:

bool

Parameters:

c (str)

emoji_data.definitions.is_text_presentation_selector(c)[source]#

text presentation selector — The character U+FE0E VARIATION SELECTOR-15 (VS15), used to request a text presentation for an emoji character. (Also known as text variation selector in prior versions of this specification.)

text_presentation_selector := \x{FE0E}
Return type:

bool

Parameters:

c (str)

emoji_data.definitions.is_text_presentation_sequence(s)[source]#

text presentation sequence — A variation sequence consisting of an emoji character followed by a text presentation selector.

text_presentation_sequence := emoji_character text_presentation_selector
Return type:

bool

Parameters:

s (str)

  • The only valid text presentation sequences are those listed in emoji-variation-sequences.txt [emoji-data].

emoji_data.definitions.release_emoji_patterns()[source]#

Release emoji patterns dictionary