feat: 9.5.9
This commit is contained in:
parent
cb1753732b
commit
35f43a7909
1084 changed files with 558985 additions and 0 deletions
221
unicode/UNIDATA/CompositionExclusions.txt
Normal file
221
unicode/UNIDATA/CompositionExclusions.txt
Normal file
|
|
@ -0,0 +1,221 @@
|
|||
# CompositionExclusions-14.0.0.txt
|
||||
# Date: 2021-03-30, 23:59:00 GMT [KW, LI]
|
||||
# © 2021 Unicode®, Inc.
|
||||
# For terms of use, see https://www.unicode.org/terms_of_use.html
|
||||
#
|
||||
# Unicode Character Database
|
||||
# For documentation, see https://www.unicode.org/reports/tr44/
|
||||
#
|
||||
# This file lists the characters for the Composition Exclusion Table
|
||||
# defined in UAX #15, Unicode Normalization Forms.
|
||||
#
|
||||
# This file is a normative contributory data file in the
|
||||
# Unicode Character Database.
|
||||
#
|
||||
# For more information, see
|
||||
# https://www.unicode.org/reports/tr15/#Primary_Exclusion_List_Table
|
||||
#
|
||||
# For a full derivation of composition exclusions, see the derived property
|
||||
# Full_Composition_Exclusion in DerivedNormalizationProps.txt
|
||||
#
|
||||
|
||||
# ================================================
|
||||
# (1) Script Specifics
|
||||
#
|
||||
# This list of characters cannot be derived from the UnicodeData.txt file.
|
||||
#
|
||||
# Included are the following subcategories:
|
||||
#
|
||||
# - Many precomposed characters using a nukta diacritic in the Devanagari,
|
||||
# Bangla/Bengali, Gurmukhi, or Odia/Oriya scripts.
|
||||
# - Tibetan letters and subjoined letters with decompositions including
|
||||
# U+0FB7 TIBETAN SUBJOINED LETTER HA or U+0FB5 TIBETAN SUBJOINED LETTER SSA.
|
||||
# - Two two-part Tibetan vowel signs involving top and bottom pieces.
|
||||
# - A large collection of compatibility precomposed characters for Hebrew
|
||||
# involving dagesh and/or other combining marks.
|
||||
#
|
||||
# This list is unlikely to grow.
|
||||
#
|
||||
# ================================================
|
||||
|
||||
0958 # DEVANAGARI LETTER QA
|
||||
0959 # DEVANAGARI LETTER KHHA
|
||||
095A # DEVANAGARI LETTER GHHA
|
||||
095B # DEVANAGARI LETTER ZA
|
||||
095C # DEVANAGARI LETTER DDDHA
|
||||
095D # DEVANAGARI LETTER RHA
|
||||
095E # DEVANAGARI LETTER FA
|
||||
095F # DEVANAGARI LETTER YYA
|
||||
09DC # BENGALI LETTER RRA
|
||||
09DD # BENGALI LETTER RHA
|
||||
09DF # BENGALI LETTER YYA
|
||||
0A33 # GURMUKHI LETTER LLA
|
||||
0A36 # GURMUKHI LETTER SHA
|
||||
0A59 # GURMUKHI LETTER KHHA
|
||||
0A5A # GURMUKHI LETTER GHHA
|
||||
0A5B # GURMUKHI LETTER ZA
|
||||
0A5E # GURMUKHI LETTER FA
|
||||
0B5C # ORIYA LETTER RRA
|
||||
0B5D # ORIYA LETTER RHA
|
||||
0F43 # TIBETAN LETTER GHA
|
||||
0F4D # TIBETAN LETTER DDHA
|
||||
0F52 # TIBETAN LETTER DHA
|
||||
0F57 # TIBETAN LETTER BHA
|
||||
0F5C # TIBETAN LETTER DZHA
|
||||
0F69 # TIBETAN LETTER KSSA
|
||||
0F76 # TIBETAN VOWEL SIGN VOCALIC R
|
||||
0F78 # TIBETAN VOWEL SIGN VOCALIC L
|
||||
0F93 # TIBETAN SUBJOINED LETTER GHA
|
||||
0F9D # TIBETAN SUBJOINED LETTER DDHA
|
||||
0FA2 # TIBETAN SUBJOINED LETTER DHA
|
||||
0FA7 # TIBETAN SUBJOINED LETTER BHA
|
||||
0FAC # TIBETAN SUBJOINED LETTER DZHA
|
||||
0FB9 # TIBETAN SUBJOINED LETTER KSSA
|
||||
FB1D # HEBREW LETTER YOD WITH HIRIQ
|
||||
FB1F # HEBREW LIGATURE YIDDISH YOD YOD PATAH
|
||||
FB2A # HEBREW LETTER SHIN WITH SHIN DOT
|
||||
FB2B # HEBREW LETTER SHIN WITH SIN DOT
|
||||
FB2C # HEBREW LETTER SHIN WITH DAGESH AND SHIN DOT
|
||||
FB2D # HEBREW LETTER SHIN WITH DAGESH AND SIN DOT
|
||||
FB2E # HEBREW LETTER ALEF WITH PATAH
|
||||
FB2F # HEBREW LETTER ALEF WITH QAMATS
|
||||
FB30 # HEBREW LETTER ALEF WITH MAPIQ
|
||||
FB31 # HEBREW LETTER BET WITH DAGESH
|
||||
FB32 # HEBREW LETTER GIMEL WITH DAGESH
|
||||
FB33 # HEBREW LETTER DALET WITH DAGESH
|
||||
FB34 # HEBREW LETTER HE WITH MAPIQ
|
||||
FB35 # HEBREW LETTER VAV WITH DAGESH
|
||||
FB36 # HEBREW LETTER ZAYIN WITH DAGESH
|
||||
FB38 # HEBREW LETTER TET WITH DAGESH
|
||||
FB39 # HEBREW LETTER YOD WITH DAGESH
|
||||
FB3A # HEBREW LETTER FINAL KAF WITH DAGESH
|
||||
FB3B # HEBREW LETTER KAF WITH DAGESH
|
||||
FB3C # HEBREW LETTER LAMED WITH DAGESH
|
||||
FB3E # HEBREW LETTER MEM WITH DAGESH
|
||||
FB40 # HEBREW LETTER NUN WITH DAGESH
|
||||
FB41 # HEBREW LETTER SAMEKH WITH DAGESH
|
||||
FB43 # HEBREW LETTER FINAL PE WITH DAGESH
|
||||
FB44 # HEBREW LETTER PE WITH DAGESH
|
||||
FB46 # HEBREW LETTER TSADI WITH DAGESH
|
||||
FB47 # HEBREW LETTER QOF WITH DAGESH
|
||||
FB48 # HEBREW LETTER RESH WITH DAGESH
|
||||
FB49 # HEBREW LETTER SHIN WITH DAGESH
|
||||
FB4A # HEBREW LETTER TAV WITH DAGESH
|
||||
FB4B # HEBREW LETTER VAV WITH HOLAM
|
||||
FB4C # HEBREW LETTER BET WITH RAFE
|
||||
FB4D # HEBREW LETTER KAF WITH RAFE
|
||||
FB4E # HEBREW LETTER PE WITH RAFE
|
||||
|
||||
# Total code points: 67
|
||||
|
||||
# ================================================
|
||||
# (2) Post Composition Version precomposed characters
|
||||
#
|
||||
# These characters cannot be derived solely from the UnicodeData.txt file
|
||||
# in this version of Unicode.
|
||||
#
|
||||
# Note that characters added to the standard after the
|
||||
# Composition Version and which have canonical decomposition mappings
|
||||
# are not automatically added to this list of Post Composition
|
||||
# Version precomposed characters.
|
||||
# ================================================
|
||||
|
||||
2ADC # FORKING
|
||||
1D15E # MUSICAL SYMBOL HALF NOTE
|
||||
1D15F # MUSICAL SYMBOL QUARTER NOTE
|
||||
1D160 # MUSICAL SYMBOL EIGHTH NOTE
|
||||
1D161 # MUSICAL SYMBOL SIXTEENTH NOTE
|
||||
1D162 # MUSICAL SYMBOL THIRTY-SECOND NOTE
|
||||
1D163 # MUSICAL SYMBOL SIXTY-FOURTH NOTE
|
||||
1D164 # MUSICAL SYMBOL ONE HUNDRED TWENTY-EIGHTH NOTE
|
||||
1D1BB # MUSICAL SYMBOL MINIMA
|
||||
1D1BC # MUSICAL SYMBOL MINIMA BLACK
|
||||
1D1BD # MUSICAL SYMBOL SEMIMINIMA WHITE
|
||||
1D1BE # MUSICAL SYMBOL SEMIMINIMA BLACK
|
||||
1D1BF # MUSICAL SYMBOL FUSA WHITE
|
||||
1D1C0 # MUSICAL SYMBOL FUSA BLACK
|
||||
|
||||
# Total code points: 14
|
||||
|
||||
# ================================================
|
||||
# (3) Singleton Decompositions
|
||||
#
|
||||
# These characters can be derived from the UnicodeData.txt file
|
||||
# by including all canonically decomposable characters whose
|
||||
# canonical decomposition consists of a single character.
|
||||
#
|
||||
# These characters are simply quoted here for reference.
|
||||
# See also Full_Composition_Exclusion in DerivedNormalizationProps.txt
|
||||
# ================================================
|
||||
|
||||
# 0340..0341 [2] COMBINING GRAVE TONE MARK..COMBINING ACUTE TONE MARK
|
||||
# 0343 COMBINING GREEK KORONIS
|
||||
# 0374 GREEK NUMERAL SIGN
|
||||
# 037E GREEK QUESTION MARK
|
||||
# 0387 GREEK ANO TELEIA
|
||||
# 1F71 GREEK SMALL LETTER ALPHA WITH OXIA
|
||||
# 1F73 GREEK SMALL LETTER EPSILON WITH OXIA
|
||||
# 1F75 GREEK SMALL LETTER ETA WITH OXIA
|
||||
# 1F77 GREEK SMALL LETTER IOTA WITH OXIA
|
||||
# 1F79 GREEK SMALL LETTER OMICRON WITH OXIA
|
||||
# 1F7B GREEK SMALL LETTER UPSILON WITH OXIA
|
||||
# 1F7D GREEK SMALL LETTER OMEGA WITH OXIA
|
||||
# 1FBB GREEK CAPITAL LETTER ALPHA WITH OXIA
|
||||
# 1FBE GREEK PROSGEGRAMMENI
|
||||
# 1FC9 GREEK CAPITAL LETTER EPSILON WITH OXIA
|
||||
# 1FCB GREEK CAPITAL LETTER ETA WITH OXIA
|
||||
# 1FD3 GREEK SMALL LETTER IOTA WITH DIALYTIKA AND OXIA
|
||||
# 1FDB GREEK CAPITAL LETTER IOTA WITH OXIA
|
||||
# 1FE3 GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND OXIA
|
||||
# 1FEB GREEK CAPITAL LETTER UPSILON WITH OXIA
|
||||
# 1FEE..1FEF [2] GREEK DIALYTIKA AND OXIA..GREEK VARIA
|
||||
# 1FF9 GREEK CAPITAL LETTER OMICRON WITH OXIA
|
||||
# 1FFB GREEK CAPITAL LETTER OMEGA WITH OXIA
|
||||
# 1FFD GREEK OXIA
|
||||
# 2000..2001 [2] EN QUAD..EM QUAD
|
||||
# 2126 OHM SIGN
|
||||
# 212A..212B [2] KELVIN SIGN..ANGSTROM SIGN
|
||||
# 2329 LEFT-POINTING ANGLE BRACKET
|
||||
# 232A RIGHT-POINTING ANGLE BRACKET
|
||||
# F900..FA0D [270] CJK COMPATIBILITY IDEOGRAPH-F900..CJK COMPATIBILITY IDEOGRAPH-FA0D
|
||||
# FA10 CJK COMPATIBILITY IDEOGRAPH-FA10
|
||||
# FA12 CJK COMPATIBILITY IDEOGRAPH-FA12
|
||||
# FA15..FA1E [10] CJK COMPATIBILITY IDEOGRAPH-FA15..CJK COMPATIBILITY IDEOGRAPH-FA1E
|
||||
# FA20 CJK COMPATIBILITY IDEOGRAPH-FA20
|
||||
# FA22 CJK COMPATIBILITY IDEOGRAPH-FA22
|
||||
# FA25..FA26 [2] CJK COMPATIBILITY IDEOGRAPH-FA25..CJK COMPATIBILITY IDEOGRAPH-FA26
|
||||
# FA2A..FA6D [68] CJK COMPATIBILITY IDEOGRAPH-FA2A..CJK COMPATIBILITY IDEOGRAPH-FA6D
|
||||
# FA70..FAD9 [106] CJK COMPATIBILITY IDEOGRAPH-FA70..CJK COMPATIBILITY IDEOGRAPH-FAD9
|
||||
# 2F800..2FA1D [542] CJK COMPATIBILITY IDEOGRAPH-2F800..CJK COMPATIBILITY IDEOGRAPH-2FA1D
|
||||
|
||||
# Total code points: 1035
|
||||
|
||||
# ================================================
|
||||
# (4) Non-Starter Decompositions
|
||||
#
|
||||
# These characters can be derived from the UnicodeData.txt file
|
||||
# by including each expanding canonical decomposition
|
||||
# (i.e., those which canonically decompose to a sequence
|
||||
# of characters instead of a single character), such that:
|
||||
#
|
||||
# A. The character is not a Starter.
|
||||
#
|
||||
# OR (inclusive)
|
||||
#
|
||||
# B. The character's canonical decomposition begins
|
||||
# with a character that is not a Starter.
|
||||
#
|
||||
# Note that a "Starter" is any character with a zero combining class.
|
||||
#
|
||||
# These characters are simply quoted here for reference.
|
||||
# See also Full_Composition_Exclusion in DerivedNormalizationProps.txt
|
||||
# ================================================
|
||||
|
||||
# 0344 COMBINING GREEK DIALYTIKA TONOS
|
||||
# 0F73 TIBETAN VOWEL SIGN II
|
||||
# 0F75 TIBETAN VOWEL SIGN UU
|
||||
# 0F81 TIBETAN VOWEL SIGN REVERSED II
|
||||
|
||||
# Total code points: 4
|
||||
|
||||
# EOF
|
||||
Reference in a new issue