Autogenerate the BCP 47 to OpenType mappings

System Internals / HarfBuzz - David Corbett [husky.neu.edu] - 11 October 2018 17:54 EDT

The new script, gen-tag-table.py, generates `ot_languages` automatically from the [OpenType language system tag registry][ot] and the [IANA Language Subtag Registry][bcp47] with some manual modifications. If an OpenType tag maps to a BCP 47 macrolanguage, all the macrolanguage's individual languages are mapped to the same OpenType tag, except for individual languages with their own OpenType mappings. Deprecated BCP 47 tags are canonicalized.

[ot]: https://docs.microsoft.com/en-us/typography/opentype/spec/languagetags [bcp47]: https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry

Some OpenType tags correspond to multiple ISO 639 codes. The mapping from ISO 639 codes lists OpenType tags in priority order, such that more specific or more likely tags appear first.

Some OpenType tags have no corresponding ISO 639 code in the registry so their mappings use BCP 47 subtags besides the language. For example, any BCP 47 tag with a fonipa variant subtag is mapped to 'IPPH', and 'IPPH' is mapped back to und-fonipa.

Other OpenType tags have no corresponding ISO 639 code because it is not clear what they are for. HarfBuzz just ignores these tags.

One such ignored tag is 'ZHP ' (Chinese Phonetic). It probably means zh-Latn. However, it is used in Microsoft JhengHei and Microsoft YaHei with the script tag 'hani', implying that it is not a romanization scheme after all. It would be simple enough to add this mapping to gen-tag-table.py once a definitive mapping is determined.

The manual modifications are mainly either obvious mappings that the OpenType registry omits or mappings for compatibility with previous
versions of HarfBuzz. Some of the old mappings were discarded, though, for homophonous language names. For example, OpenType maps 'KUI ' to kxu; previous versions of HarfBuzz also mapped it to kvd, because kvd and kxu both happen to be called "Kui".

gen-tag-table.py also generates a function to convert multi-subtag tags like el-polyton and zh-HK to OpenType tags, replacing `ot_languages_zh` and the hard-coded list of special cases in `hb_ot_tags_from_language`. It also generates a function to convert OpenType tags to BCP 47, replacing the hard-coded list of special cases in `hb_ot_tag_to_language`.

2f1f961c Autogenerate the BCP 47 to OpenType mappings
src/Makefile.am | 9 +-
src/Makefile.sources | 1 +
src/gen-tag-table.py | 1013 ++++++++++++++++++++++++
src/hb-ot-tag-table.hh | 1997 ++++++++++++++++++++++++++++++++++++++++++++++++
src/hb-ot-tag.cc | 842 +-------------------
src/hb-ot-tag.h | 2 +-
test/api/test-ot-tag.c | 66 +-
7 files changed, 3092 insertions(+), 838 deletions(-)

Upstream: cgit.freedesktop.org


  • Share