Problem
How do I extract the list of supported Unicode characters from a TrueType or embedded OpenType font on Linux?
Is there a program or package I can use to analyze a.ttf or.eot file and generate a list of code points provided by the font (such as U+0123, U+1234, and so on)?
Asked by Till Ulen
Solution #1
Here’s an example of how to use the FontTools module (which you can install with pip install fonttools):
#!/usr/bin/env python
from itertools import chain
import sys
from fontTools.ttLib import TTFont
from fontTools.unicode import Unicode
ttf = TTFont(sys.argv[1], 0, verbose=0, allowVID=0,
ignoreDecompileErrors=True,
fontNumber=-1)
chars = chain.from_iterable([y + (Unicode[y[0]],) for y in x.cmap.items()] for x in ttf["cmap"].tables)
print(list(chars))
# Use this for just checking if the font contains the codepoint given as
# second argument:
#char = int(sys.argv[2], 0)
#print(Unicode[char])
#print(char in (x[0] for x in chars))
ttf.close()
The font path is passed as an argument to the script:
python checkfont.py /path/to/font.ttf
Answered by Janus Troelsen
Solution #2
This can be done with the X software xfd. Run the following command to show all characters for the “DejaVu Sans Mono” font:
xfd -fa "DejaVu Sans Mono"
It’s included in Debian/x11-utils Ubuntu’s package, Fedora/xorg-x11-apps RHEL’s package, and Arch Linux’s xorg-xfd package.
Answered by Spencer
Solution #3
According to fontconfig, fc-query my-font.ttf will return a map of supported glyphs as well as all the locales the font is appropriate for.
This is considerably more useful than a raw unicode list because almost all recent linux apps are fontconfig-based.
http://lists.freedesktop.org/archives/fontconfig/2013-September/004915.html discusses the actual output format.
Answered by nim
Solution #4
The glyph list can be output as a compact list of ranges using the fontconfig commands, for example:
$ fc-match --format='%{charset}\n' OpenSans
20-7e a0-17f 192 1a0-1a1 1af-1b0 1f0 1fa-1ff 218-21b 237 2bc 2c6-2c7 2c9
2d8-2dd 2f3 300-301 303 309 30f 323 384-38a 38c 38e-3a1 3a3-3ce 3d1-3d2 3d6
400-486 488-513 1e00-1e01 1e3e-1e3f 1e80-1e85 1ea0-1ef9 1f4d 2000-200b
2013-2015 2017-201e 2020-2022 2026 2030 2032-2033 2039-203a 203c 2044 2070
2074-2079 207f 20a3-20a4 20a7 20ab-20ac 2105 2113 2116 2120 2122 2126 212e
215b-215e 2202 2206 220f 2211-2212 221a 221e 222b 2248 2260 2264-2265 25ca
fb00-fb04 feff fffc-fffd
For a.ttf file, use fc-query, and for an installed font name, use fc-match.
This is unlikely to necessitate the installation of additional programs or the translation of a bitmap.
To see if the proper font is being matched, run fc-match —format=’ percent filen’.
Answered by Neil Mayhew
Solution #5
The CMAP database stores the character code points for a ttf/otf font.
You can build an XML representation of the CMAP table with ttx. take a look here
You can use the command ttx.exe -t cmap MyFont.ttf to create a MyFont.ttx file. It should display you all the character code it detected in the typeface if you open it in a text editor.
Answered by wschang
Post is based on https://stackoverflow.com/questions/4458696/finding-out-what-characters-a-given-font-supports