Parsing Unicode property data files
❘ TextPicker Patterns
When developing the word boundary recogniser for the Patterns framework, I needed to access data in Unicode property data files from Swift. These files look something like this:
# Total code points: 88
# ================================================
0780..07A5 ; Thaana # Lo [38] THAANA LETTER HAA..THAANA LETTER WAAVU
07B1 ; Thaana # Lo THAANA LETTER NAA
For all the Unicode property data files you could possibly want, see here and here.
where the hexadecimal numbers/ranges at the beginning of the line and the property name (“Thaana”) are the interesting parts. So we need to find all hexadecimal numbers that are at the beginning of a line – optionally followed by “..” and another hexadecimal – followed by spaces, a semi-colon, a single space, the property name and another space.