Strings: case sensistive custom implementations

From Unify Community Wiki
Jump to: navigation, search

Author: Serge Billault

The following tips are not meant to be absolutes but rather guidelines for Unity users that want to produce well performing code.


Sometimes the need to implement custom case conversions for custom string classes arise. It is not for us to judge if the intiative is well founded but we can attract
the attention on some common pitfalls encountered by those who attempt it for the first time.

In the 80s/90s, when we were cracking video games we would often notice in the BSS section of the executables some recuring data structures that were resembling
a 256 characters table hosting a variation of the alphabet. These are conversion tables. Conversion tables are used whenever a direct match exist between
a data to convert and the converted data using direct indexing. That's how we were so fast on systems running at 8 Mhz.

In the particular case of string case convertions, the same principle can be applied to printable letters using a conversion table. Such tables also exist for
conversion from integral values to hexadecimal literals.


Hosted in the BSS section of an executable (static datas), this kind of section had to be carefully built using some compiler directives so as to prevent cache misses
on early processors (the adress of the table is such that the processor has to to perform a cache invalidation to retrieve it). John Carmack was an early adept of telling
the compiler how to structure the sections of the executable so that there would be as few cache misses as possible. Today' compilers are so evolved that the need for
such optimizations is almost gone and, any way, common modern developers dont even care any more due to current processors speed.


Table commonly used for case convertions and ignore-case comparisons.

1) In the case of ignore-case comparisons we simply compare TO_CASE[ left_char ] to TO_CASE[ right_char ] with TO_CASE being either TO_UPPER or TO_LOWER.
2) In the case of string case conversion, char becomes either TO_UPPER[ char ] or TO_LOWER[ char ].

Case table conversion.jpg

Personal tools