A couple of weeks ago, Jean-Francois Roy twote:
Format strings should be localizable. But that opens you to format string exploits. No win?
— Jean-Francois Roy (@jfroy) 6 januari 2013
Actually, there’s a conceptually simple win: don’t let your format strings determine the byte-level interpretation of data. Especially without bounds.
I mean, duh. Who’d do something so obviously stupid? Well, unfortunately, the C standard library does, and its example has been followed all too many times. I contend that printf()
-style formatting is broken and its use should be considered a bug.
In order for this idea to have any chance of taking off, there needs to be a replacement which is at least as convenient as printf()
-style formatting. That gives us three goals: safety, convenience, and easy internationalization. To get the ball rolling, I have prototyped such a thing for Objective-C, called JATemplate. It looks like this:
NSUInteger quantity = 1057; NSString *item = @"apples"; NSString *message = JATExpand(@"Number of {item}: {quantity}.", item, @(quantity));
Given an English locale, this expands to Number of apples: 1,057. In a Swedish locale, given an appropriate Localizable.strings file, it might produce Antal apples: 1 057. Oh, well. Implicit i18n can’t do everything for you, so let’s try this:
NSUInteger quantity = 1057; NSString *item = JATExpand(@"apples"); NSString *message = JATExpand(@"Number of {item}: {quantity}.", item, @(quantity));
It works, but, always using the plural leads to unnecessarily stilted language. Let’s fix that:
NSUInteger quantity = 1057; NSString *item = JATExpand(@"apple{quantity|plural:s}", @(quantity)); // If we were dealing with birds, we might write @"{quantity|plural:goose;geese}". NSString *message = JATExpand(@"We have {quantity} {item}.", item, @(quantity)); // sv.lproj/Localizable.strings "We have {quantity} {item}." = "Vi har {quantity} {item}."; "apple{quantity|plural:s}" = "äpple{quantity|plural:n};"
Vi har 1 057 äpplen.
For reference, here’s an equivalent using Foundation string formatting:
NSUInteger quantity = 1057; NSString *item; if (quantity == 1) { item = NSLocalizedString(@"apple", @"Singular name for item "apple"."); } else { item = NSLocalizedString(@"apples", @"Plural name for item "apple"."); } NSString *formattedNumber = [NSNumberFormatter localizedStringFromNumber:@(quantity) numberStyle:NSNumberFormatterDecimalStyle]; NSString *format = NSLocalizedString(@"We have %1$@ %2$@.", @"We have <quantity> <item>."); NSString *message = [NSString stringWithFormat:format, formattedNumber, item];
Well, near equivalent. For many languages, English-like pluralization is insufficient. JATemplate’s powerful plur:
operator (modelled on Mozilla’s plural-handling system) supports many more languages with no extra code.
What we’ve seen so far
- Templates can refer to variables by name, but only variables specified at the call site. (There is also a positional syntax –
{0}
,{1}
etc. – for referring to expressions.) Variable names inside boxing expressions can be used (@(quantity)
). When using names, the order variable are specified in doesn’t matter (third example). - By default, templates are localized through Localizable.strings. (There are variants that take a string file name and optionally a bundle, as with
NSLocalizedString()
, andJATExpandLiteral()
which doesn’t attempt to localize the template.) - Variable names or indices can be followed by a pipe and a formatting operator, which may take arguments. (Formatting operators can be chained:
{number|round|num:spellout|uppercase}
turns 41.5 into FORTY-TWO.) - By default, numbers are formatted in a locale-sensitive way using
NSNumberFormatterDecimalStyle
. (There are operators for various other formatting options, includingnum:noloc
for “computer-style” numbers,num:memorybytes
andnum:filebytes
wrappingNSByteCountFormatter
, and arbitraryNSNumberFormatter
format strings can be used as well.)
Here is a more complex example, from a unit test:
unsigned myKittenCount = 1; unsigned myGooseCount = 7; unsigned yourKittenCount = 3; unsigned yourGooseCount = 1; NSString *expansion = JATExpand(@"I have {myKittenCount|num:spellout} kitten{myKittenCount|plural:s}. You have {yourKittenCount|num:spellout} kitten{yourKittenCount|plural:s}. I have {myGooseCount|num:spellout} {myGooseCount|plural:goose;geese}. You have {yourGooseCount|num:spellout} {yourGooseCount|plural:goose;geese}.", @(myKittenCount), @(myGooseCount), @(yourKittenCount), @(yourGooseCount));
It expands to I have one kitten. You have three kittens. I have seven geese. You have one goose.
What we’ve not seen
There are several operators I haven’t mentioned. They’re listed in the header. There are also some glaring holes.
All template parameters must be objects, and conform to the JATCoercable
protocol (which is implemented for all NSObject
s). This is made palatable by explicit support for the newish boxing syntax for numbers and C strings.
There are a few convenience wrappers: JATLog()
calls JATExpandLiteral()
and then NSLog()
, JSTAssert()
and JSTCAssert()
wrap NSAssert()
and NSCAssert()
, and JSTAppend()
appends an expanded template to a mutable string. There’s also an “unwrapper”, JATExpandWithParameters()
, which takes a dictionary of parameter values instead of a direct list. These have FromTable
and FromTableInBundle
variants where appropriate.
Custom formatting operators are easily implemented using selectors matching - (id)jatemplatePerform_{operator}_withArgument:(NSString *)argument variables:(NSDictionary *)variables
. The built-in operators are implemented in a category on NSObject
and use JATemplate-specific coercion methods to convert values to strings, numbers or booleans as necessary. For example, the uppercase
operator looks like this:
- (id)jatemplatePerform_uppercase_withArgument:(NSString *)argument variables:(NSDictionary *)variables { NSString *value = [self jatemplateCoerceToString]; if (value == nil) return nil; return [value uppercaseStringWithLocale:[NSLocale currentLocale]]; }
Literal braces can be escaped by doubling them: JATLog(@"int main() {{ printf(\"Hello, {planet}\"; }}", planet);
The guts
So, is this a horrible hack? Well, by the standards of typical day-to-day code, probably. But in the correct frame of reference, which is “compared to printf()
-style formatting”, no. Our very first example expands to:
X(@"Number of {item}: {quantity}.", nil, nil, @"item, @(quantity)", (__strong id<JATCoercable> []){ item, @(quantity) }, 2); // Where X is really JAT_DoLocalizeAndExpandTemplateUsingMacroKeysAndValues, which is quite long.
Compared to “Oh, and there’s an unspecified amount of junk on the stack starting over here”, this is pretty clean.
The first argument is the template. The two nil
s are for different localization variants. The fourth argument is the “name string”, which contains the preprocessed parameter list; this is used to resolve named expansions to variable names or boxed variable names. The fifth argument is an array of object pointers initialized with the parameter values. The final argument is the number of parameters.
Given this, the template expansion itself is simple enough. The template is localized using NSBundle
. A parameter dictionary is built, with NSString
keys for the plain and boxed identifiers found in the name string (other expressions are skipped) and NSNumber
keys for each parameter’s position. In the example, the dictionary is equivalent to @{ @"item": item, @"quantity": @(quantity), @0: item, @1: @(quantity) }
.
The template and parameter dictionary are then handed off to JATExpandLiteralWithParameters
, which implements a rather boring single-pass parser.
Safety
Because the parameter list is passed as an array, it is type-safe, or at least as type-safe as object pointers in general. In ARC code, that’s pretty strong; you get a non-optional error if you pass anything other than an object pointer or a literal zero.
Because the preprocessor counts the parameters for us, the template can’t refer to an out-of-bounds positional argument. Variable names are parsed from the name string; if a template refers to an invalid name, the expansion simply fails. There is no equivalent of printf()
’s dangerous %n
operator – although, to be fair, Foundation doesn’t implement it either.
The prototype is not battle-hardened, and the ad-hoc parsers probably have exploitable bugs, but I feel that the goal of avoiding the type of architectural vulnerabilities printf()
-style formatting exemplifies has been met.
Future directions
- Did I mention it’s a prototype? The core parser is a mess. If anyone’s actually interested in this it needs to be rewritten, and tested more adversarially.
- The set of built-in formatting operators is somewhat arbitrary. There should be a
date:
operator, for both locale-aware human-readable and ISO date formatting. Some thought should go into getting it right. There should be something for column formatting (mostly for logging). It should be possible to use operators to do most things you can do withprintf()
directives. At the same time, I don’t want to add much in the way of logic capabilities to the system; this is a replacement for+[NSString stringWithFormat:]
, not MGTemplateEngine. - Although I’ve made a point of keeping the preprocessor use in JATemplate relatively simple – the scariest part is the macro soup that counts parameters, which is really just a sanity check – it could benefit significantly from nastier preprocessor stuff in the style of RXFold. Firstly, the preprocessor could give us tokenized names instead of a single string, which would move work from JATemplate at runtime to Clang at compile time. Secondly, we could use
__attribute__((overloadable))
functions to auto-box values, including structs on a per-type basis. I may start a separate branch for this “nasty mode”. - The approach using transparent boxing with overloadable functions could also be used in clang-C and “old” C++. For C++11, one would hope that variadic templates would do the job, but I haven’t played with those yet.
Code is on GitHub (MIT/X11 license).
Changes
2013-03-21: removed overly bold claims about plural:
operator.
2013-03-22: added reference to new, shiny plur:
operator.
Pingback: Michael Tsai - Blog - JATemplate