Proposal: generics (and some other stuff) for Objective-C

Some time ago, Greg Parker asked the Twitternets what we’d like to see in a purely hypothetical Objective-C-without-the-C language. Someone — I believe it was Landon Fuller — pointed at an article about the Strongtalk type system for Smalltalk. I quite like the idea of Objective-C-without-the-C (i.e., a language that is native to the Objective-C object system and runtime without the baggage of C), but after reading that article I found myself asking why we couldn’t do something similar in Objective-C.

I don’t think my random musings have much influence on the design of the language, but if I don’t write it down nobody’s going to know how nuts I am, so here’s a semi-concrete proposal for contextual types and generics for Objective-C. Since anyone even mentioning generics in the vicinity of Objective-C will inevitably be flamed for trying to turn it into C++, this is followed by an aside entitled Why This is Not the Baby-eating Spawn of Bjarne Stroustrup. (Nothing personal, Bjarne.)

Types in Objective-C

A fundamental characteristic of Objective-C is that it has two separate type systems: the dynamic type system, which applies to objects, and the static type system, which applies to variables (which may or may not refer to objects).

As far as objects are concerned, the static type system is optional — you can refer to any object with the type id, except when calling a method whose type may be ambiguous. The static type system is also advisory — it suggests to programmers, the compiler and other tools such as the IDE and analyzer what the class of an object may be at runtime, but doesn’t constrain the object. A variable of type NSString * may actually refer to an NSArray at runtime, and method calls will dynamically go to NSArray’s implementation.

My proposal deals only with the static type system. The idea is to provide information that helps the compiler and analyzer check your logic, and the IDE to provide better suggestions. This is done by replacing id with more specific static types in most of the situations it’s used in. The generated code is not affected in any way. The proposal does not introduce bondage and discipline on the language; the new types can always be cast away.

Contextual Types

By far the most common use of id is as the return type for methods that may return an instance of “this” class, or of the subclass it’s called on. The obvious examples are +alloc and -init.

The IDE and, I believe, the analyzer already use heuristics to determine the return type of +alloc and -init, but I propose formalizing this in code. It would look something like this:

@interface NSObject <NSObject>
{
    Class   isa;
}

+ (void)load;

+ (void) initialize;
- ([:Self]) init;

+ ([:Self]) new;
+ ([:Self]) allocWithZone:(NSZone *)zone;
+ ([:Self]) alloc;

// ...

+ ([:Superclass]) superclass;
+ ([:Class]) class;
- ([:Superclass]) superclass;
- ([:Class]) class;

@end

When calling a class method, [:Class] resolves to the receiver (or, type-equivalently, a pointer to an instance the receiver’s metaclass), and [:Self] resolves to a pointer to an instance of the receiver. [:Superclass] resolves to the superclass of the receiver. For instances, they resolve as for class methods on the class of the instance.

I’m sure some people will object to the conceptual purity of this design, and possibly the names and syntax. The colon is a bit odd — it’s there to avoid ambiguity with generics (see below). All of these are minor quibbles; the syntax would need to be reviewed if actually implementing it.

So what’s the point? Consider the following code:

NSString *s = [[NSArray alloc] init];

As it stands, this is perfectly valid and doesn’t generate a compiler diagnostic. With the addition of contextual types it would, because:

  • The type of +[NSArray alloc] (inherited from NSObject) is [:Self], which resolves to NSArray *.
  • The receiver of the -init is thus known to be an NSArray.
  • The type of -[NSArray init] (inherited from NSObject) is [:Self], which again resolves to NSArray *.
  • Therefore, the right hand side of the assignment is of type NSArray *, and the assignment is invalid.

If, for some reason, you really wanted to do that, you could use an explicit cast to get rid of the diagnostic.

Generics

The other major use of id is for polymorphic collections. True polymorphic collections are great! But some of the time, you only want to put one kind of object in your collection, and would appreciate the computer doing the drudge work of checking that you didn’t put the wrong stuff in the wrong place.

From the perspective of the previous section, generics are a simple extension to contextual types. Instead of restricting you to [:Self] and the highly specialized [:Class] and [:Superclass], you can provide one or more class names as a parameter to a type declaration. Example time again:

@interface MyThingHolder[ThingType = id] : NSObject
{
    [ThingType]     thing;
}

- ([:Self]) initWithThing:([ThingType])thing;

- (void) setThing:([ThingType])thing;
- ([ThingType]) thing;

// Or, for modernists:
@property (readonly, nonatomic) [ThingType] thing;

@end

// ...
MyThingHolder[NSString] *holder = [[MyThingHolder[NSString] alloc] initWithThing:@"foo"];
holder.thing = [NSNumber numberWithBool:MAYBE];      // Warning: type mismatch
holder.thing = (id)[NSNumber numberWithBool:MAYBE];  // OK. (an issue here would be inconsistent with general Objective-C behaviour.)

// Or, equivalently:
typedef MyThingHolder[NSString] MyStringHolder;
MyStringHolder *holder = [[MyStringHolder alloc] initWithThing:@"foo"];

Some notes: the type parameter can only be a class, since specialized code is not generated for each type. (See Why This is Not the Baby-eating Spawn of Bjarne Stroustrup below.) Since the type parameter is always a class, I have made the * implicit. This is cleaner, but could well lead to confusion and is quite likely a bad idea. (If only I had a time machine…)

The type parameter has a default value, previously unheard of in Objective-C, so that you can ignore generics and create a “vanilla” MyThingHolder that works just like in traditional Objective-C. This provides an upgrade path for existing classes:

@interface NSArray[Item <NSObject> = id <NSObject>] : NSObject <NSCopying, NSMutableCopying, NSCoding, NSFastEnumeration>

- (NSUInteger) count;
- ([Item]) objectAtIndex:(NSUInteger)index;

@end

@interface NSArray[Item] (NSExtendedArray)

- (NSArray[Item] *) arrayByAddingObject:([Item])anObject;
- (NSArray[Item] *) arrayByAddingObjectsFromArray:(NSArray[Item] *)otherArray;
// ...
- (BOOL)containsObject:([Item])anObject;
// ...
+ ([:Self[Item]]) arrayWithObject:([Item])anObject;
// ...

@interface NSDictionary[Key <NSCopying, NSObject> = id <NSCopying, NSObject>, Value <NSObject> = id <NSObject>] : NSObject <NSCopying, NSMutableCopying, NSCoding, NSFastEnumeration> - (NSUInteger) count; - ([Value]) objectForKey:([Key])aKey; - (NSEnumerator[Key] *)keyEnumerator; @end @interface NSDictionary[Key, Value] (NSExtendedDictionary) - (NSArray[Key] *) allKeys; - (NSArray[Key] *) allKeysForObject:([Value])anObject; - (NSArray[Value] *) allValues; // ... - (BOOL) isEqualToDictionary:(NSDictionary[Key, Value] *)otherDictionary; - (NSEnumerator[Value] *) objectEnumerator; - (NSArray[Value] *) objectsForKeys:(NSArray[Key] *)keys notFoundMarker:([Value])marker; // ... - (void) getObjects:([Value] *)objects andKeys:([Key] *)keys; // ...

These examples introduce an additional concept: restrictions on type parameters, in this case protocol requirements. The other obvious type of restriction would be a superclass requirement, such as [Type: NSString = NSString].

Using these genericized versions in the same manner as the existing versions would require no code changes – as long as you’re using them with objects that fulfil the protocol requirements, which already exist but aren’t explicit in code.

Type Conformance

A fundamental question for static type systems with inheritance is when implicit casts are allowed. There are subtleties here, some of which I probably haven’t considered, and I have a feeling I’ve considered some and then forgotten about them. The most obvious case is when the base type and each type parameter could be validly cast:

NSMutableArray[NSMutableString] *a = whatever;
NSMutableArray *b = a;            // OK, b is NSMutableArray[id <NSObject>]
NSMutableArray[NSString] *c = a;  // OK
a = b;                            // Not OK, implicit cast from NSMutableArray[NSString] to NSMutableArray[NSMutableString]
NSArray[NSMutableString] *d = a;  // OK
NSArray[NSString] *e = a;         // OK
id f = a;                         // OK

As indicated above, passing an id when a parameter type is expected is necessary for consistency with Objective-C in general:

NSMutableArray[NSString] *a = whatever;
[a addObject:[NSNumber numberWithInt:42]];      // Type mismatch, assuming +numberWithInt: is declared to return [:Self]
[a addObject:(id)[NSNumber numberWithInt:42]];  // OK

How about the case where a method returns an unadorned type?

@interface LegacyThing: NSObject
- (NSArray *) legacyListOfStrings;
@end

LegacyThing *l = whatever;
NSArray[NSString] *list = [l legacyListOfStrings];

In order to minimize the burden of adopting generic syntax, I’d suggest explicitly permitting this, with an optional warning. (The non-generic equivalent, assigning an id <NSObject> to an NSString *, generates the somewhat unexpected warning “type ‘id <NSObject>’ does not conform to the ‘NSCopying’ protocol” in GCC and nothing in Clang. Assigning an NSObject * to an NSString * generates warnings in both. My proposal is that assigning a T to a T[P] should work without warning by default, even if the default type parameter is a class rather than id with or without protocols.)

Why This is Not the Baby-eating Spawn of Bjarne Stroustrup

Many Objective-C programmers are refugees from the blasted wasteland of C++, and will reflexively cringe at the similarity with templates:

std::vector<Duck> ducks;
Chicken chicken;
ducks.push_back(chicken);             // Error
ducks.push_back(*(Duck *)chicken);    // Horrible crash here or some time in the future, maybe.

NSArray[Duck] *ducks = [NSArray new];
Chicken *chicken = [Chicken new];
[ducks addObject:chicken];            // Warning: type mismatch
[ducks addObject:(Duck *)chicken];    // No problem, unless you call -quack on it.

While the difference should hopefully be clear by now, I’ll spell it out: in C++, std::vector<Duck> creates an entirely new class (with bits of Duck inlined into it). The Objective-C-with-generics version only provides hints to the compiler, so it can catch mistakes. It doesn’t stop you from putting chickens among your ducks, or make the duck array reject chickens at runtime, or generate a new array-of-chickens class.

In earlier discussions of generics, it has been stated that this type of mistake is rare in practice and should be caught with unit tests. If you feel that way, you’re welcome to stick to your current approach, and the introduction of generics, as described, will not affect you.

To avoid ballooning side effects of generics, I’ve deliberately avoided suggesting generic functions and methods (i.e., ones whose return type is dependent on one or more argument types, independent of their class in the case of methods).

One Last Thing

An effect of the above is that I want the language to contain types like NSArray[NSMutableDictionary[NSString, NSSet[NSMutableArray[NSNumber]]]] *. That doesn’t mean I want to spend my time typing NSArray[NSMutableDictionary[NSString, NSSet[NSMutableArray[NSNumber]]]] *, even with autocomplete. Fortunately, there exists a well-known solution to this problem: type inference.

If I access a member of an array of the aforementioned sesquipedalian type, I get an NSMutableDictionary[NSString, NSSet[NSMutableArray[NSNumber]]] *. What’s more, the compiler knows this. The benefit of static typing lies primarily in checking what I do with my dictionary, rather than checking that the item I retrieved is what I thought it was, so I should be able to ask it to type a variable appropriately.

I quite like C++1x’s solution of recycling the auto keyword for this use, but that would conflict with Objective-C’s goal of being a strict superset of C. There are various other choices, such as var or any — or maybe ego. In any case, type inference would lead to code like:

typedef NSArray[NSMutableDictionary[NSString, NSSet[NSMutableArray[NSNumber]]]] MyHorribleNestedType;  // TODO: replace with sensible model classes.

MyHorribleNestedType *array = whatever;
var element = [array objectAtIndex:0];              // Type is inferred as NSMutableDictionary[NSString, NSSet[NSMutableArray[NSNumber]]]*
var subEntry = [NSArray arrayWithObject:@"bloop"];  // Type is inferred as NSArray[NSString]
[element setObject:subEntry forKey:@"moop"];        // Type mismatch: expected NSSet[NSMutableArray[NSNumber]], got NSArray[NSString]

A subtlety here: the return type of [NSArray arrayWithObject:@"bloop"] (declared previously) is [:Self[Item]], which is inferred from the receiver (the NSArray class object) and argument to resolve to NSArray[NSString]. Instead of typedefing MyHorribleNestedType, we could have constructed the type implicitly in the same way. like everything else in the proposal, this is optional; if you used id or explicit, unparameterized types in this example, the type mismatch would not be detected, but the generated code would be identical.

Summary

  • The id type is very powerful and flexible. However, most of the time you don’t need this flexibility, and type checking is helpful — this is why Objective-C has optional static type checking in the first place, and almost all Objective-C code opts in to it.
  • The proposal would not impose any new restrictions, or any new guarantees. It would only extend (optional) static type checking to cases which are currently not covered.
  • It would not involve any implicit code generation, and no runtime overhead.
  • Combined with type inference, it could catch errors without additional code.
  • There would be no new collections, and no implementation changes to existing ones other than updated type declarations. (Updated headers would work with the existing implementations.)
  • If you really don’t want static type checking, you can continue using id everywhere.
  • I mean it about the “optional”. The proposed change would have no effect unless adopted both by a class and its clients. A parameterized collection pointer can be cast to plain one at any time with no cost.
This entry was posted in Cocoa, Code and tagged , , , , , . Bookmark the permalink.

10 Responses to Proposal: generics (and some other stuff) for Objective-C

  1. I really like it, actually! Then again, I’m starting to warm up to C++ again… Maybe I’m just mad. Really like that ‘auto’ keyword too, that’ll ease C++ coding.

  2. Uros Dimitrijevic says:

    Interesting. Sort of like Java generics but with the flexibility of Objective-C. Although one thing that I can’t seem to ignore is the complexity this would add to the language, however negligible its impact would be on backwards-compatibility.

    Oh, and on calling the inferred type ‘ego’: priceless. (Does that mean I wasn’t mistaken in not pronouncing the ‘id’ type ‘eye-dee’?)

  3. mikeash says:

    I like the contextual types proposal. This sort of thing has been annoying, and occasionally destructive, for quite a while. I hope Apple adopts something like this.

    I don’t really get the generics proposal, though. Mistakes due to forgetting what sort of object are in a collection are, in my experience, rare and easily caught. Very, very occasionally you run into problems because -objectAtIndex: returns id and this results in type confusion, but with for/in loops or just sane use of temporary variables this doesn’t really happen. Is there a specific use case beyond this that I’m missing?

  4. Jesper says:

    mike: The types in generics are useful for exactly the same reason the types everywhere else are useful.

  5. Jens Ayton says:

    Well, I certainly can’t beat Jesper at succinctness.

    To quote myself: “The idea is to provide information that helps the compiler and analyzer check your logic, and the IDE to provide better suggestions.” Exactly like static typing in general, then.

    For example, if I have a heterogeneous dictionary, and type [[someDict objectForKey:key] st<esc>], I want relevant suggestions. If I use a method that the value type doesn’t have, I want to be warned about it. If you don’t see these as advantages, why don’t you use id everywhere?

    The “ just sane use of temporary variables” thing sounds to me like you’re desensitized to a workaround.

  6. Jesper says:

    I had a list of things to put after that sentence, but I decided not to; everything there could be argued against with something that could be answered with that sentence. (For better or worse.) Now I’m going to spell it out anyway, though, since we really may not be on the same page.

    *Usually*, the temporary variable is not a problem. You’re right that it’s not the end of the world. Sometimes you end up with treacherous pairs. `length` exists on both NSString and on NSData, and `count` on a bunch of different collections. (They happen to have the same semantics, but that’s not why they’re treacherous, it’s because they block the compiler warning or error from happening.)

    In some cases you change a data type somewhere and the code you used for removing/enumerating objects from one collection and inserting into another either doesn’t do any logic on the object or happens to do the exact operation that’s sort of available on both, and you end up pushing an object of the entirely wrong kind of class somewhere else, and you have to go hunt for the bug.

    Like Ayton said, ask yourself again why you use the temporary variable. Casting (or even just plain assignment) certainly doesn’t convert the object or throw if it fails. It’s there for documentation for your eyes, and to enable Code Sense. If that can be done with fewer lines of code, help keep the variables down to the ones you actually want to have around and last but not least add this metadata to the interface in the first place (we’ve been talking about implementation), I don’t see why it’s not worthwhile.

  7. Jens Ayton says:

    “…add this metadata to the interface” is a good phrase. Every now and then, I read or write code like:

    - (NSArray *) frobs; // Array of JAFrob

    This is suboptimal for – once again – exactly the same reason - (id) frobs; // NSArray of JAFrob would be. The genericized form, - (NSArray[JAFrob] *) frobs; gives you the whole scoop in one place consistently with non-container methods.

  8. Ondra Hošek says:

    I might be a bit late to the fray, but calling the infer-me type “var” would break ObjC’s strict-superset-ness, since I’d not be able to declare an “int var;” anymore. This is very probably the reason why every ObjC keyword is prefixed with an @ sign.

  9. Jens Ayton says:

    It wouldn’t break supersetosity if var could be redefined, somewhat analogous to the way self can be used in non-method contexts. But that would cause new problems, so I agree – an @-keyword would be better, yet uglier. (It could perhaps be defined to a normal identifier in a standard header, just as id is typedefed, or bool is in C99.)

  10. Karyo says:

    Generics are here but we still need typedef StringArray NSArray* type definitions

Leave a Reply

Your email address will not be published. Required fields are marked *