Square Abstractions

Managing complexity is at the heart of Software Engineering, and abstraction is the tool by which we accomplish this. But what do our abstractions look like, and how should we judge them?

Abstractions should be square.

Or cubic. Possibly n-dimensional hypercubes. But not rectangles. And lines are right out.

In 1956, G.A. Miller wrote a classic psychology paper entitled 'The Magical Number Seven, Plus or Minus Two: Some Limits on our Capacity for Processing Information'. The far-reaching conclusion of this paper is that in uni-dimensional data-sets, humans have a typical classification capacity of between 2 and 3 bits - between 4 and 8 items. How does this apply to software abstraction? It gives us a quantitive key to determining whether an abstraction (which implies a reduction in complexity) is of sufficient quality. It also gives us a clue to resolving the issue of abstractions still retaining too much complexity: add another dimension.

By square abstractions, I mean that a good set of abstractions in the software domain, from an arbitrarily complex starting point to the most understandable abstraction of that idea, should have approximately equal complexity in each dimension. If the result is that each (and all, since we have decreed equality) dimension of abstraction is still too complex, we must re-dimension, refactor, and re-abstract.

Soap bubbles form perfect spheres not just because they find it aesthetically pleasing, but because they are most comfortable like that. It takes the least effort. In software we should similarly strive to find the solution which satisfies the constraints with the least energy. Spheres might be nature's solution, but in software we tend to seek orthogonal abstractions - leading to squares, cubes, hypercubes, and so-on.

Getting practical for a moment, remember that every program, library, and API is an abstraction. An application containing a single 100,000 file (yes, really...) might be perfectly good internally, but is missing out on a key abstraction in terms of translation units, modules, whatever else maps to files. So split it into one-hundred 1000 line files - we've added a dimension and reduced the maximum uni-dimensional complexity. But we should continue - 100 is more than an order of magnitude greater than our magic 7 plus or minus 2. Directories, packages, folders: another level of abstraction. And because we are being square, we aim to have approximately 10 directories with 10 files in each. This stretches 7 +/- 2, but not sufficiently that any more abstraction would necessarily be helpful - adding a dimension has a cost too. Why 100 files of 1000 lines, and not 316 files of 316 lines? Because not all abstractions have the same cost, and we can apply additional abstractions within those files. Like, um, classes, methods and functions. Back in the earlier days of computing, the concept of procedural programming was a major advance. Since then, we've invented many more abstractions, and although some might seem unhelpful (for example Java's one-to-one file-class mapping, missing out on the ability to have an additional layer of abstraction with several classes in one file), the general concept of abstractions to reduce complexity is in full swing.

So next time you (or I) think about adding that 100th method to our widget API, think about adding a new dimension instead. And if it isn't obvious what that new dimension might be, then get creative and invent something new.

I guess my point here is that we shouldn't stop - thinking we've got there because we have OOP / generic / whatever paradigm programming now, but keep looking for new abstractions, using those we have, borrowing from the best of other languages, and somehow eventually to tame the monster of software complexity.

Well, I can dream.