Object-Oriented Brain Damage

Posted on 2022-04-05 :: Tags: OOP, Object-Oriented Programming, ORM, Principle of Locality, Python, Performance, Software Engineering

So, this is the post that I’ve been thinking about for some time. I’m still surprised that it feels like swearing in church; it shouldn’t be this hard (and this rare) to criticize object-oriented programming.

What do I mean by OOP? Mainly the classical style, where you define classes with members and methods and use them to model the solution you’re working on. It was supposed that this way was superior to the procedural style, where you write procedures that modify global state. I believe the rage against procedural programming was because of this global state. Somehow, in all domains, we have to maintain state, and having a global state makes the reusability of procedures/functions almost impossible.

I can understand how the reaction against global state led to something like Java’s “everything is a class” creed. In order to contain global state, you need classes that contain parts of it and interact through methods. The idea is simple: if we forbid global state and keep partial state within structs (or classes), and mutate it with methods, we get reusable software. Once upon a time, I believed in this, too.

In an ideal world where we could write software in Smalltalk instead of C++, I’d probably not write this post. Actually, this post isn’t about C++ or Java, either. They have their places; they solve real problems and were probably necessary steps to arrive at Go or Rust. We now have better ideas thanks to the “everything should be a class” worldview.

The problem, in my experience, is applying these classical classes to interpreted languages. In C++, although it became like a Leviathan with arms for many different paradigms, there is some care for the cost of abstraction. If you know the tool, you can get away with classes. Java also seems to try to make abstractions cost zero. So if you use these languages, it’s a matter of modeling, habit, or domain suitability. I still think, over the long term, OOP increases the maintenance burden, but like most ideas in our profession, this is not rigorously tested.

However, when it comes to multi-paradigm interpreted languages like Python, objects begin to hurt. The problem, in my opinion, is that object-oriented design is in conflict with the Principle of Locality.

The Principle of Locality (PoL) is probably the most important empirical idea in software design. It’s directly related to the Pareto Principle, Zipf’s Law, and other well-known notions. Our CPUs, disks, search engines, and content delivery networks depend on it to cache artifacts for reuse. We know it works because a CPU with a larger L1 cache is faster; we add more caches to CPUs and disks to increase their speed.

The conflict between the PoL and OOP arises when classes include data that is not directly related to the problem at hand. The problem at hand, for example, might be finding the maximum of one million numbers or performing transformations on some variables. But if these are members of a class, they bring unusable data to the locality. If the variable I’m transforming is a member of a class, when I access it through the object, all other members—be they 3, 30, or 300—are also referenced. Thus, looping over objects ends up polluting the locality with all other members of the class.

In compiled languages, the advantage of iterators over plain loops is to overcome this. However, in interpreted languages, no one writes specific iterators for their member variables. This means when I have:


for obj in my_objects:
    obj.do_something(a)

All the members of obj are now in the loop. This is against the PoL.

Another problem I see with OOP is its alienation from the basic elements of software. When you begin to work with objects that do not directly correspond to anything in computer/software architecture, it becomes more or less a castle in the clouds. You have to tie this castle—i.e., the class hierarchy—somehow to the ground. The ground is the CPU, memory, disk, cache, etc. Compilers may do a good job of this, or they may not, but this detachment from the basic elements of computing machinery is the cause of what I call Object-Oriented Brain Damage.

When you begin to believe that the objects in your program are real, you try to express your problem using more objects. Classes are like kipple. They proliferate all the time. You add classes, then you add classes to create classes, then you add some base class to derive classes, then you try to fix these problems with patterns, and so on.

But this whole enterprise doesn’t have anything to do with the real tools we have. Our tools are processors, memory, disks, screens, printers, etc. When we detach the problem from these basics, it doesn’t become more solvable. We only create an ideal version of our understanding that requires additional attention to teach others, to document, etc.

It’s true that we need abstractions over the tools—we cannot just read bytes from disk, process in CPU, and print to screen. These all must be abstracted. But in my experience, OOP is not a good way to abstract these tools. Instead, it tries to abstract the problem at hand in an arbitrary way that is supposed to solve the problem. Yet this solution itself becomes a problem that must be fit onto the ground.

Databases and ORMs are good examples of this. Databases and Entity-Relationship theory are well-understood abstractions. We know how to use them across multiple CPUs, multiple disks, and multiple machines across multiple continents. ORMs are not like this; they are supposed to correspond to databases and provide that cozy object-oriented feeling. However, any system that depends on ORMs learns that these are not identical to databases. They don’t have the same capabilities and performance, and over the long run, depending on an ORM causes more problems than just writing SQL queries. Because an ORM does not consider the tools we have and their limitations, it tries to fit an idealistic world view onto databases. When it doesn’t scale, we think this performance degradation is natural.

No, it’s not natural. When your models load whole rows from billions of records just to access a single field for a calculation, you deplete the cache space quickly, and it doesn’t scale. OOP might just be an educational tool, but even in this regard, I believe it causes most of the brain damage we see in enterprise software.