Don’t optimize yet ?

Premature optimization is the root of all evil

This quote by Donald E. Knuth is probably the best known about software optimization. And believe me, it’s generally a very good thing. What you don’t want when you are starting to write your big program is to worry about detailed optimization.

Here are some examples of things you really do not care about at the moment.

When I read my configuration using my pretty reflection-based parser, should I keep a cache of the Class objects in order to have fewer calls to the Reflection API ?

When I iterate on a C++ vector, maybe I should write my code like:
int size = vector.size();
for (int i = 0; i <size; i++) { doStuff(); }

rather than
for (int i = 0; i <vector.size(); i++) { doStuff(); }
(hint: we’ll see the answer one of these days)

Spending time on this kind of preoccupations is called “micro-optimization”, and it’s almost always bad. You should only care about these in the “hot spots” of your code, the bottlenecks where you spend the vast majority of the time. But you should almost never try to think about those by yourself, without being sure that there is indeed a problem. There are very good reasons for that:

  • Optimization takes time. Lots of time. Each hour that you spend optimizing is time during which you won’t be writing new features, fixing bugs, and generally speaking, making your product better.
  • Optimization introduces bugs. Generally speaking, code that has gone through micro-optimization passes is more complex, less readable than “simple” code. Furthermore, it will often replace some already existing code, code that you knew were correct. Changing it to something new is a risk.

So, micro-optimization is a risk and a cost. You are only willing to take that risk and spend that time if there is really something to win. And to be sure that there is, you need some proof, some numbers. You need to profile your application. That will be one of the very first topics we’ll cover.

But you still need to design and think

However, there is an area where you must think about optimization and performance in the very first steps, and it’s when you design your project. The first requirement is that you must decide on the performance requirements for your program. How many users ? How many stuffs per second ? How many foobar per file ? What maximum answer time ? Think about the future, think about the lifetime of your program. What will happen when your traffic will increase ten-fold ? Micro-optimization will never give you that, you must have thought about it at the beginning.

These “upper bound requirements” will guide you through the high-level design of your solution. Is one database gonna be OK for the whole life of the product, or will you need to have several database shards ? You really want to know the answer to this before writing the first line of code. Will you ever need to spread the load on several servers, or do you condemn yourself to sticking on one single machine forever

Changing any of these a few monthes after the initial design phase would be extremely costly, because you basically need to redesign most of the software.

On the other hand, trying to achieve “infinite” scalability on all aspects on your software is almost guaranteed to fail. You would waste huge amounts of time and hardware on non-use cases. You do want your super new Web 2.0 website to scale with the number of users, but do you really need to build some scalability for the number of supported languages ? There are only a few hundreds in the world, and that’s not going to change soon.

Changing scales

Another dangerous mistake is to under-estimate the dangers of trying to scale. When you have a piece of software which works correctly for a size N, there’s actually very little chance that it will work for size 10N. Even if just multiplying the figures seems to indicate that everything will be OK, you are quite sure to run into unforeseen troubles. Maybe you’ll hit some 32-bits limits. Maybe your structures that did fit in RAM don’t fit anymore and you performance is going to decrease severely. Maybe your critical section that worked so nicely will suddendly start being contended and will slow down. Software is full of non-linearities.

This also implies that if you test your software with 10% of the estimated total size of your data, it will not work correctly. Maybe you will only consume 5% of the resources and think that everything will be OK and that you still have some margin, but chances are that it’s not true.

Summing it up
Don’t micro-optimize, especially at the beginning. Don’t dig in the internals of the code and line-by-line optimizations to look for performance. If you’re already at this level, then you’ve probably already lost, because you can’t get much back from doing this.

Think hard about the figures as soon as you start your project. Make your choices of where you want scalability, and where you are sure that it doesn’t matter and will never matter.

And, as much as possible, test your software on “real” size data. Testing on 10% of the target size is not going to work.

Leave a comment