Friday, September 5, 2008

Is Manent complex?

Yes and no.

It currently has 12K lines of Python code, and Python is a relatively concise language. And it currently includes just the engine - the user interface is fetal, error handling is minimalistic and the documentation is, well, not for the faint of heart. So, there is a lot to be added yet.

However, code size is not the only metric of complexity. There is mental complexity that is hard to quantify but very real. It was felt deeply? when I tried to do some code changes?.

The current Manent code is approximately third generation of architecture. It has undergone significant redesigns, the major point of which was to reduce the mentall complexity.

The first redesign affected the way directories are stored. In the beginning, the directory tree was stored as one large serialized object. This was, of course, wasteful, since the directory trees change very little between backup increments. So, I had to extend the format to store diffs of directory trees. Soon, I realized that storing diffs is also not efficient enough, and switched to multi-level diffs. Now these have some serious complexity. It's not that it can't be done, but very hard to pull off in finely dispersed 1-hour coding sessions.

So, I had to wait until I have a serious chunk of time to work on it. Fortunately, a conference conveniently came along, and I gave up any city touring in favor of fleshing out the complete implementation. After several evenings with paper, pen and coffee it was finally done, but when I got near the keyboard to start writing it, I realized that there is no way I can finish that before the conference is over. Then I said to myself, it should not be so complex! So, I sat back and thought.

At around the same time, I have switched from SVN to Git, and have read some description of Git inner workings. I borrowed many ideas from Git's design, and I'm not ashamed to admit it - Linus is a very smart guy.

So, I have switched the directory encoding to a content-addressed scheme, same as the data within the files. There is no need for encoding tree diffs, no complex back-pointing rules, and so the new code was not only much simpler than the original one, but also shorter. Moreover, the simplified container file format allowed me to easily implement complete encryption and compression, which would be a pain to do previously.

The implementation took one day, squashing the bugs another two, but I had put it behind enough testing that I'm confident enough about it.

So, is Manent complex? Yes, and it has to be - it has many complexities to deal with. But it could be much more complex, unless I had several "Oh, it doesn't have to be this complex" moments. And I hope to have more of those.

Two other cases of mental simplification have been: sharing of container files between backup instances and header summary organization. I'll probably write on those later on.