Monday, August 17, 2009

New blog

I have started a new blog, this one more general in its nature. There are two posts already, see http://blog.alex.gontmakher.com.

Sunday, July 19, 2009

Uploading reviews to Rietveld hosted on a domain

Rietveld recommends using upload.py for sending code reviews. Unfortunately, it doesn't work when the code review engine is installed as a domain in Apps For Domains.

The fix is simple and is described here: http://code.google.com/p/rietveld/issues/detail?id=133.

Thursday, May 21, 2009

Programmer's heaven

I'm a big fan of code reviews, so when another person expressed willingness to join the development, I immediately decided I want the technical means for doing the code reviews.

First, I tried the review engine at github.com. That didn't go well. It was easy to set up, but the review engine is laughable (sorry guys. Github is a great service, but code reviews are not, at this point, its strong side).

Then I installed reviewboard. It was not hard, and it's a huge improvement over github. However, it's still somewhat kinku to use, wants a local repository for Git, and I grew to strongly prefer online comments to small bubbles at the beginnings of the lines.

So, I gave a try to Rietveld, an open source incarnation of Google's Mondrian. It can be installed on AppEngine, and it was also ported to pure Django. Rietveld is immediately usable with Git, even without configuring any repositories, and it is as simple to use as it gets. And I got it working after maybe one hour of wrestling.

But now for the best part: it's actually available in Google Apps for Domains! To set it up for your domain, go to the administration dashboard, click the "Add more services" link and select the Code Reviewing engine. One CNAME record later, you're done. All the joy of Mondrian combined with zero administration and 3-minute setup.

Funny how the most powerful thing out there is also the easiest to use. Oh, the wonders of software as a service!

Monday, March 23, 2009

I #$%ing love windows!!! The curious case of bat files under chcp 65001

It's not enough that Windows's cmd.exe scripting is stone age. But also, if you set "chcp 65001", switching to the UTF-8 codepage, then batch files won't run.

Why oh why?

Update: from perusing various web forums, apparently everyone just says "oh, bat files just won't work if you change the code page". Hey guys, this is outrageous!!! How comes we're putting up with bugs like these?

Saturday, March 21, 2009

Manent 0.90.0

We have decided we now need to clean up, pack up and work to release version 1.0. Functionally, Manent is quite powerful, but as user interface goes, there's much to be desired.

For 1.0, we'll improve command line user interface, provide a built-in HTTP server for restoring, make error handling more user-friendly, and simplify the installation. Probably we'll even add more backends.

And the good news: starting from version 0.9.0 (released today), we provide an installer for Windows!

Tuesday, March 10, 2009

Yet another take on: Is Python a platform?

Python is a nice scripting language, and a nice application language too. But is it a platform? In my book, a platform means that I can develop in it, and remain pretty confident that it will meet my requirements without having to bend it too much. There are more important problems to solve than battling with my own tools.

In that respect, C++ is a platform. It's not as nice and terse as Python, and my program would have to accrete all the libraries it needs as it goes along, but therein lies the flexibility: I can accrete exactly what I want, exactly how I want it. Unfortunately, while I love C++, it's not an option for Manent. It would take too much to implement the first prototypy prototype in it, given the time that I have.

Python is another story. It has batteries included, in the sense that almost everything comes built-in. Hashing, filesystem operation, encryption, compression, network protocols, GUI. Right?

Wrong. That works, but up to a certain extent. Yes, hashing works fine, but it's quite simple and self-contained. Encryption and compression also work pretty much out of the box. But then the reality starts to hit.

Filesystem operations are pretty much portable when you want the basic stuff. But what to do about the ACLs? The hard links? The symbolic links? The hard links to symbolic links (which are possible under Linux but not, say, under Mac)?

Ok, the situation with filesystems is not that bad. I just decided that so far, I'll target the lowest common basis, with some exceptions. Obviously, hardlinks and symlinks are terribly important in Unix-based OS'es, so they are going to say, and if you restore your program in Windows, bad luck.

Now the situation with network protocols is harder. As it goes, some are available out of the box, like FTP, some require external libraries, such as SFTP. SFTP is one of the most important here, so let's analyze it:

There are several ways to do SFTP in Python:
  • A pure-Python library called Paramiko. It works OK, and it's what I currently use, but it seems slow compared to what others do.
    Another small trouble that it gives me is that it relies on Python Crypto library. That library works OK, but was not updated for Python 2.6 and now gives warnings on startup. The author of the library works on a new release with no announced ETA, so I'll have to maintain a privately patched version to get rid of the warning. Oh.
  • Bringing along a SFTP executable and running it for all the transfers. This is not bad under Linux, and only a bit worse in Windows where I'd have to bring it along. But since it is not a library, it can have strange failure modes that I need to support: it can decide that it stops and asks for a password. So I'd have to intercept that.
  • Rsync has support for almost all network protocols I need and would actually be easy to use. However, it's not available out-of-the-box on Windows and I'd have to bring it along again.
  • PyCurl is also a nice candidate. But it's also problematic: in all the systems I have checked, it was by default built with no SFTP support. So I'd have to build and bring along my own version of the library.
What does all of that have to do with Python? Simple: Python is an interpreter, and the python system is supposed to be installed somewhere and shared between different uses of itself. Kind of like Java does. But I can't jump around, randomly putting custom-compiled libraries on top of an existing Python install, even if I'm the first one to put it on a given machine. Some other program dependent on Python might come along, and things will start to get screwy.

There are several ways to make python a proper platform. The best and easiest would be, if it just had supported everything I need, batteries included and with very high quality. But that's not going to happen soon and I can't wait for it. Another possibility would be for me to use a centrally installed Python and install the necessary libraries in a private location. This would work but some gut feeling says I shouldn't do that, and besides, it's more complexity to add to my already severely constrained dev time.

Come py2exe. I recently tried it, and it works just fine. Point is, it packs along the python interpreter, all the libraries I need, with their custom versions as set up on my dev machine, all into a small, nice, self-contained system. Well, not so small and a bit ugly inside, but what do I care? Self-contained is the word.

Thus, I hereby proclaim: starting with the next version, Manent on Windows will come with and installer and be a self-contained exe. It's still command-line only, but the install instructions will be: run the installer, done. Whoever feels curious enough to install it from source, welcome, but it's not easy and will become increasingly harder as more customization is added.

That's the platform for now. And it's a Python platform for me. Until Python itself works out of the box.

Friday, September 5, 2008

Is Manent complex?

Yes and no.

It currently has 12K lines of Python code, and Python is a relatively concise language. And it currently includes just the engine - the user interface is fetal, error handling is minimalistic and the documentation is, well, not for the faint of heart. So, there is a lot to be added yet.

However, code size is not the only metric of complexity. There is mental complexity that is hard to quantify but very real. It was felt deeply? when I tried to do some code changes?.

The current Manent code is approximately third generation of architecture. It has undergone significant redesigns, the major point of which was to reduce the mentall complexity.

The first redesign affected the way directories are stored. In the beginning, the directory tree was stored as one large serialized object. This was, of course, wasteful, since the directory trees change very little between backup increments. So, I had to extend the format to store diffs of directory trees. Soon, I realized that storing diffs is also not efficient enough, and switched to multi-level diffs. Now these have some serious complexity. It's not that it can't be done, but very hard to pull off in finely dispersed 1-hour coding sessions.

So, I had to wait until I have a serious chunk of time to work on it. Fortunately, a conference conveniently came along, and I gave up any city touring in favor of fleshing out the complete implementation. After several evenings with paper, pen and coffee it was finally done, but when I got near the keyboard to start writing it, I realized that there is no way I can finish that before the conference is over. Then I said to myself, it should not be so complex! So, I sat back and thought.

At around the same time, I have switched from SVN to Git, and have read some description of Git inner workings. I borrowed many ideas from Git's design, and I'm not ashamed to admit it - Linus is a very smart guy.

So, I have switched the directory encoding to a content-addressed scheme, same as the data within the files. There is no need for encoding tree diffs, no complex back-pointing rules, and so the new code was not only much simpler than the original one, but also shorter. Moreover, the simplified container file format allowed me to easily implement complete encryption and compression, which would be a pain to do previously.

The implementation took one day, squashing the bugs another two, but I had put it behind enough testing that I'm confident enough about it.

So, is Manent complex? Yes, and it has to be - it has many complexities to deal with. But it could be much more complex, unless I had several "Oh, it doesn't have to be this complex" moments. And I hope to have more of those.

Two other cases of mental simplification have been: sharing of container files between backup instances and header summary organization. I'll probably write on those later on.