sábado, 28 de marzo de 2009

One DVCM to rule them all


Given the recent news that GNOME development will be moving to git soon, I decided to take a look at three DVCMs to see how they matched each other. Not that I made exhaustive tests. I just thought of a series of operations to perform on a small and a big project to see how they matched against each other.

I compared git, mercurial and bazaar (the one I use).

The small project I used was a project of mine (some thousand lines of code.... not too big). I used a sequence of ten revisions taken from the project and stuffed them in the three VCMs.

The big project was linux (the kernel). Given the time that some operations took (plus the room that it took on my already mostly filled up box) I only tried with revisions 2.6.27 (327 MB) and (348 MB).

I measured performance both in time taken for the operations and room taken by the repository. I'm using my dated box (4 years old?) on jaunty using the repository packages. Verbigracia: git, hg 1.1.2 and bzr 1.13

Here are the results:

Time Performance
Small Project
On the small project the absolute winner was git. Second was mercurial and third bazaar. Git made most of the operations 3 to 10 times faster than mercurial and the later did them mostly three times faster than bazaar. Of course, as it's a smal project, what git could do in the blink of an eye, bazaar could do it in a longer blink of an eye. The slower operation on all three VCMs was a revert after having deleted the whole project. It took bazaar 1,72 seconds on the biggest revision (the 10th), mercurial made it in 0,56 and git took 0,13 (on the 8th revision... I was already bored with seeing git kicking a55es).

Big project
Here I expected git to mop the floor with the other VCMs. Given Linus' dislike for slowness (at least on VCMs subjects) and that at least bazaar recommends not to use it for big projects, I expected to see git go faster than Ussain Bolt yet here I saw mixed results amazingly having bazaar (the slowest of them all on a small project) be the fastest sometimes (not that I did things people would normally do on a project.... I don't think you will work with a 20 MB patch between one revision an another, anyway).

On the first add (a 300+MB add) results were veeeeery strange. Here git dragged itself to let the others pass over it. Mercurial took ~6 seconds, Bazaar made it in ~41 seconds, but git.... well, it made it in ~106 seconds. That shattered my expectations... and that was for starters.

Status after an add: On the first revision, bazaar and git were very close around 3-4 seconds... bazaar made it at over 10 seconds.... but on the add of the second revision we had very different results: ~3 seconds for mercurial, ~14 seconds for bazaar and git made it at ~191 seconds.

For the first commit, git won easily. It took git ~55 seconds, Mercurial did it in ~131 seconds and bazaar arrived when the party was already over at ~203 seconds (mercurial was already drunk and git was on the way to the hospital)... however on the second commit (remember, it's over a 20 MBs difference), they were all much closer. Mercurial arrived last with ~168 seconds, git second with ~144 seconds and bazaar (oh, my!) first with ~130 seconds. Now I didn't expect that.

After the commit, (again) I expected git to fly when asked for a status. But it didn't deliver. Both times mercurial was first, bazaar was second with about double the time and git was third with about 5 times the time for mercurial. By the way, the status after the first commit was about 3 times slower than the second on all VCMs.

Now, when I removed all the content of the project (rm * -dfR) and asked for a status, we went back to normal: git first, mercurial second bazaar third. For both revisions, bazaar stayed around 8-9 seconds. Mercurial made it in ~8 seconds for the first and ~4 seconds for the second. Git made it in ~2 for the first and ~4 for the second.

Then I tried to revert (in bazaar and mercurial, reset --hard HEAD for git) after having removed everything. Here the result were strange again. Bazaar made it first with 213 and ~212. Mercurial and git exchanged places between the two revisions. First revision, mercurial made it in ~356 and git made it in ~435. Second revision, git made it in ~455 and mercurial in ~459.

Then I tried to go to 2.6.27. Mercurial took ~88 seconds and bazaar was waaaaaaaaaay behind at ~507. When i revert back to the last revision, mercurial made it in ~186 and bazaar made it in ~393. When I tried it in git (with a checkout) when I reverted to the first revision, I destroyed (I think) the second revision, so I decided to not include the time it took to revert/revert.

So, I thought that git would be the clear winner and the fact is that I got mixed results. Perhaps people can join in and give me some insights about why it was like that.

Repository Size Performance
Here there were no mixed results. Both for a small project and a big project, bazaar was the clear winner. Mercurial made it second and git arrived last.

Bazaar's repository size for the small project was around 10-40% smaller than mercurial's. Also bazaar was around a third the size of git's.

On the big project, here are the sizes:
Bazaar: 87152k for the first revision and 99456k for the second.
Mercurial: 148844k and 166122k
Git: 158912k and 228544k

Well.... I would have loved to give you a clear winner, but the fact is that there wasn't. For a small project you already know who the best is in terms of user time and repository size... but for a big project, it's a little more blury. I urge you to jump into the comments area to give me your thoughts on it.

As usual, keep in mind that other than performance, there are other differences between all of them in terms of features. It's not just about performance to decide what DVCM to use for a project.

I could have made a mistake when working with mercurial or git, as I didn't know how to use them before writing this article (and I still don't). So if you think I could try to do something a little different, then go ahead, mail me or post a comment. Perhaps we could create a test suite or something to compare their performance.

3 comentarios:

  1. Did you run `git gc` liberally for these tests? If you didn't then you inadvertantly disadvantaged git.

  2. also, bazaar scales really bad with increased repository depth. You shouldn't look at just two revisions, but try the same with 10000 revisions. Also compare for example log times

  3. Sounds like I have to do more testing with a big project... but not from scratch. Perhaps if I try to move one project already stablished into a bazaar repository of mine? I'll try that to see how it goes.

    Also, I didn't run git gc. I'll try that and I'll see how it goes then, ok?