Comparison of Intel and CloudFlare zlib patches

Posted on 2014-08-04 in General

These days I care a bit about zlib deflate speed for work. Last week CloudFlare made some noise about having 'rewritten gzip' for massive performance improvements. Which was perhaps a bit exaggerated, since it turns out to mostly be vectorization changes similar to those included in the earlier zlib patches from Intel.

The changes from Intel are somewhat more extensive, since they include two completely new deflate strategies ('quick' for use at level 1, 'medium' for use at levels 4-6). There's a higher risk of bugs with such changes than with 'just' vectorizing existing scalar code. But those new strategies can be disabled at compile time, so even if that's a worry it would've been nice if the latter changes had been built on top of the earlier ones. (The most likely reason is probably not knowing about it, but then it becomes a case of a single web search saving a few days of programming. Intel's changes were pretty well publicized last year and have been going through the zlib mailing list).

Anyway, since there are now two separate zlib performance forks, I thought it'd be useful to compare them a bit.

The summary is that CloudFlare's fork is much faster with decompression and a little faster with high compression levels, while Intel's is faster with low and medium compression levels.

It seems likely that one could get the best of both worlds. At least a superficial analysis suggests that many of the changes don't conflict in principle (even if they often conflict in practice, due to messing with the same areas in the code). For example the decompression speedups in CloudFlare's version appear to come mostly from a SSE-optimized CRC32 implementation. Intel's version also includes similar optimized CRC32 code, but as a separate entry point used only for compression (doing a combined memory copy and checksum).

As for production use, the new compression strategies in Intel's version had some early bugs. They are fixed now, and at least in our internal testing we haven't seen any new problems. So we'll probably switch to it in our next release. CloudFlare's changes are more conservative in that sense. Unfortunately there's a license time-bomb in that version, since it now includes a GPLv2 source file. That matters to me, but of course won't be an issue for some.

For more details, continue reading.


The test script checks out and compiles the specified zlib versions, compiles them, and then uses the generated minigzip binaries to compress and decompress a few test files, possibly at different compression levels. When testing decompression, all zlib versions are tested using the same compressed file, generated using the baseline version at the default compression level.

Each benchmark run (of 10 compressions or 200 decompressions) was repeated 5 times, with different zlib versions being interleaved to eliminate any systematic biases (e.g. system load or thermal throttling). The numbers reported in the following tables are the mean and the standard error of the execution times.

The test corpus included English text in HTML format (The Count of Monte Crisco), a x86-64 executable, a basically uncompressable jpeg file, and the pixel data from a RGB image after png compression filtering had been applied.

The benchmarks were compiled using gcc 4.8.2 and run on a i7-4770, using the following command line.

CFLAGS='-msse4.2 -mpclmul -O3' perl --output-format=json --output-file=results.json --compress-levels=1,3,5,7,9 --compress-iters=10 --decompress-iters=200 --recompile

(Note the CFLAGS if you want to rerun the test. At least on my machine the configure script in CloudFlare's version doesn't autodetect SSE 4.2 / PCLMULQDQ support without those flags.)


Compression level 1

This is essentially a comparison between the original 'fast' deflate strategy and the new 'quick' one. Which is to say, these results aren't very comparable at all. Basically the Intel version adds a completely new point to the performance vs. compression ratio curve (trading a fairly big amount of compression for an even bigger speedup). It seems like a worthwhile option to have.

compress executable -1 (10 iterations)
Compression ratio0.370.370.46
Execution time0.84s±0.00(100%)0.59s±0.00(70%)0.30s±0.01(36%)
compress html -1 (10 iterations)
Compression ratio0.390.390.54
Execution time0.44s±0.00(100%)0.33s±0.00(73%)0.22s±0.02(49%)
compress jpeg -1 (10 iterations)
Compression ratio1.001.001.05
Execution time0.68s±0.01(100%)0.58s±0.00(85%)0.25s±0.00(36%)
compress pngpixels -1 (10 iterations)
Compression ratio0.170.170.23
Execution time0.51s±0.00(100%)0.34s±0.00(66%)0.18s±0.00(35%)

Compression level 3

This is very close to a like-for-like comparison, since all library versions are using the same strategy ('fast') with the same parameters. The results aren't identical, but the achieved compression ratios are essentially the same. Intel's version is marginally faster.

compress executable -3 (10 iterations)
Compression ratio0.350.350.36
Execution time1.21s±0.01(100%)0.79s±0.00(64%)0.77s±0.01(63%)
compress html -3 (10 iterations)
Compression ratio0.360.360.35
Execution time0.70s±0.00(100%)0.54s±0.00(76%)0.46s±0.00(66%)
compress jpeg -3 (10 iterations)
Compression ratio1.001.001.00
Execution time0.68s±0.00(100%)0.59s±0.00(86%)0.60s±0.00(87%)
compress pngpixels -3 (10 iterations)
Compression ratio0.150.150.16
Execution time0.95s±0.00(100%)0.61s±0.00(63%)0.48s±0.00(50%)

Compression level 5

This is a comparison of the new 'medium' strategy to the old 'slow'. The compression ratios are essentially the same. Intel's version is significantly faster on the compressible data, slower on the uncompressable data.

compress executable -5 (10 iterations)
Compression ratio0.330.330.34
Execution time1.76s±0.01(100%)1.20s±0.00(68%)0.99s±0.01(56%)
compress html -5 (10 iterations)
Compression ratio0.340.340.33
Execution time1.08s±0.00(100%)0.85s±0.00(79%)0.57s±0.01(52%)
compress jpeg -5 (10 iterations)
Compression ratio1.001.001.00
Execution time0.69s±0.00(100%)0.61s±0.00(88%)0.75s±0.00(108%)
compress pngpixels -5 (10 iterations)
Compression ratio0.140.140.14
Execution time1.35s±0.01(100%)0.85s±0.00(62%)0.64s±0.00(47%)

Compression level 7

Another like-for-like comparison, this time with the 'slow' strategy. Both optimized versions are noticeably faster than the original, but CloudFlare's version is marginally faster.

compress executable -7 (10 iterations)
Compression ratio0.330.330.33
Execution time3.90s±0.01(100%)2.62s±0.01(67%)2.73s±0.01(69%)
compress html -7 (10 iterations)
Compression ratio0.330.330.33
Execution time1.89s±0.01(100%)1.58s±0.01(83%)1.66s±0.01(87%)
compress jpeg -7 (10 iterations)
Compression ratio1.001.001.00
Execution time0.69s±0.00(100%)0.61s±0.00(88%)0.61s±0.01(88%)
compress pngpixels -7 (10 iterations)
Compression ratio0.130.130.13
Execution time4.09s±0.01(100%)2.74s±0.00(67%)3.17s±0.01(77%)

Compression level 9

Basically the same as level 7.

compress executable -9 (10 iterations)
Compression ratio0.330.330.33
Execution time9.88s±0.01(100%)7.49s±0.01(75%)7.72s±0.01(78%)
compress html -9 (10 iterations)
Compression ratio0.330.330.33
Execution time2.94s±0.01(100%)2.55s±0.00(86%)2.63s±0.01(89%)
compress jpeg -9 (10 iterations)
Compression ratio1.001.001.00
Execution time0.69s±0.00(100%)0.61s±0.01(89%)0.60s±0.00(87%)
compress pngpixels -9 (10 iterations)
Compression ratio0.120.120.12
Execution time27.42s±0.03(100%)20.19s±0.04(73%)22.06s±0.02(80%)


CloudFlare's version decompresses a lot faster than either of the other two.

decompress executable (200 iterations)
Execution time5.48s±0.01(100%)4.71s±0.02(85%)5.51s±0.02(100%)
decompress html (200 iterations)
Execution time3.43s±0.04(100%)3.01s±0.07(87%)3.42s±0.04(99%)
decompress jpeg (200 iterations)
Execution time2.43s±0.12(100%)1.39s±0.25(57%)2.48s±0.15(102%)
decompress pngpixels (200 iterations)
Execution time3.68s±0.03(100%)2.97s±0.03(80%)3.70s±0.03(100%)

Permanent Link

Book Review: Worm, a web serial

Posted on 2014-05-24 in Books


An awesome, somewhat addictive, fairly dark and finished free novel about people with superpowers, published as a very long web serial. I recommend this to anyone who generally likes SF, and either has plenty of free time or sufficient self control to only read a few chapters a day.

Longer review

Worm by Wildbow (John McCrae) is a superhero web serial that Christophe pointed me at. He sold it to me roughly as "it's about superheroes who use their powers in a mostly intelligent and often surprising manner". Which was vague enough to spoil nothing and specific enough to make me have a look.

I came to work the following day extremely blurry-eyed, having read until the morning and only having time for a a couple of hours of sleep before I had to be back at work. And that was kind of the pattern for the next couple of weeks, because this thing is long, and sufficiently compelling that most of my free time was spent reading it. It's all the more impressive for being written by one guy, part time, updating 2-3 times a week without fail, over a couple of years.

There's a couple of separate bits to Worm that I find interesting. First, there's the question of the work itself. And then there's the publishing / business model. I'll discuss these in that order.

The Work

The world building is absolutely top notch. Incredibly imaginative, and mostly internally consistent. I like the characters (even when I don't like the persona, I like the characterization). A decent comparison would be the early Vorkosigan books by Bujold. I love the way the protagonist's angle of figuring out creative ways to abuse her superpower, and her continuing annoyance at other superheros not doing it. It's got a bit of Harry Potter and the Methods of Rationality feel to it (but doesn't suffer from the many weaknesses that HP:MoR has as a story rather than as a rationalist screed).

McCrae is very skilled at setting up cliffhangers. That's probably a requirement for the web serial format where the most important part is to make sure people come back regularly for their next fix. But it's really dangerous for a completed work. It never feels like there's a satisfying place to stop :-)

As a particular rarity, I love the ending. So many epic scifi or fantasy tomes either end up with a whimper, or end up with supposedly cosmic nonsense that I just can't make any sense of. Worm avoids this completely. It builds up to an awesome ending sequence, which is clearly foreshadowed right from the beginning but that I never would've guessed would actually happen. And then it even resolves everything in a very satisfying and fitting way. I kind of wish the epilogues hadn't been written, or at least that I hadn't read them. But I will say no more about that, and understand why the epilogues exist and why some (most?) people would like them, it's purely a matter of taste.

There are some bad parts as well.

The book has not been professionally edited. I have no major complaints about the language, though there are some annoying verbal ticks, and I'm not a huge fan of the longer dialogues. As a more serious matter many parts of the story could be tightened significantly. One friend commented that during particularly combat heavy sequences he'd start just reading the first sentence of each paragraph, because the action felt like it was moving at one second per 5 paragraphs. I don't agree entirely, but some sections were a bit more of a slog. It feels like a ruthless editor could do the book a big service by cutting away the right 20%.

It's also a very dark and story. Can you imagine something bad happening to someone? There's a pretty good chance it'll happen to a major character, or happened as part of their backstory. Good things happen, but much less often. It's not casual cruelty, nor any kind of misery porn (unlike e.g. Song of Ice and Fire seemed to devolve to). But it is a bleak and potentially quite depressing world. Definitely wouldn't let kids read this.

But anyway, overall I really like it, and recommend it with the above caveats.

The Business Model

There's been a lot of talk about what's going to happen to the publishing industry in the future (see for example the related posts on the blog of Charlie Stross, both by him and guest bloggers). And there have been all kinds of experiments in that space. Worm is at the extreme end of those experiments.

This is the first web serial I've ever come across that felt like an actual business rather than hobbyist fanfiction. (I gather from some interviews that it wasn't set up as such, but just a way for the author to force themselves to stick to one project). It maybe should not be a surprise that the model exists. A bunch of people appear to be able to make a living from long running webcomics (geez, have I really been reading Sluggy Freelance for almost half my life?). Why wouldn't it be possible for a novelist to do the same?

One major difference in the business model is that Worm ran entirely on donations, not on ads. This is quite admirable. Another difference is that from the outside it feels like the amount of continuous work going into Worm must have been much higher than for the typical "3 panes, 6 times a week" webcomic. And finally, while I don't know the full financials of the operation, part of the donations are done over the recurring donation system Patreon, and those donations are public. Currently it's at a recurring $1200 per month. Maybe people are uncomfortable with that method (I certainly was, and used PayPal instead). But at least based on the public information, the work/donations combination doesn't really look sustainable, which is a real shame.

But then again, authors being published for the first time through traditional channels are unlikely to make a living just by that either. And while it seems likely that the ARPU of a web serial must be pretty low, it must be way easier to reach new readers than in traditional publishing. I buy very little fiction from new authors these days. There's just no point since I already have a big backlog of books from authors I'm a fan of. From my point of view a publisher's brand gives absolutely no marketing benefit, or even credibility, to a new author. The only way I'll find new authors is through positive word of mouth, and a web serial is clearly much more viral than a book, whether physical or e-book, as long as the author can hook the reader within the first 10 minutes.

This feels like the future. It might be a depressing future, but then again that's what every kind of business feels like these days.

McCrae has already started a new serial Pact.

Permanent Link

Android on a Netbook - an Eee Transformer Review

Posted on 2012-02-22 in General

When the Eee Transformer was announced a year ago, I thought that the it was a brilliant idea. There had been some earlier attempts at Android netbooks (e.g. the Toshiba AC100), but they were clearly not serious devices. They had no access to Google apps, were running Android 2.x for small values of x, and had no touch input. A fully supported Honeycomb tablet with an optional full laptop keyboard looks a lot more reasonable.

So I got one as soon as they became easily available around here. Here's my impressions of the device after over half a year of using one, including a couple of weeks of trying to use it as my main computing device.

You might wonder what the point of a review for a device that was announced a year ago is, especially since it has already been obsoleted by the Transformer Prime. Mostly it was written a while back and never posted, but the subject of Android on laptop-like devices seems to be in geek news a bit these days (see e.g. Ubuntu for Android or How to use the Galaxy Nexus as a desktop replacement. And I expect that many of these observations carry over to the Prime, or other future Android devices that use the same form factor.

The tablet

The Transformer is a perfectly fine Honeycomb tablet for its generation when it comes to hardware. It's maybe got a slightly larger bezel than normal due to needing to be as wide as the keyboard dock, which in turn needs to be of a certain size for the keyboard to be usable. My unit did have some bad RAM making especially the browser crash a lot, but diagnosing and fixing bad RAM is easy enough.

On the software side Asus seems to be much better about getting updates out than other manufacturers, and instead of skinning the system try to differentiate by merely installing some random crapware that doesn't interfere at all with normal use. The upgrade to Android 4.0 has been a long time coming, but no other vendors seem capable of pushing it out either.

Overall, I use the Transformer as a tablet daily and am really happy with it. Any complaints I have would apply to all other Android tablets as well.

The dock

But then again, as a pure tablet the benefits of a Transformer over other tablets aren't that major either. Transformer's big differentiator is the keyboard dock, which converts it from a tablet to a netbook. The dock has three functions: it doubles the battery capacity, has a multitouch trackpad for using touch-based applications without touching the screen, and of course it can be used for typing.

The battery

The extra battery life is just awesome, especially when traveling. It's like a super-light laptop that gets 16 hours of active use and is instantly usable after opening the lid.

The trackpad

The trackpad is totally useless. First, it doesn't seem to have reliable detection for accidental touches, e.g. with the palm when typing. Unless you're very careful, random touch events will be registered all over the place. It's way worse than with a normal trackpad. Second, mapping touches on the trackpad to touches on the screen just feels unnatural and hard to use, even with on-screen indicators for where the machine considers your virtual fingers to be located.

Unfortunately this touch emulation might be the lesser of two evils. If you plug in an USB mouse it'll work as a normal mouse, moving the cursor around on the screen. But apps built for touch just won't behave the way you'd expect them to when used with the pointer. It's easy to see why Asus decided to default the trackpad to work like a touch screen rather than a pointing device.

In practice I've had to toggle the trackpad completely off when the Transformer is docked. But that means using the screen for touch input, which isn't much better when there's an intervening keyboard. It's not at all comfortable to use due to gorilla arm syndrome.

The keyboard

The hardware of the keyboard is pretty reasonable for typing. The keys are a bit too small, but I got used to it quickly. A bigger issue is that the keys also don't feel like they have enough travel, and the flat profiled chiclet keycaps are pretty horrid compared to e.g. the contoured keys on the Thinkpad x100 series. But all of this is easy to justify due to the design constraints. E.g. making the keyboard larger would also require making the tablet part larger. Likewise more keyboard travel would presumably require a thicker device.

From a software point of view the keyboard situation is less good. The keyboard support is undocumented, minimal, and occasionally flaky even in Google's apps. Third party apps tend to be even less keyboard-friendly. The browser app has some minimal shortcuts, but a lot of things you'd expect in a real browser are missing. For example you can't do some basic tasks like switching tabs using the keyboard. In the gmail app the normal shortcuts from the web UI might work as expected in one view, but then suddenly not in another view. The task switcher has no keyboard support at all. And so on - I don't think I've been happy with the keyboard support in any software.

This might seem like nitpicking. But since using the touch screen is so uncomfortable when the tablet is docked, this is actually a major issue.

The Transformer as a netbook / laptop

After having had the Transformer for a while I noticed that the only way I was using the dock was as a case. So as an experiment I resolved to use it as my main computer while going on vacation. It took a lot of preparation to get the system into a state where I thought it would be usable, and in the end it still didn't work very well.

Some of the issues were esoteric things won't matter to most users (my Mom isn't going to care about Emacs keybindings). But I don't think it's just a matter of me looking at it from a hacker viewpoint. Even mundane tasks that technically are possible on Android - for example using gmail or Skype - were just incredibly annoying compared to doing the same task on a laptop. I was very happy when the experiment was finally over and I could use a real computer without feeling guilty.

Some basic functionality like multitasking or copy-paste are in an acceptable state for a phone or maybe even a tablet, but not for a computer. For example I wanted to read a friend's PhD thesis. I tried 3 pdf readers, and the usability of all sucked. But what was even worse was trying to read the thesis and write notes in a separate app. The task switching was just unbelievably clunky, and I never had any confidence in that the pdf readers would restore its state back correctly after I switched back.

It's hard to describe just how awful the multitasking experience is compared to a desktop OS, and how much it ends up mattering for usability. Not being able to have two applications visible at the same time in bad. Not being able to reliably and effortlessly switch between applications is crippling.

You might not realize how often you're switching contexts before using a system where it's not easy to do. Reading something in a chat or on a web page and need to punch in some numbers? In X my muscle memory opens a calculator automatically in a fraction of a second, with minimal interruption to the flow. In Android? It's a ridiculous process of much tapping, swiping and delays regardless of whether you need to start the calculator from the home screen or can use the recent apps menu to switch to it. Need to quickly check some chat windows and then resume whatever I was doing? Again a fraction of a second on a computer, an ordeal in Android. This stuff piles up very quickly.

The automatic process management that Android does feels totally absurd on a device like this. It's obvious what the point of it is on a possibly very resource-constrained phone. But once you stop thinking of the device as a tablet and treat it as a computer, your applications potentially being killed and needing to restore their state - possibly imperfectly - just becomes intolerable. "Oops. Hope you didn't actually need that ssh session".

While I don't feel software-deprived on Android when using it as a a tablet or phone OS, the expectations are very different for a general purpose computer. Either the software isn't there at all, or it's very crippled since it's not really inteded for serious use.

Even some software needs that that I expected to be trivial turned out to be painful. The best example is trying edit a Google Docs spreadsheet, which you might very well expect to be an easy operation. There's an Android app for Google Docs, but it turns out to be just a wrapper for the completely crippled mobile browser version, with (once again) not even minimal keyboard support, and with only the crudest editing capabilities. It was ok for viewing spreadsheets though. There's an option to switch the wrapper to use the normal desktop browser version, but it's so slow that even scrolling to the right part of the spreadsheet took a dozen tries, of iteratively over- or undershooting the target.

On the more hackerly side of things, stuff I expected to be just simple matters of programming turned out to be impossible. For example have you ever wondered why there are no editors with Emacs keybindings on the Android market? Or terminal software that could pass all Emacs keybindings through properly? It turns out not to be because nobody wants one enough to do the work, but because it's just not possible. Sure you might get the proper keyboard event for something simple like C-a, but for C-_ you're going to just get an _ event while for M-_ you'll get nothing.

Likewise while my programming needs on vacation were pretty modest (just wanted to prototype some things), it is pretty depressing that the only reasonably way to do it was to run an Ubuntu installation in a chroot. It's almost offensive that you can't realistically use an Android computer for developing native Android apps.

Now, in the beginning Android was a lot more agnostic about the input methods used. That support has been steadily eroding as the environment has moved towards a main input method of touch, in search of slicker user interfaces. And this is totally fine. But it does mean that trying to move back towards multiple input methods is going to be a touch job. Even if you get everything right in the core OS, you'd still need to get the application writers on board.

Ubuntu for Android looks like it might bring a solve this need to have the tablet work almost completely differently in normal vs. docked modes. But it's still vaporware, and running two parallel userlands is hardly an elegant solution.


I'm happy with the Transformer as a tablet. It's good for playing games, web browsing, watching videos on the plane, etc.

As a laptop or netbook the Transformer is a total failure, and I can't see this form factor ever working out for vanilla Android. It's too different to the main uses of the OS. Nobody seems to care about this use case, and the changes to make it a serious general purposes computer would probably hurt Android on the main target platforms. The main bright spot of the Transformer in netbook form is the incredible battery life, but it's hard to be happy about that when the experience as a whole is so lacking.

It's possible that this is finally the big chance for Linux on the desktop, but it's not the kind of Linux I like using, nor the kind I like programming for.

Permanent Link

Game design diary: minimal 18xx train rush game #1

Posted on 2010-10-11 in Games
» 1 comment

Time for a new board game design attempt. This time I'm going to try documenting my design process, mostly to motivate myself to actually finish the design this time. The inspiration was the recent spate of lighter variations of the 18xx system, none of which really does what I want. If you're not familiar with 18xx, this post probably won't make much sense, and you might as well stop reading. Sorry.

My goals:

  1. Strip away everything possible from an 18xx game except for the train rush, but try to retain the feel of the rush. Lokomotive Werks tried to do something along these lines, but I didn't really care for it for various reasons.

  2. Aim for a 90-minute game for the first play.

  3. Fully deterministic with no hidden information.

  4. It's a plus if the mechanisms make more sense for something that isn't train-themed. (Some people I play with have an allergy to trains).

First decision that had to be made was whether players represent companies or investors that might control several companies. The latter seemed much more likely to create interesting turn order and timing issues. Features like collusion between the same player's companies are pretty central to the way a 18xx train rush unfolds. And the possibility of controlling multiple companies also creates the delicious dilemma of companies being both assets and liabilities.

But having companies be separate entities has the problem that the game then also needs to keep at least a limited form of share dealing in the game. This is in conflict with both of the first goals.

After a couple of days of doodling around with the game I hit upon the idea of keeping the companies, but totally gutting the 18xx stock round. The only thing that you can do on a stock action would be to start a company: set the par price, buy 50-100% of the company, which will then receive 10*par price as initial capital. No selling or buying of shares. The latter would allow cutting out the stock market entirely.

For operating companies I absolutely did not want any kind of board play. Again the goal is to cut ruthlessly on the first pass. The simplest possible solution would be to have each train produce a constant income every turn until it rusts (e.g. Railroad Barons), but that felt a little too spartan for me. There should be something players can do to differentiate two identical companies.

My first stab at this is that train won't necessarily produce income every turn. Instead when you first buy one, you put 1-3 tokens on top of it. When the company operates, flip one token on each train. Any train that has all the tokens on it flipped produces income that grows non-linearly in the number of tokens. For one token $x, for two tokens $2.5x, for three tokens $4.5x. After a train has produced income, populate it with new tokens (can be different amount from original).

This seemed like a system with some potential: mechanically still very simple, but offering some interesting decision space when it came to the number of tokens you place on a train. You'd much rather prime a train with 3 tokens once than with 1 token thrice. But maybe you need the money sooner (or at least an option for having the money). And how long will the train live anyway? Somebody being too optimistic with their scheduling might even prompt other players to short-schedule their trains, withhold the money, and accelerate the train-buying.

As a bonus this also satisfied goal 4. This mechanism makes no thematic sense with trains, but the whole "small return safely and quickly" vs. "large risky payoff later" should have some resonance with other themes. The first couple of possibilities that popped into my mind were technology companies (where the "trains" would represent R&D projects) and trade fleets in the Age of Exploration. I'm sure there are more.

Other than that the structure of a single 18xx operation round seems ok. First run trains, then give a dividend or withhold the income (though no stock price effect), finally buy trains either from the bank at face value or from another company controlled by the same player. Every company must always own a train. If it can't afford one, then the controlling player will pay for one (this might not work in practice given shares can't be sold: it might be too easy to go bankrupt). There should probably also be a train limit.

I figure the game should have a constant alternation between an SR and sets of 2 ORs, and since it's explicitly a train rush game, using either the 1844 or 18FR-RCE rule for the foreigners buying a train from the bank at the end of each set of ORs.

That was enough for a first draft, so I plugged in some numbers. The train schedule and prices were taken from 1856 and made up an arbitrary payoff table for the different kinds of trains. Finally I set a par price range of $40-$100 and started the players with $300 (so the first choice of the game would be deciding which of 50-70% to go for).

I soloed half a game with these rules, and the game seemed to have the right kind of feel. Both the decisions on how many trains to buy each turn and how to run them seemed non-obvious, and there was the same kind of wheels within wheels planning.

The numbers I'd chosen were of course badly off, but that was to be expected and can be tuned. There were a number of worrying issues though:

There probably isn't a runaway winner problem, but there might be a spiral of death problem. If someone ends up running a company with fewer trains the opponents, they don't have the usual possibility of investing into the opponents' companies to boost their income. They are also unlikely to be able to found a new company very soon, and are thus much more likely to have dead cash after an SR. Possibly it should be possible to sell shares (down to 20%?) for the original price as a stock action.

A totally fixed company operating order is too restrictive and predictable, there needs to be some way to manipulate it. I think this should be done in the SR, ideally making stock priority more important. Perhaps something along these lines: there's a linear turn order track. A company starts on a space based on its par price (there's probably a couple of extra spaces between each par price space). As a stock action a player can move one token up or down on the track. This costs money: the nth token movement by any player on a SR costs fib(n).

I also suspect some company differentiation might need to be added later on. The current idea is that some companies get bonuses for running trains of a certain level.

Permanent Link
» 1 comment

Pretty SBCL backtraces

Posted on 2007-12-19 in Lisp

Every now and then I see complaints about the stacktraces in SBCL. They contain too little info, or too much info, or are formatted the wrong way, etc. But the backtrace printing isn't really any dark magic, it's just basic Lisp code. If you don't like the default format, just write a new backtrace function that prints something prettier/less cluttered/more informative/etc.

For inspiration, below is one implementation, based on a really quick hack I wrote in answer to a c.l.l post a few weeks ago. In addition to cosmetic changes, it adds a a couple of extra features: printing filenames and line numbers for the frames when possible, and printing the values of local variables when possible. Just call backtrace-with-extra-info in any condition handler where you'd normally call sb-debug:backtrace, or call it from the debugger REPL instead of using the backtrace debugger command.

The code assumes that you've got Swank loaded. For best results, compile your code with (debug 2) or higher.

(defun backtrace-with-extra-info (&key (start 1) (end 20))
   (lambda ()
     (loop for i from start to (length (swank-backend::compute-backtrace
                                        start end))
           do (ignore-errors (print-frame i))))))
(defun print-frame (i)
  (destructuring-bind (&key file position &allow-other-keys)
      (apply #'append
             (remove-if #'atom
                        (swank-backend:frame-source-location-for-emacs i)))
    (let* ((frame (swank-backend::nth-frame i))
           (line-number (find-line-position file position frame)))
      (format t "~2@a: ~s~%~
                   ~:[~*~;~:[~2:*    At ~a (unknown line)~*~%~;~
                             ~2:*    At ~a:~a~%~]~]~
                   ~:[~*~;    Local variables:~%~{      ~a = ~s~%~}~]"
              (sb-debug::frame-call (swank-backend::nth-frame i))
              file line-number
              (swank-backend::frame-locals i)
              (mapcan (lambda (x)
                        ;; Filter out local variables whose variables we
                        ;; don't know
                        (unless (eql (getf x :value) :<not-available>)
                          (list (getf x :name) (getf x :value))))
                      (swank-backend::frame-locals i))))))
(defun find-line-position (file char-offset frame)
  ;; It would be nice if SBCL stored line number information in
  ;; addition to form path information by default Since it doesn't
  ;; we need to use Swank to map the source path to a character
  ;; offset, and then map the character offset to a line number
   (let* ((location (sb-di::frame-code-location frame))
          (debug-source (sb-di::code-location-debug-source location))
          (line (with-open-file (stream file)
                  (1+ (loop repeat char-offset
                            count (eql (read-char stream) #\Newline))))))
     (format nil "~:[~a (file modified)~;~a~]"
             (= (file-write-date file)
                (sb-di::debug-source-created debug-source))

For example on the following code:

(declaim (optimize debug))
(defun foo (x)
  (let ((y (+ x 3)))
    (+ x y)))
(defmethod bar ((n fixnum) (y (eql 1)))
  (foo (+ y n)))

The old backtrace would look like:

1: (FOO 4)
    #<unused argument>
    #<unused argument>

And the new backtrace like:

1: FOO
   At /tmp/testlisp:5
   Local variables:
     X = 4
     Y = 7
   At /tmp/testlisp:8
   Local variables:
     N = 3
     Y = 1
   At /scratch/src/sbcl/src/code/evallisp:93 (file modified)
   Local variables:
     ARG-0 = (BAR 3 1)
     ARG-1 = #<NULL-LEXENV>

An improvement? That's probably in the eye of the beholder, and depends on the codebase and the use cases. For example I can imagine that for large functions showing the values of local variables in the trace would make it way too spammy. But that's besides the point: if the default stacktrace format is making debugging difficult for you, it's not hard to customize it.

Permanent Link

Faster SBCL hash-tables

Posted on 2007-10-01 in Lisp

Long time, no blog. I have an excuse though, since I moved to Switzerland for a new job a month ago, and haven't had a lot of time for things like blogging or hacking Lisp (the latter is usually a prerequisite for the former for me).

Anyway, I finally finished and committed the third rewrite of my patch for speeding up the embarrassingly slow hash-tables in SBCL. It turned out to be a really frustrating game of whack-a-mole, with every change uncovering either some new deficiency or another interaction between the GC and the hash-tables that the old implementation had handled by always inhibiting GC during a hash-table operation.

The main user-visible change is that SBCL no longer does its own locking for hash-tables (the fact that it locked the tables was always just an implementation detail, not a part of the public interface). This follows the usual SBCL policy of requiring applications to do take care of locking when sharing data structures between threads.

The exact details are pretty boring, so I won't repeat them here (read the commit message if you really want to know). Instead I'm just going to post a pretty benchmark graph, since it's been way too long since I last did one of these:

Sadly those improvements don't mean that SBCL now has the fastest hash-tables in the West, it just means they don't completely suck. For some reason the issue of SBCL hash-table speed has come up more often during the last couple of months than during the previous three years combined, so it was probably time to get this sorted out.

Permanent Link

ICFP 2007

Posted on 2007-07-25 in Lisp

For the last five years or so it's always been my firm intent to take part in the programming contest associated with the International Conference on Functional Programming (ICFP). And each year something has prevented it. But this year there was no emergency at work, no computer hardware broke, no sisters were getting married, etc. So instead of playing poker on the net, which had been consuming all of my free time for the last couple of weeks, I read the 22 page spec and fired up emacs. (Just kidding, emacs was already running).

The surface task was to write an interpreter for a weird string-rewriting language. The organizers supplied a large blob of data, which when run through the interpreter would produce as output some image drawing operations (for which you basically had to write some kind of a visualizer if you wanted to achieve anything). The goal was to come up with some prefix to the program which would make it instead produce output that would be as close as possible to a certain target image.

The intended way to achieve that goal was to notice that the drawing operations generated by the blob would first write a clue message, which would then be hidden in the final image by other image operations. This seems like a really bad decision. I luckily noticed the message since my first version of the image conversion tool didn't support the flood fill operation. But apparently a lot of teams never saw the message, and were left to stumble in the dark for the whole weekend. The image that could be drawn by using the clue would then lead to another obscure puzzle. Again, I was lucky to figure out the solution after a while, but judging by IRC and mailing list traffic a huge amount of teams never got it, and were basically stuck.

That clue could then finally be used to produce some concrete details on how the big blob of data was using the string-rewriting language to produce the image. There was even a catalog of the functions that the blob contained. But the really useful data seemed to be hidden behind yet more puzzles. So at this point I just did a minimal hack to make a token improvement to the produced image: the source image had a grove of apple trees, the target had pear trees. And according to the catalog the function apple_tree was exactly as large as pear_tree. So I wrote a prefix that overwrote the former with the latter. And then I submitted that prefix, and switched to doing something more interesting. (I think that token improvement was still in the top 20 something like 8 hours before the contest ended, which probably says something about how much progress people were making).

I did rather enjoy writing the interpreter and the visualization tool, and the specifications for both were mostly very good. Unfortunately the spec contained only a couple of trivial test cases with the expected results, so if your interpreter had a problem, figuring out what exactly was going wrong just from looking at execution traces was really hard. The organizers originally replied on the mailing list that such debugging "is exactly part of the task", but later released an example trace from a few iterations at the start. There was a documented prefix that would run some tests on the implementation, and generate an image from those results, but the coverage of those tests didn't seem to be very good. (I had several bugs that only showed up with the full image, not with the test one).

The part of the interpreter that many teams seemed to have big trouble with was that you couldn't really use a basic string or array to represent the program. If you did, performance would be orders of magnitude too slow (people were reporting getting 1 iteration / second, when drawing the basic image would require 2 million iterations) due to excessive copying of strings. Now, this was even pointed out in the specification! Paraphrasing: "these two operations needs to run faster than in linear time". And still people tried to use strings, bawled when their stupid implementation wasn't fast enough, and decided that the only solution would be to rewrite their program in C++ instead of their favourite Haskell/Ocaml/CL. Good grief...

For what it's worth, I used just about the stupidest imaginable implementation strategy beyond just a naive string: represent the program as a linked list of variable length chunks, which will share backing storage when possible. My first CL implementation of this ran at about 5.5k iterations / second. This was good enough at the stage in the competition that I got to, and would've been easy to optimize further if I'd decided to continue (afterwards I made a 15 line change that gave a 8x speedup, so the basic image now only takes 41 seconds to render on an Athlon x2 3800+). And this was with a stupid data structure and couple of minor performance hacks. It seems obvious that practically any language could have been used to write a sufficiently fast interpreter. It never ceases to amaze me how programmers would rather blame their tools than think about the problem for a couple of minutes.

Anyway, the organizers obviously put in a huge effort for this contest, so thanks to them for that. It's just that the format really wasn't what I was looking for in a programming contest. But at least it was interesting enough to temporarily shake me out of playing poker into doing some hacking again :-) (Faster SBCL hash tables coming soon, I hope).

I've made the source code for the interpreter available since several people have asked for it. I'm not sure why they've asked for it, since it's not very good code, and probably contains no worthy insights. But if you want it, it's there.

Addenda: After writing the above, I read a few messages on the mailing list which claimed that there really wasn't much of a puzzle aspect, but that success was mainly determined by how good tools (compilers, debuggers, disassemblers, etc) you were able to write. While it's possible that after the initial two humps that I described above the puzzles were irrelevant, that wasn't my impression. At the point where I stopped, it didn't feel to me as if sufficient knowledge was available for writing the tools, but rather was hidden behind encrypted pages, steganography, etc. None of which I really wanted to deal with.

There was definitely enough information available to make a start at reverse-engineering, but I don't think there was enough time to reverse-engineer enough of the system to figure out how to write the tools, write them, and then use the tools to actually solve the real problem. I'm sure things were different for larger teams, but that doesn't really comfort me as a one person team :-) My impression is that in the earlier ICFP contests the tasks were such that it was possible for a single programmer to achieve a decent result, even if it's unlikely that it's good enough to win. In this case you don't get any points for the reverse-engineering or for the tools, but just for the end result.

(Having written the above, I'm now sure that the eventual winner will turn out to be a single programmer who only started working on the task 8 hours before the deadline).

Permanent Link

Code coverage tool for SBCL

Posted on 2007-05-03 in Lisp

SBCL includes an experimental code coverage tool (sb-cover) as a new contrib module. Basically you just need to compile your code with a special optimize proclamation, load it, run some tests, and then run a reporting utility. The reporting utility will produce some html files. One will contain an aggregate coverage report of your whole system, the others will show your source code transformed into angry fruit salad:

For a more substantial example, here's the coverage output for the cl-ppcre test suite.

There are still some places where the coverage output won't be what most people would intuitively expect. Some, like the handling of inlined functions, would be simple to solve. It's just not yet clear to me what the right solution would be. For example in the case of inlined functions the right solution might be suppressing inlining when compiling with coverage instrumentation, or it might be to say "don't do that, then" to the users. Others are fundamentally unsolvable, due to the impossibility of reliably mapping the forms that the compiler sees back to the exact character position in the source file. Hopefully this'll still turn out to be useful in its current state.

If you have any suggestions for improvements, I'd love to hear them.

Permanent Link

ILC 2007 Summary

Posted on 2007-04-11 in Lisp

I wrote several almost finished blog posts during ILC, but didn't get around to posting them "live" due to the issues with wireless access and a generic lack of time, due to being off having a jolly good time. Then I did some more traveling after the ILC, and didn't manage to get them posted right afterward either. And now that I'm finally back home, most of what I wrote then no longer seems worth posting, since it's lost the immediacy.

So here's a few things that come to mind now.

The good

The organization was stellar in almost all respects. A huge thanks to Nick Levine and anyone else who was involved. Cambridge was just incredibly pretty, and the weather ranged from great to "not bad". There were some very good talks, though disappointingly most of the best ones were from Schemers :-) The last day of talks was particularily good. I had incredible fun meeting old friends, most of whom I hadn't seen for a year, putting faces to names I knew from the net, and talking to completely new people. Special honorable mentions in the latter category go to Jeremy Jones and Richard Brooksby, with whom I had several very interesting and fruitful discussions.

I also got lots of very valuable SBCL feedback and new ideas, for all kinds of things from the GC to the user interface for my code coverage tool for SBCL (work in progress). It looks as if we need to beef up the SBCL marketing department, though. I had several discussions of the form "Q: What would it take to make SBCL do FOO? A: It's already done that for the latest X releases.". In the worst case with the same person asking for three different features in succession, all of which had been implemented :-) For example no-one seems to be aware that SBCL/Slime have stepper support. Not horribly good stepper support, but support nonetheless. Also got to talk shop with SBCL developers and Clozure/ITA people, which is always good. And maybe even managed to offload some ideas that I'd proof-of-concepted, but have no intention of ever properly implementing myself.

Got a surprisingly large number of congratulations on graduating. And the guys had even got me a present (a copy of the Lisp 1.5 manual that Nikodemus had found from a bookshop in Cambridge, MA). Thanks! Conveniently the title of the programming contest for the next ILC was pre-announced as "Lisp 1.5", so the manual might even be useful, not just cool :-)

I think the Ravenbrook guys are going to try integrating MPS with SBCL, since CMUCL didn't work out for them. While it's unlikely to replace the current SBCL GC for licensing reasons (it's currently under a GPLish license), it would be very interesting for two reasons: as a benchmark for the current GC and as a first step towards pluggable GCs. The first one would be good since we know that the SBCL memory management is suboptimal in many ways. It'd be valuable to find out what the real cost of fixing many of those suboptimalities is. As for pluggable GCs, Frode wrote a nice message to sbcl-devel about that. If MPS is better for someone's use case than SBCL's gencgc and they can live with the license, it'd certainly be nice for them to be able to just switch GCs. And of course at some point implement other alternative GCs.

Compared to the ECLMs, surprisingly many people that I talked to weren't yet using Lisp seriously, but were just interested about it. Some might think that this is bad, but I think it's really great that there are people still in that stage who are interested enough to travel to and attend a multi-day Lisp conference. And of course there were a lot more serious Lisp users than newbies.

Overall my ILC experience was very positive. I'll talk next about some bad stuff, but that's just because I believe that you can't just sweep that stuff under the rug.

The bad

I think that program-wise there was maybe a day of talks that could've been discarded with little loss. Or if not a whole day, than at least enough to make the rest of the schedule less tight. For example the History of Lisp presentation was total crap (not just somewhat bad, but "I'd rather listen to an hour of silence"-bad), and the information theory one had no business being presented in a Lisp conference. Given what little I heard of the review process in other cases, I don't understand how the latter ever got accepted.

I understand that people don't really go to a conference for the talks, but that doesn't mean that anything goes. My plea to the next ILC program committee is threefold:

  • Please invite only speakers with something to say that's relevant to Lisp now or in the future, not in the last millennium.

  • More specifically, I'm sure there's a temptation to "honor" the 50th birthday of Lisp by historical navel-gazing. Please don't give in to it.

  • If you don't get enough good submissions, don't accept the irrelevant ones as padding.

My attempts at industrial espionage were mostly a failure. Both Duane and Jans ran out of time before getting around to stuff that would've been both worth stealing. For example Duane didn't have time to demo their profiler, which I'd heard described as the gold standard of Lisp profilers, and of course I can't really try it out myself due to the license. I was surprised that the Allegro equivalent to SBCL's optimization notes didn't have any kind of UI for mapping the notes back to the original source, making it look mostly useless. Or at least Duane, who is probably an expert at reading them, did get confused by the results a couple of times despite it being a scripted demo :-)

[ Which isn't to say that Franz's presentations were bad. I just didn't get much out of them SBCL-wise. ]

The controversial

Some stuff has received a lot of airtime after the conference.

Before the conference I expressed some puzzlement about there being an invited talk about CL-HTTP, which I regarded as a choice that was completely out of touch with the current state of the Lisp world. Seeing the talk didn't change my opinion (oh, wow, still using the White House information system from the Clinton administration as the example?). E.g. when Mallery asked about who had ever used CL-HTTP, and practically no hands went up, unlike with every other similar question that was asked during the conference. But amazingly enough, in the last day two presentaters appeared to be seriously using CL-HTTP. (IIRC they were the RacerPro and XMLisp ones).

Most of the Allegro features that Duane and Jans had time to show were things that SBCL already does in some form. It's just that they're exporting their internals, and in some cases the interfaces don't seem very polished. I guess READ-LINE-INTO (?) wouldn't be a bad addition, but e.g. MEMCPY-UP and MEMCPY-DOWN were just completely wrong.

So I wasn't horribly impressed with what they talked about. But unlike Luke, who was stirring up the debate both at ILC and after, I think that it is a very worthwhile goal to give Lisp users access to low level facilities, and that we really should be suppling non-consing / resource-reusing versions of functions where possible. STRING-TO-OCTETS and OCTETS-TO-STRING are an obvious example where SBCL could be improved.

Yes, it'd be really great to just cons indiscriminately, but no matter what the GC scheme is, there will be programs where consing will be deadly. And yes, it'll mean that code written for performance might be a bit ugly, but it's still better than dropping to C from Python for performance, etc. Of course SBCL users can use many of those low level facilities right now, but most of them are undocumented and unexported, which sets the bar for using them pretty high.

The end

Anyway, it was lots of fun! I hope to see all of you again next year.

Permanent Link

ILC 2007 MPS Tutorial

Posted on 2007-04-01 in Lisp

Oh, man. My excitement about the CMUCL/MPS integration seems to have been premature :-)

Paraphrases from the early part of the MPS tutorial:

"We didn't actually get too far with the actual implementation of MPS and CMUCL, since we were unable to boostrap CMUCL if we made any (even tiny modifications)." (But apparently they have all the design issues solved).

"Unfortunately Dave Jones who's been doing the work on this is ill and thus not at the conference."

"Used CMUCL rather than SBCL since Carl Shapiro had earlier expressed interest in integrating MPS and CMUCL. No particular reason besides that." (In answer to my question about why they didn't try SBCL if bootstrapping was a problem).

Permanent Link