SnapshotCM News for September, 2012

SnapshotCM Newsletter for September, 2012

Contents:

Editor's Blog
- Product-line comment
- New release performs 10x faster check out
Recommended Releases
Links We Like

Editor's Blog

Product-Line Comment

"I figured most of it out on my own. It shows how intuitive the feature is to use." - a user commenting on SnapshotCM's product-line extensions.

New SnapshotCM Release is Dramatically Faster

The newest release of SnapshotCM is dramatically faster and more scalable than earlier releases. In short, check out is 10x faster and check in 3x faster, all while using just one eighth the CPU usage per check out. Read on for details and be sure to read the summary section which contains recommendations as you plan your server upgrade.

The Changes

The performance improvements come from two areas: changing the way file revisions are stored, and changing a network connection option.

As discussed in a newsletter article two years ago, the tradeoff between disk space usage and access performance is changing. When disk space was relatively expensive and in short supply, it made sense to trade CPU time for space. But disk space has become incredibly cheap and this tradeoff no longer makes sense. This is especially true for non-text files which are not efficiently stored by the typical delta versioning schemes of the past.

Because of delta compression issues with large files, the SnapshotCM repository server has long stored large files in a gzip'd file per revision format, and this is the only format the proxy server has ever used. Compressing individual revisions results in significant space savings compared to storing files whole. Furthermore, file expansion is off-loaded to the client, further reducing both the server and network load on every checkout. And since decompress requires little CPU or memory, this offloading has no negative client-side effects.

When Faster is Slower

Based on the above analysis and previous testing, we decided to transition to exclusive use of the gzip'd file per revision format. After testing the change on our own data, we were shocked to discover that a check out of 3400 files had actually slowed, going from 270 to 420 seconds! Totally unexpected, this result caused lots of confusion until we understood another problem we had been experiencing for some time.

The key was noticing a 200 ms delay in the check out protocol. We'd assumed this was server processing (we have a 13 year old server), but server measurements showed that the data was written without delay. It simply wasn't arriving at the client until 200 ms later. We quickly confirmed this was occurring with both repository and proxy servers and on every OS where we tested it, including customer systems.

Eventually we discovered the Nagle algorithm and TCP's delayed ACK and learned how they can interact with certain patterns of network writes to introduce a 200 ms delay in network communication. It also gave me an explanation for why check outs got slower with the "faster" storage method, and more importantly, what to do about it.

In short, file writes smaller than an Ethernet packet size (about 1500 bytes) were being delayed 200 ms by TCP/IP in order to aggregate the write with later writes or data ACKs, if possible. Since the gzip'd files were typically smaller than the actual files, more revisions than before fell into the small category and incurred the 200 ms delay on check out.

Once this became clear, we backed up to the previous release, disabled the Nagle algorithm and redid our performance testing. Check out times halved from 270s to 135s. Not bad for a relatively isolated change, but we wanted more. So we repeated the testing with gzip'd revisions and a disabled Nagle algorithm and throughput quadrupled again. Further investigation showed that the server CPU load is about 8x higher processing RCS files during check out than simply reading the gzip'd revision files. The final check out time dropped from 270 to just 24 seconds for over a 10x improvement in throughput!

Storage Space Implications

Because of the dramatic performance improvements, this was clearly a change we wanted to make. However, we were concerned with expanding disk space usage. Certainly, converting a 200 revision delta compressed text file into 200 separate gzip'd revision files would result in an increase in space. While true, we discovered that many stored files have just one revision. And for every one of them, the gzip'd format was smaller than the single revision format. It turns out that for the majority of delta files with just two or three revisions, the resulting gzip'd revisions were also smaller in aggregate than the delta file they replaced.

We also didn't want to just compress rarely accessed files, as that would not improve typical user experienced performance, so we decided to automatically convert revisions to the gzip'd revision format on check out. The first check out pays this conversion cost, while all successive check outs reap the benefit.

What We Implemented

In the final product, we made two key changes:

We disabled the Nagle algorithm on server writes to the clients. This eliminated the 200 ms delay for small file check outs.
We changed the storage system to store all revisions in gzip'd revision format since it improves both check in and check out throughput, while reducing server CPU usage. In detail:
- All new revisions are stored in gzip'd revision format.
- All delta compressed revisions are automatically converted to gzip'd revision format on check out in order to improve performance for later check outs of actively used revisions.
- All existing 1, 2 and 3 revision delta files are auto-converted to the gzip'd revision format. This both saves space and eliminates the first-access file conversion delay. This automatic conversion takes place in a background server thread, and may take several days to complete, depending on the number of files to convert.
- For now, we are keeping delta files with more than 3 unconverted revisions.

Summary

In light of these changes, we expect check out throughput to increase dramatically, server load to decrease, and overall disk space to stay about the same. However disk space usage is data dependent, therefore we recommend that you make sure you have some free space on your revision storage disk (perhaps 20% free) before installing this release. We also recommend that you monitor the free space especially closely for the first few weeks after upgrading your repository server to make sure you don't run out of space.

For a complete list of user-visible changes, see the Change List, or contact us.

Scott Kramer
President

Recommended Releases

The following releases are recommended:

1.91.0.25 - The product-line release, with the latest features and fixes.
1.85.5.12 - The last pre-product-line release.
1.84.2 / 1.84.2.1 - A known stable release.
1.82.06 / 1.82.07 / 1.82.08 - stable version with the old (single mount) workspace model.

If you are running any other release, we recommend that you update to the latest recommended version that your license allows.

For a complete list of user-visible changes, see the Change List.

Links We Like

Links we find interesting, fun, or occasionally useful.

Ambitious, complex and ultimately successful Mars landing:
http://www.bbc.com/future/story/20120719-how-to-land-on-mars/1
The "world's fastest everything"
http://www.flixxy.com/worlds-fastest-everything.htm
Space shuttle launch, for those who couldn't make it in person. High def video and audio.
http://www.flixxy.com/the-best-space-shuttle-launch-video.htm
A day in Venice, timelapse from daybreak to sunset. Rush-hour on the canals looks interesting...
http://www.flixxy.com/venice-in-a-day.htm
Perpetuum Jazzile, performing Africa - amazing and unique sound!
http://www.youtube.com/watch?v=yjbpwlqp5Qw&feature=endscreen

We are looking for interesting links to share. Send to sales@truebluesoftware.com.

Please forward this newsletter to interested colleagues, and if you are not a subscriber, keep up-to-date by subscribing to SnapshotCM News today!