Product InfoCustomer TestimonialsConcept and Reference DocumentsLicensing Terms and PricingCompany Location, Mission and VisionAnnouncements & NewslettersHome

SnapshotCM Newsletter for March, 2013

Contents:

  • Editor's Blog
    • New release posted.
  • Useful Numbers
  • Recommended Releases
  • Links We Like

Editor's Blog

New Release Posted

We've just posted a minor release that fixes UNC path support that was broken by a previous fix, and also fixes a performance issue with import filtering. See the Change List for a complete list of changes.

This month, we begin a series on useful numbers. We're all familiar with π and e because their values represent useful ratios in the real world and are used daily in many professions. In fact, they are so useful, it's highly likely that you know both their meaning and approximate values.

In computing, useful numbers take a different form, but are quickly forming the bedrock upon which a lot of computing depends. Read the full article below.

As always, we welcome your feedback and ideas.

Scott Kramer
President

Useful Numbers

We define useful numbers as those which have properties that make them useful. We're all familiar with π and e because their values represent things in the real world and are used daily in many professions. In fact, they are so useful, it's highly likely that you know both their meaning and approximate values. Similarly with the constant in the famous equation: E = mc2.

Recently, I've been considering the usefulness of certain calculated values. While not constants, these relatively small values are proving themselves extremely useful. Perhaps the most famous to computer science professionals is the ubiquitous check sum. The check sum's purpose is to succinctly and efficiently represent the state of a set of data such that it is unlikely in practice for the data or check sum to change without the change being detectable. A well designed check sum will detect data corruption with high reliability and low cost (size and computation). Note that the check sum cannot guarantee that it detects all possible errors, yet in practice, it proves useful.

Check sums over the years have ranged from the lowly parity bit, to the sophisticated error correcting codes of modern memory and disk drives that support recovering from many common errors. In fact, without error correcting codes, large memory systems (pretty much any current PC) would be unreliable as the probability of a cosmic ray affecting a memory cell is surprisingly large.

Another kind of check sum algorithm is making a big splash. It claims to create a unique, fixed sized value for any input you or anyone could or would generate in a lifetime. In short, it claims that different inputs always generate different outputs. When I was first introduced to this idea, I was skeptical. It seemed impossible. Indeed it is easy to see that it is impossible to guarantee absolutely. But the claim isn't that the algorithm does the impossible, but only that the unique mapping is true in practice. I can't explain the details, but such algorithms have been created and the best of them are very, very good at that guarantee. So good that the best mathematicians have been unable to create examples disproving the claim.

Algorithms such as this are known as cryptographic hashes. A primary use is to validate that a block of data has not been modified. Any change, whether intentional or accidental, results in a different hash. Many products are signed with a cryptographic hash, and download sites often publish md5 or sha1 hashes so the download integrity can be verified before unpacking. Another use is in hashing passwords. By storing the hash, the password is not stored and thus can't be compromised, yet can be validated simply by comparing the hash of the entered password with the stored hash. A third use is identifying a file or any arbitrary data. The idea is that if every file has a different hash, the hash can represent that file. Therefore, if a duplicate hash is seen, we know the contents must also be the same. In this application, the hash serves as an efficient means for identifying content and comparing for equality.

SnapshotCM makes use of a cryptographic hash to identify whether file content is modified locally. This allows validation more efficient than a background check out and compare. SnapshotCM product images on Windows are also signed, another form of cryptographic hashing that verifies product image integrity.

For further reading on cryptographic hashes, I recommend An Illustrated Guide to Cryptographic Hashes, which also contains further references for anyone who wants to dive deeper.

SnapshotCM uses two other useful number generating algorithms I plan to discuss in coming months.

Recommended Releases

If you haven't upgraded, now is the time. Release 1.92.4 contains the latest fixes and performance enhancements.

  • 1.92.4.33 - Newest build with the latest fixes and performance enhancements.

If you are running any other release, we recommend that you update to the latest recommended version that your license allows.

For a complete list of user-visible changes, see the Change List.

Links We Like

Links we find interesting, fun, or occasionally useful.

We are looking for interesting links to share. Send to sales@truebluesoftware.com.

Please forward this newsletter to interested colleagues, and if you are not a subscriber, keep up-to-date by subscribing to SnapshotCM News today!