Useful Numbers, Part 2
Useful numbers have properties that make them useful. We're
all familiar with π and e because
their values represent things in the real world and are used daily in many
professions. Last month we looked at the usefulness of cryptographic hashes and
their practical claims. This month, we've another useful number algorithm that
makes an equally audacious claim.
Many software systems make use of unique identifiers (UIDs)
to provide a small handle for referencing a larger object. Such references
provide both efficiency of space and time compared to making copies of the
large object. Maintaining a single instance of an object also eliminates the
associated update issues. Relational databases use UIDs to join tables, modern
file systems use them to identify file system objects and object databases use
UIDs similarly. While users are typically isolated from UIDs, they are
essential to internal efficiency and correctness of many modern software
systems, including SnapshotCM.
Identifiers in the systems described above are usually
unique only within their domain. For example, a file system assigns UIDs to
each file system object, but another file system might use those same ids. Same
with databases. However, in the distributed world we live in, the need for a
universally, or globally, unique identifier (UUID or GUID) becomes apparent.
UUIDs enable distributed systems to uniquely identify information without
central coordination (Wikipedia). A UUID standard and several GUID implementations
exist, all claiming to generate a unique identifier every time they run, no
matter who runs it, or when or on which machine. The basic idea is that if
every person on the planet ran the generator millions of time, every generated
UUID would be unique. Unfortunately, depending on the algorithm used, the claim
may be more practical than theoretical, though some of the algorithms can both
practically and theoretically guarantee the claim. Which points out the
importance of using a GUID generator appropriate to the task. Once we've chosen
the appropriate GUID generator, we can expect certain practical benefits.
One benefit is that keys from multiple databases can be
unique without having to coordinate them, which eliminates the need to map UIDs
from one domain into another and the complications that entails. Microsoft's
Installer technology (MSI) uses GUIDs to track each item of each product
through the full install, update and removal cycle on a Windows system. Using
GUIDs, product developers can create MSI packages without fear of collision
with other products, and without centralized control of ids. Cloud
infrastructure uses UUIDs to track user data, machines, interfaces and other
requirements. COM interfaces have GUIDs, and so on. In short, GUIDs provide a
widely used and useful handle for anything that needs to be tracked and which
can change.
It's reasonable to ask when one would use a cryptographic
hash, as discussed last month, and when to use a UUID. The answer depends on
how something is used. If the id represents data which, if it changes should
have a different id, then a cryptographic hash is your choice. If the id
represents something (data or interface, etc.) which will evolve over time, yet
maintain its identify, then a GUID is the choice.
SnapshotCM uses GUIDs in its MSI package, as well as to
uniquely identify repositories.
For further reading on UUIDs, I recommend the
Wikipedia page on
UUIDs, which also contains further references for anyone who
wants to dive deeper.
In addition to cryptographic hashes and UUIDs, SnapshotCM
uses another useful, though less commonly known number algorithm I plan to
discuss next newsletter.
|