Ordered Tuple

How to guarantee ordered uniqueness in horizontally-scalable applications in 41 lines of code (not counting comments :wink:)

Problem Statement
An operation occurs across multiple threads - (and possibly across multiple execution contexts) - generating some type of important data which requires FIFO-based post-processing.

I first encountered this need working on a project for Intel waaaayyyyy back in 1992.

Some of you might remember those bad old days, just about the time Token Ring was ceding control to Ethernet, and cross-site networking was becoming a reality.

The rest of you likely won't remember that time, probably 'cuz you were still in diapers!!! :joy:

We were creating a centralized work/task tracking system with a tie-in to the mainframe accounting system. Traceability was critical and since the data creation act occurred across multiple systems without real-time connectivity, we had to roll our own universal identifier mechanism.

In my most recent project, I needed this ... at the time I just know I needed this ... so I put in the effort to create a small bundle of cool functionality. Then I changed my design. Go figure! :wink:

The tricky part is the uniqueness ...
Uniqueness is frequently deferred to a "trusted external provider". My use of trust has nothing to do with trust, but instead is used to impart a guarantee of contractual satisfaction, where the contract here is for uniqueness.

For example, a GUID (UUID) can be generated by nearly every computer imaginable and represents a number that is so large that the probability of the same number being generated randomly twice is negligible. 1

Another example is the Microsoft SQL Server Identity property; by virtue of having a context bound to a single table, uniqueness within that limited scope is absolutely guaranteed by the underlying engine.

Sure ... fine ... whatever ... there ya go ... move along ... nothing to see here ...

The problem with a GUID is that while it may satisfy uniqueness, it is abysmal at satisfying ordered-ness.

An Identity fulfills both requirements ... but at a cost!!! [HINT: Operationally SSSSSS .... pensive!!! :smile:]

Take a step back ... let's replay the requirements:

  • Ordered
  • Unique

What is ordered-ness? At the risk of sounding d*ck'ish and pedantic ... it is something which is done according to specific principles or procedures. 2

The alphabet is "ordered" by virtue of the convention of everyone agreeing that A is followed by B which is followed by C, etc.

The same with integers.

And the same with time.


Ignoring Antiphon, McTaggart, and Barbour, pretty much all the rest of us agree that one particular moment in time - t1 - can be simplistically compared to another - t2 - solely on the basis that one is earlier than the other, i.e., t1 < t2. And presuming that there exists some mechanism to iterate a set of tn values using some specific principle which can consistently evaluate this condition ... it sounds like a decent candidate for ordered-ness!

But what about uniqueness?

The red-headed step-child of uniqueness is precision, that is to say, precision is an important characteristic and will turn into a sure-fire conversation starter if ignored.

For example, on a geological scale, this decade and last decade are pretty much equivalent.

We should probably strive for juuuusssstttt a bit more precision than that.

How 'bout a tick?

When dealing with the .NET DateTime structure there's a real handy property called Ticks.

Per MSDN, [a] single tick represents one hundred nanoseconds or one ten-millionth of a second ... The value of this property represents the number of 100-nanosecond intervals that have elapsed since 12:00:00 midnight, January 1, 0001.

Wow! Just a bit more precise than geological scale!!!

Devil ... meet Details. Details ... Devil!!!
The .NET DateTime has a few issues with precision. Take a brief detour and go read Eric Lippert's most excellent treatise on this subject and then come back so we can talk about the Stopwatch class.

Done already?!?! Evelyn Woodhead would be proud! :smirk:

I know what you're thinking: Okay Mister Unconditional, that's really cool tech for measuring elapsed time ... but just what the heck does that have to do with uniqueness and ordered-ness and the price of beans ...

It has to do with a static method on Stopwatch named GetTimestamp():

A long integer representing the tick counter value of the underlying timer mechanism.

IOW ... a point-in-time reference to some value t.

And not just some value ... but a pretty darn precise 3 value. For example, 635573390402345678 is right now, while 635573390449876543 is immediately right after taking a sip of coffee!!! Mmmm ... mmmm ... good. :wink:

You know - about an hour ago - we started talking about uniqueness and ordered-ness. Did we ever finish that conversation?

I got an idea ... let's pretend we did, and move on to something of actual value.

That something is the ability to trace an action to a specific point-in-time, i.e., GetTimestamp(). So far so good, but if you scroll up to the beginning, the requirement was to associate some data with that point-in-time.

Enter the OrderedTuple. (Better late than never ... )

public static Tuple<string, object> OrderedTuple(object aInput)  

Quick primer: A tuple is a data structure that has a specific number and sequence of elements. 4

The Item1 value of a two-part Tuple result represents the current Stopwatch.GetTimestamp() value, expressed as a string. The Item2 value is the input value, i.e., aInput.

At the heart of this functionality lies the following helper method:

public static long UniqueTimestamp()  
        long orig, newval;
                orig = _LastTimestamp;
                long now = Stopwatch.GetTimestamp();
                newval = Math.Max(now, orig + 1);
        } while (Interlocked.CompareExchange(ref _LastTimestamp, newval, orig) != orig);

        return newval;

This is a real belts-&-suspenders approach to avoiding duplication; in the rare event that the current value matches the last value, go ahead and keep adding 100 nanoseconds until you're absolutely sure you've got a unique value. Whew! :sweat:

But wait ... there's more! Call now ... operators are standing by!!!
Using generics, a couple more static methods are available:

public static Tuple<string, T> OrderedTuple<T>(this T aInput)

public static IEnumerable<Tuple<string, T>> OrderedTupleRange<T>(this IEnumerable<T> aInput)  

This makes possible all the following invocations:

var a = OrderedTupleHelpers.OrderedTuple("Howdy");  
var b = OrderedTupleHelpers.OrderedTuple(42);  
var c = OrderedTupleHelpers.OrderedTuple(new DateTime(1980, 1, 1));

var l = new List<int>();  
var d = OrderedTupleHelpers.OrderedTupleRange(l);

var e = l.OrderedTupleRange();  
var f = "Howdy".OrderedTuple();  
var g = 42.OrderedTuple();  

Pay particular attention to the e, f, and g examples; by exposing these helpers as extension methods, the simplest of call patterns is permitted! 5

The more you know ... cue splashy rainbow effect ...
Better than I could do, Eric's article does a great job of relating the concept of precision and accuracy. And while I suppose accuracy is important to some people ... that ain't us.

Or should I say ... that ain't us within the current context, i.e., ordered-ness and uniqueness. IOW, we are co-opting the timestamp as a proxy for satisfying those two requirements and completely ignoring accuracy.

Does that mean accuracy is not present? Not at all! If your post-processing is tolerant of a small degree of inaccuracy and can support the cost of converting from the string representation back into a long ... the OrderedTuple also fulfills an unstated when requirement. Doncha just love free stuff! :wink:

(and possibly across multiple execution contexts)

Remember that statement ... way back in the beginning ... when we were young?

By multiple execution contexts ... I mean running the same code across multiple separate disconnected instances, i.e., horizontal scaling, i.e., the cloud, all while still permitting a certain degree of inter-relationship.

A core assumption is that UniqueTimestamp() function is going to guarantee uniqueness within the current context. But if the same code is running on different servers - (and regardless of the synchronization state of their clocks) - the probability of collision is non-zero.

Tying the bow on the package ...
Hmmm ... I wonder what we could do to guarantee uniqueness across execution boundaries? If only there was some value we could access where the probability of the same number being generated randomly twice is negligible. Wait! I've seen that phrase somewhere before .. :smirk:

Following is the actual prototype for the singular OrderedTuple() method:

public static Tuple<string, T> OrderedTuple<T>(this T aInput, bool aMakeGlobal = false)  

Notice the aMakeGlobal optional parameter?

If instead of making this call ...

var local = "Local".OrderedTuple();  

we make this call ...

var global = "Global".OrderedTuple(true);  

a GUID will be suffixed to the Item1 value:
highlighted in red

Ordered-ness is preserved, and uniqueness now extends across execution contexts.

Easy-peazy, right?!?!? :sunglasses:

Source code and Linqpad examples are below.

Have fun stormin' the castle ...


  1. GUID, from Wikipedia

  2. Ordered, from Dictionary.com

  3. I'm going to blissfully ignore the warning about Stopwatch downshifting to DateTime if the high-resolution performance counter is not available. Ignorance is bliss!

  4. Tuple, from MSDN

  5. All while blissfully ignoring the boxing implications, 'cuz I'm a blissfully ignorant kinda guy!!! :wink:

Source Code & Examples

  • OrderedTupleHelpers Standalone snippet of static class

  • Example 1 LinqPad query of simple examples from above

  • Example 2 LinqPad Multi-threaded example query
    This example creates 4 threads and launches them simultaneously to create 400,000 samples with an A/B/C/D ThreadID as the tracked value. All 400K samples are then saved to a csv text file for uniqueness and distribution analysis. [BTW ... on a not-particularly fast duocore laptop (with hyperthreading!), it took 165ms to create the 400K samples. :grinning:]

Comments powered by Disqus