Hourglass requires two parameters: the minimum gap in thread execution
that is considered to be worth logging, and the overhead of executing
the loop that polls the timestamp counter.

On most modern processors, no calibration is necessary -- the default
values are fine.  Run Hourglass to see what they are.  On slower
processors some calibration needs to happen; here is how to do it.

Run a command line like this:

  ./src/hourglass -n 5 -d 2s -gh -x -g 200us

This has the effect of creating 5 threads, running for 2 seconds,
creating a "gap histogram", suppressing reporting of execution
intervals, and setting the gap threshold to a ridiculously large 200
microseconds.  The gap histogram is a histogram of gaps in thread
execution that were smaller than Hourglass's gap threshold.  If
Hourglass runs out of trace records then you may have to increase the
gap threshold even further.

Now look at the histogram.  It should give you a good idea of the
kinds of gaps threads running on your system are likely to see.  A
very large proportion of samples should be in some of the small
buckets.  These values reflect the expected case where a single thread
executes without interruption, completing the polling loop in a small
amount of time.  It is very important that Hourglass not record a gap
in the expected case, so some percentile values are interspersed with
the histogram, telling you for example that 99.999% of the samples
showed a gap smaller than some value.  Use this to pick a gap.

This calibration is just a bit more of an art than a science: the
proper gap threshold depends on what you are trying to measure.  If
you set a fairly small gap, the Hourglass will see and record the
execution of interrupt handlers.  You might or might not be interested
in this, but to make it happen you should set the gap to be
comfortably larger than the time to execute the polling loop and
comfortable smaller than the duration of the fastest interrupt handler
on your machine.  On the other hand, if you are not interested in
interrupts and want to see only thread context switches, then the gap
should be set to be comfortably longer than the duration of the
longest interrupt handler but comfortably shorter than the shortest
time-slice that will ever be given to a process.  This can be a hard
balance to strike since the shortest time-slice ever given to an app
would be twice the context switch time for your OS plus just enough
time for your app to see that it should re-block.

