Posts Tagged ‘HPET’

Recently, I decided to take MIT OCW Algorithms course. I wanted to actually measure the performance of various algorithms. So before I dived in to it, I decided to come up with a setup for measuring time taken. For this, we need high precision time measurement. I have used the Read Time Stamp Counter (RDTSC) instruction introduced in Pentium processors before. I have heard about High Precision Event Timers (HPET) introduced by Intel circa 2005. In this post we have a shootout between the two mechanisms.

The metrics we want to compare are

  • Resolution
  • Accuracy
  • Cost (in terms of CPU time)
  • Reliability

Before we get in to the actual testing, let us understand how to use HPET and RDTSC. Here is how we use HPET which is a POSIX standard.

#include <time.h>
  struct timespec ts;
  clock_gettime(CLOCK_MONOTONIC, &ts);

And here is how we use the RDTSC instruction. With RDTSC, we actually read the number of CPU clock cycles from a counter (Time Stamp Counter). This keeps incrementing for each CPU clock. This does not directly translate to actual time. This needs to be done by calibrating the number of CPU cycles per nanosecond and dividing the clock ticks by this calibrated value for actual nanoseconds. Since it is not guaranteed that this TSC value will be synchronized across CPU, we bind our process to CPU1 (I have a dual core Inter T7500 CPU) to eliminate TSC mismatch between the two CPU cores.

#include <stdint.h> /* for uint64_t */
#include <time.h>  /* for struct timespec */

/* assembly code to read the TSC */
static inline uint64_t RDTSC()
  unsigned int hi, lo;
  __asm__ volatile("rdtsc" : "=a" (lo), "=d" (hi));
  return ((uint64_t)hi << 32) | lo;

const int NANO_SECONDS_IN_SEC = 1000000000;
/* returns a static buffer of struct timespec with the time difference of ts1 and ts2
   ts1 is assumed to be greater than ts2 */
struct timespec *TimeSpecDiff(struct timespec *ts1, struct timespec *ts2)
  static struct timespec ts;
  ts.tv_sec = ts1->tv_sec - ts2->tv_sec;
  ts.tv_nsec = ts1->tv_nsec - ts2->tv_nsec;
  if (ts.tv_nsec < 0) {
    ts.tv_nsec += NANO_SECONDS_IN_SEC;
  return &ts;

double g_TicksPerNanoSec;
static void CalibrateTicks()
  struct timespec begints, endts;
  uint64_t begin = 0, end = 0;
  clock_gettime(CLOCK_MONOTONIC, &begints);
  begin = RDTSC();
  uint64_t i;
  for (i = 0; i < 1000000; i++); /* must be CPU intensive */
  end = RDTSC();
  clock_gettime(CLOCK_MONOTONIC, &endts);
  struct timespec *tmpts = TimeSpecDiff(&endts, &begints);
  uint64_t nsecElapsed = tmpts->tv_sec * 1000000000LL + tmpts->tv_nsec;
  g_TicksPerNanoSec = (double)(end - begin)/(double)nsecElapsed;

/* Call once before using RDTSC, has side effect of binding process to CPU1 */
void InitRdtsc()
  unsigned long cpuMask;
  cpuMask = 2; // bind to cpu 1
  sched_setaffinity(0, sizeof(cpuMask), &cpuMask);

void GetTimeSpec(struct timespec *ts, uint64_t nsecs)
  ts->tv_sec = nsecs / NANO_SECONDS_IN_SEC;
  ts->tv_nsec = nsecs % NANO_SECONDS_IN_SEC;

/* ts will be filled with time converted from TSC reading */
void GetRdtscTime(struct timespec *ts)
  GetTimeSpec(ts, RDTSC() / g_TicksPerNanoSec);

Now back to our metrics. This is how each mechanism fares.


HPET API clock_gettime, gives the result in struct timespec. The maximum granularity of timespec is nanoseconds. This is what struct timespec can represent, actual resolution varies depending upon implementation. We can get the resolution through the API clock_getres(). On my Dell XPS 1530 with Intel core2duo T7500 CPU running Ubuntu 10.04, it has a resolution of 1 nanosecond. On the other hand, RDTSC instruction can have resolution of upto a CPU clock time. On my 2.2 GHz CPU that means resolution is 0.45 nanoseconds. Clearly RDTSC is the winner.


From my tests, both seemed to give consistently the same results agreeing with each other correct to 5 nanoseconds. Since I have no other reference, I assume both are equally accurate. So no winner.


I ran a simple test case where I measured the time taken for 1 million calls to both HPET and RDTSC. And here is the result.

HPET : 1 sec 482 msec 188 usec 38 nsec
RDTSC: 0 sec 103 msec 311 usec 752 nsec

RDTSC is the clear winner in this case by being 14 times cheaper than HPET.


Well a quick look at the Wikipedia entry for RDTSC will give us an idea of how unreliable it is. So many factors affect it like

  • Multiple cores having different TSC values (we eliminated this by binding our process to 1 core)
  • CPU frequency scaling for power saving (we eliminated this by always being CPU intensive)
  • Hibernation of system will reset TSC value (we didn’t let our system hibernate)
  • Impact on portability due to varying implementation of CPUs (we ran only on the same Intel CPU)

So for application programming, RDTSC seems to be quite unreliable. HPET is a POSIX standard and is the clear winner.


Final score is RDTSC 2 and HPET 1. But there is more to this. RDTSC definitely has reliability and portability issues and may not be very useful for regular application programming. I was affected by CPU frequency scaling during my tests. In CalibrateTicks(), initially I used a sleep(1) to sleep for 1 second to calibrate the number of ticks in a nanosecond. I got values ranging from 0.23 to 0.55 instead of 2.2 (or very close to it since my CPU is 2.2 GHz). Once I switched the sleep(1) to wasting CPU in a for loop, it gave me consistent readings of 2.198 ticks per nanosecond.

But RDTSC is 14 times cheaper than HPET. This can be useful for certain benchmarking exercises as long as one is aware of its pitfalls and is cautious.