Posts Tagged ‘Linux’

I do most of my coding at work on a Linux server. My laptop at work runs Windows 7 and I can’t do much about it. So enter PuTTY – the ubiquitous terminal emulator for Windows. But I miss the eye candy of gnome terminal. PuTTY intentionally doesn’t support transparency or background images. So enter KiTTY – the slightly unstable and bloated PuTTY derivative that supports transparency, background images, automatic passwords, executing commands on remote machine among other things.

Get your copy of KiTTY here. Now let’s enable transparency and other eye candy. By default KiTTY doesn’t display options for setting background images. We can enable it by setting it kitty.ini file. That file is located on my Windows 7 PC at


Set this value to yes for KiTTY to enable setting background images.


The option will be available under the category “Window” as shown. Select your favorite image as the background and set a comfortable transparency level. Different profiles can have different images and other settings.

Screenshot of setting background image and transparency in KiTTY

If you want to dynamically control transparency of KiTTY, set this variable in kitty.ini


If you are setting the background image and transparency, this option is not needed. This option is needed only if you want to change transparency dynamically. Ctrl + Up/Down increases/decreases transparency of the current frame.

Something that bothered me now was that KiTTY was capturing some of my key chords (Ctrl + arrow keys) instead of passing it on to the application on the remote server (screen in my case). Disabling shortcuts in kitty.ini solves this problem (you do lose all KiTTY shortcuts, which I don’t miss).


You can set post login commands which will be executed on remote machine after login. This is useful for stuff that cannot be done in .bashrc like becoming root. A command like this


would do the trick. The command is to be entered in the “Data” section of the config as shown. For details of what “\p” and other escape sequence stands for, check here.

Screenshot of KiTTY setting post login auto-commands

Even though KiTTY can save your login and password, I prefer using and a putty private key file. Use PuTTY KeyGen to generate a private key/public key pair. Put the public key in ~/.ssh/authorized_keys file on a separate line on the remote machine. Give the path of the private key to KiTTY under “SSH->Auth” as shown. More details can be found here. It is worth the effort of creating a private key/public key pair for two reasons – 1) It is more secure 2) Even if the server takes time to respond, authentication will happen with the ppk file. KiTTY waits a certain amount of time and just pushes the password. If the server is taking time to respond, the password sent is lost and will have to be manually entered. Having a private key (ppk file) doesn’t have this problem, since it’s triggered by the server requests for authentication and not a predetermined timer.

Screenshot of KiTTY setting private key for authentication

Also if you plan to view X applications over the session, enable X forwarding (reverse ssh tunneling) under “SSH->X11” as shown.Screenshot of KiTTY enabling X forwarding (reverse SSH tunneling)

Recently, I decided to take MIT OCW Algorithms course. I wanted to actually measure the performance of various algorithms. So before I dived in to it, I decided to come up with a setup for measuring time taken. For this, we need high precision time measurement. I have used the Read Time Stamp Counter (RDTSC) instruction introduced in Pentium processors before. I have heard about High Precision Event Timers (HPET) introduced by Intel circa 2005. In this post we have a shootout between the two mechanisms.

The metrics we want to compare are

  • Resolution
  • Accuracy
  • Cost (in terms of CPU time)
  • Reliability

Before we get in to the actual testing, let us understand how to use HPET and RDTSC. Here is how we use HPET which is a POSIX standard.

#include <time.h>
  struct timespec ts;
  clock_gettime(CLOCK_MONOTONIC, &ts);

And here is how we use the RDTSC instruction. With RDTSC, we actually read the number of CPU clock cycles from a counter (Time Stamp Counter). This keeps incrementing for each CPU clock. This does not directly translate to actual time. This needs to be done by calibrating the number of CPU cycles per nanosecond and dividing the clock ticks by this calibrated value for actual nanoseconds. Since it is not guaranteed that this TSC value will be synchronized across CPU, we bind our process to CPU1 (I have a dual core Inter T7500 CPU) to eliminate TSC mismatch between the two CPU cores.

#include <stdint.h> /* for uint64_t */
#include <time.h>  /* for struct timespec */

/* assembly code to read the TSC */
static inline uint64_t RDTSC()
  unsigned int hi, lo;
  __asm__ volatile("rdtsc" : "=a" (lo), "=d" (hi));
  return ((uint64_t)hi << 32) | lo;

const int NANO_SECONDS_IN_SEC = 1000000000;
/* returns a static buffer of struct timespec with the time difference of ts1 and ts2
   ts1 is assumed to be greater than ts2 */
struct timespec *TimeSpecDiff(struct timespec *ts1, struct timespec *ts2)
  static struct timespec ts;
  ts.tv_sec = ts1->tv_sec - ts2->tv_sec;
  ts.tv_nsec = ts1->tv_nsec - ts2->tv_nsec;
  if (ts.tv_nsec < 0) {
    ts.tv_nsec += NANO_SECONDS_IN_SEC;
  return &ts;

double g_TicksPerNanoSec;
static void CalibrateTicks()
  struct timespec begints, endts;
  uint64_t begin = 0, end = 0;
  clock_gettime(CLOCK_MONOTONIC, &begints);
  begin = RDTSC();
  uint64_t i;
  for (i = 0; i < 1000000; i++); /* must be CPU intensive */
  end = RDTSC();
  clock_gettime(CLOCK_MONOTONIC, &endts);
  struct timespec *tmpts = TimeSpecDiff(&endts, &begints);
  uint64_t nsecElapsed = tmpts->tv_sec * 1000000000LL + tmpts->tv_nsec;
  g_TicksPerNanoSec = (double)(end - begin)/(double)nsecElapsed;

/* Call once before using RDTSC, has side effect of binding process to CPU1 */
void InitRdtsc()
  unsigned long cpuMask;
  cpuMask = 2; // bind to cpu 1
  sched_setaffinity(0, sizeof(cpuMask), &cpuMask);

void GetTimeSpec(struct timespec *ts, uint64_t nsecs)
  ts->tv_sec = nsecs / NANO_SECONDS_IN_SEC;
  ts->tv_nsec = nsecs % NANO_SECONDS_IN_SEC;

/* ts will be filled with time converted from TSC reading */
void GetRdtscTime(struct timespec *ts)
  GetTimeSpec(ts, RDTSC() / g_TicksPerNanoSec);

Now back to our metrics. This is how each mechanism fares.


HPET API clock_gettime, gives the result in struct timespec. The maximum granularity of timespec is nanoseconds. This is what struct timespec can represent, actual resolution varies depending upon implementation. We can get the resolution through the API clock_getres(). On my Dell XPS 1530 with Intel core2duo T7500 CPU running Ubuntu 10.04, it has a resolution of 1 nanosecond. On the other hand, RDTSC instruction can have resolution of upto a CPU clock time. On my 2.2 GHz CPU that means resolution is 0.45 nanoseconds. Clearly RDTSC is the winner.


From my tests, both seemed to give consistently the same results agreeing with each other correct to 5 nanoseconds. Since I have no other reference, I assume both are equally accurate. So no winner.


I ran a simple test case where I measured the time taken for 1 million calls to both HPET and RDTSC. And here is the result.

HPET : 1 sec 482 msec 188 usec 38 nsec
RDTSC: 0 sec 103 msec 311 usec 752 nsec

RDTSC is the clear winner in this case by being 14 times cheaper than HPET.


Well a quick look at the Wikipedia entry for RDTSC will give us an idea of how unreliable it is. So many factors affect it like

  • Multiple cores having different TSC values (we eliminated this by binding our process to 1 core)
  • CPU frequency scaling for power saving (we eliminated this by always being CPU intensive)
  • Hibernation of system will reset TSC value (we didn’t let our system hibernate)
  • Impact on portability due to varying implementation of CPUs (we ran only on the same Intel CPU)

So for application programming, RDTSC seems to be quite unreliable. HPET is a POSIX standard and is the clear winner.


Final score is RDTSC 2 and HPET 1. But there is more to this. RDTSC definitely has reliability and portability issues and may not be very useful for regular application programming. I was affected by CPU frequency scaling during my tests. In CalibrateTicks(), initially I used a sleep(1) to sleep for 1 second to calibrate the number of ticks in a nanosecond. I got values ranging from 0.23 to 0.55 instead of 2.2 (or very close to it since my CPU is 2.2 GHz). Once I switched the sleep(1) to wasting CPU in a for loop, it gave me consistent readings of 2.198 ticks per nanosecond.

But RDTSC is 14 times cheaper than HPET. This can be useful for certain benchmarking exercises as long as one is aware of its pitfalls and is cautious.