Often I find myself working on multiple source code clones or views simultaneously. And I like to have a separate screen session for each view/clone. In this post, I attempt to achieve two things.

  • Make it easier to correct the DISPLAY variable after reconnecting to an existing screen session.
  • Display current clone name on screen status line.

Screen shot of screen status line

First we decide on a suitable environment variable name to hold the view/clone name. CLONE_NAME is good enough. Now let’s define a bash function that will set the view/clone for us. Give this command to change the view/clone.

setclone ()
{
  if [ "$1" ]; then
    export CLONE_NAME=$1;
    SERVER_CLONE=`hostname`.$CLONE_NAME;
    export DISPLAY_FILE="$HOME/displays/$SERVER_CLONE";
    fi;
    echo clone=$CLONE_NAME;
}

Setclone exports two variables. CLONE_NAME is the name of the view/clone that we are going to work on. DISPLAY_FILE is the name of an ASCII file where we will store the current $DISPLAY variable before starting screen. We will use this later, from within screen. Now we define another function – clview (short for clone view) which actually starts the screen session.

clview ()
{
  if [ "$1" ]; then
    setclone $1;
  fi;
  rm -f $DISPLAY_FILE;
  echo $DISPLAY > $DISPLAY_FILE;
  screen -xR -S $CLONE_NAME;
}

This function writes the current DISPLAY variable to $DISPLAY_FILE (which was set by setclone). If we had started a screen session before and are reconnecting to it now, the DISPLAY variable in the existing screen session may have changed. So we have to set it up again. Instead of detaching from screen, finding the DISPLAY variable, reconnecting to screen and manually updating the DISPLAY variable, the clview function will put the correct value for DISPLAY variable in the file $DISPLAY_FILE. Now we write another function that will correct the DISPLAY variable for the screen sessions.

display ()
{
  echo old DISPLAY=$DISPLAY;
  export DISPLAY=`cat $DISPLAY_FILE`;
  echo new DISPLAY=$DISPLAY;
}

The screen options used are

       -x   Attach to a not detached screen session. (Multi display mode).
       -R   attempts  to resume the first detached screen session it finds.  If successful,
            all other command-line options are ignored.  If no detached session exists,
            starts a new session using the specified options, just as if -R had not been
            specified. The option is set by default if screen is run as a login-shell
            (actually screen uses "-xRR" in that case).
       -S sessionname
            When creating a new session, this option can be used to specify a meaningful
            name for the session. This name identifies  the  session  for  "screen -list"
            and "screen -r" actions. It substitutes the default [tty.host] suffix.

One thing that can be useful is for screen to display the current view/clone name on the status bar. Put the following the the screenrc file for a nice status line. The backtick command can query the shell and display its output in the status line. The query interval is set to 3600 seconds.

backtick 1 3600 3600 /bin/echo $CLONE_NAME
hardstatus alwayslastline "%{ck}%H: %{gk}%1` %?%{wk}%-Lw%?%{Yk}[%n*%f %t]%?%{kk}(%u)%?%?%{wk}%+Lw%?%=%{gk}%C %A %{Bk}%D, %M %d, %Y"

One side effect of the above status line is that it display the current time. So if we use the mouse to scroll back in history and the minute changes, it needs to be redisplayed. As part of the redisplay, we end up back at the prompt. You can disable this from the status line by removing “%{gk}%C %A ” (set green foreground, black background, time in 12 HR format, AM or PM).

I used to use Xming for an X server for my Windows 7 PC. I started looking out for other X servers for Windows because of a particularly irritating bug of Xming. Sometime Xming interferes with the clipboard, so that cut/copy/paste no longer works. The bug blocks the clipboard not only for X applications but also for native Windows applications. Quitting Xming doesn’t get the clipboard functionality back. Even after quitting Xming, a residual process would still be running which continues to block the clipboard. It has to be manually killed from the task manager.

Finally, I found an alternative that is as capable as Xming but doesn’t block the clipboard from time to time. Cygwin comes with an X server among many other things. Grab your copy here. It is an installer stub and will download additional stuff from the internet based on the components selected for installation. Cygwin/X which has the X server is not selected for installation by default. You can do it manually by selecting the Xinit package from the X11 section as shown. Cygwin has so many packages, that it is easier to search for what you want.

Screenshot of Cygwin X Server Installation

Cygwin installs a whole bunch of stuff (standard Unixy stuff). It is after all meant to provide a Unix style native environment in Windows. Even if you don’t use any of the other Cygwin stuff, it is worth it just for the X server. I am using Cygwin X Server for over a year now and it has never failed me. Also I put a shortcut to Cygwin X Server in my Startup folder so that it is automatically started whenever my PC boots.

I do most of my coding at work on a Linux server. My laptop at work runs Windows 7 and I can’t do much about it. So enter PuTTY – the ubiquitous terminal emulator for Windows. But I miss the eye candy of gnome terminal. PuTTY intentionally doesn’t support transparency or background images. So enter KiTTY – the slightly unstable and bloated PuTTY derivative that supports transparency, background images, automatic passwords, executing commands on remote machine among other things.

Get your copy of KiTTY here. Now let’s enable transparency and other eye candy. By default KiTTY doesn’t display options for setting background images. We can enable it by setting it kitty.ini file. That file is located on my Windows 7 PC at

 C:\Users\user_name\AppData\Roaming\KiTTY\kitty.ini.

Set this value to yes for KiTTY to enable setting background images.

backgroundimage=yes

The option will be available under the category “Window” as shown. Select your favorite image as the background and set a comfortable transparency level. Different profiles can have different images and other settings.

Screenshot of setting background image and transparency in KiTTY

If you want to dynamically control transparency of KiTTY, set this variable in kitty.ini

transparency=yes
transparencyvalue=0

If you are setting the background image and transparency, this option is not needed. This option is needed only if you want to change transparency dynamically. Ctrl + Up/Down increases/decreases transparency of the current frame.

Something that bothered me now was that KiTTY was capturing some of my key chords (Ctrl + arrow keys) instead of passing it on to the application on the remote server (screen in my case). Disabling shortcuts in kitty.ini solves this problem (you do lose all KiTTY shortcuts, which I don’t miss).

shortcuts=no

You can set post login commands which will be executed on remote machine after login. This is useful for stuff that cannot be done in .bashrc like becoming root. A command like this

"\psu\n\proot_pass\n"

would do the trick. The command is to be entered in the “Data” section of the config as shown. For details of what “\p” and other escape sequence stands for, check here.

Screenshot of KiTTY setting post login auto-commands

Even though KiTTY can save your login and password, I prefer using username@host.com and a putty private key file. Use PuTTY KeyGen to generate a private key/public key pair. Put the public key in ~/.ssh/authorized_keys file on a separate line on the remote machine. Give the path of the private key to KiTTY under “SSH->Auth” as shown. More details can be found here. It is worth the effort of creating a private key/public key pair for two reasons – 1) It is more secure 2) Even if the server takes time to respond, authentication will happen with the ppk file. KiTTY waits a certain amount of time and just pushes the password. If the server is taking time to respond, the password sent is lost and will have to be manually entered. Having a private key (ppk file) doesn’t have this problem, since it’s triggered by the server requests for authentication and not a predetermined timer.

Screenshot of KiTTY setting private key for authentication

Also if you plan to view X applications over the session, enable X forwarding (reverse ssh tunneling) under “SSH->X11″ as shown.Screenshot of KiTTY enabling X forwarding (reverse SSH tunneling)

Most of the time I use Linux on my desktop and use the default X server for remote X clients. On the rare occasion of having to use Windows to ssh to a Linux server, my preferred X server is Xming. Now by default the fonts displayed by default X server on Linux is much more readable than the tiny size displayed by Xming. My initial solution was to increase the font sizes of the few applications (gvim, gnome-terminal) I typically used. But once I switched back to Linux desktop, the fonts were too big.

Xming properties screenshot

There is a better solution. By default Xming uses a DPI of 96. We can increase this value to have bigger font sizes (not only fonts, everything scales proportionately). I found anything in the range of 108-112 DPI to be appropriate. Xming takes the DPI config as part of its command line parameters.

Right click on the Xming icon and select properties. You should see a similar dialog box. You can edit the “Target” field (see screenshot) and add -dpi 108 to set the DPI config to be 108.

The final command should look something similar to

"C:\Program Files (x86)\Xming\Xming.exe" :0 -clipboard -multiwindow -dpi 108

Xming has a whole bunch of other options too. You can see a full list by giving

"C:\Program Files (x86)\Xming\Xming.exe" -h

A notepad with all options should popup.

EDIT: If you are having trouble with Xming hijacking the clipboard, you can give Cygwin X Server a try.

There are wide choice of programming languages catering to a diverse application domain. This post is an attempt at classifying them based on how much the language/implementation tries to abstract the machine details from the programmer.

We can broadly classify languages as

  • Assembly language (x86 assembler AT&T syntax or Intel syntax)
  • Compiled (C)
  • Compiled to byte code of VM and compiled just in time (Java2)
  • Compiled to byte code of VM and interpreted (Python)
  • String interpreted (TCL)
Assembly Languages

Such languages need the least amount of work to make them executable. Assemblers like the x86 assembly language is closely tied to the machine code. It is basically a one is to one mapping between machine instruction to a human readable string called mnemonics. Assemblers also make it easy to assign labels to address locations to ease programming effort. Modern assemblers also support macros which can make programming repetitive code easier. There is little scope for optimization or higher order data structures in assembly languages, since they just mirror the target machine code. Such languages are the least portable since they are always tied to a target machine architecture. Few examples of assembly languages would be the x86 assembly language, Motorola 6800 assembly etc.

Compiled Languages

Such languages are a step taken for portability. They abstract the underlying machine instructions and give higher order constructs for arithmetic operations, branching, looping and basic data types. They are compiled in to target machine code directly. Once compiled, the binary can be natively run in the target machine. Arguably they are slower than assembler, since the actual machine code is generated by a compiler and will not be as optimized as hand coded assembler. But in practice, for any modern processor with multiple cores and pipelining, the compiler tends to generate more optimized code. Such languages can have support for higher order data structures like lists, maps etc either natively or via standard libraries. C, C++, Pascal etc are some examples of compiled languages.

Compiled to Byte Code of a VM (JIT compiled)

These languages are another step towards portability. First they are compiled to byte code of a virtual machine. The virtual machine executes the byte code by compiling it in to native machine code Just In Time (JIT). Practically these are slower than compiled languages since there is another layer of abstraction. But they are more portable than compiled languages. The same compiled byte code can be run on any platform that supports the VM, while for compiled languages, different binaries are required for different platforms. Typically they support a full range of higher order data structures. Examples include Java2, Ruby core 1.9.

Compiled to Byte Code of a VM (Interpreted)

The first step of these languages is the same as the above. They are compiled to byte code of a virtual machine. The virtual machine itself executes the byte code by interpreting it. They are generally slower than JIT implementations. They have same portability as JIT implementations. Performance also depends on the optimization effort gone in to the VM implementation. For example Python2 (byte code interpreted) is faster than Ruby core 1.9 (JIT) while Java2 (JIT) is way faster than compiled Lisp (SBCL). Examples include Python, Ruby.

String Interpreted

Such languages interpret the source code string directly. Because of this, these are usually the slowest of the lot. Consider this statement – a = 100 + 2. The 100 and 2 are strings and instead of doing the addition 100 + 2 natively, the interpreter knows how to add integers as strings. The interpreter is easier to implement than byte code compilation but performance is the least. TCL, JavaScript are examples of string interpreted languages.

We can see a pattern emerging. From assembly to string interpretation, the language/implementation abstracts machine details more and more from the programmer. As a result performance keeps decreasing while portability keeps increasing. Beyond a point, performance decreases without any increase in portability, but implementation becomes easier. Also more abstracted languages usually provide higher order data structures and automatic memory management for free.

Also the level of abstraction really depends on the implementation rather than the language itself. For example, Python2 is both byte code interpreted (official CPython) and JIT (PyPy). The implementation can vary not only in the VM. For example Common Lisp has a compiled implementation (SBCL), compiled to C/C++ or byte code interpreted implementation (ECL).

Picture of Dedicated Mute Button on a BlackBerry

How many times have we pushed the mute button during a teleconference at work? To discuss strategy before committing? To hide uncontrollable laughter? To curse the other end of the line? All with the comfort of the red LED indicating that the phone is on mute and the other party cannot hear us.

Let me illustrate the importance of the mute button. What makes a good business phone? Great email synchronization including pushmail? Full QWERTY keypad (real or virtual)? Advanced encryption for secure communication? Lack anything that is remotely fun? Well one this is for sure, BlackBerry which is the specialist in business phones, got the dedicated mute button right.

Take other equipment for teleconferencing. Most devices have a central console and multiple (typically 3) extensions having just a microphone so that all the people in a boardroom sized conference can speak. The only button on the extensions is the dedicated mute button. The only LED indicator on all extensions is the mute indicator.

What would happen if the mute button stops working? Even worse, what would happen if the red LED shone brightly but the phone isn’t muted? Given the right (or wrong) time, this would be enough to lose contracts? Strain relations? Get people fired? Split a company? Corporate espionage anyone?

PS: As I was searching for a suitable picture for this post, came across this. It describes an incident of a broken mute button, comments agree with more incidents.

PPS: I have used a total of 17 question marks ‘?’ including this one in this post. All sentences in the last paragraph end with a question mark.

Recently, I decided to take MIT OCW Algorithms course. I wanted to actually measure the performance of various algorithms. So before I dived in to it, I decided to come up with a setup for measuring time taken. For this, we need high precision time measurement. I have used the Read Time Stamp Counter (RDTSC) instruction introduced in Pentium processors before. I have heard about High Precision Event Timers (HPET) introduced by Intel circa 2005. In this post we have a shootout between the two mechanisms.

The metrics we want to compare are

  • Resolution
  • Accuracy
  • Cost (in terms of CPU time)
  • Reliability

Before we get in to the actual testing, let us understand how to use HPET and RDTSC. Here is how we use HPET which is a POSIX standard.

#include <time.h>
TestHpet()
{
  struct timespec ts;
  clock_gettime(CLOCK_MONOTONIC, &ts);
}

And here is how we use the RDTSC instruction. With RDTSC, we actually read the number of CPU clock cycles from a counter (Time Stamp Counter). This keeps incrementing for each CPU clock. This does not directly translate to actual time. This needs to be done by calibrating the number of CPU cycles per nanosecond and dividing the clock ticks by this calibrated value for actual nanoseconds. Since it is not guaranteed that this TSC value will be synchronized across CPU, we bind our process to CPU1 (I have a dual core Inter T7500 CPU) to eliminate TSC mismatch between the two CPU cores.

#include <stdint.h> /* for uint64_t */
#include <time.h>  /* for struct timespec */

/* assembly code to read the TSC */
static inline uint64_t RDTSC()
{
  unsigned int hi, lo;
  __asm__ volatile("rdtsc" : "=a" (lo), "=d" (hi));
  return ((uint64_t)hi << 32) | lo;
}

const int NANO_SECONDS_IN_SEC = 1000000000;
/* returns a static buffer of struct timespec with the time difference of ts1 and ts2
   ts1 is assumed to be greater than ts2 */
struct timespec *TimeSpecDiff(struct timespec *ts1, struct timespec *ts2)
{
  static struct timespec ts;
  ts.tv_sec = ts1->tv_sec - ts2->tv_sec;
  ts.tv_nsec = ts1->tv_nsec - ts2->tv_nsec;
  if (ts.tv_nsec < 0) {
    ts.tv_sec--;
    ts.tv_nsec += NANO_SECONDS_IN_SEC;
  }
  return &ts;
}

double g_TicksPerNanoSec;
static void CalibrateTicks()
{
  struct timespec begints, endts;
  uint64_t begin = 0, end = 0;
  clock_gettime(CLOCK_MONOTONIC, &begints);
  begin = RDTSC();
  uint64_t i;
  for (i = 0; i < 1000000; i++); /* must be CPU intensive */
  end = RDTSC();
  clock_gettime(CLOCK_MONOTONIC, &endts);
  struct timespec *tmpts = TimeSpecDiff(&endts, &begints);
  uint64_t nsecElapsed = tmpts->tv_sec * 1000000000LL + tmpts->tv_nsec;
  g_TicksPerNanoSec = (double)(end - begin)/(double)nsecElapsed;
}

/* Call once before using RDTSC, has side effect of binding process to CPU1 */
void InitRdtsc()
{
  unsigned long cpuMask;
  cpuMask = 2; // bind to cpu 1
  sched_setaffinity(0, sizeof(cpuMask), &cpuMask);
  CalibrateTicks();
}

void GetTimeSpec(struct timespec *ts, uint64_t nsecs)
{
  ts->tv_sec = nsecs / NANO_SECONDS_IN_SEC;
  ts->tv_nsec = nsecs % NANO_SECONDS_IN_SEC;
}

/* ts will be filled with time converted from TSC reading */
void GetRdtscTime(struct timespec *ts)
{
  GetTimeSpec(ts, RDTSC() / g_TicksPerNanoSec);
}

Now back to our metrics. This is how each mechanism fares.

Resolution

HPET API clock_gettime, gives the result in struct timespec. The maximum granularity of timespec is nanoseconds. This is what struct timespec can represent, actual resolution varies depending upon implementation. We can get the resolution through the API clock_getres(). On my Dell XPS 1530 with Intel core2duo T7500 CPU running Ubuntu 10.04, it has a resolution of 1 nanosecond. On the other hand, RDTSC instruction can have resolution of upto a CPU clock time. On my 2.2 GHz CPU that means resolution is 0.45 nanoseconds. Clearly RDTSC is the winner.

Accuracy

From my tests, both seemed to give consistently the same results agreeing with each other correct to 5 nanoseconds. Since I have no other reference, I assume both are equally accurate. So no winner.

Cost

I ran a simple test case where I measured the time taken for 1 million calls to both HPET and RDTSC. And here is the result.

HPET : 1 sec 482 msec 188 usec 38 nsec
RDTSC: 0 sec 103 msec 311 usec 752 nsec

RDTSC is the clear winner in this case by being 14 times cheaper than HPET.

Reliability

Well a quick look at the Wikipedia entry for RDTSC will give us an idea of how unreliable it is. So many factors affect it like

  • Multiple cores having different TSC values (we eliminated this by binding our process to 1 core)
  • CPU frequency scaling for power saving (we eliminated this by always being CPU intensive)
  • Hibernation of system will reset TSC value (we didn’t let our system hibernate)
  • Impact on portability due to varying implementation of CPUs (we ran only on the same Intel CPU)

So for application programming, RDTSC seems to be quite unreliable. HPET is a POSIX standard and is the clear winner.

Conclusion

Final score is RDTSC 2 and HPET 1. But there is more to this. RDTSC definitely has reliability and portability issues and may not be very useful for regular application programming. I was affected by CPU frequency scaling during my tests. In CalibrateTicks(), initially I used a sleep(1) to sleep for 1 second to calibrate the number of ticks in a nanosecond. I got values ranging from 0.23 to 0.55 instead of 2.2 (or very close to it since my CPU is 2.2 GHz). Once I switched the sleep(1) to wasting CPU in a for loop, it gave me consistent readings of 2.198 ticks per nanosecond.

But RDTSC is 14 times cheaper than HPET. This can be useful for certain benchmarking exercises as long as one is aware of its pitfalls and is cautious.