Archive for September, 2010

Picture of Dedicated Mute Button on a BlackBerry

How many times have we pushed the mute button during a teleconference at work? To discuss strategy before committing? To hide uncontrollable laughter? To curse the other end of the line? All with the comfort of the red LED indicating that the phone is on mute and the other party cannot hear us.

Let me illustrate the importance of the mute button. What makes a good business phone? Great email synchronization including pushmail? Full QWERTY keypad (real or virtual)? Advanced encryption for secure communication? Lack anything that is remotely fun? Well one this is for sure, BlackBerry which is the specialist in business phones, got the dedicated mute button right.

Take other equipment for teleconferencing. Most devices have a central console and multiple (typically 3) extensions having just a microphone so that all the people in a boardroom sized conference can speak. The only button on the extensions is the dedicated mute button. The only LED indicator on all extensions is the mute indicator.

What would happen if the mute button stops working? Even worse, what would happen if the red LED shone brightly but the phone isn’t muted? Given the right (or wrong) time, this would be enough to lose contracts? Strain relations? Get people fired? Split a company? Corporate espionage anyone?

PS: As I was searching for a suitable picture for this post, came across this. It describes an incident of a broken mute button, comments agree with more incidents.

PPS: I have used a total of 17 question marks ‘?’ including this one in this post. All sentences in the last paragraph end with a question mark.

Recently, I decided to take MIT OCW Algorithms course. I wanted to actually measure the performance of various algorithms. So before I dived in to it, I decided to come up with a setup for measuring time taken. For this, we need high precision time measurement. I have used the Read Time Stamp Counter (RDTSC) instruction introduced in Pentium processors before. I have heard about High Precision Event Timers (HPET) introduced by Intel circa 2005. In this post we have a shootout between the two mechanisms.

The metrics we want to compare are

  • Resolution
  • Accuracy
  • Cost (in terms of CPU time)
  • Reliability

Before we get in to the actual testing, let us understand how to use HPET and RDTSC. Here is how we use HPET which is a POSIX standard.

#include <time.h>
TestHpet()
{
  struct timespec ts;
  clock_gettime(CLOCK_MONOTONIC, &ts);
}

And here is how we use the RDTSC instruction. With RDTSC, we actually read the number of CPU clock cycles from a counter (Time Stamp Counter). This keeps incrementing for each CPU clock. This does not directly translate to actual time. This needs to be done by calibrating the number of CPU cycles per nanosecond and dividing the clock ticks by this calibrated value for actual nanoseconds. Since it is not guaranteed that this TSC value will be synchronized across CPU, we bind our process to CPU1 (I have a dual core Inter T7500 CPU) to eliminate TSC mismatch between the two CPU cores.

#include <stdint.h> /* for uint64_t */
#include <time.h>  /* for struct timespec */

/* assembly code to read the TSC */
static inline uint64_t RDTSC()
{
  unsigned int hi, lo;
  __asm__ volatile("rdtsc" : "=a" (lo), "=d" (hi));
  return ((uint64_t)hi << 32) | lo;
}

const int NANO_SECONDS_IN_SEC = 1000000000;
/* returns a static buffer of struct timespec with the time difference of ts1 and ts2
   ts1 is assumed to be greater than ts2 */
struct timespec *TimeSpecDiff(struct timespec *ts1, struct timespec *ts2)
{
  static struct timespec ts;
  ts.tv_sec = ts1->tv_sec - ts2->tv_sec;
  ts.tv_nsec = ts1->tv_nsec - ts2->tv_nsec;
  if (ts.tv_nsec < 0) {
    ts.tv_sec--;
    ts.tv_nsec += NANO_SECONDS_IN_SEC;
  }
  return &ts;
}

double g_TicksPerNanoSec;
static void CalibrateTicks()
{
  struct timespec begints, endts;
  uint64_t begin = 0, end = 0;
  clock_gettime(CLOCK_MONOTONIC, &begints);
  begin = RDTSC();
  uint64_t i;
  for (i = 0; i < 1000000; i++); /* must be CPU intensive */
  end = RDTSC();
  clock_gettime(CLOCK_MONOTONIC, &endts);
  struct timespec *tmpts = TimeSpecDiff(&endts, &begints);
  uint64_t nsecElapsed = tmpts->tv_sec * 1000000000LL + tmpts->tv_nsec;
  g_TicksPerNanoSec = (double)(end - begin)/(double)nsecElapsed;
}

/* Call once before using RDTSC, has side effect of binding process to CPU1 */
void InitRdtsc()
{
  unsigned long cpuMask;
  cpuMask = 2; // bind to cpu 1
  sched_setaffinity(0, sizeof(cpuMask), &cpuMask);
  CalibrateTicks();
}

void GetTimeSpec(struct timespec *ts, uint64_t nsecs)
{
  ts->tv_sec = nsecs / NANO_SECONDS_IN_SEC;
  ts->tv_nsec = nsecs % NANO_SECONDS_IN_SEC;
}

/* ts will be filled with time converted from TSC reading */
void GetRdtscTime(struct timespec *ts)
{
  GetTimeSpec(ts, RDTSC() / g_TicksPerNanoSec);
}

Now back to our metrics. This is how each mechanism fares.

Resolution

HPET API clock_gettime, gives the result in struct timespec. The maximum granularity of timespec is nanoseconds. This is what struct timespec can represent, actual resolution varies depending upon implementation. We can get the resolution through the API clock_getres(). On my Dell XPS 1530 with Intel core2duo T7500 CPU running Ubuntu 10.04, it has a resolution of 1 nanosecond. On the other hand, RDTSC instruction can have resolution of upto a CPU clock time. On my 2.2 GHz CPU that means resolution is 0.45 nanoseconds. Clearly RDTSC is the winner.

Accuracy

From my tests, both seemed to give consistently the same results agreeing with each other correct to 5 nanoseconds. Since I have no other reference, I assume both are equally accurate. So no winner.

Cost

I ran a simple test case where I measured the time taken for 1 million calls to both HPET and RDTSC. And here is the result.

HPET : 1 sec 482 msec 188 usec 38 nsec
RDTSC: 0 sec 103 msec 311 usec 752 nsec

RDTSC is the clear winner in this case by being 14 times cheaper than HPET.

Reliability

Well a quick look at the Wikipedia entry for RDTSC will give us an idea of how unreliable it is. So many factors affect it like

  • Multiple cores having different TSC values (we eliminated this by binding our process to 1 core)
  • CPU frequency scaling for power saving (we eliminated this by always being CPU intensive)
  • Hibernation of system will reset TSC value (we didn’t let our system hibernate)
  • Impact on portability due to varying implementation of CPUs (we ran only on the same Intel CPU)

So for application programming, RDTSC seems to be quite unreliable. HPET is a POSIX standard and is the clear winner.

Conclusion

Final score is RDTSC 2 and HPET 1. But there is more to this. RDTSC definitely has reliability and portability issues and may not be very useful for regular application programming. I was affected by CPU frequency scaling during my tests. In CalibrateTicks(), initially I used a sleep(1) to sleep for 1 second to calibrate the number of ticks in a nanosecond. I got values ranging from 0.23 to 0.55 instead of 2.2 (or very close to it since my CPU is 2.2 GHz). Once I switched the sleep(1) to wasting CPU in a for loop, it gave me consistent readings of 2.198 ticks per nanosecond.

But RDTSC is 14 times cheaper than HPET. This can be useful for certain benchmarking exercises as long as one is aware of its pitfalls and is cautious.

For anybody who loves programming, there is a problem of plenty. Naturally we are curious to learn more and there is simply too many options to pursue. When I started my career 4 years ago, the only language I knew fluently (well kinda) was C. I wanted to learn C++ and a scripting language for rapid prototyping. It has been 4 years, and I have learnt C++ and Python.

Now comes the predicament. What to study next. There are plenty of options. Should I study GUI programming via wxWidgets or learn Internet technologies like HTML and JavaScript. Both will not help me in my line of work (carrier class embedded software development in telecom domain). But it will be fun, and turn around time is much quicker. Applications always have this magical aura of being useful directly. Users interact with application directly and the developer gets feedback instantly. With infrastructure like Linux or the iPhone, we have to wait for some application to utilize the new goodies so that end users can appreciate the infrastructure.

Or I can take MIT Open Course Ware course on Operating Systems or strengthen my knowledge of algorithms. Both will help me in my line of work and should be tons of fun. But people around me (friends and relatives) identify with Internet technologies much better. Imagine showing my dad a terminal shell on the basic OS I wrote as part of MIT OCW or red black trees in action. Now imagine showing my dad an iGoogle widget written in HTML and JavaScript that he can interact with. But having a better understanding of the infrastructure OS or the algorithms can make us better programmers in general. This will benefit application programming too.

Another fascinating thread to follow is to learn Lisp. It may not give us any practical benefits. But I trust people when they say that learning Lisp is a profoundly enlightening experience for a programmer. I want to experience it first hand. But should I take it up now? If not now when?

Apart from technology, processes and paradigms are also a good candidate to delve into. Modern software engineering principles, design patterns, architectures, refactoring, software metrics, aspect oriented programming, functional programming, concurrent programming, literate programming etc are also promising. All of them, like investing in infrastructure, will help in general.

One of the things to consider while choosing is, how long will it be relevant. Ideally, we do not want to invest in something that will become irrelevant in the future. Till now I have invested in time proven technologies/tools like C, C++, Python, Vim, Linux… which are here to stay for good. Another aspect of relevancy is that it should be universally useful. It should still be relevant outside of my current organization, without licensing issues. Copyrights and licenses can make a current technology irrelevant for our purposes. Sticking with open standards and open source technologies with strong communities and vision ensures that our investment will not be obsolete quickly. HTML, JavaScript, OS concepts, algorithms and Lisp all are here to stay for good. Processes and paradigms may or may not stand the test of time and can quickly become a fad. Plus they can be picked up while focus is on learning the others.

So we have 3 main streams to pursue. Internet technologies (HTML and JavaScript), infrastructure (OS and algorithms) and Lisp. Well investing in infrastructure seems to be the most logical choice for me. It will directly help me in my line of work. It will help me be a better programmer by providing me a better understanding of the underlying universal principles of computer science. Both will be around for a long time (forever is more like it). Both will help me in my pursuit of the other two streams. So this can be a foundation for the other two streams 🙂

I am not the only on faced with this predicament. I quote Frederick P Brooks Jr from The Mythical Man Month.

The computer-related intellectual discipline has exploded as has the technology. When I was a graduate student in the mid-1950s, I could read all the journals and conference proceedings; I could stay current in all the discipline. Today my intellectual life has seen me regretfully kissing sub discipline interests goodbye one by one, as my portfolio has continuously overflowed beyond mastery. Too many interests, too many exciting opportunities for learning, research, and thought. What a marvelous predicament! Not only is the end not in sight, the pace is not slackening. We have many future joys.

So it is not really a predicament. We are just spoilt for choice. As Brooks predicts, we have many future joys. Amen.

In this post, we will explore how to set different TAB settings while editing C/C++ and Python code. For C/C++, we want 2 spaces for indendation, while for Python, we will stick to PEP 8 recommendations (4 spaces for indendation, 8 spaces for TAB, 4 spaces for expandtab). Vim being such a powerful and versatile editor, there are many ways to achieve this. We will explore the most portable way – with changes just to the .vimrc.

" Python options
let python_highlight_all = 1
function! s:set_python_settings()
  set tabstop=8
  set softtabstop=4
  set shiftwidth=4
endfunction
function! s:unset_python_settings()
  set tabstop=2
  set softtabstop=2
  set shiftwidth=2
endfunction
autocmd BufNewFile,BufEnter *.{py} call set_python_settings()
autocmd BufLeave *.{py} call unset_python_settings()

What we are doing here is define two functions, set_python_settings() which will change tab settings as per PEP 8 recommendations and unset_python_settings() which will default to our C/C++ settings. Autocmd upon entering a new buffer (BufEnter) or a new file is opened (BufNewFile) of type *py, we call set_python_settings() to set tab settings for Python. When we leave a buffer (BufLeave) of type *.py, we get back to original settings by calling unset_python_settings().

Exuberant ctags is a pretty nifty utility for source code browsing. Especially since it integrates so well with vim. Also exuberant ctags can understand many languages (41 as per the official website). So this is relevant not only for C/C++ or Python but for all 41 languages supported (and future ones too). In this post, we explore yet another popular plugin using exuberant ctags – TagList.

TagList is capable of showing a list of functions/global variables/class/struct towards one side of the vim. This is similar to some IDEs or event editors like Notepad++. This makes browsing and navigation pretty easy. Our aim will be to be able to navigate current file using TagList window and to always display current function name in the status line.

Screenshot of TagList plugin showing current function name in status line

For this, we will need:

TagList will work only with exuberant ctags and not with any other ctags (specifically GNU ctags). Install exuberant ctags if required. Also ensure that ctags command should actually invoke exuberant ctags.

Install TagList plugin (see elaborate steps in the link). Typically vim plugins are installed by simply copying to $HOME/.vim/plugin directory. If any documentation is there, copy that to $HOME/.vim/doc and re-index vim help by giving “:helptags $HOME/.vim/doc” command in vim.

Now we need to customize the TagList plugin options to get what we want.

" TagList options
let Tlist_Close_On_Select = 1 "close taglist window once we selected something
let Tlist_Exit_OnlyWindow = 1 "if taglist window is the only window left, exit vim
let Tlist_Show_Menu = 1 "show Tags menu in gvim
let Tlist_Show_One_File = 1 "show tags of only one file
let Tlist_GainFocus_On_ToggleOpen = 1 "automatically switch to taglist window
let Tlist_Highlight_Tag_On_BufEnter = 1 "highlight current tag in taglist window
let Tlist_Process_File_Always = 1 "even without taglist window, create tags file, required for displaying tag in statusline
let Tlist_Use_Right_Window = 1 "display taglist window on the right
let Tlist_Display_Prototype = 1 "display full prototype instead of just function name
"let Tlist_Ctags_Cmd = /path/to/exuberant/ctags

nnoremap <F5> :TlistToggle
nnoremap <F6> :TlistShowPrototype

set statusline=[%n]\ %<%f\ %([%1*%M%*%R%Y]%)\ \ \ [%{Tlist_Get_Tagname_By_Line()}]\ %=%-19(\LINE\ [%l/%L]\ COL\ [%02c%03V]%)\ %P

Notice the call to Tlist_Get_Tagname_By_Line() within the statuline. This is what displays the current function name (actually tag which can be class name or struct name or any tag) in the status line. With the above settings, pressing F6 will show the prototype of current function at the bottom.

Pressing F5 will open up the TagList window which will show the list of functions/classes/structs/define etc. The focus is automatically switched to the list window and we can quickly jump to the function/class/struct definition we want to. Upon selecting an entry, the list window is automatically closed.shrink the window back to default size.

Screenshot of TagList window

Now with the full prototype display, it is not very easy to read the actual function name. TagList has a zoom feature to overcome this. Press “x” and the TagList window will enlarge, occupying almost the complete vim screen. Selection of an entry or pressing F5 again will close the window. Pressing x again will shrink the window back to default size.

Screenshot of TagList window zoomed using "x" key

TagList generates its own tags file and does not require the user to provide a tags file. This file is generated every time we switch to a buffer. The tags file is created somewhere in /tmp/ folder. TagList cannot work with a user generated tags file.