Archive for the ‘Programming’ Category

There are wide choice of programming languages catering to a diverse application domain. This post is an attempt at classifying them based on how much the language/implementation tries to abstract the machine details from the programmer.

We can broadly classify languages as

  • Assembly language (x86 assembler AT&T syntax or Intel syntax)
  • Compiled (C)
  • Compiled to byte code of VM and compiled just in time (Java2)
  • Compiled to byte code of VM and interpreted (Python)
  • String interpreted (TCL)
Assembly Languages

Such languages need the least amount of work to make them executable. Assemblers like the x86 assembly language is closely tied to the machine code. It is basically a one is to one mapping between machine instruction to a human readable string called mnemonics. Assemblers also make it easy to assign labels to address locations to ease programming effort. Modern assemblers also support macros which can make programming repetitive code easier. There is little scope for optimization or higher order data structures in assembly languages, since they just mirror the target machine code. Such languages are the least portable since they are always tied to a target machine architecture. Few examples of assembly languages would be the x86 assembly language, Motorola 6800 assembly etc.

Compiled Languages

Such languages are a step taken for portability. They abstract the underlying machine instructions and give higher order constructs for arithmetic operations, branching, looping and basic data types. They are compiled in to target machine code directly. Once compiled, the binary can be natively run in the target machine. Arguably they are slower than assembler, since the actual machine code is generated by a compiler and will not be as optimized as hand coded assembler. But in practice, for any modern processor with multiple cores and pipelining, the compiler tends to generate more optimized code. Such languages can have support for higher order data structures like lists, maps etc either natively or via standard libraries. C, C++, Pascal etc are some examples of compiled languages.

Compiled to Byte Code of a VM (JIT compiled)

These languages are another step towards portability. First they are compiled to byte code of a virtual machine. The virtual machine executes the byte code by compiling it in to native machine code Just In Time (JIT). Practically these are slower than compiled languages since there is another layer of abstraction. But they are more portable than compiled languages. The same compiled byte code can be run on any platform that supports the VM, while for compiled languages, different binaries are required for different platforms. Typically they support a full range of higher order data structures. Examples include Java2, Ruby core 1.9.

Compiled to Byte Code of a VM (Interpreted)

The first step of these languages is the same as the above. They are compiled to byte code of a virtual machine. The virtual machine itself executes the byte code by interpreting it. They are generally slower than JIT implementations. They have same portability as JIT implementations. Performance also depends on the optimization effort gone in to the VM implementation. For example Python2 (byte code interpreted) is faster than Ruby core 1.9 (JIT) while Java2 (JIT) is way faster than compiled Lisp (SBCL). Examples include Python, Ruby.

String Interpreted

Such languages interpret the source code string directly. Because of this, these are usually the slowest of the lot. Consider this statement – a = 100 + 2. The 100 and 2 are strings and instead of doing the addition 100 + 2 natively, the interpreter knows how to add integers as strings. The interpreter is easier to implement than byte code compilation but performance is the least. TCL, JavaScript are examples of string interpreted languages.

We can see a pattern emerging. From assembly to string interpretation, the language/implementation abstracts machine details more and more from the programmer. As a result performance keeps decreasing while portability keeps increasing. Beyond a point, performance decreases without any increase in portability, but implementation becomes easier. Also more abstracted languages usually provide higher order data structures and automatic memory management for free.

Also the level of abstraction really depends on the implementation rather than the language itself. For example, Python2 is both byte code interpreted (official CPython) and JIT (PyPy). The implementation can vary not only in the VM. For example Common Lisp has a compiled implementation (SBCL), compiled to C/C++ or byte code interpreted implementation (ECL).

Advertisements

Recently, I decided to take MIT OCW Algorithms course. I wanted to actually measure the performance of various algorithms. So before I dived in to it, I decided to come up with a setup for measuring time taken. For this, we need high precision time measurement. I have used the Read Time Stamp Counter (RDTSC) instruction introduced in Pentium processors before. I have heard about High Precision Event Timers (HPET) introduced by Intel circa 2005. In this post we have a shootout between the two mechanisms.

The metrics we want to compare are

  • Resolution
  • Accuracy
  • Cost (in terms of CPU time)
  • Reliability

Before we get in to the actual testing, let us understand how to use HPET and RDTSC. Here is how we use HPET which is a POSIX standard.

#include <time.h>
TestHpet()
{
  struct timespec ts;
  clock_gettime(CLOCK_MONOTONIC, &ts);
}

And here is how we use the RDTSC instruction. With RDTSC, we actually read the number of CPU clock cycles from a counter (Time Stamp Counter). This keeps incrementing for each CPU clock. This does not directly translate to actual time. This needs to be done by calibrating the number of CPU cycles per nanosecond and dividing the clock ticks by this calibrated value for actual nanoseconds. Since it is not guaranteed that this TSC value will be synchronized across CPU, we bind our process to CPU1 (I have a dual core Inter T7500 CPU) to eliminate TSC mismatch between the two CPU cores.

#include <stdint.h> /* for uint64_t */
#include <time.h>  /* for struct timespec */

/* assembly code to read the TSC */
static inline uint64_t RDTSC()
{
  unsigned int hi, lo;
  __asm__ volatile("rdtsc" : "=a" (lo), "=d" (hi));
  return ((uint64_t)hi << 32) | lo;
}

const int NANO_SECONDS_IN_SEC = 1000000000;
/* returns a static buffer of struct timespec with the time difference of ts1 and ts2
   ts1 is assumed to be greater than ts2 */
struct timespec *TimeSpecDiff(struct timespec *ts1, struct timespec *ts2)
{
  static struct timespec ts;
  ts.tv_sec = ts1->tv_sec - ts2->tv_sec;
  ts.tv_nsec = ts1->tv_nsec - ts2->tv_nsec;
  if (ts.tv_nsec < 0) {
    ts.tv_sec--;
    ts.tv_nsec += NANO_SECONDS_IN_SEC;
  }
  return &ts;
}

double g_TicksPerNanoSec;
static void CalibrateTicks()
{
  struct timespec begints, endts;
  uint64_t begin = 0, end = 0;
  clock_gettime(CLOCK_MONOTONIC, &begints);
  begin = RDTSC();
  uint64_t i;
  for (i = 0; i < 1000000; i++); /* must be CPU intensive */
  end = RDTSC();
  clock_gettime(CLOCK_MONOTONIC, &endts);
  struct timespec *tmpts = TimeSpecDiff(&endts, &begints);
  uint64_t nsecElapsed = tmpts->tv_sec * 1000000000LL + tmpts->tv_nsec;
  g_TicksPerNanoSec = (double)(end - begin)/(double)nsecElapsed;
}

/* Call once before using RDTSC, has side effect of binding process to CPU1 */
void InitRdtsc()
{
  unsigned long cpuMask;
  cpuMask = 2; // bind to cpu 1
  sched_setaffinity(0, sizeof(cpuMask), &cpuMask);
  CalibrateTicks();
}

void GetTimeSpec(struct timespec *ts, uint64_t nsecs)
{
  ts->tv_sec = nsecs / NANO_SECONDS_IN_SEC;
  ts->tv_nsec = nsecs % NANO_SECONDS_IN_SEC;
}

/* ts will be filled with time converted from TSC reading */
void GetRdtscTime(struct timespec *ts)
{
  GetTimeSpec(ts, RDTSC() / g_TicksPerNanoSec);
}

Now back to our metrics. This is how each mechanism fares.

Resolution

HPET API clock_gettime, gives the result in struct timespec. The maximum granularity of timespec is nanoseconds. This is what struct timespec can represent, actual resolution varies depending upon implementation. We can get the resolution through the API clock_getres(). On my Dell XPS 1530 with Intel core2duo T7500 CPU running Ubuntu 10.04, it has a resolution of 1 nanosecond. On the other hand, RDTSC instruction can have resolution of upto a CPU clock time. On my 2.2 GHz CPU that means resolution is 0.45 nanoseconds. Clearly RDTSC is the winner.

Accuracy

From my tests, both seemed to give consistently the same results agreeing with each other correct to 5 nanoseconds. Since I have no other reference, I assume both are equally accurate. So no winner.

Cost

I ran a simple test case where I measured the time taken for 1 million calls to both HPET and RDTSC. And here is the result.

HPET : 1 sec 482 msec 188 usec 38 nsec
RDTSC: 0 sec 103 msec 311 usec 752 nsec

RDTSC is the clear winner in this case by being 14 times cheaper than HPET.

Reliability

Well a quick look at the Wikipedia entry for RDTSC will give us an idea of how unreliable it is. So many factors affect it like

  • Multiple cores having different TSC values (we eliminated this by binding our process to 1 core)
  • CPU frequency scaling for power saving (we eliminated this by always being CPU intensive)
  • Hibernation of system will reset TSC value (we didn’t let our system hibernate)
  • Impact on portability due to varying implementation of CPUs (we ran only on the same Intel CPU)

So for application programming, RDTSC seems to be quite unreliable. HPET is a POSIX standard and is the clear winner.

Conclusion

Final score is RDTSC 2 and HPET 1. But there is more to this. RDTSC definitely has reliability and portability issues and may not be very useful for regular application programming. I was affected by CPU frequency scaling during my tests. In CalibrateTicks(), initially I used a sleep(1) to sleep for 1 second to calibrate the number of ticks in a nanosecond. I got values ranging from 0.23 to 0.55 instead of 2.2 (or very close to it since my CPU is 2.2 GHz). Once I switched the sleep(1) to wasting CPU in a for loop, it gave me consistent readings of 2.198 ticks per nanosecond.

But RDTSC is 14 times cheaper than HPET. This can be useful for certain benchmarking exercises as long as one is aware of its pitfalls and is cautious.

For anybody who loves programming, there is a problem of plenty. Naturally we are curious to learn more and there is simply too many options to pursue. When I started my career 4 years ago, the only language I knew fluently (well kinda) was C. I wanted to learn C++ and a scripting language for rapid prototyping. It has been 4 years, and I have learnt C++ and Python.

Now comes the predicament. What to study next. There are plenty of options. Should I study GUI programming via wxWidgets or learn Internet technologies like HTML and JavaScript. Both will not help me in my line of work (carrier class embedded software development in telecom domain). But it will be fun, and turn around time is much quicker. Applications always have this magical aura of being useful directly. Users interact with application directly and the developer gets feedback instantly. With infrastructure like Linux or the iPhone, we have to wait for some application to utilize the new goodies so that end users can appreciate the infrastructure.

Or I can take MIT Open Course Ware course on Operating Systems or strengthen my knowledge of algorithms. Both will help me in my line of work and should be tons of fun. But people around me (friends and relatives) identify with Internet technologies much better. Imagine showing my dad a terminal shell on the basic OS I wrote as part of MIT OCW or red black trees in action. Now imagine showing my dad an iGoogle widget written in HTML and JavaScript that he can interact with. But having a better understanding of the infrastructure OS or the algorithms can make us better programmers in general. This will benefit application programming too.

Another fascinating thread to follow is to learn Lisp. It may not give us any practical benefits. But I trust people when they say that learning Lisp is a profoundly enlightening experience for a programmer. I want to experience it first hand. But should I take it up now? If not now when?

Apart from technology, processes and paradigms are also a good candidate to delve into. Modern software engineering principles, design patterns, architectures, refactoring, software metrics, aspect oriented programming, functional programming, concurrent programming, literate programming etc are also promising. All of them, like investing in infrastructure, will help in general.

One of the things to consider while choosing is, how long will it be relevant. Ideally, we do not want to invest in something that will become irrelevant in the future. Till now I have invested in time proven technologies/tools like C, C++, Python, Vim, Linux… which are here to stay for good. Another aspect of relevancy is that it should be universally useful. It should still be relevant outside of my current organization, without licensing issues. Copyrights and licenses can make a current technology irrelevant for our purposes. Sticking with open standards and open source technologies with strong communities and vision ensures that our investment will not be obsolete quickly. HTML, JavaScript, OS concepts, algorithms and Lisp all are here to stay for good. Processes and paradigms may or may not stand the test of time and can quickly become a fad. Plus they can be picked up while focus is on learning the others.

So we have 3 main streams to pursue. Internet technologies (HTML and JavaScript), infrastructure (OS and algorithms) and Lisp. Well investing in infrastructure seems to be the most logical choice for me. It will directly help me in my line of work. It will help me be a better programmer by providing me a better understanding of the underlying universal principles of computer science. Both will be around for a long time (forever is more like it). Both will help me in my pursuit of the other two streams. So this can be a foundation for the other two streams 🙂

I am not the only on faced with this predicament. I quote Frederick P Brooks Jr from The Mythical Man Month.

The computer-related intellectual discipline has exploded as has the technology. When I was a graduate student in the mid-1950s, I could read all the journals and conference proceedings; I could stay current in all the discipline. Today my intellectual life has seen me regretfully kissing sub discipline interests goodbye one by one, as my portfolio has continuously overflowed beyond mastery. Too many interests, too many exciting opportunities for learning, research, and thought. What a marvelous predicament! Not only is the end not in sight, the pace is not slackening. We have many future joys.

So it is not really a predicament. We are just spoilt for choice. As Brooks predicts, we have many future joys. Amen.

Exuberant ctags is a pretty nifty utility for source code browsing. Especially since it integrates so well with vim. Also exuberant ctags can understand many languages (41 as per the official website). So this is relevant not only for C/C++ or Python but for all 41 languages supported (and future ones too). In this post, we explore yet another popular plugin using exuberant ctags – TagList.

TagList is capable of showing a list of functions/global variables/class/struct towards one side of the vim. This is similar to some IDEs or event editors like Notepad++. This makes browsing and navigation pretty easy. Our aim will be to be able to navigate current file using TagList window and to always display current function name in the status line.

Screenshot of TagList plugin showing current function name in status line

For this, we will need:

TagList will work only with exuberant ctags and not with any other ctags (specifically GNU ctags). Install exuberant ctags if required. Also ensure that ctags command should actually invoke exuberant ctags.

Install TagList plugin (see elaborate steps in the link). Typically vim plugins are installed by simply copying to $HOME/.vim/plugin directory. If any documentation is there, copy that to $HOME/.vim/doc and re-index vim help by giving “:helptags $HOME/.vim/doc” command in vim.

Now we need to customize the TagList plugin options to get what we want.

" TagList options
let Tlist_Close_On_Select = 1 "close taglist window once we selected something
let Tlist_Exit_OnlyWindow = 1 "if taglist window is the only window left, exit vim
let Tlist_Show_Menu = 1 "show Tags menu in gvim
let Tlist_Show_One_File = 1 "show tags of only one file
let Tlist_GainFocus_On_ToggleOpen = 1 "automatically switch to taglist window
let Tlist_Highlight_Tag_On_BufEnter = 1 "highlight current tag in taglist window
let Tlist_Process_File_Always = 1 "even without taglist window, create tags file, required for displaying tag in statusline
let Tlist_Use_Right_Window = 1 "display taglist window on the right
let Tlist_Display_Prototype = 1 "display full prototype instead of just function name
"let Tlist_Ctags_Cmd = /path/to/exuberant/ctags

nnoremap <F5> :TlistToggle
nnoremap <F6> :TlistShowPrototype

set statusline=[%n]\ %<%f\ %([%1*%M%*%R%Y]%)\ \ \ [%{Tlist_Get_Tagname_By_Line()}]\ %=%-19(\LINE\ [%l/%L]\ COL\ [%02c%03V]%)\ %P

Notice the call to Tlist_Get_Tagname_By_Line() within the statuline. This is what displays the current function name (actually tag which can be class name or struct name or any tag) in the status line. With the above settings, pressing F6 will show the prototype of current function at the bottom.

Pressing F5 will open up the TagList window which will show the list of functions/classes/structs/define etc. The focus is automatically switched to the list window and we can quickly jump to the function/class/struct definition we want to. Upon selecting an entry, the list window is automatically closed.shrink the window back to default size.

Screenshot of TagList window

Now with the full prototype display, it is not very easy to read the actual function name. TagList has a zoom feature to overcome this. Press “x” and the TagList window will enlarge, occupying almost the complete vim screen. Selection of an entry or pressing F5 again will close the window. Pressing x again will shrink the window back to default size.

Screenshot of TagList window zoomed using "x" key

TagList generates its own tags file and does not require the user to provide a tags file. This file is generated every time we switch to a buffer. The tags file is created somewhere in /tmp/ folder. TagList cannot work with a user generated tags file.

Omni Completion in Vim

Posted: August 26, 2010 in C/C++, Programming, Vim
Tags: , , ,

Vim 7 onwards, we have Omni Complete feature. This feature is similar to IntelliSense in Visual Studio or other IDEs. It can show a pop up menu displaying the structure, class or scope context. Here is a screenshot of Omni Complete displaying the class scope in CCCC code base .

OmniCppComplete screen shot in CCCC code.

As you can see, it can show all methods and variables including its original context like type, prototype (in case of method), documentation (if documented), private/protected/public, file defined in etc.

For getting Vim to work like this, you will need

Install all 4 by whatever means (apt-get, yum, build from source). OmniCppComplete and SuperTab are Vim scripts, detailed installation steps are given in the respective links. GNU has a version of ctags. But what we need is exuberant ctags which is different from GNU ctags. Most popular Linux distros bundle GNU ctags, but exuberant ctags should be available from the repository.

Vim is an excellent editor, but does not understand C/C++ natively. This is where ctags comes in to the picture. Ctags will generate a “tags” file which is like an index of the project source code. This tags file is used by Vim and OmniCppComplete script to provide contextual data in a pop-up menu.

To build a ctags tags file, give the command from the top directory of the project. This can be done from any subdirectory too (for even more focused tags file).

ctags -R --languages=C,C++ --c++-kinds=+p --fields=+iaS --extra=+q ./

Run vim from the same directory. By default, Vim will search for tags file beginning from current directory to all parent directories. First tags file found will be used. For better usability of OmniCppComplete, put these in your .vimrc file.

if v:version >= 600
  filetype plugin on
  filetype indent on
else
  filetype on
endif

if v:version >= 700
  set omnifunc=syntaxcomplete#Complete " override built-in C omnicomplete with C++ OmniCppComplete plugin
  let OmniCpp_GlobalScopeSearch   = 1
  let OmniCpp_DisplayMode         = 1
  let OmniCpp_ShowScopeInAbbr     = 0 "do not show namespace in pop-up
  let OmniCpp_ShowPrototypeInAbbr = 1 "show prototype in pop-up
  let OmniCpp_ShowAccess          = 1 "show access in pop-up
  let OmniCpp_SelectFirstItem     = 1 "select first item in pop-up
  set completeopt=menuone,menu,longest
endif

The default key mapping for invoking Omni Completion is C-X C-O. SuperTab plugin can help us invoke Omni Complete when we press TAB key. Also in the default color scheme, vim pop-up menu is in pink. Let’s change that to green.

if version >= 700
  let g:SuperTabDefaultCompletionType = "<C-X><C-O>"
  highlight   clear
  highlight   Pmenu         ctermfg=0 ctermbg=2
  highlight   PmenuSel      ctermfg=0 ctermbg=7
  highlight   PmenuSbar     ctermfg=7 ctermbg=0
  highlight   PmenuThumb    ctermfg=0 ctermbg=7
endif

As we edit code, newer types get defined, but vim continues to use the same tags file. So we need to regularly update our tags file. We can add a shortcut map (say pressing F4) in our .vimrc for the same.

function! UpdateTags()
  execute ":!ctags -R --languages=C++ --c++-kinds=+p --fields=+iaS --extra=+q ./"
  echohl StatusLine | echo "C/C++ tag updated" | echohl None
endfunction
nnoremap <F4> :call UpdateTags()

Vim is all about customizing it to your tastes. You can never get the perfect .vimrc file or plugin collection and settings from anywhere. The trick is to collect cool stuff for your .vimrc and plugins. Vim tips wiki in wikia is a good place for a lot of info. Play around with them. Find what suits best for you. And never forget to share. Happy coding 🙂

In my previous post, I came up with an enhanced pygenie, which could count total complexity of a Python project as a whole. As explained earlier, my motivation is purely for fun. I took a test run of the enhanced pygenie on some of the popular open-source Python projects. And the results are out 🙂

Django 1.2.1

time ./pygenie/pygenie.py Django-1.2.1/ -r
Total Cumulative Statistics
----------------------------------
Type         Count      Complexity
----------------------------------
X            1431             1629
C            2393             2408
M            6299            13000
F            1144             3600
T           11267            20637
----------------------------------

real    0m11.840s
user    0m11.489s
sys     0m0.056s

Zope 2.11.4

time ./pygenie/pygenie.py Zope-2.11.4-final/ -r
Total Cumulative Statistics
----------------------------------
Type         Count      Complexity
----------------------------------
X            3005             4549
C            6453             6500
M           23137            45392
F            4985            12960
T           37580            69401
----------------------------------

Totally 5 files failed
FAILED to process file: /home/aufather/Downloads/Zope-2.11.4-final/lib/python/ZEO/scripts/zeoserverlog.py
FAILED to process file: /home/aufather/Downloads/Zope-2.11.4-final/lib/python/zope/app/testing/ztapi.py
FAILED to process file: /home/aufather/Downloads/Zope-2.11.4-final/lib/python/zope/app/component/back35.py
FAILED to process file: /home/aufather/Downloads/Zope-2.11.4-final/lib/python/zope/app/component/site.py
FAILED to process file: /home/aufather/Downloads/Zope-2.11.4-final/lib/python/zope/rdb/gadfly/sqlbind.py

real    0m37.042s
user    0m36.302s
sys     0m0.140s

Twisted 10.1.0

time ./pygenie/pygenie.py Twisted-10.1.0/ -r
Total Cumulative Statistics
----------------------------------
Type         Count      Complexity
----------------------------------
X             935             1282
C            3736             3922
M           16782            26142
F            1323             3268
T           22776            34614
----------------------------------

Totally 6 files failed
FAILED to process file: /home/aufather/Downloads/Twisted-10.1.0/doc/core/howto/listings/udp/MulticastClient.py
FAILED to process file: /home/aufather/Downloads/Twisted-10.1.0/doc/core/howto/listings/udp/MulticastServer.py
FAILED to process file: /home/aufather/Downloads/Twisted-10.1.0/doc/historic/2003/pycon/deferex/deferex-listing0.py
FAILED to process file: /home/aufather/Downloads/Twisted-10.1.0/doc/historic/2003/pycon/deferex/deferex-listing2.py
FAILED to process file: /home/aufather/Downloads/Twisted-10.1.0/doc/historic/2003/pycon/deferex/deferex-bad-adding.py
FAILED to process file: /home/aufather/Downloads/Twisted-10.1.0/twisted/python/test/test_win32.py

real    0m20.794s
user    0m20.525s
sys     0m0.036s

Even though I started this activity for fun, I was surprised by the speed and utility of pygenie. Some of the uses I can think of

  • Identify syntax errors within a project. Since pygenie uses the same python compiler, if pygenie could not process a file, the file cannot be processed by Python byte code compiler. Zope has 5 files which will not run while Twisted has 6. Django is clean in this front. When we take a closer look, it is test cases, demo code, tutorials, historical code etc that are broken.
  • Object orientation used within a project. The ratio between methods to functions is a fair indicator. The scores for the three projects are Django(5.5), Zope (4.6) and Twisted (12.7). This is a good indicator of existing style of code.
  • Overall size of a project. Pygenie report indicates that Zope is 3.4 times and Twisted is 1.7 times as big as Django.

The original pygenie is available here. Enhanced pygenie is available from my dropbox here.

Recently I measured some code metrics of a C/C++ project at work. I used C and C++ Code Counter(CCCC) and it is pretty good. There was a similar project at work which was almost completely in Python. I wanted to compare the two and searched for a suitable tool to measure McCabe’s Cyclomatic Complexity(CC) for Python code and found pygenie.

Original pygenie, by default, prints Files(X), Classes(C), Methods(M), Functions(F) only if their CC exceeds 7. If –verbose or -v is given, it prints CC irrespective of whether it crosses the threshold of 7 or not. But I wanted overall CC. My main motivation is purely for fun or for interesting numbers. Kind of like the calories burnt counter on a treadmill or a cycle computer for a bicycle. But overall CC can be useful in estimating testing effort.

Here is a sample run of original pygenie on its own source code.

aufather@simstudio:~/Downloads/pygenie_orig$ ./pygenie.py complexity *.py
File: /home/aufather/Downloads/pygenie_orig/cc.py
This code looks all good!

File: /home/aufather/Downloads/pygenie_orig/pygenie.py
Type Name Complexity
--------------------
F    main 8          

File: /home/aufather/Downloads/pygenie_orig/test_cc.py
This code looks all good!

And here is a sample run of modified pygenie on its own source code.

aufather@simstudio:~/Downloads/pygenie$ ./pygenie.py * -c
File: /home/aufather/Downloads/pygenie/cc.py
This code looks all good!

File: /home/aufather/Downloads/pygenie/pygenie.py
This code looks all good!

File: /home/aufather/Downloads/pygenie/test_cc.py
This code looks all good!

Total Cumulative Statistics
----------------------------------
Type         Count      Complexity
----------------------------------
X               3                4
C               8                8
M              31               70
F              13               26
T              55              108
----------------------------------

I added the total cumulative statistics and a summary for each file. Total is represented by ‘T’. This gives a fair idea of how complex the overall project is. ‘F’ stands for functions, ‘M’ stands for methods, ‘C’ stands for classes and ‘X’ stands for files. Also while I was at it, added a whole bunch of command line options to pygenie.

Here is the help for original pygenie. Valid commands are “complexity” or “all”.

aufather@simstudio:~/Downloads/pygenie_orig$ ./pygenie.py all -h
Usage: ./cc.py command [options] *.py

Options:
 -h, --help     show this help message and exit
 -v, --verbose  print detailed statistics to stdout

And the new options added.

aufather@simstudio:~/Downloads/pygenie$ ./pygenie.py -h
Usage: pygenie.py [options] *.py

Options:
 --version             show program's version number and exit
 -h, --help            show this help message and exit
 -c, --complexity      print complexity details for each file/module
 -t THRESHOLD, --threshold=THRESHOLD
 threshold of complexity to be ignored (default=7)
 -a, --all             print all metrics
 -s, --summary         print cumulative summary for each file/module
 -r, --recurs          process files recursively in a folder
 -d, --debug           print debugging info like file being processed
 -o OUTFILE, --outfile=OUTFILE
 output to OUTFILE (default=stdout)

You can get the full zipped source code from my dropbox here

http://dl.dropbox.com/u/10402332/pygenie.zip.

PS: I based my modification on revision 44 of pygenie. I got the original from http://svn.traceback.org/python/cyclic_complexity/.

PPS: I could not find a better way of sharing the code in wordpress. I have updated the test_cc.py too so that unit test do not fail. If any one knows a better way of sharing code, please let me know. Dropbox works well for me.