March 1, Tuesday
12:00 – 13:30
In a research conducted at the IBM T.J. Watson research center, for the first time we booted Linux on the compute-nodes of BlueGene/L and compared its performance to CNK. We identified two major obstacles to performance under Linux: the well known daemon noise problem that affects application scalability, and the high cost of TLB (Translation Lookaside Buffer) misses that affects performance at the node-level. We then leveraged unique hardware features in the system using proper software support in the Linux kernel, and demonstrated comparable performance to CNK over a wide set of HPC benchmarks.
My talk has two parts: I will begin with an introduction to supercomputing, describe the BlueGene software and hardware architecture, the structure of HPC applications, and related performance issues. I will then focus on the Linux research, present the experiments we performed, the issues we encountered, and the solutions we developed to address them.