link

March 1, Tuesday
12:00 – 13:30

Operating-Systems for Supercomputers
Computer Science seminar
Lecturer : Edi Shmueli
Affiliation : IBM systems & technology group
Location : 202/37
Host : Dr. Eitan Bachmat
The BlueGene supercomputers in production today run a small limited-function kernel called CNK (Compute-Node Kernel), which is designed to expose bare-metal performance to the application. The problem is that this comes at the expense of limiting BlueGene to only executing HPC applications, while potentially the platform can execute any type of workload. A possible solution is to replace CNK with standard Linux, but in order to get this accepted, Linux must first demonstrate performance that is comparable to CNK.

In a research conducted at the IBM T.J. Watson research center, for the first time we booted Linux on the compute-nodes of BlueGene/L and compared its performance to CNK. We identified two major obstacles to performance under Linux: the well known daemon noise problem that affects application scalability, and the high cost of TLB (Translation Lookaside Buffer) misses that affects performance at the node-level. We then leveraged unique hardware features in the system using proper software support in the Linux kernel, and demonstrated comparable performance to CNK over a wide set of HPC benchmarks.

My talk has two parts: I will begin with an introduction to supercomputing, describe the BlueGene software and hardware architecture, the structure of HPC applications, and related performance issues. I will then focus on the Linux research, present the experiments we performed, the issues we encountered, and the solutions we developed to address them.