Seminar Topics & Project Ideas On Computer Science Electronics Electrical Mechanical Engineering Civil MBA Medicine Nursing Science Physics Mathematics Chemistry ppt pdf doc presentation downloads and Abstract

Full Version: Blue Gene/L System Software Organization
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Abstract.
The Blue Gene/L supercomputer will use system-on-a-chip integration
and a highly scalable cellular architecture. With 65,536 compute nodes,
Blue Gene/L represents a new level of complexity for parallel system software,
with specific challenges in the areas of scalability, maintenance and usability. In
this paper we present our vision of a software architecture that faces up to these
challenges, and the simulation framework that we have used for our experiments.
1 Introduction
In November 2001 IBM announced a partnership with Lawrence Livermore National
Laboratory to build the Blue Gene/L (BG/L) supercomputer, a 65,536-node machine designed
around embedded PowerPC processors. Through the use of system-on-a-chip integration
[10], coupled with a highly scalable cellular architecture, Blue Gene/L will deliver
180 or 360 Teraflops of peak computing power, depending on the utilization mode.
Blue Gene/L represents a new level of scalability for parallel systems. Whereas existing
large scale systems range in size from hundreds (ASCI White [2], Earth Simulator [4])
to a few thousands (Cplant [3], ASCI Red [1]) of compute nodes, Blue Gene/L makes
a jump of almost two orders of magnitude.
The system software for Blue Gene/L is a combination of standard and custom
solutions. The software architecture for the machine is divided into three functional
entities (similar to [13]) arranged hierarchically: a computational core, a control infrastructure
and a service infrastructure. The I/O nodes (part of the control infrastructure)
execute a version of the Linux kernel and are the primary off-load engine for most system
services. No user code directly executes on the I/O nodes. Compute nodes in the
computational core execute a single user, single process minimalist custom kernel, and
are dedicated to efficiently run user applications. No system daemons or sophisticated
system services reside on compute nodes. These are treated as externally controllable
entities (i.e. devices) attached to I/O nodes. Complementing the Blue Gene/L machine
proper, the Blue Gene/L complex includes the service infrastructure composed of commercially
available systems, that connect to the rest of the system through an Ethernet
network. The end user view of a system is of a flat, toroidal, 64K-node system, but the
system view of Blue Gene/L is hierarchical: the machine looks like a 1024-node Linux
cluster, with each node being a 64-way multiprocessor. We call one of these logical
groupings a processing set or pset.
The scope of this paper is to present the software architecture of the Blue Gene/L machine
and its implementation. Since the target time frame for completion and delivery
of Blue Gene/L is 2005, all our software development and experiments have been conducted
on architecturally accurate simulators of the machine. We describe this simulation
environment and comment on our experience.
The rest of this paper is organized as follows. Section 2 presents a brief description
of the Blue Gene/L supercomputer. Section 3 discusses the system software. Section 4
introduces our simulation environment and we conclude in Section 5.
2 An Overview of the Blue Gene/L Supercomputer
Blue Gene/L is a new architecture for high performance parallel computers based on
low cost embedded PowerPC technology. A detailed description of Blue Gene/L is provided
in [8]. In this section we present a short overview of the hardware as background
for our discussion on its system software and its simulation environment.
2.1 Overall Organization
The basic building block of Blue Gene/L is a custom system-on-a-chip that integrates
processors, memory and communications logic in the same piece of silicon. The BG/L
chip contains two standard 32-bit embedded PowerPC 440 cores, each with private L1
32KB instruction and 32KB data caches. Each core also has a 2KB L2 cache and they
share a 4MB L3 EDRAM cache. While the L1 caches are not coherent, the L2 caches
are coherent and act as a prefetch buffer for the L3 cache.
Each core drives a custom 128-bit double FPU that can perform four double precision
floating-point operations per cycle. This custom FPU consists of two conventional
FPUs joined together, each having a 64-bit register file with 32 registers. One of the conventional
FPUs (the primary side) is compatible with the standard PowerPC floatingpoint
instruction set. We have extended the PPC instruction set to perform SIMD-style
floating point operations on the two FPUs. In most scenarios, only one of the 440 cores
is dedicated to run user applications while the second processor drives the networks. At
a target speed of 700 MHz the peak performance of a node is 2.8 GFlop/s. When both
cores and FPUs in a chip are used, the peak performance per node is 5.6 GFlop/s.
The standard PowerPC 440 cores are not designed to support multiprocessor architectures:
the L1 caches are not coherent and the architecture lacks atomic memory
operations. To overcome these limitations BG/L provides a variety of synchronization
devices in the chip: lockbox, shared SRAM, L3 scratchpad and the blind device

Download full report
https://asc.llnl.gov/computing_resources...ftware.pdf