22-05-2012, 12:48 PM
Blue Gene systems
Blue Gene systems.doc (Size: 324.5 KB / Downloads: 28)
Introduction
This article is about the supercomputer. For the musician, see ‘Blue’Gene Tyranny. This article may need to be rewritten entirely to comply with Wikipedia's quality standards.
Blue Gene is an IBM project aimed at designing a supercomputer that can reach an operating speeds in the PELOPS(petaFLOPS) range, with low power consumption.The project created three generations of supercomputers, Blue Gene/L, Blue Gene/P, and Blue Gene/Q. Blue Gene systems have led for several years the Top500 ranking of the most powerful supercomputers and have been deployed in many supercomputing centers. The project was awarded the 2008 National medal of Technology and Innovation.U.S.Preasident Barack Obama bestowed the award on October 7, 2009.
History:
In December 1999, IBM announced a $100 million research initiative for a five-year effort to build a massively parallel computer, to be applied to the study of biomolecular phenomena such as protein folding. The project had two main goals: to advance our understanding of the mechanisms behind protein folding via large-scale simulation, and to explore novel ideas in massively parallel machine architecture and software. Major areas of investigation included: how to use this novel platform to effectively meet its scientific goals, how to make such massively parallel machines more usable, and how to achieve performance targets at a reasonable cost, through novel machine architectures. The initial design for Blue Gene was based on an early version of the Cyclops64 architecture, designed by Monty Denneau. The initial research and development work was pursued at IBM T.J.Watson Research Center .
Major features:
The Blue Gene/L supercomputer was unique in the following aspects:[9]
• Trading the speed of processors for lower power consumption. Blue Gene/L used low frequency and low power embedded PowerPC cores with floating point accelerators.While the performance of each chip was relatively low, the system could achieve better computer to energy ratio, for applications that could use larger numbers of nodes.
• Dual processors per node with two working modes: co-processor mode where one processor handles computation and the other handles communication; and virtual-node mode, where both processors are available to run user code, but the processors share both the computation and the communication load.
• System-on-a-chip design. All node components were embedded on one chip, with the exception of 512 MB external DRAM.
• A large number of nodes (scalable in increments of 1024 up to at least 65,536)
• Three-dimensional torus interconnect with auxiliary networks for global communications (broadcast and reductions), I/O, and management
• Lightweight OS per node for minimum system overhead (system noise
Blue Gene/Q:
The third supercomputer design in the Blue Gene series, Blue Gene/Q aims to reach 20 Petaflops in the 2012 time frame. It continues to expand and enhance the Blue Gene/L and /P architectures.
Design:
• The Blue Gene/Q Compute chip is an 18 core chip. The 64-bit PowerPC A2 processor cores are 4-way simultaneously multithreaded, and run at 1.6 GHz. Each processor core has a quad SIMD double precision floating point unit. The processor cores are linked by a crossbar switch to a 32 MB eDRAM L2 cache, operating at half core speed. The L2 cache is multi-versioned, supporting transactional memory and speculative execution, and has hardware support for atomic operations.[27] L2 cache misses are handled by two built-in DDR3 memory controllers running at 1.33 GHz. The chip also integrates logic for chip-to-chip communications in a 5D torus configuration, with 2GB/s chip-to-chip links. 16 Processor cores are used for computing, and a 17th core for operating system assist functions such as interrupts, asynchronous I/O, MPI pacing and RAS. The 18th core is used as a spare in case one of the other cores is permanently damaged, like in manufacturing, but is normally shut down. The Blue Gene/Q chip is manufactured on IBM's copper SOI process at 45 nm, and will deliver 205 GFLOPS at 1.6 GHz and draw 55 watts. It is 19×19 mm large (359.5 mm²) and comprises 1.47 billion transistors. The chip is mounted on a compute card along with 16 GB DDR3 DRAM (i.e., 1 GB for each user processor core).[28]
• A Q32[29] compute drawer will have 32 compute cards, each water cooled and connected into a 5D network torus.[30]
• Racks will have 32 compute drawers for a total of 1024 compute nodes, 16,384 user cores and 16 TB RAM.[30]