BlueGene/L System Software

project uploader · 01-08-2012, 12:43 PM

BlueGene/L System Software

.ppt

BlueGeneL System Software.ppt (Size: 291 KB / Downloads: 26)
Programming on BG/L
A single application program image
Running on tens of thousands of compute nodes
Communicating via message passing
Each image has its own copy of
Memory
File descriptors
A “job” is encapsulated in a single host-side process
A merge point for compute node stdout streams
A control point for
Signaling (ctl-c, kill, etc)
Debugging (attach, detach)
Termination (exit status collection and summary)
Cross compile the source code
Place executable onto BG/L machine’s shared filesystem
Run it
“blrun <job information> <program name> <args>”
Stdout of all program instances appears as stdout of blrun
Files go to user-specified directory on shared filesystem
blrun terminates when all program instances terminate
Killing blrun kills all program instances
Programming Models
“Coprocessor model”
64k instances of a single application program
each has 255M address space
each with two threads (main, coprocessor)
non-coherent shared memory
“Virtual node model”
128k instances
127M address space
one thread (main)
Programming Model
Does a job behave like
A group of processes?
Or a group of threads?
A little bit of each
A process group?
Yes
Each program instance has its own
Memory
File descriptors
No
Can’t communicate via mmap, shmat
Can’t communicate via pipes or sockets
Can’t communicate via signals (kill)
A thread group?
Yes
Job terminates when
All program instances terminate via exit(0)
Any program instance terminates
Voluntarily, via exit(!0)
Involuntarily, via uncaught signal (kill, abort, segv, etc)
No
Each program instance has own set of file descriptors
Each has own private memory space
Compilers and libraries
GNU C, Fortran, C++ compilers can be used with BG/L, but they do not exploit 2nd FPU
IBM xlf/xlc compilers have been ported to BG/L, with code generation and optimization features for dual FPU
Standard glibc library
MPI for communications
System calls
Traditional ANSI + “a little” POSIX
I/O
Open, close, read, write, etc
Time
Gettimeofday, etc
Signal catchers
Synchronous (sigsegv, sigbus, etc)
Asynchronous (timers and hardware events)
System calls
No “unix stuff”
fork, exec, pipe
mount, umount, setuid, setgid
No system calls needed to access most hardware
Tree and torus fifos
Global OR
Mutexes and barriers
Performance counters
Mantra
Keep the compute nodes simple
Kernel stays out of the way and lets the application program run
Software Stack in BG/L Compute Node
CNK controls all access to hardware, and enables bypass for application use
User-space libraries and applications can directly access torus and tree through bypass
As a policy, user-space code should not directly touch hardware, but there is no enforcement of that policy
What happens under the covers?
The machine
The job allocation, launch, and control system
The machine monitoring and control system
The machine
Nodes
IO nodes
Compute nodes
Link nodes
Communications networks
Ethernet
Tree
Torus
Global OR
JTAG
The IO nodes
1024 nodes
talk to outside world via ethernet
talk to inside world via tree network
not connected to torus
embedded linux kernel
purpose is to run
network filesystem
job control daemons
The compute nodes
64k nodes, each with 2 cpus and 4 fpus
application programs execute here
custom kernel
non-preemptive
application program has full control of all timing issues
kernel and application share same address space
kernel is memory protected
kernel provides
program load / start / debug / termination
file access
all via message passing to IO nodes
The link nodes
Signal routing, no computation
Stitch together cards and racks of io and compute nodes into “blocks” suitable for running independent jobs
Isolate each block’s tree, torus, and global OR network

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Software Life-Cycle Models ppt	seminar flower	1	3,852	22-09-2017, 10:54 AM Last Post: jaseela123
	Software Crisis pdf	study tips	1	2,117	21-09-2017, 04:31 PM Last Post: jaseela123
	Software Test Factory (A proposal of a process model to create a Test Factory) Semi	seminar code	1	680	15-09-2017, 01:25 PM Last Post: jaseela123
	Software Licence Agreement	project girl	1	1,475	13-09-2017, 10:13 AM Last Post: jaseela123
	The challenges of software reliability and Maintenance Report	project girl	1	1,018	11-09-2017, 12:53 PM Last Post: jaseela123
	JAVA Software Development Paradigm	seminar ideas	1	1,880	06-09-2017, 12:56 PM Last Post: jaseela123
	Software Change Management -The way of Managing Change in Enterprise Software System	project uploader	0	2,214	25-08-2017, 09:32 PM Last Post: project uploader
	Bioinformatiocs & Role Of Software In It Full Download Seminar Report and Paper Prese	computer science crazy	0	10,351,613	25-08-2017, 09:32 PM Last Post: computer science crazy
	Designing Software\'s Interface for Quadriplegic People	nit_cal	0	12,998,575	25-08-2017, 09:32 PM Last Post: nit_cal
	Agile Software Development Methodologies	computer science crazy	0	14,179,456	25-08-2017, 09:32 PM Last Post: computer science crazy

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.