17-01-2013, 02:39 PM
Vector Processing Principles:
Vector Processing Principles.docx (Size: 78.45 KB / Downloads: 38)
Vector Processing Principles:
• A vector is a set of scalar data items, all of the same type, stored in memory. Usually, the vector elements are ordered to have a fixed addressing increment between successive elements called the stride.
• A vector processor is an ensemble of hardware resources, including vector registers, functional pipelines, processing elements, and register counters, for performing vector operations. Vector processing occurs when arithmetic or logical operations are applied to vectors. The conversion from scalar processing to vector code is called vectorization.
• Vector processing speedup 10..20 compared with scalar processing. A compiler capable of vectorization is called vectorizing compiler or vectorizer.
Vector instructions types:
1. Vector-vector instructions One or two vector operands are fetched form the respective vector registers, enter through a functional pipeline unit, and produce result in another vector register.
2. Vector-scalar instructions
3. vector-memory instructions Store-load of vector registers
4. Vector reduction instructions maximum, minimum, sum, mean value.
5. Gather and scatter instructions Two instruction registers are used to gather or scatter vector elements randomly through the memory (operations with sparse vectors).
6. Masking instructions The Mask vector is used to compress or to expand a vector to a shorter or longer index vector (bit per index correspondence).
Vector-access memory schemes:
• Vector operands may have arbitrary length.
• Vector elements are not necessarily stored in contiguous memory locations.
• To access a vector a memory, one must specify its base, stride, and length.
• Since each vector register has fixed length, only a segment of the vector can be loaded into a vector register.
• Vector operands should be stored in memory to allow pipelined and parallel access. Access itself should be pipelined.
• C-Access memory organization The m-way low-order memory structure, allows m words to be accessed concurrently and overlapped.
• S-Access memory organization All modules are accessed simultaneously storing consecutive words to data buffers. The low order address bits are used to multiplex the m words out of buffers.
• C/S-Access memory organization.
C-access:
• Eight-way interleaved memory (m = 8 and w = 8). m is called the degree of interleaving.
• The major cycle is the total time required to complete the access of a single word form a memory. The minor cycle is the actual time needed to produce one word, assuming overlapped access of successive memory modules separated in every memory cycle .
SIMD Computer architecture:
• SIMD models diffrerentiates on base of memory distribution and addressing scheme used. Most SIMD computers use a single control unit and distributed memories, except for a few that use associative memories.
• Distributed memory model : A distributed memory SIMD consists of an array of PEs (supplied with local memory) which are controlled by the array control unit. Program and data are loaded into the control memory through the host computer and distributed from there to PEs local memories.