High Performance Computing

RISC Architecture, Optimization and Benchmarks

Kevin Dowd

Publisher: O'Reilly, 1993, 371 pages

ISBN: 1-56592-032-5

Keywords: Information Systems

Last modified: March 16, 2022, 12:07 a.m.

The latest group of workstations — including IBM's RS/6000, DEC's Alpha/AXP, Sun's SuperSPARC, HP's 700 series, and others — incorporate many advanced features, pipelines, RISC instruction sets, long instruction words, multiprocessing support, etc. These features aren't all new; they've been used on supercomputers for a while. What is new is that "supercomputer" features are now appearing on desktop computers.

What do these changes mean for us? Well, they've made workstations a lot more interesting for "arm-chair" architects. If you'd like to know how the hardware on your desk works, this book is a good place to start; your workstation is alot more complicated than it was in 1980!

If you're a software developer, you probably know that getting the most out of a modern workstation can be tricky. Paying closer attention to memory reference patterns and loop structure can have a huge payoff. This book discusses how modern workstations get their performance, and how you can write code that makes optimal use of your hardware.

If you're involved with purchasing or evaluating workstations, this book will help you make intelligent comparisons. You'll learn what the newest set of buzzwords really means, how caches and other architectural tricks affect performance, how to interpret the commonly quoted industry benchmarks, and how to run your own benchmarks.

Whatever you do, you'll find that this book is an indispensable guide to the workstations of the 90s. Topics covered include:

  • CPU and Memory Architecture for RISC Workstations
  • Optimizing Compilers
  • Timing and Profiling Programs
  • Understanding Parallelism
  • Loop and Memory Reference Optimization
  • Benchmarking
  • Parallel Computing and Multiprocessing
  1. Modern Computing Architectures
    1. What is High Performance Computing
      • Why Worry About Performance?
      • Measuring Performance
      • The Next Step
    2. RISC Computers
      • Why CISC?
        • Space and Time
        • Beliefs About Complex Instruction Sets
        • Memory Addressing Modes
        • Microcode
      • Making the Most of a Clock Tick
        • Pipelines
        • Instruction Pipelining
      • Why RISC?
        • Characterizing RISC
      • A Few More Words About Pipelining
        • Memory Refences
        • Floating point Pipelines
      • Classes of Processors
        • Superscalar Processors
        • Superpipelined Processors
        • Long Instruction Word (LIW)
      • Other Advanced Features
        • Register Bypass
        • Register Renaming
        • Reducing Branch Penalties
      • Closing Notes
    3. Memory
      • Memory Technology
        • Random Access Memory
        • Access Time
      • Caches
        • Direct Mapped Cache
        • Fully Associative Cache
        • Set Associative Cache
        • Uses of Cache
      • Virtual Memory
        • Page Tables
        • Translation Lookaside Buffer
        • Page Faults
      • Improving Bandwidth
        • Large Caches
        • Interleaved Memory Systems
        • Software Managed Caches
        • Memory Reference Reordering
        • Multiple References
      • Closing Notes
  2. Porting and Tuning Software
    1. What an Optimizing Compiler Does
      • Optimizing Compiler Tour
        • Intermediate Language representation
        • Basic Blocks
        • Forming a DAG
        • Uses and Definitions
        • Loops
        • Object Code Generation
      • Classical Optimizations
        • Copy Propagation
        • Constant Folding
        • Dead Code Removal
        • Strength Reduction
        • Variable Renaming
        • Common Subexpression Elimination
        • Loop Invariant Code Motion
        • Induction Variable Simplification
        • Register Variable Detection
      • Closing Notes
    2. Clarity
      • Under Construction
      • Comments
      • Clues in the Landscape
      • Variable Names
      • Variable Types
      • Named Constants
      • INCLUDE Statements
      • Use of COMMON
      • The Shape of Data
      • Closing Notes
    3. Finding Porting Problems
      • Problems in Argument Lists
        • Aliasing
        • Argument Type Mismatch
      • Storage Issues
        • Equivalenced Storage
        • Memory Reference Alignment Restrictions
      • Closing Notes
    4. Timing and Profiling
      • Timing
        • Timing a Whole program
        • Timing a Portion of the Program
        • Using Timing Information
      • Subroutine Profiling
        • prof
        • gprof
        • gprof's Flat Profile
        • Accumulating the Results of Several gprof Runs
        • A Few Words About Accuracy
      • Basic Block Profilers
        • tcov
        • lprof
        • pixie
      • Closing Notes
    5. Understanding Parallelism
      • A Few Important Constants
        • Constants
        • Scalars
      • Vectors and Vector Processing
      • Dependencies
        • Data Dependencies
        • Control Dependencies
      • Ambiguous References
      • Closing Notes
    6. Eliminating Clutter
      • Subroutine Calls
        • Macros
        • Procedure Inlining
      • Branches
        • Wordy Conditionals
        • Redundant Tests
      • Branches Within Loops
        • Loop Invariant Conditionals
        • Loop Index Dependent Conditionals
        • Independent Loop Conditionals
        • Dependent Loop Conditionals
        • Reductions
        • Conditionals That Transfer Control
        • A Few Words About Branch Probability
      • Other Clutter
        • Data Type Conversions
        • Doing Your Own Common Subexpression Elimination
        • Doing Your Own Code Motion
        • Handling Array Elements in Loops
      • Closing Notes
    7. Loop Optimizations
      • Basic Loop Unrolling
      • Qualifying Candidates for Loop Unrolling
        • Loops with Low Trip Counts
        • Fat Loops
        • Loops Containing Procedure Calls
        • Loops with Branches in Them
        • Recursive Loops
      • Negatives of Loop Unrolling
        • Unrolling by the Wrong Factor
        • Register Trashing
        • Instruction Cache Miss
        • Other Hardware Delays
      • Outer Loop Unrolling
        • Outer Loop Unrolling to Expose Computations
      • Associative Transformations
        • Reductions
        • Dot Products and daxpys
        • Matrix Multiplication
      • Loop Interchange
        • Loop Interchange to Move Computations to the Center
      • Operation Counting
      • Closing Notes
    8. Memory Reference Optimizations
      • Memory Access Patterns
        • Loop Interchange to Ease Memory Access Patterns
        • Blocking to Ease Memory Access Patterns
      • Ambiguity in Memory References
        • Ambiguity in Vector Operations
        • Pointer Ambiguity in Numerical C Applications
      • Programs That Require More Memory Than You Have
        • Software-Managed, Out-of-Core Solutions
        • Virtual Memory
      • Instruction Cache Ordering
      • Closing Notes
    9. Language Support for Performance
      • Subroutine Libraries
        • Vectorizing Preprocessors
      • Explicitly Parallel Languages
        • Fortran 90
        • High Performance Fortran (HPF)
        • Explicitly Parallel Programming Environments
      • Closing Notes
  3. Evaluating Performance
    1. Industry Benchmarks
      • What is a MIP?
        • VAX MIPS
        • Dhrystones
      • Floating Point Benchmarks
        • Linpack
        • Whetstone
      • The SPEC Benchmark
        • Individual SPEC Benchmarks
        • 030.matrix300 Was Deleted
      • Transaction Processing Benchmarks
        • TPC-A
        • TPC-B
        • TPC-C
      • Closing Notes
    2. Running Your Own Benchmarks
      • Choosing What to Benchmark
        • Benchmark Run Time
        • Benchmark Memory Size
        • Kernels and Sanitized Benchmarks
        • Benchmarking Third Party Codes
      • Types of Benchmarks
        • Single Stream Benchmarking
        • Throughput Benchmarks
        • Interactive Benchmarks
      • Preparing the Code
        • Portability
        • Making a Benchmark Kit
        • Benchmarking Checklist
      • Closing Notes
  4. Parallel Computing
    1. Large Scale Parallel Computing
      • Problem Decomposition
        • Data Decomposition
        • Control Decomposition
        • Distributing Work Facility
      • Classes of Parallel Architectures
      • Single Instruction, Multiple Data
        • SIMD Architecture
        • Mechanics of Programming a SIMD Machine
      • Multiple Instruction, Multiple Data
        • Distributed Memory MIMD Architecture
        • Programming a Distributed Memory MIMD Machine
        • A Few Words About Data Layout Directives
        • Virtual Shared Memory
      • Closing Notes
    2. Shared-Memory Multiprocessors
      • Symmetric Multiprocessing
        • Operating System Support for Multiprocessing
        • Multiprocessor Architecture
      • Shared Memory
        • Conservation of Bandwidth
        • Coherency
        • Data Placement
      • Multiprocessor Software Concepts
        • Fork and Join
        • Synchronization with Locks
        • Synchronization with Barriers
      • Automatic Parallelization
        • Loop Splitting
        • Subroutine Calls in Loops
        • Nested Loops
        • Manual Parallelism
      • Closing Notes
  1. Processor Overview
  2. How to Tell When Loops Can Be Interchanged
  3. Obtaining Sample Programs and Problem Set Answers
    • FTP
    • FTPMAIL
    • BITFTP
    • UUCP

Reviews

High Performance Computing

Reviewed by Roland Buresund

Very Good ******** (8 out of 10)

Last modified: May 21, 2007, 3:06 a.m.

Still valid and interesting. Wish I had an update.

Comments

There are currently no comments

New Comment

required

required (not published)

optional

required

captcha

required