Performance comes directly from faster devices and indirectly from using more devices in parallel. Parallelism can be helpfully divided into instruction-level parallelism, data-level parallelism, and thread-level parallelism. Instruction-level parallelism has been extensively mined, but there is now broad interest in data-level parallelism for example, due to graphics processing units and thread-level parallelism for example, due to chip multiprocessors.

Computer-system performance requires attention beyond processors to memories such as, dynamic random-access memorystorage for example, disksand networking. Computers today are implemented with integrated circuits chips that incorporate numerous devices transistors whose population measured as transistors per chaip has been doubling every 1.

Large parts of the potential performance gain due to device innovations have been usefully applied to productivity gains for example, via instruction-set compatibility and layers of software. Improvements in computer performance and cost have enabled creative product innovations that generated computer sales that, in turn, enabled a virtuous cycle of computer and product innovations.

Measuring how well machines perform their tasks is of vital importance for improving them, conceiving better machines, and deploying them for economic benefit.

Computer systems are machines designed to perform information processing and computation. Their performance is typically measured by how much information processing they can accomplish per unit time, Page 59 Share Cite Suggested Citation: The Future of Computing Performance: Game Over or Next Level?.

The National Academies Press. Those perspectives reflect the broad array of uses and the diversity of end users of modern computer systems. In general, the systems are deployed and valued on the basis of their ability to improve productivity.

For some users, such as scientists and information technology specialists, the improvements can be measured in quantitative terms. For others, such as office workers and casual home users, the performance and resulting productivity gains are more qualitative.

Thus, no single measure of performance or productivity adequately characterizes computer systems for all their possible uses.

Although the raw computational capabilities of the central processing unit CPU core tend to get the most attention, the reality is that performance comes from a complex balance among many cooperating subsystems.

Similarly, the interactive responsiveness perceived by end users of personal computers and hand-held devices is typically defined more by the characteristics of the operating system, the graphical user interface GUIand the storage components than by the CPU core.

Nevertheless, to understand and reason about performance at a high level, it is important to understand the fundamental lower-level contributors to performance. Page 60 Share Cite Suggested Citation: Although detailed technical descriptions of them are beyond the intended scope of this report, the brief descriptions below will provide context for the discussions that follow.

Operating frequency defines the basic clock rate at which the CPU core runs.

Modern high-end processors run at several billion cycles per second. Operating frequency is a function of the low-level transistor characteristics in the chip, the length and physical characteristics of the internal chip wiring, the voltage that is applied to the chip, and the degree of pipelining used in the microarchitecture of the machine.

The last 15 years have seen dramatic increases in the operating frequency of CPU cores. As an unfortunate side effect of that growth, the maximum operating frequency has often been used as a proxy for performance by much of the popular press and industry marketing campaigns.

That can be misleading because there are many other important low-level and system-level measures to consider in reasoning about performance. Instruction count is the number of native instructions—instructions written for that specific CPU—that must be executed by the CPU to achieve correct results with a given computer program.

Machine instructions are specific to the instruction set architecture ISA that a given computer architecture or architecture family implements.

For a given high-level program, the machine instruction count varies when it executes on different computer systems because of differences in the underlying ISA, in the microarchitecture that implements the ISA, and in the tools used to compile the program.

Although this section of the report focuses mostly on the low-level raw performance measures, the role of the compiler and other modern software system technologies are also necessary to understand performance fully. Instructions per cycle refers to the average number of instructions that a particular CPU core can execute and complete in each cycle.

IPC is a strong function of the underlying microarchitecture, or machine organization, of the CPU core. Hennessy and David A. Patterson,Computer Architecture: Page 61 Share Cite Suggested Citation: Some performance assessments focus on the peak capabilities of the machines; for example, the peak performance of the IBM Power 7 is six instructions per cycle, and that of the Intel Pentium, four.

In reality, those and other sophisticated CPU cores actually sustain an average of slightly more than one instruction per cycle when executing many programs.©British!Council!!!

The Storage Performance Council ranked the system's work rate with its SPC-1 that measures the performance of a block-access array with a single workload.

