bw_mem(8)
NAME
bw_mem - time memory bandwidth
SYNOPSIS
bw_mem_cp [ -P <parallelism> ] [ -W <warmups> ] [ -N <repetitions> ] size rd|wr|rdwr|cp|fwr|frd|bzero|bcopy [align]
DESCRIPTION
bw_mem allocates twice the specified amount of memory, zeros it, and
then times the copying of the first half to the second half. Results
are reported in megabytes moved per second.
The size specification may end with ``k'' or ``m'' to mean kilobytes (*
1024) or megabytes (* 1024 * 1024).
OUTPUT
Output format is "%0.2f %.2f\n", megabytes, megabytes_per_second, i.e.,
8.00 25.33
There are nine different memory benchmarks in bw_mem. They each measure slightly different methods for reading, writing or copying data.
- rd measures the time to read data into the processor. It computes
- the sum of an array of integer values. It accesses every fourth word.
- wr measures the time to write data to memory. It assigns a con
- stant value to each memory of an array of integer values. It accesses every fourth word.
- rdwr measures the time to read data into memory and then write data
- to the same memory location. For each element in an array it adds the current value to a running sum before assigning a new (constant) value to the element. It accesses every fourth word.
- cp measures the time to copy data from one location to another. It
- does an array copy: dest[i] = source[i]. It accesses every fourth word.
- frd measures the time to read data into the processor. It computes
- the sum of an array of integer values.
- fwr measures the time to write data to memory. It assigns a con
- stant value to each memory of an array of integer values.
- fcp measures the time to copy data from one location to another. It
- does an array copy: dest[i] = source[i].
- bzero measures how fast the system can bzero memory.
- bcopy measures how fast the system can bcopy data.
MEMORY UTILIZATION
This benchmark can move up to three times the requested memory. Bcopy
will use 2-3 times as much memory bandwidth: there is one read from the
source and a write to the destionation. The write usually results in a
cache line read and then a write back of the cache line at some later
point. Memory utilization might be reduced by 1/3 if the processor
architecture implemented ``load cache line'' and ``store cache line''
instructions (as well as ``getcachelinesize'').
SEE ALSO
AUTHOR
Carl Staelin and Larry McVoy
- Comments, suggestions, and bug reports are always welcome.