King Abdullah University of Science & Technology (KAUST), 2012 Summer

High Performance Computing, eXtreme Technical Computing

Professor Craig C. Douglas



Homework Covers Worth
hw1 Historical data and future predictions 20%
hw2 Memory bandwidth 20%
hw3 OpenMP and multithreading 30%
hw4 Cache aware algorithms 30%

Advice, Hints, whatever...

All homework should be emailed to me before class on the date due unless another specific time is listed. Only one person in your group needs to send me the solution (preferably as a .tgz or .zip file with everything in it). Always put MA5490 in the Subject line of your message. I will send you a reply (from GMail) when I get your mail. If you do not get a reply, I might not have received your email.

You are free to program in C, C++, Fortran, or something else if you confirm it with me. I find C to be convenient myself, however, and the examples and handouts will all be in C.

If I give you software, check the Notes page often to see if there is an update. I take suggestions for improved software. If you think you found a bug, please send me information about it. I am always happy to see bug fixes or better code. Just because I have been programming since 1968 does not mean I write the best code.

What you should turn in:

As stated in the first class, it is nearly impossible to cheat in this class as long as your group works on the assignments in the computer lab and turns in what you worked on. You are allowed to discuss concepts with other groups. Please do not copy verbatim another group's code, however.


Part 1

For the following computer systems, research information about them (consult

Find out the following information for each system:

  1. Clock Speed
  2. Number of Nodes
  3. Number of Cores or Processors on a Node
  4. Memory Per Node
  5. Peak Speed (in floating point operations per second) per Processing Element, Per Node and for the whole system.
  6. Linpack performance for this system.
  7. Memory System Bandwidth: how many bytes can be transferred in a node of the system per second.
  8. What is the network architecture.
  9. Network Bandwidth: how many bytes can be sent off of a node in a second.

Part 2

Review the historical data on the top 500 systems in the world which is available at or in the free Apple App Store Top500 app. For example the top 500 lists are available in Microsoft Excel format for each year since 1993. By collecting the historical data, make plots of the performance of the #1 system, the #100 system, and the #500 system for each year since June 2000. Using these plots project what performance the #1 system, the #100 system and the #500 system will have in June 2012, November 2015, and November 2018. Use Matlab to make the plots.

What to turn in

For Part 1, turn in a report using one of the writing systems in the Advice section. A table is sufficient. For Part 2, turn in your Matlab script and a report on the raw numbers with the graphs.


Find the STREAM benchmark on the Internet and investigate its home web site. Download the code and then benchmark

  1. A computer in the classroom.
  2. Your personal computer(s), preferably using more than one operating system.
  3. Any other computer that you find interesting to benchmark including multicore or distributed memory computers. The more the merrier to a degree.

What to turn in

Turn in a report describing the computers benchmarked and their scores. During the semester, you may want to add to your list as we progress through OpenMP and MPI.


We will explore overlapped I/O and computing using multiple threads on a single CPU:

1. Download the software (hw-mm.tgz or and familiarize yourself with it. Inside the packed files is a subdirectory hw-mm containing several files:

A simple way to see how this all works is to unpack the files. Then in a Terminal window in the hw-mm directory type the command make run.

2. Start by implementing MM-mult using a simple formula without OpenMP.

Use the simple formula for cij in (2) above. The tricky part is that you have to get the right blocks of A and B into memory before you can compute any element of C. Work that out on paper before programming and include it in your homework documentation.

3. Add OpenMP last.

You will need to implement a way of communicating with different threads using shared memory to tell one or more threads what disk block(s) to read or write. Your computing thread will need to know when data is available. It will also need to schedule blocks to be brought into memory (so it can compute on blocks already in memory). Once you have read enough blocks into memory, you should be able to make MM-mult compute bound (able to compute without waiting for input from the disk files).

What to turn in

Turn in a report describing the results, the files needed to make an executable code, and how to make and run the code. Give conditions when your code is compute bound based on the computer you used (and state what that was in the report).


We will explore several key areas for cache aware programming in this homework:

1. Download the software (hw-db.tgz or and familiarize yourself with it. Inside the packed files are containing several files:

A simple way to see how this all works is to create a directory called hw-db, change to it, and unpack the files. Then in a Terminal window in the hw-db directory type the command make run.

2. You should add functionality to make accessing the hash table cache aware once the table has been created (but possibly before it has been accessed a lot of times).

3. Do experiments to demonstrate that your code really is better at using the cache than the original code and clearly document what experiments you did and why your code is better at using the cache. It is easy to do this assignment all wrong. Here are some hints:

4. Useful things to think about are

What to turn in


Craig C. Douglas

Last modified: