MA 5490, Spring 2009
Professor Craig C. Douglas
Monday - Wednesday 1:20-2:35, Ross Hall 247
Notes   Homework   Syllabus

Homework, Projects, Quizzes, and Exams

All homework should be emailed to me. I require that you send me your files as a single zip or gzipped tar file using the naming convention, or lastname.tgz.

I will send you a reply when I get your email. If you do not get a reply, I did not get your email. You are responsible for having an email account that can communicate successfully with me. I am flexible on which of my email accounts you use up to a degree. I will send you email from GMail, however. Please send email to my GMail account since that is where mail to my university address will forward to eventually.

Description Homework Quiz Due Worth
hw1 X   1/12/2008 20
hw2 X   1/28/2008 100
quiz1 and solution   X 2/2/2009 100
hw3 X   2/27/2009 100
hw4 X   3/6/2009 100
hw5 X   4/3/2009 150
hw6 X   4/10/2009 150
hw7 (extra credit) X   4/30/2009 100


You are expected to research a topic in parallel computing and make a presentation using the computer projection system in the classroom by the end of April. The presentations should be about 20 minutes in length. A good rule of thumb is no more than 20 slides in 20 minutes. Do not read the slides to the class. Talk about the slide's contents and use the slide as a guide to the spoken material. Your presentation should be of the same quality of one given at an international conference (meaning of higher quality than s student seminar) or a thesis defense (though of not as much depth as a thesis defense, however). You should provide the following material:

You may find the examples in a course I taught at the University of Kentucky useful to see what students did in 75 minutes for a variety of topics. Obviously, I do not want to see those slides over again. If I fall asleep during your presentation, then I will not know what you did and will not grade it highly. I suggest coming to see me a week before your presentation to the class to get feedback from me.

Who Topic When
Cerwinsky/Wiblimo Random number generation 4/22/2009
Burgess/Flynt Computational fluid dynamics 4/22/2009
Rigelo/Price Galerkin methods and adaptive mesh generation? 4/27/2009
Grace/Tang General relativity 4/27/2009
Skari/Wibisono Subsurface flow modeling 4/27/2009
Raghu/Sivasankaran Graph Algorithms 4/29/2009

The project is worth the sum of the points for hw1 - hw6.


Fill out the course questionaire and email it back to me as an attachment. Note that it is a .doc Word document. Please return it as such (not pdf, docx, txt, rtf, xls, or anything else).


Use the Matlab code gen_sparse.m to generate a large sparse matrix. Write a C program that is modular. You should write several small functions including ones that do the following:

Be careful how you write your code since you will have to modify it at least twice more this semester in later homeworks. At some point you will want to time your code to see how fast it is. This will come later in the course. Right now concentrate on getting codes that work correctly.

Things you should or should not do include the following:

Turn in your code along with your test cases. In a single file, write a short description of each file that you turn in and call it Readme.{txt,doc, or whatever}.


Parallel codes are frequently written to use only one parallel computing communication system, e.g., MPI. For example, calls to specific MPI functions are embedded directly into a code. This is problematic if you ever have to change communications systems (e.g., to something other than MPI). A cleaner way to do communication is to write a simple library of routines that do what you want a parallel communications system to do independent of the specific system. Some simple functionality that you need for most or all parallel codes includes the following:

Test your library by revising an existing MPI based C program, such as matmul_mpi.c by rewriting the existing code to call your library routines instead of MPI functions directly. Compare the answers to the original code to the modified one using your library. The results should be the same.


Modify the brandt_mc77_openmp.c file to make it more efficient from a parallelism viewpoint. The routine relax does one Gauss-Seidel iteration when run on one processor. In its current form in the file, it is doing a block Jacobi iteration with one Gauss-Seidel iteration within the blocks when more than one processor is used. The routine multig takes the reported error and decides if another relaxation step, coarsening to the next coarser level, interpolating to the next finer level, or the approximate solution has been achieved on the finest level after the return from relax. It would be much efficient if some of the logic in multig would be in relax instead. Modify both routines so that relax returns when all of the relaxation iterations for a level are complete (so that multig only changes levels or returns as decisions after relax returns).

The routine relax should run in parallel for all relaxation sweeps without going back to serial. You will need to add a while loop (run in parallel by OpenMP) that runs until the variable err has been reduced enough to consider changing levels. You have to be careful updating err in parallel (it needs to be done atomically to be safe). You need a barrier before checking if err has been reduced sufficiently, too. The current for loop that has a #pragma in the line above it, no longer needs the OpenMP command since it will be inside the while loop that has its own #pragma above it.

Turn in your modified brandt_mc77_openmp.c file (and call it something else). Do your compiling, running, and testing on since that is what I will use to evaluate your homework.


Write a parallel sorting program using one of the three algorithms (LS3, 4 way merge sort, or rotate sort) using a 22 array and the C programming language. While I do not care if you use integers, floating point numbers, or complicated structures, I personally would use integers or doubles. Please do the following:

Hint: Do the homework first in Matlab. This is known as prototyping in the parallel computing world. You can write the Matlab code quite simply or in a complicated style that simulates how the parallel code would be written. It is up to you how much effort you want to put into this step.


Write a parallel program to transmit data in a two dimensional mesh to neighboring processors. You should look in your textbook for pictures of this topic and hints. You may use a Cartesian tensor product mesh (this is simply a matrix in how it should be stored on a computer) or an unstructured mesh (only do this if you really know what you are getting into; this is a request from Mr. Burgess). Please do the following:



This is strictly extra credit or for fun. Modify your homework 2 so that you do a sparse matrix-vector multiplication using CUDA on the nVidia GP-GPU on my workstation. Let me know if you intend to work on this after the semester ends.



Craig C. Douglas

Last modified: