|
Table of contents:
- What is special about MPI performance analysis?
- What are "profiling" and "tracing"?
- How do I sort out busy wait time from idle wait, user time from system
time, and so on?
- What is PMPI?
- Should I use those switches --enable-mpi-profile and --enable-trace when
I configure OMPI?
- What support does OMPI have for performance analysis?
- How do I view VampirTrace output?
- Are there MPI performance analysis tools for OMPI that I can download for free?
- Any other kinds of tools I should know about?
| 1. What is special about MPI performance analysis? |
The synchronization among the MPI processes can be a key performance
concern. For example, if a serial program spends a lot of time in function
foo(), you should optimize foo(). In contrast, if an MPI process spends a lot of time
in MPI_Recv(), not only is the optimization target probably not MPI_Recv(),
but you should in fact probably be looking at some other process altogether.
You should ask, "What is happening on other processes when this process has
the long wait?"
Another issue is that a parallel program (in the case of MPI, a multi-process program)
can generate much more performance data than a serial program due to the greater
number of execution threads. Managing that data volume can be a challenge.
| 2. What are "profiling" and "tracing"? |
These terms are sometimes used to refer to two different kinds of
performance analysis.
In profiling, one aggregates statistics at run time -- e.g., total
amount of time spent in MPI, total number of messages or bytes sent, etc.
Data volumes are small.
In tracing, an event history is collected. It is common to display such event
history on a timeline display. Tracing data can provide much interesting detail,
but data volumes are large.
| 3. How do I sort out busy wait time from idle wait, user time from system
time, and so on? |
Don't.
MPI synchronization delays, which are key performance inhibitors you
will probably want to study, can show up as user or system time, all
depending on the MPI implementation, the type of wait, what run-time
settings you have chosen, etc. In many cases, it makes most sense for
you just to distinguish between time spent inside MPI from time spent
outside MPI. Elapsed wallclock time will probably be your key metric.
Exactly how the MPI implementation spends time waiting is less important.
PMPI refers to the MPI standard profiling interface.
Each standard MPI function can be called with an MPI_ or PMPI_ prefix.
For example, you
can call either MPI_Send() or PMPI_Send(). This feature of the MPI standard
allows one to write functions with the MPI_ prefix that call the equivalent
PMPI_ function. Specifically, a function so written has the behavior of
the standard function plus any other behavior one would like to add.
This is important for MPI performance analysis in at least two ways.
First, many performance analysis tools take advantage of PMPI. They
capture the MPI calls made by your program. They perform the associated
message-passing calls by calling PMPI functions, but also capture important
performance data.
Second, you can use such wrapper functions to customize MPI behavior.
E.g., you can add barrier operations to collective calls, write out
diagnostic information for certain MPI calls, etc.
OMPI generally layers the various function interfaces as follows:
- Fortran
MPI_ interfaces are weak symbols for ...
- Fortran
PMPI_ interfaces, which call ...
- C
MPI_ interfaces, which are weak symbols for ...
- C
PMPI_ interfaces, which provide the specified functionality.
Since OMPI generally implements MPI functionality for all languages in C,
you only need to provide profiling wrappers in C, even if your program
is in another programming language. Alternatively, you may write the wrappers in
your program's language, but if you provide wrappers in both languages
then both sets will be invoked.
There are a handful of exceptions. For example, MPI_ERRHANDLER_CREATE()
in Fortran does not call MPI_Errhandler_create(). Instead, it calls some
other low-level function. Thus, to intercept this particular Fortran call, you need a Fortran wrapper.
Be sure you make the library dynamic.
A static library can experience the linker problems described in
the Complications section of the Profiling Interface chapter of the MPI standard.
See the section on Profiling Interface in the MPI standard for more details.
| 5. Should I use those switches --enable-mpi-profile and --enable-trace when
I configure OMPI? |
Probably not.
The --enable-mpi-profile switch enables building of the PMPI interfaces.
While this is important for performance analysis, this setting is already
turned on by default.
The --enable-trace enables internal tracing of OMPI/ORTE/OPAL calls. It
is used only for developer debugging, not MPI application performance tracing.
| 6. What support does OMPI have for performance analysis? |
The OMPI source base has some instrumentation to capture performance data,
but that data must be analyzed by other non-OMPI tools.
PERUSE gives information about low-level behavior of MPI
internals. Check the PERUSE web site for any information about analysis tools.
When you configure OMPI, be sure to use --enable-peruse. Information is
available describing its
integration with OMPI.
VampirTrace traces the entry to and exit from the MPI layer, along with important
performance data, writing data using the open OTF format. VT is available freely
and can be used with any MPI. Information is available
describing its integration with OMPI.
| 7. How do I view VampirTrace output? |
While OMPI includes VampirTrace instrumentation, it does not provide a
tool for viewing OTF trace data. There is simply a primitive otfdump utility
in the same directory where other OMPI commands (mpicc, mpirun, etc.) are
located.
Another simple utility, otfprofile, comes with OTF software and allows you to produce a short profile in LaTeX format from an OTF trace.
The main way to view OTF data is with the Vampir tool. Evaluation licenses are available.
| 8. Are there MPI performance analysis tools for OMPI that I can download for free? |
The OMPI distribution includes no such tools, but some general MPI tools can
be used with OMPI. You can search the Internet for such tools, and we
list a few candidates here.
For tracing, there are:
- MPE
(or here)
is a software package for MPI programmers. It is associated with MPICH, but works
with other MPIs. The Jumpshot trace viewer has similar functionality to Vampir.
- Sun Studio Performance Analyzer has MPI tracing support (like Vampir
and Jumpshot), but it also has whole-program performance analysis support as well.
- TAU
can be used to trace data, but its viewer aggregates data. For viewing
TAU traces, one must convert data and use another viewer.
- Paraver
can be used to view trace data, and has been used to view PERUSE events.
If you don't need traces and are concerned about large trace files,
profiling tools include:
There are also more sophisticated tools that attempt to incorporate analysis
heuristics, adapt data gathering based on performance characteristics, or
otherwise automate analysis based on expert knowledge. Examples include:
| 9. Any other kinds of tools I should know about? |
Well, there are other tools you should consider. Part of performance
analysis is not just analyzing performance per se, but generally understanding
the behavior of your program.
As such, debugging tools can help you step through or pry into the execution
of your MPI program. Popular tools include TotalView,
which can be downloaded for free trial use, and Allinea DDT
which also provides evaluation copies.
The command-line job inspection tool padb
has been ported to orte and OMPI
|