PTL logo

From: Bachar Dedi [Netanya] (david.bachar_at_[hidden])
Date: 2010-11-22 02:35:37


Hello Everyone,

In our HPC application we are using the MPI.Net as our communication
platform.
For some unknown reason, we are facing a situation where our application
is crashing with one of the following:
1. An MPI exception (MPI_other_error ,related to the shared memory
allocation)
2. An empty message that we receive
3. System halts within one of the MPI methods.
We suspect that the problem is related to the size of the message we are
sending (up to 100 MB) and the high rate of the message transportation.
We are considering the option of changing the communication layer using
a different type of communication method.
The main issue is the size of the data we are transferring and the rate
of the messaging (there are quite a few large messages per second to and
from different processes).

Just to have some clarifications, each of our ranks is looping on Iprobe
with any tag, any source and then if there is a status calling the
receive.
The sending from or to a rank can be simultaneously from and to another
rank. (It means a rank can get more than one message simultaneously).

Are you familiar with these problems?
Is there something that can fix this problem?
 
If you are in need of any clarifications regarding our architecture, we
will be happy to clarify.

David Bachar

* Phone: +972-9-8864648
* Mobile: +972-54-9222625
* Fax: +972-9-8864766
* E-mail: david.bachar_at_[hidden]
 <<Picture (Device Independent Bitmap)>>
LAND SYSTEMS & C4I - TADIRAN

The information in this e-mail transmission contains proprietary and business
sensitive information. Unauthorized interception of this e-mail may constitute
a violation of law. If you are not the intended recipient, you are hereby
notified that any review, dissemination, distribution or duplication of this
communication is strictly prohibited. You are also asked to contact the sender
by reply email and immediately destroy all copies of the original message.