SCIENTIFIC-LINUX-USERS Archives

August 2008

SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Jon Peatfield <[log in to unmask]>
Reply To:
Jon Peatfield <[log in to unmask]>
Date:
Thu, 7 Aug 2008 23:43:45 +0100
Content-Type:
multipart/mixed
Parts/Attachments:
TEXT/PLAIN (3343 bytes)
On Thu, 7 Aug 2008, Panagiotis Kritikakos wrote:

> Hi,
>
> I think I have a bug on the Scientific Linux desktop relating to the
> OpenMPI installation. It appears there is a condition which
> prevents any MPI application from completing. Bellow you can see perhaps the 
> simplest MPI that can be written and still link to the MPI library:
>
> x.F90
>
> program main
>
>  use mpi
>
>  implicit none
>
>  integer : :  ierror
>
>   call mpi_init(ierror)
>   call mpi_finalize(ierror)
>
> end program main

On my sl51 (32-bit) boxes mpif90 objects the the 'integer : :  ierror' 
line.  Maybe my compiler is feeling odd...

Replacing it with 'integer ierror' lets it compile for me.

> I've compiled it with mpif90 -o x x.F90. Compilation goes fine, but when 
> trying to execute the resulting x file, I received the following error:
>
> libibverbs: Fatal: couldn't read uverbs ABI version.
> --------------------------------------------------------------------------
> [0,0,0]: OpenIB on host localhost was unable to find any HCAs.
> Another transport will be used instead, although this may result in
> lower performance.
> --------------------------------------------------------------------------
> librdmacm:  couldn't read ABI version.
> librdmacm:  assuming: 4
> libibverbs: Fatal: couldn't read uverbs ABI version.
> CMA: unable to open /dev/infiniband/rdma_cm

If I run the result I get no hang, but then i get not much useful output 
either :-)

$ mpirun ./x
libibverbs: Fatal: couldn't read uverbs ABI version.
--------------------------------------------------------------------------
[0,1,0]: OpenIB on host unfair.damtp.cam.ac.uk was unable to find any HCAs.
Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
[0,1,0]: uDAPL on host unfair.damtp.cam.ac.uk was unable to find any NICs.
Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------

In case it matters that box has:

$ rpm -q sl-release gcc-gfortran openmpi openmpi-devel
sl-release-5.1-2.i386
gcc-gfortran-4.1.2-14.el5.i386
openmpi-1.2.3-4.el5.i386
openmpi-devel-1.2.3-4.el5.i386

> The warnings should not be a problem as it just compalinging about not 
> finding a high performance interconnect network options that were compiled in 
> and the system will fall back to lower performance defaults.
>
> strace gives a lot of output but shows the program is stalling with:
>
> futex(0x26049c, FUTEX_WAIT, 2, NULL
>
> It looks like some kind of deadlock is occurring probably relating to
> the threads involved or shared memory.

Debugging mpi problems is always a bit of a nightmare.  BTW how many 
processors were you running it on?

I'm still worrying about updating our systems to the (newer) openmpi from 
sl52 as the package maintainers have switched from alternatives (which I 
sort of understand) to mpi-selector (which I don't)...

> I have compiled up a personal version of OpenMPI removing the OpenIB support:
>
> ./configure –prefix=/opt/local/openmpi --without-openib
>
> This works (once I worked out that it was linking against the old
> libraries at runtime) and the correct output (nothing) is produced.

  -- Jon


ATOM RSS1 RSS2