errors in running MPI

Linping Hu's picture
Submitted by Linping Hu on Mon, 04/17/2017 - 06:00

Forums 

User questions

Hello everyone,

I'm trying to re-calculate the example in examples/DFT/silicon using BerkeleyGW-1.2.0 complex version, compiling with the following flags:

=====================================
- Compiler: INTEL
- Para. flags: MPI, OMP
- Math flags: USESCALAPACK, USEFFTW3
- Debug flags:
=====================================

An fatal error occurs and the calculation stopped while executing the epsilon.cplx.x. The content of epsilon.out is attached.

====================================================================================================
Summary of the WFN files:
- Number of k-points in WFN: 19
- Number of k-points in WFNq: 110
- Number of k-points in the full BZ of WFN: 500

================================================================================
18:42:25 Dealing with q = 0.000000 0.000000 0.001000 1 / 10
================================================================================

This is the special q->0 point.
Rank of the polarizability matrix (nmtx): 137
Fatal error in PMPI_Comm_size: Invalid communicator, error stack:
PMPI_Comm_size(124): MPI_Comm_size(comm=0x5b, size=0x98f48c) failed
PMPI_Comm_size(78).: Invalid communicator
Fatal error in PMPI_Comm_size: Invalid communicator, error stack:
PMPI_Comm_size(124): MPI_Comm_size(comm=0x5b, size=0x98f48c) failed
PMPI_Comm_size(78).: Invalid communicator
Fatal error in PMPI_Comm_size: Invalid communicator, error stack:
PMPI_Comm_size(124): MPI_Comm_size(comm=0x5b, size=0x98f48c) failed
PMPI_Comm_size(78).: Invalid communicator
Fatal error in PMPI_Comm_size: Invalid communicator, error stack:
PMPI_Comm_size(124): MPI_Comm_size(comm=0x5b, size=0x98f48c) failed
......................
==================================================================================================

The arch.mk is also attached for your convenience.
==================================================================================================
COMPFLAG = -DINTEL

PARAFLAG = -DMPI -DOMP
MATHFLAG = -DUSEFFTW3 -DUSESCALAPACK
FCPP = cpp -C
F90free = mpiifort -free -qopenmp
LINK = mpiifort -qopenmp
FOPTS = -O3
FNOOPTS = $(FOPTS)
MOD_OPT = -module
INCFLAG = -I

C_PARAFLAG = -DPARA
CC_COMP = mpiicc -qopenmp
C_COMP = mpiicc -qopenmp
C_LINK = mpiicc -qopenmp

REMOVE = /bin/rm -f

FFTWPATH = /share/home/hulp/fftw-3.3.5
FFTWINCLUDE = $(FFTWPATH)/api

MKLPATH = /share/intel/mkl/lib/intel64
LAPACKLIB = -mkl=parallel

SCALAPACKLIB = /share/intel/composer_xe_2015/mkl/lib/intel64/libmkl_scalapack_lp64.a
SCALAPACKLIB += /share/intel/composer_xe_2015/mkl/lib/intel64/libmkl_blacs_lp64.a

TESTSCRIPT = make check-parallel
==================================================================================================

Looking forward to your reply. Thank you very much!

Best,
Linping

Linping Hu's picture

Submitted by Linping Hu on Tue, 04/18/2017 - 01:49

Hello Brad,

Thank you very much for your help.

The discussion on the two sites both suggested that the error may be caused by not including mpi.h correctly. And it's said that if mpiifort is used to compile MPI related sources, everything should be done automatically and correct mpi.h should be included.

mpiifort is exactly what I used. BekerleyGW could be compiled succcessfully, but that kind of error stops the calculation.

Best,
Linping

jdeslip's picture

Submitted by jdeslip on Wed, 04/19/2017 - 19:10

How many MPI tasks were you running with in the above? Does the message change regardless of the number or tasks?

I agree that this looks like a build issue. Your arch.mk looks ok in principle - could you try building with another version of the intel MPI library or a different MPI library on system?