real version test failed but complex all OK

Submitted by weixie on Mon, 05/29/2017 - 20:35

Forums 

Installation

Dear BGW developers and users,

I recently built BGW 1.2 on various different computing clusters using Intel compilers (version 2016 or 2017). For MPI I used either Intel MPI or MVAPICH2(v2.2) depending what is available. Compiling went all right for all cases, but testing showed the same [FAIL]. Is this a known problem? Safe to ignore them?
Thanks
Wei

Attachment: error messages from testing of the real executables:

$ tail -13 testsuite/test.out
Passed: 4 / 12
Skipped: 6 / 12
Failed: 2 / 12

testfile # failed testcases
--------------------------------------------------------------------
Graphene/Graphene.test 4
Graphene/Graphene_3D.test 2

Total run-time of the testsuite: 00:01:57

make: *** [check-parallel] Error 2

$ grep -B 7 FAIL testsuite/test.out
Match dcmat_norm.dat (48) ik_co :

Calculated value : 12
Reference value : 9
Difference : 3
Tolerance : 1e-05

dcmat_norm.dat (48) ik_co : [ FAIL ]
--
Match absorp eps2 673 :

Calculated value : 0.219896012
Reference value : 0.219906935
Difference : 1.0922999999996e-05
Tolerance : 1e-05

absorp eps2 673 : [ FAIL ]
--
Match absorp eps1 673 :

Calculated value : 0.583210328
Reference value : 0.583339742
Difference : 0.000129414000000105
Tolerance : 1e-05

absorp eps1 673 : [ FAIL ]
--
Match Sum rule :

Calculated value : 0.270474977153960
Reference value : 0.27048226853231960
Difference : 7.29137835958626e-06
Tolerance : 4e-10

Sum rule : [ FAIL ]
--
Match dcmat_norm.dat (48) ik_co :

Calculated value : 12
Reference value : 9
Difference : 3
Tolerance : 1e-05

dcmat_norm.dat (48) ik_co : [ FAIL ]
--
Match Sum rule :

Calculated value : 0.391705935920563
Reference value : 0.39170610026155211
Difference : 1.64340989039502e-07
Tolerance : 4e-10

Sum rule : [ FAIL ]

Submitted by babarker on Wed, 05/31/2017 - 17:28

Hello,

The testsuite should pass on your machine if the appropriate libraries are used. Would you please share your arch.mk file for a compilation that leads to the testsuite failure?

Best,
Brad

Submitted by weixie on Thu, 06/01/2017 - 13:37

Hi Brad,

Thanks for your kind reply.

Here's my arch.mk. This was using modules intel/2013_sp1.2.144 and mvapich2_ib/2.1 on XSEDE San Diego's Comet. I also tried to use Intel 2017.4 for both compiler, MKL and MPI, but they led to the same test failures...

Thanks,
Wei

COMPFLAG = -DINTEL
PARAFLAG = -DMPI -DOMP
MATHFLAG = -DUSESCALAPACK -DUNPACKED -DUSEFFTW3 -DHDF5

FCPP = cpp -C
F90free = mpif90 -free -qopenmp -no-ipo -ip
LINK = mpif90 -qopenmp -no-ipo -ip

FOPTS = -O3 -xhost -fp-model source
FNOOPTS = -O2 -xhost -fp-model source -no-ip

MOD_OPT = -module
INCFLAG = -I

C_PARAFLAG = -DPARA -DMPICH_IGNORE_CXX_SEEK
CC_COMP = mpicxx -xhost
C_COMP = mpicc -xhost
C_LINK = mpicxx -xhost
C_OPTS = -O3 -xhost -no-ipo -ip -qopenmp
C_DEBUGFLAG =

REMOVE = /bin/rm -f

# Math Libraries

FFTWLIB = $HOME/programs/mklfftw/libfftw3xf_intel.a
FFTWINCLUDE = $(MKLROOT)/include/fftw

MKLPATH = $(MKLROOT)/lib/intel64
LAPACKLIB = -Wl,--start-group \
$(MKLPATH)/libmkl_intel_lp64.a \
$(MKLPATH)/libmkl_intel_thread.a \
$(MKLPATH)/libmkl_core.a \
$(MKLPATH)/libmkl_blacs_intelmpi_lp64.a \
-Wl,--end-group -lpthread -lm
SCALAPACKLIB = $(MKLPATH)/libmkl_scalapack_lp64.a

HDF5PATH = $(HDF5HOME)/lib
HDF5LIB = $(HDF5PATH)/libhdf5hl_fortran.a \
$(HDF5PATH)/libhdf5_hl.a \
$(HDF5PATH)/libhdf5_fortran.a \
$(HDF5PATH)/libhdf5.a \
-lz
HDF5INCLUDE = $(HDF5HOME)/include

TESTSCRIPT = sbatch comet.scr

jdeslip's picture

Submitted by jdeslip on Sat, 06/17/2017 - 15:30

Could you try a lower optimization level and a different math library to see if that resolves the issue. Generally MKL is optimal, but it would help to discover where the issue is.

Submitted by weixie on Mon, 06/26/2017 - 18:20

Hi Jack, I tried to use the supplied arch.mk for comet, which uses "gnutools gnu openmpi_ib hdf5 fftw lapack atlas" modules on Comet and all tests passed OK. I don't know if this is due to different compilers (intel vs. GNU) or math libraries (MKL vs. atlas). Do you have any suggestions?

# To compile the code, you'll need to:
# module purge && module load gnutools gnu openmpi_ib hdf5 fftw lapack atlas
#
# To run the code, you'll need to include the following lines in your script:
# module purge && module load gnu openmpi_ib
#
# All tests pass @ r6954.
#
# Felipe H. da Jornada

COMPFLAG = -DGNU
PARAFLAG = -DMPI -DOMP
MATHFLAG = -DUSESCALAPACK -DUNPACKED -DUSEFFTW3 -DHDF5
# Only uncomment DEBUGFLAG if you need to develop/debug BerkeleyGW.
# The output will be much more verbose, and the code will slow down by ~20%.
#DEBUGFLAG = -DDEBUG

FCPP = cpp -ansi
F90free = mpif90 -ffree-form -ffree-line-length-none -fopenmp -fno-second-underscore
LINK = mpif90 -fopenmp -ldl
FOPTS = -O3 -funroll-loops -funsafe-math-optimizations
FNOOPTS = -O3 -funroll-loops -funsafe-math-optimizations
#FOPTS = -g -O0 -ffpe-trap=invalid -fbounds-check -fbacktrace -Wall
#FNOOPTS = $(FOPTS)
MOD_OPT = -J
INCFLAG = -I

C_PARAFLAG = -DPARA -DMPICH_IGNORE_CXX_SEEK
CC_COMP = mpicxx
C_COMP = mpicc
C_LINK = mpicxx
C_OPTS = -O3 -fopenmp
C_DEBUGFLAG =

REMOVE = /bin/rm -f

# Math Libraries
FFTWLIB = $(FFTWHOME)/lib/libfftw3_omp.a \
$(FFTWHOME)/lib/libfftw3.a
FFTWINCLUDE = $(FFTWHOME)/include

SCALAPACKHOME = /opt/scalapack/gnu/openmpi_ib
LAHOME = $(ATLASHOME)/lib
LAPACKLIB = $(LAHOME)/liblapack.a $(LAHOME)/libf77blas.a $(LAHOME)/libcblas.a $(LAHOME)/libatlas.a
SCALAPACKLIB = $(SCALAPACKHOME)/lib/libscalapack.a

HDF5PATH = $(HDF5HOME)/lib
HDF5LIB = $(HDF5PATH)/libhdf5hl_fortran.a \
$(HDF5PATH)/libhdf5_hl.a \
$(HDF5PATH)/libhdf5_fortran.a \
$(HDF5PATH)/libhdf5.a \
-lz
HDF5INCLUDE = $(HDF5HOME)/include

TESTSCRIPT = sbatch comet.scr

Submitted by weixie on Sun, 07/16/2017 - 06:59

Hi All,
I found that changing "-fp-model source" in FOPTS and FNOOPTS to "-fp-model strict -fp-model no-except" would solve this particular problem I reported here when I was using more recent Intel compilers (2016 or 2017).
Cheers,
Wei