Segmentation fault of kernel.real.x in testsuite

H.Katow's picture
Submitted by H.Katow on Thu, 06/01/2017 - 07:06

Forums 

Installation

Dear all,

I had a trouble when running testsuite. I hope if I could have any advices here.
I'm compiling only real version but failed three tests as follows:

=================================================================
Passed: 3 / 12
Skipped: 6 / 12
Failed: 3 / 12

testfile # failed testcases
--------------------------------------------------------------------
Graphene/Graphene.test 1
Graphene/Graphene_3D.test 1
Si-EPM/Si.test 1
=================================================================

In any case, the error message is :

=================================================================
Starting test run ...
Executing: cd /tmp/BGW.i_fHJ_; $HOME/BerkeleyGW-1.2.0/testsuite/../bin/kernel.real.x > kernel.out
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
kernel.real.x 000000000062565D Unknown Unknown Unknown
kernel.real.x 00000000006234F7 Unknown Unknown Unknown
kernel.real.x 00000000005D1C54 Unknown Unknown Unknown
kernel.real.x 00000000005D1A66 Unknown Unknown Unknown
kernel.real.x 00000000005772B6 Unknown Unknown Unknown
kernel.real.x 000000000057DC30 Unknown Unknown Unknown
Unknown 00002AAAB0A3C850 Unknown Unknown Unknown
kernel.real.x 0000000000408F3C bsewrite_m_mp_bse 197 bsewrite.f90
kernel.real.x 000000000044837C MAIN__ 447 kernel.f90
kernel.real.x 000000000040578E Unknown Unknown Unknown
libc.so.6 00002AAAB0EE2C36 Unknown Unknown Unknown
kernel.real.x 0000000000405629 Unknown Unknown Unknown
Elapsed time: 4.8 s

Test run failed with exit code 44544.

Execution : [ FAIL ]

Skipping subsequent steps due to nonzero exit code.
=================================================================

The OS is SUSE Linux Enterprise Server 11 (x86_64).
Here I also put the contents of arch.mk.

=================================================================
COMPFLAG = -DINTEL
PARAFLAG = -DMPI
MATHFLAG = -DUSESCALAPACK -DUNPACKED -DUSEFFTW3
FCPP = cpp -ansi
F90free = mpif90 -g -traceback -free
LINK = mpif90 -g -traceback -I$(MKLROOT)/include/fftw -mkl=parallel -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64 -lmpi
FOPTS = -O3 -xCORE-AVX2
FNOOPTS = $(FOPTS)
MOD_OPT = -module ./
INCFLAG = -I
C_PARAFLAG = -DPARA -DMPICH_IGNORE_CXX_SEEK
CC_COMP = mpicxx -xhost -g -traceback
C_COMP = mpicc -xhost -g -traceback
C_LINK = mpicxx -xhost -g -traceback
C_OPTS = -O3
REMOVE = /bin/rm -f
TESTSCRIPT = make check-parallel
=================================================================

Does anyone have ideas to deal with this problem?

Sincerely yours,
Hiroki KATOW

H.Katow's picture

Submitted by H.Katow on Sun, 06/04/2017 - 23:22

I compiled again with

DEBUGFLAG = -DDEBUG -DVERBOSE
C_DEBUGFLAG = -DDEBUG -DVERBOSE

and I got compilation errors as follows :

input_kernel.f90(343): error #6404: This name does not have a type, and must have an explicit type. [KPQ]
write(6,*) 'kp%ngkmax,kpq%ngkmax',kp%ngkmax,kpq%ngkmax
------------------------------------------------^
input_kernel.f90(343): error #6460: This is not a field name that is defined in the encompassing structure. [NGKMAX]
write(6,*) 'kp%ngkmax,kpq%ngkmax',kp%ngkmax,kpq%ngkmax
----------------------------------------------------^
compilation aborted for input_kernel.p.f (code 1)
../Common/common-rules.mk:277: recipe for target 'input_kernel.o' failed
make[2]: *** [input_kernel.o] Error 1

These errors did not appear without those debug flags.
Also I could found a similar question in this forum http://berkeleygw.org/?q=node/343 but it didn't provide solutions.

Best regards,
Hiroki Katow

jdeslip's picture

Submitted by jdeslip on Mon, 06/05/2017 - 05:30

Hi Hiroki,

The compilation error is known and will be fixed in next version. For now, you can just comment out those write statement.

I'm not really sure what is causing the segfault in the testsuite - we don't see those on our tests or systems. Line 197 in bsewrite in your version is pretty innocuous:

do is2=1,xct%nspin

So, I wonder if optimization is doing some reordering.

Can you try with a newer version of Intel compiler or with a lower optimization flag for bsewrite.f90 (-O0 for example)?

Jack

H.Katow's picture

Submitted by H.Katow on Mon, 06/05/2017 - 22:37

Hello Jack,

Thank you for your comments.
I made line 343 commented out in input_kernel.f90 and set the optimization option as -O0.
It passed compilation and all testsuites for real version.
In my case, the compiler version is ifort (IFORT) 17.0.1.

Best regards,
Hiroki Katow