BSE memory needed

Submitted by Fabiof on Mon, 07/24/2017 - 04:44

Hello

I am doing calculations for 2D h-BN.
When I study the convergence of the absorption spectra with the fine(interpolated) grid I have the following problem:

---------------------------------------------------------------------
This is for an interpolated K-grid: 120 120 1

number_val_bands_fine 2
number_val_bands_coarse 4

number_cond_bands_fine 4
number_cond_bands_coarse 4

Memory available: 6682.7 MB per PE
Memory required for vcoul: 114.4 MB per PE
Memory needed to store the effective Ham. and intkernel arrays: 5331.7 MB per PE
----------------------------------------------------------------------

I did not reach convergence with this calculation. I need to do further calculations. The problem is that i don't have memory for such calculations.

Is there anything I can do for reduce the memory needed for this type of calculation (without changing the number of bands used)?

Reducing the kernel screening or bare cut off does not have any influence.

Sincerely,

Fabio

Submitted by dyq on Thu, 07/27/2017 - 15:46

Hi Fabio,

Are you using OpenMP threads? In not, you can increase the amount of available memory by increasing the number of threads. For instance, if you are doing a calculation with 1 thread and 10 MPI processes on 10 cores with 2 GB of memory each, then the available memory is 2 GB. If you increase to 5 threads and 2 MPI processes on 10 cores, then you will have 10 GB of available memory.

Best,
Diana

Submitted by Fabiof on Thu, 08/03/2017 - 12:40

Hi Diana,

I compiled BerkeleyGW with the openmp flag as you suggested.
I can't increase the available memory by changing the OMP_NUM_THREADS or the number of processes (np) for mpirun.

When you say number of threads, are you talking about the number of threads per MPI task? If not, then I don't see how can I run with 10 cores 2 MPI tasks and 5 threads.

Sincerely,

Fabio

Submitted by dyq on Mon, 08/07/2017 - 16:11

Hi Fabio,

Yes, I mean increase the number of threads per MPI task and reduce the number of MPI tasks, while keeping the total number of cores constant. Then, the total available memory should be the memory available to each MPI task. If that doesn't increase the available memory, can you check the start of your output file and see how many MPI tasks and OpenMP threads are being reported?

Best,
Diana

Submitted by Fabiof on Wed, 08/09/2017 - 02:33

Hi Diana,

I am running a job with 16 cores:

#PBS -l nodes=1:ppn=16

Changing the number of MPI tasks or threads do not change the available memory as you can see in the following examples:

------------------------------------------------

2 MPI tasks and 8 threads:

Running with 2 MPI task(s)

Using OpenMP. Number of threads per MPI task: 8

Memory available: 31596.6 MB per PE
Memory required for vcoul: 114.4 MB per PE
Memory needed to store the effective Hamiltonian and intkernel
arrays:13124.0 MB per PE
Additional memory needed for eigenvectors and diagonalization:52536.1 MB per PE

-----------------------------------------------------------

8 MPI tasks and 2 threads:

Running with 8 MPI task(s)

Using OpenMP. Number of threads per MPI task: 2

Memory available: 7891.6 MB per PE
Memory required for vcoul: 114.4 MB per PE
Memory needed to store the effective Hamiltonian and intkernel
arrays: 3282.5 MB per PE
Additional memory needed for eigenvectors and diagonalization:13154.9 MB per PE

Sincerely,

Fabio

Submitted by dyq on Mon, 08/14/2017 - 13:01

Hi Fabio,

Changing the number of threads does not change the total available memory, but it does change the memory available to each MPI task, which can be helpful depending on how arrays are distributed across MPI tasks.

Thanks for providing the example. It looks like most of the memory demand is coming from the diagonalization. Depending on what you're interested in, you can try using the "Haydock" iterative diagonalization method instead of direct diagonalization. To use this option, you need to change the flag "diagonalization" to "haydock" and add the flag "number_iterations 100." You will need to converge the absorption spectrum with respect to the number of iterations. Haydock should be faster and less memory intensive, but you will not be able to get out the eigenvectors, only the absorption spectrum.

Best,
Diana

Submitted by Fabiof on Tue, 08/15/2017 - 11:34

Hi Diana,

I have already converged the absorption spectrum using the Haydock method. I am now interested to get the excitonic wave functions and they seem to vary a lot with the number of bands used in the fine grid. So a converged optical gap or absorption spectrum does not mean a converged excitonic wave function and that is why I am struggling with the memory available.

Sincerely,

Fabio

Submitted by dyq on Tue, 08/15/2017 - 16:35

Hi Fabio,

I see. If you're only interested in a few eigenvectors, you could reduce the number of eigenvalues calculated using the flag `number_eigenvalues neig.` Then the code will only calculate the lowest "neig" eigenvalues and eigenvectors. This should reduce the required memory somewhat. If that's still not enough, then I'm afraid there's no way to reduce the memory, while keeping the same level of convergence.

Best,
Diana

Submitted by Fabiof on Wed, 08/16/2017 - 07:26

Hi Diana,

Reducing the number of eigenvalues or eigenvectors flag calculated doesn't help. It didn't change the additional memory for eigenvectors and diagonalization.

I will see what I can do to get the excitonic wave functions with my available memory.

Thank you for all your help.

Sincerely,

Fabio