Is it possible to run several DMRG-processes in parallel without segmentation fault?

Question

Is it possible to run several DMRG-processes in parallel without segmentation fault?

asked Mar 2, 2018 by bock (280 points)
reshown Mar 19, 2018 by miles

In my project I have to run a long DMRG calculation for different values of particle interactions, and I would like to run the different calculations in parallel. For this I implement a naive OpenMP parallel computing the following way:

Add the "-fopenmp" flag in compilation

Run my entire program within a for-loop initialized with "#pragma omp parallel for".

The processes run for about 40-60 sweeps (of 128), after which I get "Segmentation fault (core dumped)". Is the DMRG code in ITensor not thread-safe?

commented Mar 19, 2018 by miles (70.2k points)

Hi, before I try to answer the question, may I ask some more details about the code? A key question is: are you putting the same wavefunction into the dmrg function? (I.e. the same variable "psi" of type MPS or IQMPS?) The reason I ask that is that psi is passed by reference, so the dmrg function overwrites it. If you passed the same one to each call to dmrg then it could cause issues with different threads "racing" for the same memory. If you could post some or all of your code that would be very helpful.

commented Mar 20, 2018 by bock (280 points)

You can find the code at https://www.dropbox.com/s/tyov039vo7et3ky/UaaPar.cc?dl=0 though it's a bit messy. The major confusion for me is that I basically do nothing before starting the for-loop, meaning each parallel iteration has its completely own setup (as I understand it).

Another mystery is that I can run the non-parallel code on e.g. two CPU-cores by starting it through two different terminals without problems.

commented Mar 20, 2018 by miles (70.2k points)

Hi so this code and your comment answers my question I think. I was saying that you may need to create separate copies of the wavefunction (psi0) before calling dmrg with multiple threads. Not doing this could be the reason you got an error. On the other hand it may not fix the problem but having separate copies of the wavefunction is the first thing I'd try.

commented Mar 22, 2018 by bock (280 points)

I don't understand - when I construct the MPS and handle it inside a separate for loop iteration (as done in the attached code), is that then not the same as makeing separate copies of the wavefunction?

commented Mar 23, 2018 by miles (70.2k points)

It isn't the same; the wavefunction is passed by reference to the dmrg function. So each separate call to dmrg could then be attempting to modify the same memory. This may not be the cause of the crash, but it's a reasonable place to start. I'll make a note to try this out myself & see if I can get it to work by making small changes like using separate copies of the wavefunction.

commented Apr 17, 2018 by bock (280 points)
edited Apr 18, 2018 by bock

Hi Miles

Sorry about waiting so long to answer. Do you mean that if I want to run 10 different DMRG processes in parallel, I should try initializing 10 states before starting the for loop, giving each iteration of the parallel for loop a different state? Instead of initializing each state inside each iteration?

EDIT: I have tried this, and it still doesn't handle the problem. I believe the way OpenMP handles parallelization conflicts with some design choice in ITensor. It seems such problems can occur because of the way OpenMP handles memory, which conflicts with variables declared static, as seen in this thread: http://forum.openmp.org/forum/viewtopic.php?f=3&t=1569

Could that be it?

EDIT 2: After compiling in debug mode, I got the following different error: https://pastebin.com/1XQi1wS4

It seems an assertion in a header file fails. Do you have an interpretation of this?

commented Apr 27, 2018 by miles (70.2k points)

Hi, thanks for continuing to look into this. If that cblock assertion is failing (and if it otherwise succeeds when there is only one process), then it probably means indeed that something non-thread safe is happening where two or more threads are 'racing' to access the same memory. ITensors and IQTensors use a shared_ptr to their storage and do not copy their storage until you make a change to the tensor elements. So this optimization could be conflicting with the type of parallelism you are doing.

On the other hand, I have successfully parallelized some other code I'm doing (not DMRG) by using the C++ threading library to pass lambda function tasks to std::async.

Is it possible to run several DMRG-processes in parallel without segmentation fault?

Please log in or register to add a comment.

Please log in or register to answer this question.

Categories