+1 vote
asked by (280 points)
reshown by
In my project I have to run a long DMRG calculation for different values of particle interactions, and I would like to run the different calculations in parallel. For this I implement a naive OpenMP parallel computing the following way:

Add the "-fopenmp" flag in compilation

Run my entire program within a for-loop initialized with "#pragma omp parallel for".

The processes run for about 40-60 sweeps (of 128), after which I get "Segmentation fault (core dumped)". Is the DMRG code in ITensor not thread-safe?
commented by (70.1k points)
Hi, before I try to answer the question, may I ask some more details about the code? A key question is: are you putting the same wavefunction into the dmrg function? (I.e. the same variable "psi" of type MPS or IQMPS?) The reason I ask that is that psi is passed by reference, so the dmrg function overwrites it. If you passed the same one to each call to dmrg then it could cause issues with different threads "racing" for the same memory. If you could post some or all of your code that would be very helpful.
commented by (280 points)
You can find the code at https://www.dropbox.com/s/tyov039vo7et3ky/UaaPar.cc?dl=0 though it's a bit messy. The major confusion for me is that I basically do nothing before starting the for-loop, meaning each parallel iteration has its completely own setup (as I understand it).

Another mystery is that I can run the non-parallel code on e.g. two CPU-cores by starting it through two different terminals without problems.
commented by (70.1k points)
Hi so this code and your comment answers my question I think. I was saying that you may need to create separate copies of the wavefunction (psi0) before calling dmrg with multiple threads. Not doing this could be the reason you got an error. On the other hand it may not fix the problem but having separate copies of the wavefunction is the first thing I'd try.
commented by (280 points)
I don't understand - when I construct the MPS and handle it inside a separate for loop iteration (as done in the attached code), is that then not the same as makeing separate copies of the wavefunction?
commented by (70.1k points)
It isn't the same; the wavefunction is passed by reference to the dmrg function. So each separate call to dmrg could then be attempting to modify the same memory. This may not be the cause of the crash, but it's a reasonable place to start. I'll make a note to try this out myself & see if I can get it to work by making small changes like using separate copies of the wavefunction.
commented by (280 points)
edited by
Hi Miles

Sorry about waiting so long to answer. Do you mean that if I want to run 10 different DMRG processes in parallel, I should try initializing 10 states before starting the for loop, giving each iteration of the parallel for loop a different state? Instead of initializing each state inside each iteration?

EDIT: I have tried this, and it still doesn't handle the problem. I believe the way OpenMP handles parallelization conflicts with some design choice in ITensor. It seems such problems can occur because of the way OpenMP handles memory, which conflicts with variables declared static, as seen in this thread: http://forum.openmp.org/forum/viewtopic.php?f=3&t=1569

Could that be it?

EDIT 2:  After compiling in debug mode, I got the following different error: https://pastebin.com/1XQi1wS4

It seems an assertion in a header file fails. Do you have an interpretation of this?
commented by (70.1k points)
Hi, thanks for continuing to look into this. If that cblock assertion is failing (and if it otherwise succeeds when there is only one process), then it probably means indeed that something non-thread safe is happening where two or more threads are 'racing' to access the same memory. ITensors and IQTensors use a shared_ptr to their storage and do not copy their storage until you make a change to the tensor elements. So this optimization could be conflicting with the type of parallelism you are doing.

On the other hand, I have successfully parallelized some other code I'm doing (not DMRG) by using the C++ threading library to pass lambda function tasks to std::async.

Please log in or register to answer this question.

Welcome to ITensor Support Q&A, where you can ask questions and receive answers from other members of the community.

Formatting Tips:
  • To format code, indent by four spaces
  • To format inline LaTeX, surround it by @@ on both sides
  • To format LaTeX on its own line, surround it by $$ above and below
  • For LaTeX, it may be necessary to backslash-escape underscore characters to obtain proper formatting. So for example writing \sum\_i to represent a sum over i.
If you cannot register due to firewall issues (e.g. you cannot see the capcha box) please email Miles Stoudenmire to ask for an account.

To report ITensor bugs, please use the issue tracker.