Random IQIndex Mismatch for long runtimes

Question

Random IQIndex Mismatch for long runtimes

asked Oct 24, 2016 by swolff (250 points)
retagged Oct 26, 2016 by swolff

[UPDATE: Problem also happens in unitary time evolution, see below]

Hello everybody,

I've come around some weird error, with wich I've been struggling for quite some time now. I am trying to implement a stochastic wave function approach (arXiv:quant-ph/9806026, see sec II.A) for the time-evolution in open quantum systems. This method combines the non-unitary time evolution, with a non-Hermitian Hamiltonian @@H = H{closedsystem} -i \gamma C^{\dag} C @@, of wave functions with the stochastic application of jump operators.

For this I use IQTensors for the wave functions and gates for the bond operators of a second order even-odd time evolution. The model is a Heisenberg model for using the SpinHalf site-set . Most of the times this gives the right results (compared to ED), but after longer times I get an Index Mismatch error:

----------------------------------------
IQIndexSet 1 = 
IQIndex(ul,4,Link,724) <Out>
  (long,1,Link,853) QN(-2)
  (long,2,Link,565) QN(0)
  (long,1,Link,745) QN(2)

IQIndex(S=1/2 2,2,Site,416) <Out>
  (Up 2,1,Site,957) QN(1)
  (decltype(nullptr) 2,1,Site,622) QN(-1)

IQIndex(ul,2,Link,411) <Out>
  (long,1,Link,282) QN(1)
  (long,1,Link,761) QN(-1)

----------------------------------------
IQIndexSet 2 = 
IQIndex(uc,4,Site,416) <Out>
  (c0,1,Site,216) QN(2)
  (c1,2,Site,631) QN(0)
  (c2,1,Site,174) QN(-2)

IQIndex(S=1/2 2,2,Site,416) <In>
  (Up 2,1,Site,957) QN(1)
  (decltype(nullptr) 2,1,Site,622) QN(-1)

IQIndex(ul,2,Link,411) <In>
  (long,1,Link,282) QN(1)
  (long,1,Link,761) QN(-1)

----------------------------------------
Mismatched IQIndex IQIndex(S=1/2 2,2,Site,416) <Out>
  (Up 2,1,Site,957) QN(1)
  (decltype(nullptr) 2,1,Site,622) QN(-1)

A backtrace of the error shows, that the occurrence is in the position() function, during the SVD in the normalization process (see below). I also verified this by using couts right before and after the function call. Interstingly, the error is not deterministically reproducible. If I store the MPS just before entering the position() function where it crashes with writeToMPS(), reload it and perform position() everything works as expected. Also the point where it crashes is different for different runs of the same parameter set (incl. same seed). Therefore, I have thought the problem may originate in some dynamical memory errors. To circumvent this I tried to define the MPS explicitly on the heap

MPSt<IQTensor> *psi = new IQMPS(psi0);

where psi0 is a InitState, but that didn't solve the problem.

Backtrace of error:

[cmnode006:296505] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0xf030)      [0x2b9079611030]
[cmnode006:296505] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35) [0x2b9079850475]
[cmnode006:296505] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x180) [0x2b90798536f0]
[cmnode006:296505] [ 3] condor_exec.exe() [0x9f1dcb]
[cmnode006:296505] [ 4] condor_exec.exe(itensor::detail::checkArrows(itensor::IndexSetT<itensor::IQIndex> const&, itensor::IndexSetT<itensor::IQIndex> const&, bool)+0x3cc) [0xa7400c]
[cmnode006:296505] [ 5] condor_exec.exe(itensor::ITensorT<itensor::IQIndex>::operator*=(itensor::ITensorT<itensor::IQIndex> const&)+0x85d) [0xa8175d]
[cmnode006:296505] [ 6] condor_exec.exe(itensor::Spectrum itensor::svd<itensor::ITensorT<itensor::IQIndex> >(itensor::ITensorT<itensor::IQIndex>, itensor::ITensorT<itensor::IQIndex>&, itensor::ITensorT<itensor::IQIndex>&, itensor::ITensorT<itensor::IQIndex>&, itensor::Args)+0x466) [0xa079d6]
[cmnode006:296505] [ 7] condor_exec.exe(itensor::Spectrum itensor::orthMPS<itensor::ITensorT<itensor::IQIndex> >(itensor::ITensorT<itensor::IQIndex>&, itensor::ITensorT<itensor::IQIndex>&, itensor::Direction, itensor::Args const&)+0x24a) [0xb8ecea]
[cmnode006:296505] [ 8] condor_exec.exe(itensor::MPSt<itensor::ITensorT<itensor::IQIndex> >::position(int, itensor::Args const&)+0xc2) [0xb97eb2]
[cmnode006:296505] [ 9] condor_exec.exe(local_magnetization(itensor::MPSt<itensor::ITensorT<itensor::IQIndex> >, itensor::BasicSiteSet<itensor::SpinHalfSite>, int)+0x4f) [0xa051ef]

Also I once got the error message

Index::write: Index is default initialized

This might be an effect of the commonIndex() function used in orthMPS(). Generally I see the error appearing in every kind of parameter space, but it happens earlier (in terms of runtime) for smaller systems with lower cutoff, that means it might depend loosely on the number of gates applied. I tried this also for a Hamiltonian system, i.e. without Jump operators, but the program still crashes.

Maybe someone has any idea about that? I would appreciate any kind of help.
Best,
Stefan

P.S.: for completeness, I attach part of the code:

MPSt<IQTensor> *psi = new IQMPS(psi0);
      *psi *= 1./norm(*psi);
  compute_quantum_trajectory(sites,*psi,tensor_gates,
                             ofile_name,N, jump_operators,
                             uni,args, nstart, nfinal,Measurements);

void compute_quantum_trajectory(SpinHalf &sites,
                                MPSt<IQTensor> &psi,
                                vector< pair< int , IQTensor > > &tensor_gates,
                                string ofile_name,
                                int N,
                                vector<JumpOperator> &jump_operators,
                                my_uni_RNG uni,
                                Args args,
                                int nstart,
                                int nfinal,
                                vector<TimeSeries> &Measurements
)
{
  int meas_counter =0;

  double eta = uni(); // draws a random number from boost RNG
  for(int step = nstart; step < nfinal; ++step)
  {
    measure(psi, sites, Measurements,step,ofile_name,args);

    do_non_unitary_time_evolution(tensor_gates,psi,args);

    if (norm(psi) * norm(psi) >= eta) continue;
    else
    {
      JumpOperator Li
          = choose_quantum_jump_operator(psi,sites,jump_operators,uni,args);
      psi.Anc(Li.site) *= sites.op(Li.operator_string,Li.site);
      psi.Anc(Li.site).noprime();

      psi *= 1./ norm(psi);
    }
    eta = uni();
  }
}

void
do_non_unitary_time_evolution(vector< pair< int , IQTensor > > &tensor_gates,
                              MPSt<IQTensor> &psi,
                              Args args)
{
  for(auto& G : tensor_gates)
  {
    auto b = G.first;
    psi.position(b,args);
    IQTensor AA = psi.A(b)*psi.A(b+1);

    AA *= G.second;
    AA.noprime();

    IQTensor D;
    svd(AA,psi.Anc(b),D,psi.Anc(b+1), args);
    psi.Anc(b+1) *= D;
  }
}

[UPDATE]
As mentioned by G. Misguich, the problem also appears for the Hamiltonian time evolution. I tested it with a code very similar to the one provided itensor/tutorial/05_gates/gates.cc.

commented Oct 26, 2016 by greg (150 points)

Hi. Just a brief comment: I am facing the same kind of error (Index Mismatch)
when performing some long real-time (unitary) evolutions on simple 1D spin-half models. As noted by swolf, the point where it crashes is different for different runs of the same parameter set.

commented Oct 26, 2016 by swolff (250 points)

I also see the problem in the unitary time evolution and updated the question.

commented Oct 28, 2016 by hermit0308 (1.2k points)

Hello, I also came across this kind of problem (Index Mismatch) a long time ago, when I was calculating the unitary time evolution for Bose-Hubbard model using "gates.cc" in the tutorial. It happens occasionally. Usually running the program again would finish the calculation successfully. I wanted to mention this but don't know how to describe it. Thanks.

1 Answer

answered Oct 27, 2016 by miles (70.2k points)
selected Dec 15, 2016 by swolff

Best answer

Hi, so unfortunately this is the toughest kind of bug to debug (occurs at random times and only at long times). We had a bug like this in version 1 of ITensor and though I never explicitly found it, we believe it had to do with memory management in the matrix layer which has now been completely rewritten from scratch to be more memory safe.

Hopefully the bug you're seeing isn't tied to a memory management issue in ITensor, but it's hard to know a priori.

To reproduce it I would need more than just parts of your code; a complete working (minimal) code would be very helpful.

One other possibility is that there is a memory bug in your driver code that only shows up later in ITensor, but is not actually cause by ITensor. This happened once a few years back with a driver code I was using that silently overstepped an array bound.

Let me know how you want to proceed. If you email me a working driver code I could take a look at it and run it myself to see if I get the same bug.

Miles

Show 12 previous comments

commented Dec 15, 2016 by swolff (250 points)

I used a usual signed 32-bit int and started from 0. I'm not completely sure, but shouldn't the RNG give (pseudo-) random integer numbers from the interval [0, 2^32-1] and not only a randomly ordered sequence of all integers in the interval? In that case one could even draw the same ID for the first two created indices if one is "unlucky". Then I think 64-bit ints would reduce the probability to get same indices, but would not kill the problem fully.

commented Dec 15, 2016 by miles (70.2k points)

Yes, you are right that even the first two numbers generated could be the same in principle. So 64 bit wouldn't guarantee it couldn't happen, but it seems like for a well-designed rng the odds should of this should be quite small for 64 bit since 2^64 ~ 1E19. Or at least I should research it more to see if there are theoretical estimates of how likely it is to happen for various rng's besides Mersenne twister.

But the random id's are not really needed for most purposes anyway. I put them in when I was doing parallel DMRG because otherwise I had issues if I didn't set the starting numbers on the separate computers at a different value. So it would be nice to have them but we can go back to sequential and then put in random as a compile-time option, or do more testing until we're comfortable that it will definitely work.

commented Dec 15, 2016 by swolff (250 points)

Ah ok I see, in that sense I agree, I think using a 64-bit RNG might also in my application case be a practical solution, since the probability for having same indices is scaled again by 2^-32 compared to the previous one, so again almost vanishing small.

commented Dec 18, 2016 by miles (70.2k points)

To follow up, I just tested your code running it about 6 times with the 64-bit rng and I didn't observe any crashes. So it seems like going to 64 bit does fix it.

I may go to sequential numbering, however, for the simple reason that we could keep using 32 bit numbers. Changing the id size would break the write-to-disk feature for people who have written tensors to file with the previous code.

Thanks again for your report and let me know if the problem does show up again even with sequential or 64 bit id's.

commented Dec 19, 2016 by swolff (250 points)

ok, that's good to hear. Thanks for the testing and for the support! Good to know that this issue is most likely solved.

Random IQIndex Mismatch for long runtimes

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Categories