The running of memory due to large m for dmrg and iqdmrg of one dimensional spin system

+1 vote
edited Jan 24

Hi, Miles.

In order to achieve the accuracy(cutoff) such as 1E-10 for my 1-D spin one system on , I need to put m to 4800 on iqdmrg. Here is my sweep:
Sweeps:
1 Maxm=10, Minm=1, Cutoff=1.0E-10, Niter=2, Noise=1.0E-07
2 Maxm=20, Minm=1, Cutoff=1.0E-10, Niter=2, Noise=1.0E-08
3 Maxm=100, Minm=1, Cutoff=1.0E-10, Niter=2, Noise=1.0E-10
4 Maxm=200, Minm=1, Cutoff=1.0E-10, Niter=2, Noise=1.0E-15
5 Maxm=400, Minm=1, Cutoff=1.0E-10, Niter=2, Noise=1.0E-20
6 Maxm=800, Minm=1, Cutoff=1.0E-10, Niter=2, Noise=1.0E-30
7 Maxm=1200, Minm=1, Cutoff=1.0E-10, Niter=2, Noise=0.0E+00
8 Maxm=2400, Minm=1, Cutoff=1.0E-10, Niter=2, Noise=0.0E+00
9 Maxm=4800, Minm=1, Cutoff=1.0E-10, Niter=2, Noise=0.0E+00
10 Maxm=9600, Minm=1, Cutoff=1.0E-10, Niter=2, Noise=0.0E+00

vN Entropy at center bond b=31 = 1.238744298544
Eigs at center bond b=31: 0.5049 0.3166 0.1094 0.0282 0.0180 0.0094 0.0059 0.0026 0.0024 0.0017
Largest m during sweep 1 was 10
Largest truncation error: 0.0258392
Energy after sweep 1 is 5.602725754556
Sweep 1 CPU time = 2.204s (Wall time = 2.205s)

vN Entropy at center bond b=31 = 1.612233679937
Eigs at center bond b=31: 0.3923 0.2871 0.1657 0.0503 0.0338 0.0229 0.0115 0.0107 0.0076 0.0071
Largest m during sweep 2 was 20
Largest truncation error: 0.0028786
Energy after sweep 2 is 0.530196370527
Sweep 2 CPU time = 36.54s (Wall time = 7.408s)

vN Entropy at center bond b=31 = 1.925120785867
Eigs at center bond b=31: 0.3345 0.2587 0.1691 0.0615 0.0459 0.0318 0.0192 0.0188 0.0125 0.0120
Largest m during sweep 3 was 100
Largest truncation error: 4.6379e-05
Energy after sweep 3 is -1.624350530351
Sweep 3 CPU time = 5m, 44.1s (Wall time = 22.07s)

vN Entropy at center bond b=31 = 2.274203834242
Eigs at center bond b=31: 0.2817 0.2263 0.1570 0.0639 0.0589 0.0345 0.0280 0.0265 0.0227 0.0190
Largest m during sweep 4 was 200
Largest truncation error: 4.08592e-05
Energy after sweep 4 is -1.868673250230
Sweep 4 CPU time = 22m, 17s (Wall time = 1m, 25.4s)

vN Entropy at center bond b=31 = 2.575774377167
Eigs at center bond b=31: 0.2459 0.1961 0.1393 0.0676 0.0569 0.0356 0.0332 0.0332 0.0312 0.0268
Largest m during sweep 5 was 400
Largest truncation error: 1.87256e-05
Energy after sweep 5 is -1.952988038907
Sweep 5 CPU time = 1h, 23m, 11s (Wall time = 5m, 19.3s)

vN Entropy at center bond b=31 = 2.959973291749
Eigs at center bond b=31: 0.1998 0.1549 0.1120 0.0760 0.0503 0.0458 0.0420 0.0412 0.0307 0.0254
Largest m during sweep 6 was 800
Largest truncation error: 9.42967e-06
Energy after sweep 6 is -2.003774164313
Sweep 6 CPU time = 5h, 44m, 4s (Wall time = 22m, 6s)

vN Entropy at center bond b=31 = 3.273794426370
Eigs at center bond b=31: 0.1523 0.1090 0.0917 0.0892 0.0524 0.0513 0.0494 0.0318 0.0305 0.0293
Largest m during sweep 7 was 1200
Largest truncation error: 6.29943e-06
Energy after sweep 7 is -2.024859515379
Sweep 7 CPU time = 11h, 34m, 3s (Wall time = 44m, 28s)

vN Entropy at center bond b=31 = 3.326094274331
Eigs at center bond b=31: 0.1442 0.0967 0.0943 0.0924 0.0517 0.0515 0.0513 0.0307 0.0306 0.0305
Largest m during sweep 8 was 2400
Largest truncation error: 7.15778e-07
Energy after sweep 8 is -2.028678884827
Sweep 8 CPU time = 46h, 7m, 7s (Wall time = 2h, 59m, 20s)
terminate called after throwing an instance of 'std::bad_alloc'
Aborted                 ./iqdmrg input


As it shows, the memory ran out in sweep 9 with m=4800
I observed my memory usage as following :

https://imgur.com/a/mKIUH

The memory of final sweep increases linearly to the upper limit of my node.
Is there any suggestion to resolve it?
Thank you.

Victor

commented Jan 26 by (650 points)
Hi Victor, have you tried "WriteM"? More details can be found at http://itensor.org/docs.cgi?page=classes/dmrg
commented Jan 28 by (20,240 points)
Thanks for posting the comment Chengshu! Feel free to also post an answer to a question such as this one if you are confident your answer could be helpful, especially since the forum supports multiple answers so I could still also post one too.
commented Jan 29 by (400 points)
Thank you all so much. That really helps.

+1 vote
answered Jan 28 by (20,240 points)

Hi Victor,
Yes please try setting the "WriteM" parameter as Chengshu suggests. That should help quite a bit with memory, although you may find you can only increase maxm by a few more thousand at most, since the memory usage scales quadratically with m.

But if you have more questions about it or are trying something specialized, please feel free to ask more or comment below.

Best regards,
Miles

commented Jan 29 by (400 points)
Hi, Miles.

I looked up the memory usage. It indicated that the use of cache memory. Is the cache memory acquired from disk? What's the difference between cache and swap memory?

Best,
Victor
commented Jan 29 by (20,240 points)
Hi Victor,
Unfortunately I'm not sure I know the answer to this question. Here is a link I found which may perhaps give the right explanation: https://unix.stackexchange.com/questions/263764/what-is-difference-between-cached-memory-and-used-memory

Probably the most authoritative place to look is in the documentation for the system usage program you are using, to see how it defines the quantities it reports.

On some unix systems, there is a simple command called "free" which you can use to see how much ram is free. It gives a straightforward and simple report of it.

Best,
Miles
commented Jan 31 by (400 points)
Yeah. You are probably right. I will try it.
Thank you.
+1 vote
answered Jan 30 by (610 points)

I am studying a 1D Hubbard model and I have run into the same problem.

My answer for improving the memory problems is to use a supercomputer...

Also just a warning: sometimes when the memory runs out, your data might not be output to your datafiles. I have a number of datasets that have missing values as a result of this and I didn't catch the error until I tried analyzing the data. I believe ITensor usually outputs error messages, but sometimes ITensor won't catch the error.

commented Feb 1 by (240 points)
I'm currently running 1D chain on the supercomputer. For Maxm=3500 the program aborted without any warning or error message. The memory I requested for this program is 15 GB and according to the system report I have used all of the 15 GB memory. I'm pretty sure that the program aborted because the memory runs out.  But there's no error message shown.

I also have a question about the memory. When I run the same program in the workstation, the program was assigned about 3-4 threads (cpu usage 350%) and the memory usage is about 6GB. When I run the program in the supercomputer, I requested 1 node 16 processors for the program, and 15GB memory is just not enough for this program. What determines the memory usage? A few extra message: the program I run is idmrg, there's no parallel code for it despite the built in BLAS and LAPACK.
commented Feb 1 by (610 points)
The supercomputer I am using has an option to allocate additional memory to a job (from multiple cores) and also has the option to use large-memory nodes.