+2 votes
asked by (170 points)

Hello everyone!

I just discovered ITensor, and i think it's a fantastic library: thank you so much for making it available for julia too!

I was trying to figure out if I can use it to improve the full contraction between a multidimensional array and 5 vectors (currently this is what my research project requires).

For now I got the best performance with julia's "LoopVectorization" package (I've tried several).

Here is the full contraction (this is julia code). Of course this is a no-sense example which is only intended to illustrate the situation:

function contraction(D,A,a,b,c,f,e,indices)  

 @turbo for i = 1:D    

  a[i] = A[i,indices[1],indices[2],indices[3],indices[4]]       
  b[i] = A[i,indices[5],indices[6],indices[7],indices[8]]     
  c[i] = A[i,indices[9],indices[10],indices[11],indices[12]]
  f[i] = A[i,indices[13],indices[14],indices[15],indices[16]]
  e[i] = A[i,indices[17],indices[18],indices[19],indices[20]]        


  s = 0.0

  @turbo for q in 1:D, l in 1:D, k in 1:D, j in 1:D, m in 1:D
         s += A[m,j,k,l,q]*a[q]*b[l]*c[k]*f[j]*e[m]      
  return s 


These are the performance:

indices = rand(1:5,20);
A = rand(5,5,5,5,5);
a = zeros(Float64,5)
b = zeros(Float64,5)
c = zeros(Float64,5)
f = zeros(Float64,5)    
e = zeros(Float64,5)
D = 5

@btime contraction($D,$A,$a,$b,$c,$f,$e,$indices) 
  736.952 ns (0 allocations: 0 bytes)

It is nothing more than a full scalar contraction between a multidimensional array and 5 vectors. I wrote it in such a way that there are no memory allocations.

I have read the documentation, but I still don't understand if and how such a contraction can be implemented with ITensor. The best thing would be to do this in the GPU, especially when the dimension is much higher than 5 (as in this no-sense example).

Can anyone give me some advice about it? Thanks a lot in advance, and have a nice day everyone!

1 Answer

+1 vote
answered by (14.1k points)
selected by
Best answer

Given the arrays A, a, b, c, f, e, you can just do:

n = ndims(A)
is = Tuple(Index(size(A, n)) for n in 1:5)
Ait = itensor(A, is)
Ms = (a, b, c, f, e)
tn = [itensor(Ms[j], is[j]) for j in 1:n]
pushfirst!(tn, Ait)

For number of dimensions n == 5, I get this result: 9.823 μs (73 allocations: 14.75 KiB), so much slower than LoopVectorization. This isn't entirely surprising for such small dimensions, since ITensor uses a transpose-transpose-gemm-transpose (TTGT) approach (https://arxiv.org/pdf/1607.00145.pdf), where we first permute the tensor data so the contraction can be performed as a matrix multiplication. Therefore, we do the contractions pairwise, and each contraction potentially has a nontrivial amount of intermediate data allocated.

On the other hand, in this case LoopVectorization is able to perform the calculation with no intermediate tensor values, and also does a bunch of tricks to perform the loops in very efficient ways. I'm curious how they compare for larger tensor orders and dimensions, but this very specific tensor contraction that reduces to a single number may be better suited for the strategy that LoopVectorization takes (and in principle we can detect this kind of contraction and dispatch to LoopVectorization for special contractions like this, which we would like to investigate).

As for performing the calculation on GPU, you should be able to just install the ITensorsGPU package (https://github.com/ITensor/ITensorsGPU.jl) and perform the contraction like this instead:

using ITensorsGPU
tn_gpu = cu.(tn)

Note that I've found that using single precision (Float32) gives the best performance on GPU, so you may want to try that.

Welcome to ITensor Support Q&A, where you can ask questions and receive answers from other members of the community.

Formatting Tips:
  • To format code, indent by four spaces
  • To format inline LaTeX, surround it by @@ on both sides
  • To format LaTeX on its own line, surround it by $$ above and below
  • For LaTeX, it may be necessary to backslash-escape underscore characters to obtain proper formatting. So for example writing \sum\_i to represent a sum over i.
If you cannot register due to firewall issues (e.g. you cannot see the capcha box) please email Miles Stoudenmire to ask for an account.

To report ITensor bugs, please use the issue tracker.