Hi Josh,
So currently the only part of ITensor that uses multithreading is when ITensor makes calls to BLAS dgemm (and in some cases zgemm) for performing tensor contractions. When the contracted tensors are sufficiently large, BLAS libraries like MKL will automatically use multi-threading to speed them up.
So if you are using a good BLAS library such as MKL and you do calculations involving large tensors, you should see a speedup when you leave OMPNUMTHREADS or set it to a larger value versus setting OMPNUMTHREADS=1.
If you aren't seeing a speedup, it could be because the tensors involved in your code are not quite large enough, or your code's running time isn't dominated by tensor contractions.
In the future we plan to exploit multithreading more within ITensor itself.
Best regards,
Miles