Hi Chia-Min,
Good question. It really does permute all of the data, so the cost is proportional to the number of tensor elements.
The concept of permuting can be a little bit confusing in the case of ITensors, where the index order should not matter to the user. The way to think about it is that the permute function is a "lower-level" operation which is really permuting the way an ITensor is stored in memory. The usefulness of it is in case a different layout would make a later operation more efficient, or if you want to use a more traditional notation to access tensor elements such as T(1,2,3).
Otherwise, for normal types of operations involving ITensors such as adding them or contracting them with other ITensors, there is never any need to call the permute function.
Hope that helps!
Miles