Hi Mingpu,
Sorry but no I don't have a reference on this because I'm not sure there is one. For how to do this with ITensor, the steps are not too hard:
1. use the AutoMPO feature to create different MPOs, one for each piece of the Hamiltonian you want to be in its own MPO
2. call the "generalized DMRG interface" described on this page:
http://itensor.org/docs.cgi?vers=cppv3&page=classes/dmrg, the one which takes std::vector<MPO> as its first argument
Then one just does timing to see whether it is faster to split the terms into separate MPOs or keep all as one MPO. We did this and found that it was faster to split them up.
If one works out the scaling of the iterative eigensolver step of DMRG one finds that it scales as m^3 k^2 (I believe) where k is the MPO bond dimension. So if one can change this to: D*m^3*p^2 where D is the number of different MPOs you split into, and p is the typical bond dimension of these new MPOs, then D*p^2 ought to be significantly smaller than k^2.
Best regards,
Miles