Hi Zhiyu,
This is a good question and sorry for such a slow reply. The ITensor codes, such as the parallel DMRG code, are provided "as is" and I don't support them to quite the extent as the main ITensor library.
But I am happy to suggest things you can try to make them do what you want.
For the first question, the fact that not all of the most up-to-date MPS tensors are available on each node is part of the design of the parallel code. If all the tensors were kept up to date it would incur far too much communication overhead. But at the time that you want to write all of the tensors together to disk, you can use the MPI tools in the file util/parallel.h to send each of the MPS tensors to a particular node (node 0, say) and then collect them into a single MPS and write them that way.
For the second question, I would have thought that write-to-disk would work ok with parallel DMRG, although I am not completely surprised it doesn't because it's such a complicated algorithm. One thing about the write to disk feature, as you observed, is that it uses randomized folder names which should prevent bugs due to collisions of names. But apparently the bug is caused by something else. Could you please file a bug report on the github repo for parallel DMRG? I will likely be using this code for a project myself soon and may need the write to disk feature. For now, I invite you to look at how write-to-disk is handled in the "LocalMPO" class (itensor/mps/localmpo.h) and see if there is some aspect that is not functioning properly within the parallel DMRG algorithm.
Best regards,
Miles