Thank you for the comment about trying things on known states! Somehow I have not thought of that. I tried on small clusters and by inspection of amplitudes I found the problem.
The inversion of my state is not just swapping of tensors, as one has to also take care of the fermionic minus sign.
Representing psi as a string of creation operators acting on the vacuum state, the inversion corresponds to changing their site index i --> N-i, AND commuting the string back into canonical order. The fermionic minus appears when commuting i past j, but only when both i and j are singly occupied.
Now if we define O_i as a diagonal operator which gives 1 for single occupation and 0 elsewhere, the minus sign structure of a permutation of i by j is given by a two level operator F(i, j) = 1 - 2 O_i O_j.
Parity inverting my MPS thus means first applying a string of F(1, 2) F(1, 3) ... F(1, N) (to commute level 1 to the end), then F(2, 3) ... F(2, N) and so on untill F(N-1, N). This will effectively multiply the correct amplitudes by -1.
The next step is to actually swap the tensors and the site indices, as discussed previously.
This approach works, and is not terribly slow or costly of memory (I think).
But as I implemented it currently, each F(i, j) consists of applying two single site operators (which is fine) and then summing up two MPS to get 1 - 2*O_i O_j. I don't think it is possible to write the expression as a product of two single-site operators.
So my question is whether you know of a way to improve this procedure? Do you think using autoMPO to obtain an MPO representation of the string of F(i, j) would be quicker? I haven't tried writing it as an MPO by hand yet, but that might be possible too.