Dear Yixuan,
It might indeed be possible, but it is not something we have experimented with very much yet. I believe Steve White tried it at one point a few years ago, but found that with that generation of GPU and the method he was trying that the time spent moving memory to and from the GPU was a significant fraction of the computation time, such that there wasn't much gain. But it's something we are planning to investigate in the future. In the end, it might be very algorithm dependent.
Of course you are welcome to open up the code your self and try to add GPU support to various parts. If you try that and have some questions about the internals of the code I happy to answer them.
Miles