As I said before, this would probably be fast implemented as a shader, but I haven't seen any shader implementations of this anywhere. Do they exist? There's no point doing it in software since transferring data between VRAM and system RAM for CPU drawing is really slow.
Googling for 2xSai, shader and dx9 returned this very thread.
Anyway, if you omit dx9, you'll find some shader attempts on Pete's board, ngemu and dosbox sites. All require PS3.0, I am afraid, but I haven't checked thoroughly. Here is one, some might be PS2.0: http://www.si-gamer.net/gulikoza/dosbox.html