USB Device DMA?

Does anyone able to make USB DMA working?

I am working on writing USB low level driver for Chibios/RT, while I am able to get the basic USB working (loading FIFO with CPU/software), I want to implement the DMA functionality.

The issue is - I see the DMA loading the FIFO, but it does not send the packet out until I manually clear the TXINI/FIFOCON like what I did in the software.
If that's the case, what does END_B_EN and EP CFG's AUTOSW?
I thought the DMA controller will just send all the buffer out, including splitting them into packet(s) if necessary, and send them out automatically. But I couldn't get this behavior.

Is there something that I might have missed?