"UTFT_demo_480x320" is useful for people familiar with UTFT on a Mega2560.
The "graphicstest" style of examples are better for time comparisons.
I don't possess a "Waveshare abortion" but can tell you what a 320x480 native SPI does. e.g.
ST7796S with SPI_FREQUENCY = 40MHz : UTFT_demo_480x320 = 4.761 sec
ST7796S with SPI_FREQUENCY = 20MHz : UTFT_demo_480x320 = 5.617 sec
ST7796S with SPI_FREQUENCY = 8MHz : UTFT_demo_480x320 = 10.714 sec
The Waveshare is limited to 20MHz. So I would expect about 6 seconds
ST7796S with SPI_FREQUENCY = 20MHz : TFT_graphicstest_one_lib
TFT_eSPI library test!
Benchmark Time (microseconds)
Screen fill 623418
Text 98927
Lines 1265262
Horiz/Vert Lines 65582
Rectangles (outline) 40847
Rectangles (filled) 1509637
Circles (filled) 501178
Circles (outline) 452745
Triangles (outline) 255603
Triangles (filled) 750434
Rounded rects (outline) 170257
Rounded rects (filled) 1645034
Done!
ST7796S with SPI_FREQUENCY = 20MHz : Viewport_graphicstest (240x320)
TFT_eSPI library test!
Benchmark Time (microseconds)
Screen fill 267910
Text 98632
Lines 548326
Horiz/Vert Lines 31082
Rectangles (outline) 22191
Rectangles (filled) 496043
Circles (filled) 215081
Circles (outline) 196965
Triangles (outline) 121334
Triangles (filled) 283955
Rounded rects (outline) 84376
Rounded rects (filled) 558808
Done!
As I said earlier. Run all of the TFT_eSPI library examples. It will show you the capabilities (and limitations) of this library on your hardware.
Those examples don't use DMA. Appropriate use of DMA can make a dramatic performance. Especially when there is heavy computation like JPEG or GIF decoding.
David.