Go Down

Topic: Vidor GFX rasterizer (Read 599 times) previous topic - next topic

VStrakh

Hi.

I've been playing with the VidorBistream repository to learn the MKR Vidor 4000 board.

I've noticed that the Adafruit GFX library is preferring vertical lines, which is generally considered bad for SDRAM.
But when I've replaced the vertical lines with horizontal, the difference wasn't that significant. Like maybe ~8% boost. The main time consumer is the sw that calculates new coordinates, and does a lot of quite generic actions. And the speed of the '_lite' version was so embarrassingly low, it doesn't worth measuring.

So I've spent few evenings, making a hardware line rasterizer, able to draw filled rectangles and arbitrary lines using Bresenham's algorithm. And now I'd like to demonstrate some results.

My IP core takes ~500 LE, needs 4 clocks for arguments setup, and then can spit out new pixel every clock. It does it in parallel with the Nios, and will stall cpu with 'waitrequest' line when it will attempt to start next primitive.
The core does the clipping of the pixels against the screen borders, so it's ok to have primitives partially offscreen. But it does it only on the output, so it will spend clocks on invisible pixels.
The starting address of the video ram can be changed, the with/height is fixed in parameters, and can't be changed in run time.
Timequest estimates the core can run up to 189 MHz under "Slow 85C" model when optimized for speed, or something close to 177MHz with balanced (normal) flow.

Measuring performance, I see that on average I need around 1 million of clocks to fill the 640x480 screen, so I get only 1 pixel in 3 clocks. I guess that's related to the way SDRAM throughput is shared between HDMI and camera mixing stuff.

Few videos:
Original Nios /f firmware performance test
Same test on free Nios /e, with hardware rasterizer
Same test as before, but running from within FPGA, without jtag mailbox overheads

The precompiled fpga image is here


Is anyone would be interested in trying this lib? :)

Limba

Do you apply that waitrequest for Avalon bus when NIOS II is reading IP block status if it's ready?

I assume IP have also master interface for Avalon to write Arbiter.

Did you simulate IP with arbiter and sdram controller?


VStrakh

Yes, the core's avalon-mm master is connected to arbiter's input.

I did simulate the core, with Nios (both /f and /e ) and the sdram controller, but separately from the entire system or arbiter. Simulating the whole Vidor system would be quite long, considering that the code runs directly from flash.

I figured it would be better to not stall the cpu immediately after drawing initiated. This saves the time by overlapping rasterization with jtag mailbox exchanges, or with cpu code preparing inputs for the next primitive. The core draws one-pixel rectangle (in the circle drawing functions, pix() function) faster than Nios /e is done with the next instruction when running from on-chip ram.

There's really no point in waiting for drawing completion, unless the next primitive must be started. The ready state is readable, but it won't wait for the completion. If explicit synchronization is required, that can be done by any write to the core, like set the color or primitive type.

Limba

Maybe add status register that will halt cpu. Like cpu reading that when it want to send next command but IP is not ready then it stalls cpu.


VStrakh

Isn't it better the current way?

You can read the busy flag from the status register without stalling, and if you didn't wait for that flag to clear, then you can safely stall with the next write to core - be it the next primitive, or something nonessential, like color.

DarioPennisi

Hi Volodymyr,
looks like you did a great job. are you interested in issuing a pull request so that we can include your IP in the official repository?
we'll be happy to help you finalize integration and documentation shall you need any help.
thanks,

Dario

VStrakh

Hi Dario.

I'd love to have IP included in repository :)

But I'll need some guidance.
The core now has a verilog and _hw.tcl files under own ip/ subdir, but I had to change GFX core's 'gfx.c' significantly. Basically, replacing the generic implementations for primitives (lines/pixels) with hardware calls. That may be ok, but also maybe not. The core is fixed to 16-bit pixels, so having 'gfx.c' patched for hw rasterizer is likely breaking the functionality for 32-bit NeoPixels.

How should I proceed? Which materials to prepare, and how to properly share it?

Go Up