An uncompleted 320*240 LCD library of RA8806

Hi, I'm working with arduino to drive some LCD for several months.
Text-mode 216,420 LCD,or
12864 GLCD with ks0108 control chip and
190
64 GLCD with st7920 control chip.
Thx God that these kinds of LCM has there library already in Playground to use.
Then here comes the point,
I even bought a GLCD which is huge as 320*240 ,using RA8806 control chip
http://goods.ruten.com.tw/item/show?21008252760542

Unfortunately, there's no RA8806 library on internet , include Playground.
So I decide to make one by myself.
There still exist RA8806 library for basic 8051 microchip on chinese website.
download: http://www.megaupload.com/?d=QBX7RSUA

Then I study the datasheet of RA8806 and use above 8051 ra8806 library as reference for several days to implement an simple library for arduino to drive ra8806 control chip LCD.

RA8806 library and sample sketch to draw line on the screen.
download: http://www.megaupload.com/?d=6GW2KMI3
RA8806 datasheet
downlode: http://www.megaupload.com/?d=5GJ032NR

But!!!!!!!!!!!!!!
It's work , but existing some problem.

1.Speed
The command/data read/write procedure i created seems too slow.
I wrought a draw point function , and call it repeatly to draw 4 lines around the boundary of 320*240 LCD.
These operation cost nearly 1 seconds =1085 milliseconds (counted by millis() and print on Serial )
Is there any idea to accelerate these library??
It seems cost too much time in the digitalWrite digitalRead, maybe there's exist some optimized way to do the pin access operation.

2.draw function extension
Sence I can draw point.
How to implement draw line function based on draw point func to draw a slope line??
a
.
.
.
.
.
.
b
I can just draw the straight horizontal and vertical line which is like below:
assme that a.y=b.y
a.......................b

psudocode:

for(int i=a.x;i<b.x;i++)
draw_point(i,a.y);