Problem is that I need to manage a complex device, with minimum 2 servo motors and few sensors. So I can't just send 2 bytes. I need to send one byte as identifier for 2 other bytes.
So do that. What I meant was that if you need to send a value, like 589, to the Arduino, you can send that as two bytes or as a string ('5', '8', '9'). Performing two reads, a left shift, an add and a store is going to be a lot faster than reading 3 bytes, and converting the 3 characters to a number.
Of course you still need to send an ID, and probably start and end markers.
Sending data when the Processing application and the Arduino are both ready should not require a delay() on either side. I don't know where the Processing or Arduino code in the SerialCallResponse example uses delay(), but they should not be necessary.
The code I have for Processing and Arduino exchanging data has no delay()s.