SPI Slave Arduino Pi Pico

Hello,
I am stuck a bit trying to implement SPI Slave with Arduino Pi Pico.
I suppose that the main problem is in the SPI FIFO buffer.
I am trying to achieve ping-pong between master and slave byte per byte.

I create the sketch with the login in receive and send callbacks:

void recvCallback(uint8_t *data, size_t len)
{
    memcpy((uint8_t* )recvBuff, data, len);
    gLen = len;
    sentBack = true;
    // memcpy((uint8_t* )sendBuff, recvBuff, len);
}

void sentCallback()
{
    for (size_t i = 0; i < 8; i ++)
    {
        if (recvBuff[0] != 0)
        {
            SPISlave1.setData((uint8_t* )recvBuff, gLen);
        }
        else
        {
            SPISlave1.setData(DUMMY, sizeof(DUMMY));
        }
    }
}

The problem is that I am expecting that from the very beginning in case I got required byte (not 0x00) I should send back proper response in the next byte. But for the first 8 times the slave sent back not the data it should.

But from what I see the first 8 bytes master got are DUMMY, and only after that I saw the data which was sent 8 bytes before. So, I have a delay between the sent data and played back 8 bytes.
Could someone advise what am I doing wrong or how to avoid this delay?

The main code of the master is below:

void printAndTransfer(const char* label, const uint8_t* sendBuff, uint8_t* recvBuff, size_t size) 
{
    Serial.print(label);
    for (size_t i = 0; i < size; i++)
    {
        Serial.printf("%02X", sendBuff[i]);
        Serial.print(" ");
    }

    Serial.println();
    SPI.transfer(sendBuff, recvBuff, size);
    
    Serial.print("M-RECV: ");
    for (size_t i = 0; i < size; i++) 
    {
        Serial.printf("%02X", recvBuff[i]);
        Serial.print(" ");
    }

    Serial.println();
}


void loop()
{
    SPI.beginTransaction(spisettings);

    printAndTransfer("AA_REQUEST: ", AA_REQUEST, recvBuff, sizeof(AA_REQUEST));
    printAndTransfer("DUMMY: ", DUMMY, recvBuff, sizeof(DUMMY));

    SPI.endTransaction();
    delay(1000);

    SPI.beginTransaction(spisettings);
    
    printAndTransfer("AB_REQUEST: ", AB_REQUEST, recvBuff, sizeof(AB_REQUEST));
    printAndTransfer("DUMMY: ", DUMMY, recvBuff, sizeof(DUMMY));

    SPI.endTransaction();

    delay(1000);

    SPI.beginTransaction(spisettings);
    
    printAndTransfer("AC_REQUEST: ", AC_REQUEST, recvBuff, sizeof(AC_REQUEST));
    printAndTransfer("DUMMY: ", DUMMY, recvBuff, sizeof(DUMMY));

    SPI.endTransaction();

    delay(1000);

    SPI.beginTransaction(spisettings);

    printAndTransfer("AD_REQUEST: ", AD_REQUEST, recvBuff, sizeof(AD_REQUEST));
    printAndTransfer("DUMMY: ", DUMMY, recvBuff, sizeof(DUMMY));
    
    SPI.endTransaction();

    delay(1000);

    SPI.beginTransaction(spisettings);

    printAndTransfer("AE_REQUEST: ", AE_REQUEST, recvBuff, sizeof(AE_REQUEST));
    printAndTransfer("DUMMY: ", DUMMY, recvBuff, sizeof(DUMMY));
    
    SPI.endTransaction();

    delay(1000);

    SPI.beginTransaction(spisettings);

    printAndTransfer("AF_REQUEST: ", AF_REQUEST, recvBuff, sizeof(AF_REQUEST));
    printAndTransfer("DUMMY: ", DUMMY, recvBuff, sizeof(DUMMY));
    
    SPI.endTransaction();

    transmits++;
    delay(8000);
}

And the output I got on the master side:

AA_REQUEST: AA
M-RECV: EE
DUMMY: DB
M-RECV: EE
AB_REQUEST: AB
M-RECV: EE
DUMMY: DB
M-RECV: EE
AC_REQUEST: AC
M-RECV: EE
DUMMY: DB
M-RECV: EE
AD_REQUEST: AD
M-RECV: EE
DUMMY: DB
M-RECV: EE
AE_REQUEST: AE
M-RECV: EE
DUMMY: DB
M-RECV: DB
AF_REQUEST: AF
M-RECV: AB
DUMMY: DB
M-RECV: DB
AA_REQUEST: AA
M-RECV: AC
DUMMY: DB
M-RECV: DB
AB_REQUEST: AB
M-RECV: AD
DUMMY: DB
M-RECV: DB
AC_REQUEST: AC
M-RECV: AE
DUMMY: DB
M-RECV: DB
AD_REQUEST: AD
M-RECV: AF
DUMMY: DB
M-RECV: DB
AE_REQUEST: AE
M-RECV: AA
DUMMY: DB
M-RECV: DB
AF_REQUEST: AF
M-RECV: AB
DUMMY: DB
M-RECV: DB

And output from the slave side:

Len = 1
AA
Len = 1
DB
Len = 1
AB
Len = 1
DB
Len = 1
AC
Len = 1
DB
Len = 1
AD
Len = 1
DB
Len = 1
AE
Len = 1
DB
Len = 1
AF
Len = 1
DB

So the first received byte on the slave should trigger sending not dummy. But it happens not for the second byte. But 8 bytes later, which I believe related to the SPI FIFO buffer for RP2040.

what is the master SPI device?
any particular reason to use SPI? UART serial would be simpler

At the current debug stage I am using 2 Arduino Pi Pico.
But in the result. The slave board supposed to work with STM32 as SPI Master.
The device will have to emulate SPI peripheral by sending proper data from SD Card.
The Slave will have to response on couple requests from the Master. Actually Master could send more, but I will have to react just on the matching requests.
The first transactions should include 1 byte request from the master and next dummy byte pushed by master should return expected value.
In case master got expected response it should start another transaction with the first request byte, to which slave should response with 60 bytes. But this 60 bytes should be sent per 1 byte in the transfer. This is how actual master push the data.

So, the problem that I could not response to the master in the next byte, because I got some delay. Which is from my perspective goes because of SPI FIFO buffer.