Bit banged SPI as a slave? Must bit bang as I can't make use of usual SPI pins

I'm in a situation with a custom PCB, already fabricated, which has traces in place to route signals to/from an arduino (as a slave device) to a non-arduino master. As the design already exists and I hadn't thought of using SPI earlier on these traces are, by bad luck, attached all to pins other than the 10,11,12,13 digital pins* of the arduino (uno, in the form of an atmega328p. *I've quoted those pins in arduino pin numbering, on the board the traces are not on the DIL chip pins which correspond to those arduino pins). Also I use 10,11,13 for their PWM function so couldn't spare them for SPI anyway.

But I know that SPI can be bit banged, according to Gammon's site it can get up to about 52microSecs per byte with an example library he wrote (the faster version at Gammon Forum : Electronics : Microprocessors : SPI - Serial Peripheral Interface - for Arduino ). This library of his can do SPI on any four pins. I am not worried by this loss of speed as compared to what the hardware SPI interface can do, though would rather not get too much slower than this. But I think his library is only appropriate to arduinos as SPI masters rather than as slaves. Is there any way to use this kind of implementation for having an arduino as an SPI slave to another device, but using bit banging to do SPI on 4 pins of my choice rather than requiring use of 10,11,12,13 to do SPI with the atmega328p's hardware interface.

Thanks

There are several different implementations of software SPI, so if the library you have does not support the desired slave mode, try another.

Forget libraries if you need top speed. Even with highly optimized code you probably won't reach more than about 100kHz SPI speed. Do you have full control over that other MCU to limit the SPI frequency to that value. And keep in mind that the ATmega won't do anything else while it communicates using a bit-banged SPI slave interface as it will be busy handling all the interrupts and bits going out and arriving.

Entering an interrupt handler needs about 3.5µs on a 16MHz AVR, the exit from that handler needs another 2µs, so you need 5.5µs without having handled any information yet. Getting the bit handling and port settings/readings into another 6µs is challenging enough (so digitalWrite() is a no-go for example) and then you reached only 100kHz.

pylon:
Entering an interrupt handler needs about 3.5µs on a 16MHz AVR, the exit from that handler needs another 2µs, so you need 5.5µs without having handled any information yet.

I am particularly curious to know how this figure of 3.5µs comes about. According to my opinion, the worst case time to arrive at the ISR should be (approx.):

Time to finish the current instruction: 2 cycles
Time to push the return address onto stack : 2 cycles
Time to jump at vector address : 2 cycles
Time to arrive at the (re-direction) at the actual ISR : 2 cycles

Total time = 8 x 1/166 = 8 x 0.0625 µs = 0.5 µs?

how this figure of 3.5µs comes about

It is nonsense. Interrupt latency is highly variable and depends on several factors, including the number of registers that are saved.

Here is an informative article (but the technique described is not the best one can do): The overhead of Arduino Interrupts | Bill Grundmann's Blog

For bit banged SPI, you would not use interrupts anyway.

jremington:
Interrupt latency is highly variable and depends on several factors, including the number of registers that are saved.

Should we consider the time required to save the users registers onto stack as these registers could only be saved after arriving at the ISR?

Do read the article linked. It is relevant to your question, short and simple.

I think most people would define latency as something like the time from "action requested" until "action initiated".

If you disable other interrupts the worst case interrupt latency will be low. If you write the ISR in ASM and possibly use dedicated registers you can make it very fast. Hundreds of kHz surely. It all depends how much resources you are willing to sacrifice. And if the communication may be blocking. If it must be non-blocking and if other interrupts are active (i.e. millis) it will be slow and it will be very difficult to find worst case frequency that will surely work.

Ok, so just for a moment getting away from the exact calcaultion of speed limits for this, can anyone suggest a software SPI library which can cope with being the slave in a bus? I pointed out gammon's master example, and if what i end up with as a slave turns out to be somewhat slower then it's not too tragic, can anyone suggest one which would work as a slave too?

It is nonsense.

Wrong, it's just the time you need if you use the standard C interface (which saves many registers) to it as most people using the Arduino IDE do. I agree that you might reach faster responses if you code your handler in assembler and optimize it to use only one or two registers but that's not feasible for coding an SPI slave emulation library. I doubt that anyone would invest that much effort into such a solution instead of just switching hardware.