Just wondering because it means using inline asm...

Suppose I have 4 C functions, all void func() so that pointer to one is pointer to any.

Suppose that using asm() I push the address of the next instruction (usual call return address, right?) and then the 4th, 3rd, and 2nd function addresses and finally jump to the 1st.

Would that execute the 1st function and return to the 2nd to execute it then return to the 3rd, etc, 4th that returns to the instruction after the jump?

Please don't ask why.
No it's not the best way and yes it loads 8 bytes onto the stack.

I just want to know if this behavior would result. It seems to be the way I read the docs, the call pushes and the ret pops, execution happens where the IP gets set.

Suppose I have 4 C functions, all void func() so that pointer to one is pointer to any.

I do not know what you mean by that...

yes, you can effect a "jump" by pushing a destination address than then doing a "ret" - this was standard practice on some of the old processors that didn't have a "jump to address in a register" instruction. I don't immediately see any reason that you couldn't push 4 separate addresses and have C functions innocently "return" to each one, successively, but I feel like I might be overlooking something. Why don't you fire up atmel studio and try it under debugging, in the simulator?

Atmel Studio won't run on an RPi 3+ last I saw. It requires Micro$oft.

For You WF, with massive respect I will try and explain and you can tell me what I'm blind to....

You see, the rets are already there in compiled C functions. The asm I have in mind is a small inline push regs (oh yeah, I do need to disable interrupts during that?) and not larger.. so far.

I'm seeing if I can shave some calls out of a virtual machine where a function returns only to call the next, why not have it skip that part and go straight to the meat of the next function and execute that?

I dunno if I'd have this problem with better written C/C++ or asm. You got a lot of functions that can run in order TBD later, usually you make them subroutines with returns, yes?

But I did say that example isn't the best way. The way the virtual code is stored it'd be better if the code in the functions ended by loading the next addy and jumping to it every time, no stack bloat that way. It's just that I'm not up to self-modifying C quite yet, LOL!

I think it should work. If you use inline ASM inside a function to do the pushes, you run the risk that the function prologue and epilogue have pushed/will pop values off of the stack that it thinks belong to it, but end up being the values you tried to push. The only way to tell for sure is to try it, perhaps carefully looking at the asm listing to see if the code looks correct. Without a debugger to support you, it might be pretty difficult to tell what is going wrong if it doesn’t work immediately…

Hmm. Something like:

void pushfuncs(void*, void*, void*) __attribute__((naked));
void pushfuncs(void*a, void*b, void*c) {
  asm(" push %A0\n"
      " push %B0\n" :"=r"(a));
  asm(" push %A0\n"
      " push %B0\n" :"=r"(b));
  asm(" push %A0\n"
      " push %B0\n" :"=r"(c));
}

void loop() {
  pushfuncs(f1, f2, f3);
}

The call of the function and consequential return, requires some code to keep the registers and the memory always in predictable state.

Would that execute the 1st function and return to the 2nd to execute it then return to the 3rd, etc, 4th that returns to the instruction after the jump?

This could be also in C, with inline functions called from the body of the 1st one. You can make them 'naked', without prologue/epilogue code; but in such case, you have to supervise the process by your own.

Maybe, some example would really help.

westfw:
I think it should work. If you use inline ASM inside a function to do the pushes, you run the risk that the function prologue and epilogue have pushed/will pop values off of the stack that it thinks belong to it, but end up being the values you tried to push. The only way to tell for sure is to try it, perhaps carefully looking at the asm listing to see if the code looks correct. Without a debugger to support you, it might be pretty difficult to tell what is going wrong if it doesn't work immediately...

Hmm. Something like:

void pushfuncs(void*, void*, void*) __attribute__((naked));

void pushfuncs(voida, voidb, void*c) {
 asm(" push %A0\n"
     " push %B0\n" :"=r"(a));
 asm(" push %A0\n"
     " push %B0\n" :"=r"(b));
 asm(" push %A0\n"
     " push %B0\n" :"=r"(c));
}

void loop() {
 pushfuncs(f1, f2, f3);
}

If I start off with a call whose return (placed on the stack by the call) takes me back to the engine, that function loads the stack and JUMPS to the top address. I think that the return should clear the top address, all the functions are void func(void) and exchange data and addresses on a stack as well as variables/constants and parameters in the compiled virtual code.

This is for an AVR Virtual Forth work-and-act alike that I hope will run more standard forth than existing AVR forths. I have the help of the author of Pocket Forth, a copy of Starting Forth (got my 1st copy in late 83) and my recollections of writing forth.

I think to you it'd be an interesting toy/tool. Starting Forth is a free PDF now, it gives full details and that book is not very thick. The flexibility it allows is amazing, you can write forth that extends itself even to the compiler.

Our first major boundary is that AVR's do not execute in RAM like every other Forth either of us knew. We have a way past that that puts most all of the load on the compiler and the pre-compiled C functions and data. The return-stack loading is just a way, if it even works to not have the code "return to ground/engine" until a branch command or end of word definition. It's supposed to substitute engine cranking (call after call) with machine-code-threading.

And I'm not sure it's going to save much at all or even that the Arduino compiler will let me do it!

I'm still looking at ways to not have subverted return addys stacked up. Still planning and running mini-tests on ideas.

It won't replace C++ on Arduino but Forth can be a nice learning environment. It has an interpreter that makes it easy to test what the defined words will do, examine the stack and memory. There isn't that much to learn but then there's only 26 letters in our alphabet too.

Budvar10:
The call of the function and consequential return, requires some code to keep the registers and the memory always in predictable state. This could be also in C, with inline functions called from the body of the 1st one. You can make them 'naked', without prologue/epilogue code; but in such case, you have to supervise the process by your own.

Maybe, some example would really help.

Mostly my understanding of the use of inline is that the compiler will do what it wants. That may kill this idea dead.

If I could end the threaded functions with code that would load the next addy from the next virtual code token (addy or value) and jump there, there would be no need to pre-load the stack. IMO the result would be a bit leaner and a bit faster.

I've got the engine figured out but with this it'd be faster and I might be able to get rid of the virtual return stack.

Last thing, this is first a 1284P project and aims to use Mighty_MCU to put selected virtual code from RAM to flash, changing memory-access functions to fit and run as threaded tokens in flash, that great - big - flash!

Can Mighty_MCU be edited to work on your Pro 1284 boards? I have blank and bootloaded 1284P's if not. I bought them for other projects and I like them and now a good use, a good reason to use them.

Can Mighty_MCU be edited to work on your Pro 1284 boards?

I do not know the Mighty, just years ago I briefly looked at it. However, it should work. Right now, I cannot imagine why not.
The difference is in port numbering. Especially, 'An' are numbered in opposite order. I think, you can live with it, or it can be changed with pins_arduino.h, but I'm not sure if simple replacement will solve it.

Only thing I could think of was the pin mapping and for the fast board the clock to time factor to get millis and micros right. But I don't have to load this project on that board.

How to ensure the 'naked' code?

PS -- you know more about the Mighty than I do. I choose it because write to flash at runtime is possible after the learning curve needed to make what I want -- possible at all with it and it's newer than the first bootloader to do that.

GoForSmoke:
How to ensure the 'naked' code?

attribute ((naked))

https://gcc.gnu.org/onlinedocs/gcc/AVR-Function-Attributes.html
https://www.microchip.com/webdoc/AVRLibcReferenceManual/mem_sections_1c_sections.html