Optimizing additive synthesis /phase accumulators

You calculate the phase[0...9] as 16 bit numbers but only use one byte in the code in the inner loop ..
You would make the code easier to execute (so faster) if you stored the bit of it you need to use.
outside the loop.

The phase accumulators are incremented after each sample is calculated in advance of the next sample to be rendered. I'm pretty sure I can't do that calculation outside the main loop.

I cannot see why sample should be a long
It contains the sum of upto 6 byte values, divided and shifted to be 0 centered, then multiplied by a byte volume.
It should be a signed 16 bit int...

You may be right about that. It's hard to keep track of how large the values will get. I think I made it a long at some point to make sure that an overflow wasn't the problem. When I first tried this code after writing it, it didn't work at all. That's why there's a few floating point values used in there in the code I commented out. I was trying to simplify things until I found the source of the problem, then I went back and optimized it again. I'll double check that and the other variables to see what I can make smaller.

If you could pull this logic out of the sample loop, and have three functions one for section02(), section1() and section2() you might get a speed up.

Yeah, I was looking at that after I posted the code. Not really sure how to go about optimizing that in a clean manner. Right now it's like:

For {
switch {
case:
case:
case:
}
}

But to avoid the case in the loop I'd need to flip it around something like this:

repeat {
switch {
case:
repeat {
} until we'vew rendered all samples in this section
case:
repeat {
} until we'vew rendered all samples in this section
case:
repeat {
} until we'vew rendered all samples in this section
} until we've rendered all samples

And all those case statements would have a ton of repeated code in them. Which isn't good if I want to be able to maintain the code and tweak it. Then I'd have to copy all the tweaks three times every time. And I've been doing tons of tweaking to get the sound right.

Is there a simulator environment in which you can profile the execution of the code?
Professionally, i refuse to optimise without measurement.
in this play environment it is fun almost because the measurement tools seem to be missing, but this is a lot of code to get running fast and correctly - and profile tools would help a lot.

I don't know of any such tool. But I guess I should look for that thing you mentioned which will let me look at the generated assembly code. I think it's gonna be hard to find the relevant section to look at in that though. I imagine it's not going to have any of the variable names and be a mess of stuff being pushed and popped off the stack and I won't be able to make heads or tails of it.