"One thing that I didn't see on the FlashForth web page, etc - was any mention as to how to access and control (or read) any of the hardware GPIO; this may, however, have more to do with my ignorance of Forth, plus not reading the manual, tutorial, etc in more depth."
I/O is merely a matter of reading and writing the right bits to the right address (in I/O space on some Intel-type machines).
"I can see that FlashForth on the ATMega would likely be slower to use (ie - toggling data pins, etc), and you would also have less RAM available (because the Forth environment on the controller would take some). However, in exchange you would have an interactive system, it also looked like you could have multiple processes working at the same time,"
FlashForth is subroutine threaded, and should be relatively fast. The inner interpreter isn't used at run time, and many call-return sequences are optimized to jumps. Short words ae in-lined. Because there is no need for the inner interpreter or dictionary for turn-key programs, they needn't occupy code space. An interactive environment is marvelous for debugging and great for trying out hare-brained schemes. When I worked at Siemens Research, my Sun terminal was flaky one morning. It's boot ROM was written in Forth, and it was possible to abort to the interactive prompt before it finished. I keyed in a simple walking memory test, and located a bad chip. I called the in-house maintenance people and gave them the address. They dropped their skepticism when changing the chip (they had a map) fixed the problem and asked me how I knew without removing the cover. I explained, and they became adept at using Forth within a week.
"you might look ... a Forth to AVR/ATMega HEX compiler ..."
I'm not fond of source code converters. Their output often needs to be cleaned up manually.
"Finally, note that virtually any and all tutorials out there are going to assume you are using C/C++ for the Arduino (or ATMega) - as such, in order to utilize that material, you'd have to do some conversion yourself (and thus, have a working knowledge of C/C++ to do so). Also note that "behind the scenes" that the Arduino library (which encompasses all of the "Arduino commands" - like digitalWrite() and such) does some pretty fancy stuff which you might have to come up with solutions for, in order to emulate the functionality - if needed (you might have to delve into the C/C++ source code of the library, for instance). The Arduino library also abstracts things out from the different microcontrollers that the Arduino system can target (which can and does lead to issues - mainly edge cases related to speed); as such, if you write your code in Forth for the ATMega328 - you will probably have to do more than a bit of refactoring of your code to get it to work properly on other ATMega processors."
I can read C++, and when I'm in doubt, there are friends I can consult. When speed is an issue, I can drop into assembler. (Forth supports that and many include a built-in assembler.) That lets me use compiled code from other compilers by dealing with calling-convention issues.
I don't like being bound by type declarations. I want to be able to add 4 to 'A' and get 'E' or 0x45, as I choose to interpret the result. My idea of straightforward programming may be odd, but it works for me. For a look into that, see users.rcn.com/jyavins/tantrum. It contains both 8080 and 6811 assembly code.
Thanks for the help and advice.
P.S. When I hit Preview, everything disappeared. I was prepared this time with a copy.