It depends on what you need to do with the board. It seems like most people (?) use a development board, like the Uno or one of the full-featured clones, as a prototype board. Then, if there's a project you've made that you want to keep, you can use a bare ATmega chip programmed with the same code (or just take the chip off the Uno) and off you go.
If you're looking to build your own development board, it's nice to have the USB interface on it, so it's self-contained -- self-powered, programmable, that sort of thing. But for projects, it makes sense to just use a standalone chip, and leave pins for connecting a programmer (which can be just another Uno board) or serial interface (FTDI cable or breakout board).
As far as the chip selection goes, there are lots of Arduino and clone boards out there using a whole gamut of chips. The ones that are most common (like the ATmega328, or the Tiny's, or the Megas) represent particular usage cases (small, middle of the road, lots of I/O). There are other chips, but these will do for just about anything you want to do. If you outgrow the Mega, you need to start thinking about getting into ARM or something like that.
I know you think the '328 looks limited. I did, when I first started. But as you start to get your hands on them, you start to realize you really don't need a Mega for every little project. E.g... There are I/O expanders to add inputs and outputs or interrupts; you can use several SPI sensors on the same clock and data lines; you don't really need KBs and KBs of RAM as often as you'd think...
The biggest thing is thinking in terms of a microcontroller, and not a computer. If you have some grand scheme where you think even the Mega might not have enough I/O or memory, you might be asking too much of one chip. More often than not, it's easer to split those projects up into pieces and have separate chips doing one part and communicating with each other, rather than trying to put touchscreen and Ethernet and sensors and whatnot on a single MCU. Without having a real OS, with task schedulers and all that, it gets cumbersome to service all those concurrent needs far before you run out of resources.
As for languages -- if there's a compiler to convert your preferred language to AVR binary, you can use it. The choices are C(++) and ASM for the most part, but something else may be available. IMO, .NET is a heavy language for embedded applications, but I've heard of people using it. Not specifically with AVR chips, but for microcontrollers in general. Obviously, if library support is important, you need libraries written in your language. And you'll probably have to give up the Arduino environment, as it's fundamentally a C++ wrapper to make the native AVR command set a little more accessible.
AFAIK, bootloaders execute the binary code flashed onto the chip, so you can use whatever bootloader you want regardless of how you write and compile your code. I think. I could be wrong.
Nothing answers your questions better than getting involved, though. So you might be best served by buying a board and giving it a shot. You'll get a handle on what's possible, and what's not, and be able to make more accurate sizing decisions.