There is a nice diagram showing the mapping of Arduino pins to the ATmega168 which makes it much easier to read the Atmel documentation and figure out where internal hardware things reaches a pin out.
As far as I can find, no one has done the same diagram for the Arduino Mega 
You could look at the the Arduino Mega schematic (I realise that is a pretty horrible recommendation, sorry).
The pins which produce PWM signals come from 'Output Compare Units' in the timers, and have pin names which look like 'OCnx', where n is a digit, and is the timer number, and x is A, B or sometimes C, and is one of the outputs from timer n.
So OC3A, OC3B and OC3C are all outputs from Timer3.
Similarly OC1A and OC1B are outputs from Timer1.
The Arduino software sets up sets the timers to count from 0 to 255. The actual rate at which the timer counts is set by a 'prescaler', which divides the 16MHz clock. The prescalers are set to give Timer0 a frequency of around 1Kz (it generates the millis() clock) with a prescale value of 64.
The prescaler for the other timers are also 64, and so set for 0.5KHz (they are set for 'Phase Correct' PWM which runs at 1/2 the rate of the counter).
The calculation is:
- precaler = clock/64 = 16MHz/64 = 250KHz
- counter = prescaled clock/256 = 976.6Hz
- phase-correct PWM = 2 cycles = 488.3Hz
You want roughly 32KHz, so the prescaler needs to be about 65 times smaller, with 64x smaller = 1.
MarkT has shown you how to adjust that for Timer1, and a similar bit of code will work for any other timer (leaving Timer0 alone). You should check the Atmel documentation for the timer you choose as there are a couple of types of prescaler.
As you are okay with a frequency near 32KHz, and don't need it exactly, then that is all there is to do. (It gets a bit more intricate if you want an exact frequency match.)
It is straightforward to get a 50:50 duty cycle from a timer set up to count to the right number. Use analogWrite(pin, value) where value is 1/2 the maximum count. Where the timer is set to count from 0 to 255 (as all are) the 50:50 value is 127. You can see the Arduino pin which matches a specific OCnx pin on the schematic.
This approach completely avoids the need for an external crystal or oscillator. If you are looking to make the whole device very small, this technique is practical on all of the AVR's with timers, which means it would work with Atmel AVR's as small as an 8-pin SOIC. If you use a timer on the Arduino Mega which looks like the timer on your target AVR device, you should be able to get the code very, very close to the production code.
HTH
GB