Looking at the specs for a pair of Apple in-ear headphones, I find this:
Impedance (at 100Hz): 23 ohms
Sensitivity (at 100Hz): 109 dB SPL/mW
109 dB is very loud - the CDC lists the permissible occupational exposure time at that level as under two minutes. So, unless you like it really loud, that would be a reasonable upper limit.
The calculation for voltage, using V^2/R = P, gives about 0.15V for a milliwatt into 23 ohms; assuming a sinusoidal input, the peak voltage at that RMS level is about 0.21V. That corresponds to an ADC output of about 43 using 5V as Aref. If all this is correct, then I'm not surprised that you read a peak of something like 25 from the ADC. I think that the voltage of the headphone output of a personal audio device is pretty low.
For this application, I'd recommend driving the analog input with an op-amp, powered from 5V and GND, with the signal capactively-coupled to the inverting input; biasing the inverting input at 2.5V, with either a high-impedance resistive divider, or a 2.5V reference IC and a high-value resistor, which will be less noisy; tying the non-inverting input to 2.5V, and picking the input and gain resistors to make it work - I think that's something of a trial and error effort. That'll give an ADC reading that swings back and forth around mid-scale, with a swing that's set by the op-amp circuit; you can use the differences between recent maxima and minima to decide whether the music is on. With the op-amp powered from the Arduino's own supply, you needn't worry that the analog input will swing outside the allowable range. You'll get wider swings with a rail-to-rail op amp than with a general purpose op-amp. And, of course, your op amp needs to operate at a supply of 5V.
A series resistor between the headphones and the input capacitor will help protect the circuit from unexpected excursions on the input. A couple of signal diodes in parallel would limit excursions to about 0.7V at the capacitor - I'd test that, though, to make sure that they don't give you noticable distortion on the signal at the listening levels you like.
Google "op amp" "single supply" to see some a lot of representative circuits. If op-amps aren't your forte, well, you'll want to learn to use them if you hope to play with audio interfaces.
Some warnings:
- The dynamic range of music content is huge - a reasonable expectation of the range might be 60 dB, making the input voltage vary by a factor of 1,000. It's unlikely that you'll be able to detect soft passages with any reliability. You can alleviate this by setting the op-amp gain really high, and letting it clip for louder passages; and by requiring a considerable delay before you decide that there's no music. You don't really care that the analog input is accurate.
- The headphones seem to load the output considerably - you read nothing with them connected, and 25 or so without them. The circuit is more likely to detect music when the headphones are disconnected.
Finally - I don't see that the capacitor and resistor in parallel with the input are helping anything. They'll just reduce the signal, though - if their values aren't too unreasonable - not by much.