Voice Scrambler

Hi everyone,

I'm considering the idea of making a voice scrambler, as seen by the Green Arrow from Smallville, for example. I was just wondering how this could be done using an Arduino board and which boards would be best to use.

Also, on a side note, could this be paired to a phone using the bluetooth module ?

Thanks, Aaron

The PWM “analog” output isn’t fast enough for audio. You might be able to use an additional DAC chip.

Otherwise, I don’t know how scrambling is done and don’t know if the Arduino can process the audio data fast enough in real-time.

There are multiple ways to scramble an audio signal.

What is the intent? To encode it, or to make it sound strange?

What is the intent? To encode it, or to make it sound strange?

The Arduino will certainly make it sound strange, missing large amounts of the data.

Hmmm...

Any scrambled signal would sound strange, true, but you could use the Arduino to scramble a signal and un-scramble it without missing a beat. Think in terms of adding or subtracting and obfuscating signal rather than DSP. Kind of like the old way they used to scramble cable TV and video tapes.

Think in terms of adding or subtracting and obfuscating signal rather than DSP.

The Arduino recognizes digital signal on the digital pins, and analog signals on the analog pins. The analog signals go through the analog to digital converter, so how would you add, subtract, or obfuscate the digital values without DSP?

DSP is literally "digital signal processing" and is what you do with sounds when you process them in the Arduino (or any other digital chip).

Old-school voice scramblers work on a side-band signal principle. When you ring modulate a signal with frequency A with a signal with frequency B, you generate two signals shifted in frequency, A+B, and A-B. if you make sure that they separate enough, you can notch filter out to get only one of those signals out, and send that through the wire. You then decode by doing the same thing again, and notching out the original side-band. You can do this in many separate bands, similar to a music vocoder effect unit, if you want to make un-scrambling somewhat harder.

I believe there's a separate related method where you can use aliasing to "mirror" the frequency spectrum of a signal, and transmit the mirrored/inverted signal, rather than the original signal. Low frequencies become high, and vice versa. However, I don't know how to implement this in practice, so I can't give more help there.

If you want to make a signal actually secure against eavesdropping, then the "scrambler" approaches don't work; they are trivial to attack. But I imagine you're not looking for cryptographic strength security, but rather than old-school spy movie sound :-)

So, how do you get the signal into the Arduino so you can run the necessary math (DSP) to scramble it? You probably buy a ADC/DAC chip that can talk SPI or I2C, and run it at some slow speed, such as 8 bit mono at 8 kHz sampling frequency, which the Arduino can keep up with. Note that this will still generate 8 interrupts per millisecond, so you'll have to be pretty efficient at whatever processing you're doing. This is the cheapest I could find with a quick search -- note that all these things are surface-mount these days, so an easy hook-up to a breadboard is not in the cards: http://search.digikey.com/us/en/products/SGTL5000XNAA3/SGTL5000XNAA3-ND/2186897

Sorry, that chip actually does I2C/SPI only for control, and data goes through I2S, which the Arduino isn't set up for. You'll need some higher-cost device to actually do audio in and out, and/or try to bend the SPI interface as a I2S interface (which I'm not sure is even possible on the Atmega/Arduino). This guy can do the input and output you need: http://datasheets.maxim-ic.com/en/ds/MAX1020-MAX1058.pdf but it's about $15 (and still surface mount): http://search.digikey.com/us/en/products/MAX1020BETX%2B/MAX1020BETX%2B-ND/1428009

It's already 've been done. Sure 8-bit mono, the same quality standard you are using everyday when you are talking by phone, and 500$ iPod makes no difference using the same network. http://interface.khm.de/index.php/lab/experiments/arduino-realtime-audio-processing/ To make voice sounds not traceable, like in TV shows anonymous voice, modulation should be with sawtooth wave instead of sine.

jwatte: Old-school voice scramblers work on a side-band signal principle. When you ring modulate a signal with frequency A with a signal with frequency B, you generate two signals shifted in frequency, A+B, and A-B. if you make sure that they separate enough, you can notch filter out to get only one of those signals out, and send that through the wire. You then decode by doing the same thing again, and notching out the original side-band. You can do this in many separate bands, similar to a music vocoder effect unit, if you want to make un-scrambling somewhat harder.

Right, but I was thinking of something even simpler. Produce 3 signals. One, a sync signal to synchronize the 'de-coding' a second signal with varying frequency to mix with the audio, and a third to amplitude modulate the mixed signal, again with varying as well as different frequency. Then mix the synch signal with the modulated signal. The synch signal would mark the beginning of a known procedure so that the 'de-coding' could be achieved. An Arduino with a minimal amount of analogue circuitry could control this process. You could even rotate the rates and sequences of the frequency changes. So, the Arduino would not actually read the signal and process it, but orchestrate the obfuscation of the signal.

Depending on how much you allowed these signals to affect the audio you could create all sorts of odd effects, or completely hide the signal such that it could no longer be heard by a human listener.

"mixing" signals just means adding sounds together. You'll get the speech, plus some tone. "amplitude modulating" the signal is the same thing as ring modulating. And I really don't think your process will sound like you think it will sound like. There would be no cryptigraphic strength in the encoding -- running a real cypher and spitting out bits like a modem would be much more secure. Also, encoding with the process you suggest requires pretty steep filtering to separate the lobes, and carefully selected ring modulation frequencies, because otherwise the process is not reversible. Steep filters in analog components is pretty hard.

OMG!

Okay. Look, I'm not talking about making it into some internationally acclaimed security solution, just scrambling an audio signal. And yes, I do know what it would sound like, and how it will work. I did the exact same thing 30 years ago with a 6502.

Why is it that every one on the internet just assumes everyone else is just an idiot?

Yeah, ur rite. Mez jus an id jot. meez i-kews onlee sevbenty. uz rok dewd!

Go with your god.

I did the exact same thing 30 years ago with a 6502.

If I recall, that chip is a CPU for a VIC-20 ? Right… So using the Ardiuno should be the same process or same way you did back 30 years ago. But with a “twist” … I think …

And I agree with you it is possible to built a voice scrambler.

Anyway, we are just brainstorming here…

Yes. Most notably it was the CPU in the Apple II. Variants of it could be found in the Vic-20, C-64, Atari 400 and 800, OSI systems, UK101 and about 20 other systems back then. It was also the main CPU choice for Bally and Atari's commercial arcade and bar video games, as well as the Atari 2600.

It is still manufactured today in 20Mhz CMOS versions and a IP cores by Western Design Center and has been in constant use in new products since its release in the late 70s.

I hve toyed with a concept for a stream compressor that might work well for a scrambler. My idea was to take blocks of bytes and rotate them on edge if you could visualize a byte as a playing card normally flat on the table and stacked vertically. Now take that deck and stand it on end, now all of the LSB values are at the bottom and the MSB at the top, now read one bit from each card and assemble a byte representing a group of bits from many bytes. Now do a run length encoding on these bytes and transmit as relative change vectors.
Say for instance out of 24 byte samples of Bit0 we had 3 repeats of bit pattern 11010111, we would store it as a nibble and a byte. 111,11010111.
The nibble being the quanta of repeats of the pattern to follow. Many types of media have large areas of slow changing mid to MSB values. While similar to RLE it is quite different as RLE counts the number of times a byte value repeats, not a bit value.

It is still manufactured today in 20Mhz CMOS versions

I am surprised it is still being manufacture.

Why is it that every one on the internet just assumes everyone else is just an idiot?

Usually they are just playing the odds. :D

Certainly you could compress these MSBits of an audio stream like that. But how about the next bit? The LSB? Wouldn't work.