Sunday, August 17, 2008

Acousting Imaging and The Dark Knight

I'm going to do my best to prevent this from being a spoiler, but if you don't trust me, then stop reading. I saw The Dark Knight in an IMAX last night. I really enjoyed it (for its genre). Movies have a tendency to bestow embarrassingly unrealistic capabilities on common technological devices. Computers, in general, are notoriously abused (Tron, The Ghost In The Machine, any modern detective mystery, etc...). Another commonly abused electronic device is the surveillance camera. A stereotypical example would be where a car drives by, and the forensics team is able to use advanced image "enhancement" to zoom in further and further, and eventually resolve the license plate number. They are violating so many laws (of physics, and math), I don't even know where to start with that one. However, I'd like to preemptively defend The Dark Knight on one seemingly ridiculous misuse of technology: the humble cell phone. Yes, what they attempted to do with cell pones isn't quite possible.... but its almost possible, so I was somewhat impressed.

In the movie, a system is developed for analyzing the sounds recorded by cellphones to create a three-dimensional representation of the room, or environment, surrounding the cell phone. This is actually sort-of possible via modern acoustic imaging techniques. Radio astronomy, passive sonar and bi-static radars all use a related set of techniques. Here's the basic idea: If you have a suite of microphones recording the ambient sound in the environment, you can cross-correlate the signals recorded by each microphone to estimate the locations of reflective surfaces (and sound-sources) in the environment. Now, I said its almost possible, but not actually possible. You could do it if you carefully controlled the conditions, but there are just too many deal-breakers in the real world to actually pull it off. Even if the NSA wanted to do this sort of thing, and was prepared to pay big bucks to try, here are the problems they'd have to contend with:

  1. The processing is vastly simpler when using an array of microphones vs a single microphone (cellphone). I think it might be possible, on paper, to do it with a single microphone, assuming it is moving in roughly random directions over the duration of the recording, you've calibrated the living hell out of the thing, and you know its precise location at each instant in time. My gut feel is that even under these conditions, you would need hours of data to be able to construct even the crudest of 3D images.
  2. Lets give them the benefit of the doubt, and assume they only attempt this method when they have multiple people in the same room, all with cell phones. This provides a spatially diverse array of microphones, perfect for acoustic imaging, right? Well, they are all likely oriented in different directions. Microphone frequency response is a function of AOA (angle of arrival). Even assuming that we have somehow calibrated (equalized) these microphones using some magical equalization technique which is capable of equalizing over all AOA space, we still have to contend with the fact that the person using each cell phone forms part of the beam pattern, and this is too big of an unknown (and a dynamic unknown, at that) to be calibrated away.
  3. Microphones are hideously noisey devices. Don't believe me, make some recordings using a normal microphone on your computer, and analyze the data in MatLab (or Python, if you aren't an evil software thief). This places severe limits on the rate of convergence to a solution.
  4. There's just no good way to know the exact 3D location of a cellphone, vs time. Yes, there are cellphones with built-in GPS, and there are geo-location techniques, which may be combined with GPS data, but the acoustic wavelengths of interest are on the order of centimeters. Therefore, we need to know the location of the cell phone to less than the smallest wavelength. 1/10th of the smallest wavelength would be nice. This means we'd need to know the cellphone's location down to about 1mm accuracy, for each sample of the recording. This is not, and will not be possible (in my lifetime). Please, prove me wrong.
  5. And finally, the most significant deal-breaker of all: data compression. Cellphones use extremely high data compression ratios. The compression scheme is optimized for the human voice and the human ear, and nothing else. It is horribly destructive to all other acoustic information, and most importantly, destroys the phase relationships between the spectral components in the data. The phase relationships are absolutely the worst components of the data to degrade, if you are interested in acoustic imaging. The phase information is where the time-delay information exists. If you throw that away, or significantly degrade it, there's just no hope of being able to form an image by correlating the data against data from other microphones. It would be a little like eliminating all of the consonants in a sentence. Example: o e i i ei a u i io ae. Good luck figuring that out.
So, it was neat to see modern DSP techniques get a little public exposure in a mainstream movie. Acoustic imaging is fun stuff, and is out there in the real world today, but it requires a lot of careful setup, calibration, and fancy algorithms (not to mention, massive computers), to make it work.

No comments: