Researchers have developed an Android device that allows for eavesdropping. It can recognize and identify the gender of callers, as well as their private conversations.
EarSpy is a side channel attack that aims to explore new avenues for eavesdropping. It captures motion sensor data from mobile devices and records the reverberations.
EarSpy is an academic project of five American universities: Texas A&M University (Texas A&M University), New Jersey Institute of Technology (Temple University), University of Dayton (University of Dayton) and Rutgers University.
This type of attack was previously explored using smartphone loudspeakers. However, the ear speaker system could not generate sufficient vibration to pose an eavesdropping threat.
Modern smartphones have more powerful speakers than models from a few decades ago. They produce a better sound quality, and vibrate with stronger frequencies.
Modern devices also use sensitive motion sensors, gyroscopes and other electronic components that are capable of recording even the smallest resonances coming from speakers.
Below is evidence of the progress made. The earphone from a 2016 OnePlus 3T registers very little on the spectrogram, while the stereo ear speaker of a 2019 OnePlus 7T produces significantly more data.
Try it and see the results
Researchers used a OnePlus 7T, OnePlus 9 and varying prerecorded audio to test their hypothesis. The sound was only heard through the earphones of both devices.
Additionally, the team used the third party app ‘Physics Toolbox Sensor Suite to collect accelerometer data in a simulated phone call. This was then sent to MATLAB to analyze and extract audio features.
Machine learning (ML), an algorithm that recognizes speech, gender, identity and caller identities was developed using easily available data.
Although the test data was varied with each device and dataset, it showed promising results when you try to eavesdrop via an ear speaker.
OnePlus 7T caller identification ranged from 77.7% to 98.7%; caller ID classification ranged from 63.0% to 91.2% and speech recognition between 51.8%, 56.4%, and 61.2%.
The researchers explained in their paper that they used classical ML algorithms to evaluate frequency and time domain features. This algorithm showed the greatest accuracy of 56.42%.
“As there are 10 classes, the accuracy still displays five times more accuracy than a random guess which suggests that vibrations due to the ear speakers induced a reasonable amount a distinguishable impact on accelerometer datasets” –
The OnePlus 9 phone had a top gender identification score of 88.7%. However, the average for identifying the speaker was 73.6%. Speech recognition scores ranged from 33.3% to 41.6%.
The researchers used the loudspeaker as well as the” application they developed during a similar attack in 2020. Caller gender, ID, and speech recognition accuracy were both 99% and 80% respectively.
Solutions and limitations
The volume that users select for their earphones can affect the effectiveness of the EarSpy attacks. It is possible to prevent side-channel attacks from being eavesdropped, and the volume chosen by users will be more pleasant for their ears.
It also impacts the distribution of speaker reverberation by how the hardware is arranged and tightened.
The accuracy of the speech data derived from environment can be affected by user movements or vibrations.
Android 13 introduces a limitation on collecting sensor data without consent for data rates exceeding 200 Hz. This prevents speech recognition at default sampling rates (400 Hz-500 Hz), but it does not affect accuracy if performed at 200Hz.
The researchers suggest that phone manufacturers should ensure sound pressure stays stable during calls and place the motion sensors in a position where internally-originating vibrations aren’t affecting them or at least have the minimum possible impact.