
Apple is at it again. Its latest research significantly enhances speech recognition capabilities and reduces resource consumption while retrieving information from a large database.
A challenge in using ASR (automatic speech recognition) systems is identifying rare and user-specific terms. To tackle this challenge, the model uses NCB (neural context biasing) to retrieve information from external, user-provided databases, improving speech recognition.
However, processing large amounts of external data in NCB demands massive computational and memory resources.
Apple is set to solve this problem in two stages, marking its foray into the world of quantisation. The research uses vector quantisation to shortlist biasing entries only relevant to the input. “Quantisation-based technique allows the ASR model to leverage more biasing entries that would otherwise be discarded due to excessive compute and memory cost,” the authors said.
The model then uses a ‘cross attention’ mechanism to apply the selected biases to improve speech recognition.
“We proposed an efficient approximation to cross-attention that uses vector quantisation techniques, allowing us to ground large biasing catalogues quickly on audio data with limited memory footprint,” said the researchers.
The results revealed a 20% reduction in computational time and, most importantly, a 71% error reduction rate was observed. The authors also mentioned that the technique processes millions of entries in biasing databases without significant degradation in the recognition quality.
No More Excuses?
Apple took its time to join the AI bandwagon, yet Apple Intelligence wasn’t received well. It was available only on iPhone 15 Pro and later models, which meant that last year’s iPhone 15 would miss out on it. The company cited the lack of RAM on older iPhones as a barrier to Apple Intelligence.
This was also because Apple was insistent on running AI features on devices and local hardware. Popular insider Mark Gurman reported that Apple admitted failing to achieve the expected performance levels with older iPhones with lower RAM capacity. The only other option was to offload all tasks online, which would go against its privacy promise.
“You could, in theory, run these models on a very old device, but it would be so slow that it would not be useful,” said Apple’s AI/machine learning head John Giannandrea.
Therefore, this latest research seems promising. Apart from reducing compute time and error rates, the authors revealed an 85-95% reduction in memory usage.
It would be premature to speculate if this indicates Apple’s plans to integrate Apple Intelligence into older iPhones, but it does strengthen their commitment to enhancing AI and keeping it on-device in future devices as well.
Moreover, voice assistants, including Siri, are far from perfect. Earlier this year, a survey mentioned that Siri experienced difficulty in recognising specific accents from certain regions in the USA. Even with the release of Apple Intelligence, some users reported that Siri was struggling to recognise even the easiest of phrases.
“It’s funny to see this post now. Five minutes ago, I tried to use dictation to say, ‘upload speeds are slower but otherwise it’s working’, but what my phone heard was ‘happy birthday’,” said a user on Reddit.
A few users also mentioned that they haven’t noticed any significant changes in Siri, despite more updates being launched to Apple Intelligence. Moreover, most of the feedback on Apple Intelligence is received from the Beta versions. While it’s fair to notice discrepancies, there’s certainly room for improvement, given the capabilities shown by the research.
Not the First Time
This isn’t the first time Apple has explored techniques to improve speech recognition. Earlier this year, in May, Apple introduced a denoising model called DenoisingLM (DLM). It is an error correction model in which a TTS system generates a hypothetical, noisy environment for ASR, which is then paired with the original text to act as training data.
DLM achieved a favourable, low WER (word error rate) of under 3% on most benchmarks. DLM can also be applied to more diverse datasets, which further improves the accuracy and performance of ASR.
Moreover, Apple also used quantisation, among many other optimisation techniques on their on-device models. This helped reduce the latency to 0.6 milliseconds per prompt token and, increase the token generation rate to 30 tokens per second.
That said, there’s also a fair critique of ‘aggressive’ quantisation methods. A recent research explored the limitations of these methods. “Despite the popularity of LLM quantisation for inference acceleration, significant uncertainty remains regarding the accuracy-performance trade-offs,” it said.
After evaluating all techniques, the authors said that W8A8FP (8-bit floating-point weights and activations) is the most effective technique for achieving ‘near lossless accuracy’.
However, Apple’s approach to quantisation in improving ASR uses FSQ (Finite Scaling Quantisation). This method mostly offers raw efficiency gains, particularly in retrieval-heavy tasks, whereas W8A8FP is mostly suitable for broader, general-purpose use cases.
A Little Too Late?
Apple definitely isn’t alone in pioneering research and development on powerful ASR systems.
While OpenAI’s Whisper and Massively Multilingual Speech AI research models were released a few years ago, some of the latest improvements to ASR also come from Huawei, AssemblyAI and Moonshine.
Huawei’s Hard-Synth uses advanced text-to-speech and LLMs for data augmentation and creates ‘challenging’ audio samples to improve an ASR’s performance while reducing biases in gender and speaker recognition.
Even Assembly AI’s latest Universal-2 focuses on improving proper noun recognition by 24% over its predecessor, the Universal 1. It also enhances transcription features like precise time stamps and formatting accuracy.
On the other hand, Moonshine developed a low-latency ASR system targeted at short audio segments and for local, on-device deployments. Moonshine’s model outperformed OpenAI’s market-leading Whisper while maintaining strong accuracy in shorter sequences. It also demonstrated a five-fold reduction in computing demands compared to OpenAI’s lightweight Whisper Tiny.en model.
The post Hey Siri, You There? appeared first on Analytics India Magazine.


