BETA
This is a BETA experience. You may opt-out by clicking here

More From Forbes

Edit Story

Client vs. Server Architecture: Why Google Voice Search Is Also Much Faster Than Siri

This article is more than 10 years old.

In writing about the latest version of Google's voice search that is now available for iOS devices, I came across many references to the differences between how Apple's Siri and Google's product handle voice recognition. It seems clear that these architectural decisions are in large part responsible for the speed differential between the two applications.

Simply put, Google Voice Search, which is a feature of the Google Search app, performs the voice recognition for each query on the client-side, while Apple's Siri processes these requests on the server-side. This means that when you push the microphone icon on the Google app and start talking, the software process required to understand what you are saying is occurring on the device itself. And when you are performing the equivalent action on Siri, your device is passing that information to a remote server which is processing the request and then returning an answer, in pieces, back to your device. As you add words to your query, Siri adjusts the output until there is enough of a lull that it is convinced you are done. The advantage of Apple's method is that it enables "server-side learning," so that the system gets smarter overall, the more it is used. The disadvantage is that, depending on how long the query and how clearly you speak, there can be a lot of back and forth (http requests) to get an answer. In actual use, this distinction causes a noticeable lag in Siri's response time compared to the almost instantaneous recognition from Google's app.

When you get under the hood of what makes for great app experiences, this is the kind of stuff you come across. The design, the UI, is what you see of an app, but the underlying architecture and how it effects performance is what you feel. Before discussing this further, I want to make a short detour into the world of actual architecture.

A friend of mine studied with the great Spanish architect Rafael Moneo at Harvard. The Pritzker Prize-winner gave a lecture in the late 80s where he described, in his thick Spanish accent, one of his own buildings as being "about sickness and depth." His students scratched their heads and spent hours trying to puzzle out this enigmatic utterance until they realized that he had actually said "thickness and depth!"

When my friend told me this, over Taiwanese food in Boston, I had a bit of an Ah-Ha moment. I was on my way to a geeky Node.js "Framework Smackdown" hosted at Brightcove. The discussions all hinged on what has become a more important distinction than that between HTML 5 web apps and native apps, the "thickness" of the client. See developer Eric Sink's recent discussion of this perennial debate, and also the concluding section of this chestnut from Facebook's James Pearce, for more detail on this. Sink, and ultimately, Pearce, focus their arguments around the evolving nature of the client side of the equation, which enables developers to be more specific about architectural decisions than just saying, "it depends." Most of the progress that has been made in web and app development in the past 5-7 years has been through "thickening" the client side of the equation and in making the data flows between client and server asynchronous. AJAX (which stands for Asynchronous JavaScript and XML) was the first and most famous expression of this and Node.js is perhaps the most well-known recent example of this trend.

But in response to Moneo's quote above, it occurs to me that the two important parameters to look at in the design of an app's architecture—the relationship between what happens on the server and what on the client and how to handle the back and forth—are optimal "thickness" of the client in relation to the "depth" of the underlying data. In the case of the present face off, both apps have a tremendous depth of data that they are accessing. Google's data is significantly deeper and wider, but in terms of design decisions, both are effectively bottomless. An obvious thought is that since search is on "offboard" activity anyway, why "onboard" any part of the process?

The answer comes down to the programming paradigm of the separation of concerns. Making the processing of the voice recognition a function of the hardware of the device, makes that aspect of the application asynchronous with the actual database query. While Siri is going back and forth multiple times, Google is resolving the statement of the query on the client and then making a single request to the server. I will write about this more in a forthcoming post, but one of the things that characterizes mobile devices in contrast to desktop computers is the amount of interaction data (whether voice or touch) that flows between the user and the device. The way Google Voice Search optimizes for this makes it a much zippier choice.

Correction: Although the point I was trying to make here is true in a general sense, it appears that I made an unsupported assumption about Google's product that turns out not to be true. According to a representative from Google, in terms of the voice recognition functions of its new iOS search app, "We do very little on the device itself, and in fact the vast majority of processing goes on in the back-end. Just so happens those servers are really fast!"

This pushes the performance differences between Siri and Google's voice recognition software back into the server-side big data territory of Google's content advantage. It's still a faster product, but just not for the reasons I thought. The only positive side of making an incorrect assumption is that by eliciting a reaction from Google, it has clarified a point that some other commentators have gotten wrong as well. Always happy to be the canary in the coal mine!

– – – – – – – – – – – – – – – – – – – -

To keep up with Quantum of Content, please subscribe to my updates on Facebook or follow me on Twitter.