voice control

It’s interesting to see the fuss around Samsung’s use of voice control in its Smart TVs, because we’re going to see this happening with increasing frequency and urgency as voice-powered devices are more deeply integrated into our personal spaces. As well as other Smart TV models, Microsoft Kinect is already in millions of homes, and Amazon Echo is beginning to roll out.

These devices work in similar ways: you activate voice search with an opt-in command (“Hi TV”; “Xbox On”; “Alexa”). Android (“OK Google”) and iOS (“Hey Siri”) devices also function this way, but usually require a button press to use voice search (except when on the home screen of an unlocked device) – although I imagine future iterations will more widely use activation commands, especially on home systems like Android TV and Apple TV (with HomeKit).

Whatever system is used, after it’s activated by the voice a brief audio clip of the user’s command or query is recorded and transmitted to a cloud server stack, which is required for running the deep learning algorithms necessary to make sense of human speech.

The fear is that with any of these devices you could accidentally activate the voice service, then reveal personal data in the following few seconds of audio, which would be transmitted to the cloud servers – and potentially made available to untrusted third parties.

Given that this risk is present on all devices with voice activation, the differences I can see in the case of Samsung’s Smart TV are:

the terms explicitly warn you that data leak is a possibility;
the voice analysis uses third-party deep learning services instead of their own;
Samsung don’t say who those third parties are, or why they’re needed; and
it’s on your TV.

This leaves me with a lot of questions (and, I’m afraid, no good answers yet).

Could the first point really be at the root of the unease? Is it simply the fact that this potential privacy breach has been made clear and now we must confront it? Would ignorance be preferable to transparency?

If Microsoft’s Kinect is always listening for a voice activation keyword, and uses Azure cloud services for analysing your query, does the only difference lie in Samsung’s use of a third party? Or is it their vague language around that third party; would it make a difference if they made clear it would only be shared with Nuance (who also provide services for Huawei, LG, Motorola and more)? When the Xbox One launched there were concerns around the ‘always listening’ feature, which Microsoft alleviated with clear privacy guidelines. Is better communication all that’s needed?

If our options are to put trust in someone, or go without voice control altogether (something that’s going to be harder to resist in the future), then who do you trust with the potential to listen to you at home? Private corporations, as long as its them alone? No third parties at all, or third parties if they’re named and explained? Or what about if a government set up a central voice data clearing service, would you trust that? What safeguards and controls would be sufficient to make us trust our choice?

Aside: what would be the effect if the service we’ve trusted with our voice data began acting on it? Say, if Cortana recognised your bank details, should it let you know that you’ve leaked them accidentally? What are the limits of that? Google in Ireland reports the phone number of the Samaritans when you use text search to find information about Suicide, would it be different if it learned that from accidental voice leaks? What if a child being abused by an adult confided in Siri; would you want an automated system on Apple’s servers to contact an appropriate authority?

Finally, could the difference be as simple as the fact that Samsung have put this in a TV? Is it unexpected behaviour from an appliance that’s had a place in our living rooms for sixty years? If it were a purpose-built appliance such as Amazon’s Echo, would that change the way we feel about it?

This is just a small selection of the types of questions with which we’re going to be confronted with increasing frequency. There’s already a tension between privacy and convenience, and it’s only going to become stronger as voice technology moves out of our pockets and into our homes.

As I said, I don’t have answers for these questions. I do, however, have some (hastily considered) suggestions for companies that want to record voice data in the home:

Privacy policies which clearly state all parties that will have access to data, and why, and give clear notice of any changes.
A plainly-written explanation of the purpose of voice control, with links to the privacy policy, as part of the device setup process.
The ability to opt-out of using voice activation, with a hardware button to instigate actions instead.
Obvious audio and visual indicators that voice recording has started, and is taking place.
An easily-accessible way to play back, manage and delete past voice clips.

Many companies supply some or all of these already; I think we should be looking at this as a minimum for the next wave of devices.

Update: Here’s a look at how other companies communicate their privacy policies on monitoring.