Data Privacy, Control, Transparency, and Regulation

I’ve written about privacy and personal data a few times before, and my conclusion generally remains the same: our data has value, and we should be able to benefit from the use of it, but we must be provided with control and transparency, backed up by strong regulation.

Pertinent to this, I was interested to read The Future Is Data Integrity, Not Confidentiality. This is an extract from a talk by Toomas Hendrik Ilves, President of Estonia, where they’re creating a digital society. In this talk he says:

“We have a law that says you own your own data. And you can see who has tried to access your data.”

And in What Happens Next Will Amaze You, the latest in a long line of excellent talks/essays by Maciej Cegłowski, he lays out six fixes for the busted internet power model (where users are somewhere near the bottom). These fixes include:

You should have the right to download data that you have provided, or that has been collected by observing your behavior, in a usable electronic format.

You should have the right to completely remove [your] account and all associated personal information from any online service, whenever [you] want.

Companies should only be allowed to store behavioral data for 90 days. Companies should be prohibited from selling or otherwise sharing behavioral data.

And, perhaps most important of all, there is a requirement for:

A legal mechanism to let companies to make enforceable promises about their behavior.

This is exactly what I mean. This is what I think the future should look like: we benefit from our personal and aggregated public data, with control and transparency, backed up by strong regulation. Who do we talk to, to make this happen?


Data use and privacy in Web services

Tim Cook recently made a speech attacking Silicon Valley companies (e.g. Google and Facebook) for making money by selling their users’ privacy. The problem with what he said is that, first of all, it’s fundamentally incorrect. As Ben Thompson points out (subscription required):

It’s simply not true to say that Google or Facebook are selling off your data. Google and Facebook do know a lot about individuals, but advertisers don’t know anything — that’s why Google and Facebook can charge a premium! [They] are highly motivated to protect user data – their competitive advantage in advertising is that they have data on customers that no one else has.

Cennydd Bowles also argues the same point:

The “you are the product” thing is pure sloganeering. It sounds convincing on first principles but doesn’t hold up to analysis. It’s essentially saying all two-sided platforms are immoral, which is daft.

The @StartupLJackson Twitter account puts this more plainly:

People who argue free-to-customer data companies (FB/Goog/etc) are selling data & hurting consumers are the anti-vaxxers of our industry.

I’ve always maintained that this is about a value exchange – you can use my data, as long as I get control and transparency over who sees it, and a useful service in return. But beyond that, another problem with making premium services where you pay for privacy is that you make a two-tier system. Cennydd again:

The supposition that only a consumer-funded model is ethically sound is itself political and exclusionary (of the poor, children, etc).

And Kate Crawford:

Two-tier social media: the rich pay to opt out of Facebook ads, the poor get targeted endlessly. Privacy becomes a luxury good.

Aside: Of course this suits Apple, as if wealthier clients can afford to opt out of advertising, then advertising itself becomes less valuable – as do, in turn, Google and Facebook.

The fact that people are willing to enter into a data exchange which benefits them when they get good services in return highlights the second problem with Tim Cook’s attack: Apple are currently failing to provide good services. As Thomas Ricker says in his snappily-titled Tim Cook brings a knife to a cloud fight:

Fact is, Apple is behind on web services. Arguably, Google Maps is better than Apple Maps, Gmail is better than Apple Mail, Google Drive is better than iCloud, Google Docs is better than iWork, and Google Photos can “surprise and delight” better than Apple Photos.

And even staunch Apple defender Jon Gruber agreed:

Apple needs to provide best-of-breed services and privacy, not second-best-but-more-private services. Many people will and do choose convenience and reliability over privacy. Apple’s superior position on privacy needs to be the icing on the cake, not their primary selling point.

As this piece by Jay Yarow for Business Insider points out, in the age of machine learning, more data makes better services. Facebook and Google are ahead in services because they make products that understand their users better than Apple do.


Samsung, Voice Control, and Privacy. Many Questions.

It’s interesting to see the fuss around Samsung’s use of voice control in its Smart TVs, because we’re going to see this happening with increasing frequency and urgency as voice-powered devices are more deeply integrated into our personal spaces. As well as other Smart TV models, Microsoft Kinect is already in millions of homes, and Amazon Echo is beginning to roll out.

These devices work in similar ways: you activate voice search with an opt-in command (“Hi TV”; “Xbox On”; “Alexa”). Android (“OK Google”) and iOS (“Hey Siri”) devices also function this way, but usually require a button press to use voice search (except when on the home screen of an unlocked device) – although I imagine future iterations will more widely use activation commands, especially on home systems like Android TV and Apple TV (with HomeKit).

Whatever system is used, after it’s activated by the voice a brief audio clip of the user’s command or query is recorded and transmitted to a cloud server stack, which is required for running the deep learning algorithms necessary to make sense of human speech.

The fear is that with any of these devices you could accidentally activate the voice service, then reveal personal data in the following few seconds of audio, which would be transmitted to the cloud servers – and potentially made available to untrusted third parties.

Given that this risk is present on all devices with voice activation, the differences I can see in the case of Samsung’s Smart TV are:

  1. the terms explicitly warn you that data leak is a possibility;
  2. the voice analysis uses third-party deep learning services instead of their own;
  3. Samsung don’t say who those third parties are, or why they’re needed; and
  4. it’s on your TV.

This leaves me with a lot of questions (and, I’m afraid, no good answers yet).

Could the first point really be at the root of the unease? Is it simply the fact that this potential privacy breach has been made clear and now we must confront it? Would ignorance be preferable to transparency?

If Microsoft’s Kinect is always listening for a voice activation keyword, and uses Azure cloud services for analysing your query, does the only difference lie in Samsung’s use of a third party? Or is it their vague language around that third party; would it make a difference if they made clear it would only be shared with Nuance (who also provide services for Huawei, LG, Motorola and more)? When the Xbox One launched there were concerns around the ‘always listening’ feature, which Microsoft alleviated with clear privacy guidelines. Is better communication all that’s needed?

If our options are to put trust in someone, or go without voice control altogether (something that’s going to be harder to resist in the future), then who do you trust with the potential to listen to you at home? Private corporations, as long as its them alone? No third parties at all, or third parties if they’re named and explained? Or what about if a government set up a central voice data clearing service, would you trust that? What safeguards and controls would be sufficient to make us trust our choice?

Aside: what would be the effect if the service we’ve trusted with our voice data began acting on it? Say, if Cortana recognised your bank details, should it let you know that you’ve leaked them accidentally? What are the limits of that? Google in Ireland reports the phone number of the Samaritans when you use text search to find information about Suicide, would it be different if it learned that from accidental voice leaks? What if a child being abused by an adult confided in Siri; would you want an automated system on Apple’s servers to contact an appropriate authority?

Finally, could the difference be as simple as the fact that Samsung have put this in a TV? Is it unexpected behaviour from an appliance that’s had a place in our living rooms for sixty years? If it were a purpose-built appliance such as Amazon’s Echo, would that change the way we feel about it?

This is just a small selection of the types of questions with which we’re going to be confronted with increasing frequency. There’s already a tension between privacy and convenience, and it’s only going to become stronger as voice technology moves out of our pockets and into our homes.

As I said, I don’t have answers for these questions. I do, however, have some (hastily considered) suggestions for companies that want to record voice data in the home:

  • Privacy policies which clearly state all parties that will have access to data, and why, and give clear notice of any changes.
  • A  plainly-written explanation of the purpose of voice control, with links to the privacy policy, as part of the device setup process.
  • The ability to opt-out of using voice activation, with a hardware button to instigate actions instead.
  • Obvious audio and visual indicators that voice recording has started, and is taking place.
  • An easily-accessible way to play back, manage and delete past voice clips.

Many companies supply some or all of these already; I think we should be looking at this as a minimum for the next wave of devices.

Update: Here’s a look at how other companies communicate their privacy policies on monitoring.


Some Further Thoughts On Privacy

The US has a (largely religion-driven) abstinence-until-marriage movement; in some states, schools are not required to provide sexual education to teens, and where it is provided, abstinence from intercourse is promoted as the best method of maintaining sexual health. But a 2007 meta-study found that abstinence-only at best had no effect at all on teen sexual health, and at worst led to higher rates of sexually-transmitted infections: in communities with greater than 20% of teens in abstinence-only programs, rates of STDs were over 60% higher than in those of regular programs.

Ignorance of their options meant these teens were less likely to use contraception when they did have sex, were more likely to engage in oral and anal sex, and less likely to seek medical testing or treatment.

I worry that ‘total privacy’ advocates are causing similar ignorance in people online. An article in the latest Wired UK heavily hypes up the scare of your data being publicly available, but without offering any explanation of why that’s bad or how you can take back control, beyond blocking all data sharing. By promoting zero-tolerance privacy, encouraging people to leave social networks or uninstall apps that share data, total privacy advocates fail to educate people on the privacy options that are available to them, and ways they can use data to their own advantage.

Facebook, for example, has excellent explanations of how they use your data, filters and preferences that let you control it, and links to external websites that explain and provide further controls for digital advertising.

My concern is that, if you advise only a zero-tolerance policy you run the risk of driving people away to alternatives that are less forthcoming with their privacy controls, or making them feel helpless to the point where they decide to ignore the subject entirely.  Either way they’ve lost power over the way they control their personal data, and are missing out on the value it could give them.

And I strongly believe there is value in my data. There is value in it for me: I can use it to be more informed about my health, to get a smarter personal assistant, to see ads that can be genuinely relevant to me. And there is value in it for everyone: shared medical data can be used to find environmental and behavioural patterns and improve the quality of public preventative healthcare.

I’m not blithe about it; I don’t want my data sold to unknown third parties, or used against me by insurers. I’m aware of the risks of the panopticon of small HD cameras that could lead to us all becoming witting or unwitting informants, and monitoring of communication by people who really have no business monitoring it.

What we need is not total privacy, but control over what we expose. We need transparency in seeing who gets our data, we need legislation to control the flow of data between third parties, we need the right to opt out, and we need better anonymity of our data when we choose to release it into large datasets.

Knowledge is power, and I’d rather have control of that power myself than completely deny it a place in the world.

Sources and further reading