Data Privacy, Control, Transparency, and Regulation

I’ve writ­ten about pri­va­cy and per­son­al data a few times before, and my con­clu­sion gen­er­al­ly remains the same: our data has val­ue, and we should be able to ben­e­fit from the use of it, but we must be pro­vid­ed with con­trol and trans­paren­cy, backed up by strong reg­u­la­tion.

Per­ti­nent to this, I was inter­est­ed to read The Future Is Data Integri­ty, Not Con­fi­den­tial­i­ty. This is an extract from a talk by Toomas Hen­drik Ilves, Pres­i­dent of Esto­nia, where they’re cre­at­ing a dig­i­tal soci­ety. In this talk he says:

We have a law that says you own your own data. And you can see who has tried to access your data.”

And in What Hap­pens Next Will Amaze You, the lat­est in a long line of excel­lent talks/essays by Maciej Cegłows­ki, he lays out six fix­es for the bust­ed inter­net pow­er mod­el (where users are some­where near the bot­tom). These fix­es include:

You should have the right to down­load data that you have pro­vid­ed, or that has been col­lect­ed by observ­ing your behav­ior, in a usable elec­tron­ic for­mat.

You should have the right to com­plete­ly remove [your] account and all asso­ci­at­ed per­son­al infor­ma­tion from any online ser­vice, when­ev­er [you] want.

Com­pa­nies should only be allowed to store behav­ioral data for 90 days. Com­pa­nies should be pro­hib­it­ed from sell­ing or oth­er­wise shar­ing behav­ioral data.

And, per­haps most impor­tant of all, there is a require­ment for:

A legal mech­a­nism to let com­pa­nies to make enforce­able promis­es about their behav­ior.

This is exact­ly what I mean. This is what I think the future should look like: we ben­e­fit from our per­son­al and aggre­gat­ed pub­lic data, with con­trol and trans­paren­cy, backed up by strong reg­u­la­tion. Who do we talk to, to make this hap­pen?


Data use and privacy in Web services

Tim Cook recent­ly made a speech attack­ing Sil­i­con Val­ley com­pa­nies (e.g. Google and Face­book) for mak­ing mon­ey by sell­ing their users’ pri­va­cy. The prob­lem with what he said is that, first of all, it’s fun­da­men­tal­ly incor­rect. As Ben Thomp­son points out (sub­scrip­tion required):

It’s sim­ply not true to say that Google or Face­book are sell­ing off your data. Google and Face­book do know a lot about indi­vid­u­als, but adver­tis­ers don’t know any­thing — that’s why Google and Face­book can charge a pre­mi­um! [They] are high­ly moti­vat­ed to pro­tect user data — their com­pet­i­tive advan­tage in adver­tis­ing is that they have data on cus­tomers that no one else has.

Cen­ny­dd Bowles also argues the same point:

The “you are the prod­uct” thing is pure slo­ga­neer­ing. It sounds con­vinc­ing on first prin­ci­ples but does­n’t hold up to analy­sis. It’s essen­tial­ly say­ing all two-sided plat­forms are immoral, which is daft.

The @StartupLJackson Twit­ter account puts this more plain­ly:

Peo­ple who argue free-to-cus­tomer data com­pa­nies (FB/Goog/etc) are sell­ing data & hurt­ing con­sumers are the anti-vaxxers of our indus­try.

I’ve always main­tained that this is about a val­ue exchange — you can use my data, as long as I get con­trol and trans­paren­cy over who sees it, and a use­ful ser­vice in return. But beyond that, anoth­er prob­lem with mak­ing pre­mi­um ser­vices where you pay for pri­va­cy is that you make a two-tier sys­tem. Cen­ny­dd again:

The sup­po­si­tion that only a con­sumer-fund­ed mod­el is eth­i­cal­ly sound is itself polit­i­cal and exclu­sion­ary (of the poor, chil­dren, etc).

And Kate Craw­ford:

Two-tier social media: the rich pay to opt out of Face­book ads, the poor get tar­get­ed end­less­ly. Pri­va­cy becomes a lux­u­ry good.

Aside: Of course this suits Apple, as if wealth­i­er clients can afford to opt out of adver­tis­ing, then adver­tis­ing itself becomes less valu­able — as do, in turn, Google and Face­book.

The fact that peo­ple are will­ing to enter into a data exchange which ben­e­fits them when they get good ser­vices in return high­lights the sec­ond prob­lem with Tim Cook’s attack: Apple are cur­rent­ly fail­ing to pro­vide good ser­vices. As Thomas Rick­er says in his snap­pi­ly-titled Tim Cook brings a knife to a cloud fight:

Fact is, Apple is behind on web ser­vices. Arguably, Google Maps is bet­ter than Apple Maps, Gmail is bet­ter than Apple Mail, Google Dri­ve is bet­ter than iCloud, Google Docs is bet­ter than iWork, and Google Pho­tos can “sur­prise and delight” bet­ter than Apple Pho­tos.

And even staunch Apple defend­er Jon Gru­ber agreed:

Apple needs to pro­vide best-of-breed ser­vices and pri­va­cy, not sec­ond-best-but-more-pri­vate ser­vices. Many peo­ple will and do choose con­ve­nience and reli­a­bil­i­ty over pri­va­cy. Apple’s supe­ri­or posi­tion on pri­va­cy needs to be the icing on the cake, not their pri­ma­ry sell­ing point.

As this piece by Jay Yarow for Busi­ness Insid­er points out, in the age of machine learn­ing, more data makes bet­ter ser­vices. Face­book and Google are ahead in ser­vices because they make prod­ucts that under­stand their users bet­ter than Apple do.


Samsung, Voice Control, and Privacy. Many Questions.

It’s inter­est­ing to see the fuss around Samsung’s use of voice con­trol in its Smart TVs, because we’re going to see this hap­pen­ing with increas­ing fre­quen­cy and urgency as voice-pow­ered devices are more deeply inte­grat­ed into our per­son­al spaces. As well as oth­er Smart TV mod­els, Microsoft Kinect is already in mil­lions of homes, and Ama­zon Echo is begin­ning to roll out.

These devices work in sim­i­lar ways: you acti­vate voice search with an opt-in com­mand (“Hi TV”; “Xbox On”; “Alexa”). Android (“OK Google”) and iOS (“Hey Siri”) devices also func­tion this way, but usu­al­ly require a but­ton press to use voice search (except when on the home screen of an unlocked device) — although I imag­ine future iter­a­tions will more wide­ly use acti­va­tion com­mands, espe­cial­ly on home sys­tems like Android TV and Apple TV (with Home­K­it).

What­ev­er sys­tem is used, after it’s acti­vat­ed by the voice a brief audio clip of the user’s com­mand or query is record­ed and trans­mit­ted to a cloud serv­er stack, which is required for run­ning the deep learn­ing algo­rithms nec­es­sary to make sense of human speech.

The fear is that with any of these devices you could acci­den­tal­ly acti­vate the voice ser­vice, then reveal per­son­al data in the fol­low­ing few sec­onds of audio, which would be trans­mit­ted to the cloud servers — and poten­tial­ly made avail­able to untrust­ed third par­ties.

Giv­en that this risk is present on all devices with voice acti­va­tion, the dif­fer­ences I can see in the case of Samsung’s Smart TV are:

  1. the terms explic­it­ly warn you that data leak is a pos­si­bil­i­ty;
  2. the voice analy­sis uses third-par­ty deep learn­ing ser­vices instead of their own;
  3. Sam­sung don’t say who those third par­ties are, or why they’re need­ed; and
  4. it’s on your TV.

This leaves me with a lot of ques­tions (and, I’m afraid, no good answers yet).

Could the first point real­ly be at the root of the unease? Is it sim­ply the fact that this poten­tial pri­va­cy breach has been made clear and now we must con­front it? Would igno­rance be prefer­able to trans­paren­cy?

If Microsoft’s Kinect is always lis­ten­ing for a voice acti­va­tion key­word, and uses Azure cloud ser­vices for analysing your query, does the only dif­fer­ence lie in Samsung’s use of a third par­ty? Or is it their vague lan­guage around that third par­ty; would it make a dif­fer­ence if they made clear it would only be shared with Nuance (who also pro­vide ser­vices for Huawei, LG, Motoro­la and more)? When the Xbox One launched there were con­cerns around the ‘always lis­ten­ing’ fea­ture, which Microsoft alle­vi­at­ed with clear pri­va­cy guide­lines. Is bet­ter com­mu­ni­ca­tion all that’s need­ed?

If our options are to put trust in some­one, or go with­out voice con­trol alto­geth­er (some­thing that’s going to be hard­er to resist in the future), then who do you trust with the poten­tial to lis­ten to you at home? Pri­vate cor­po­ra­tions, as long as its them alone? No third par­ties at all, or third par­ties if they’re named and explained? Or what about if a gov­ern­ment set up a cen­tral voice data clear­ing ser­vice, would you trust that? What safe­guards and con­trols would be suf­fi­cient to make us trust our choice?

Aside: what would be the effect if the ser­vice we’ve trust­ed with our voice data began act­ing on it? Say, if Cor­tana recog­nised your bank details, should it let you know that you’ve leaked them acci­den­tal­ly? What are the lim­its of that? Google in Ire­land reports the phone num­ber of the Samar­i­tans when you use text search to find infor­ma­tion about Sui­cide, would it be dif­fer­ent if it learned that from acci­den­tal voice leaks? What if a child being abused by an adult con­fid­ed in Siri; would you want an auto­mat­ed sys­tem on Apple’s servers to con­tact an appro­pri­ate author­i­ty?

Final­ly, could the dif­fer­ence be as sim­ple as the fact that Sam­sung have put this in a TV? Is it unex­pect­ed behav­iour from an appli­ance that’s had a place in our liv­ing rooms for six­ty years? If it were a pur­pose-built appli­ance such as Amazon’s Echo, would that change the way we feel about it?

This is just a small selec­tion of the types of ques­tions with which we’re going to be con­front­ed with increas­ing fre­quen­cy. There’s already a ten­sion between pri­va­cy and con­ve­nience, and it’s only going to become stronger as voice tech­nol­o­gy moves out of our pock­ets and into our homes.

As I said, I don’t have answers for these ques­tions. I do, how­ev­er, have some (hasti­ly con­sid­ered) sug­ges­tions for com­pa­nies that want to record voice data in the home:

  • Pri­va­cy poli­cies which clear­ly state all par­ties that will have access to data, and why, and give clear notice of any changes.
  • A  plain­ly-writ­ten expla­na­tion of the pur­pose of voice con­trol, with links to the pri­va­cy pol­i­cy, as part of the device set­up process.
  • The abil­i­ty to opt-out of using voice acti­va­tion, with a hard­ware but­ton to insti­gate actions instead.
  • Obvi­ous audio and visu­al indi­ca­tors that voice record­ing has start­ed, and is tak­ing place.
  • An eas­i­ly-acces­si­ble way to play back, man­age and delete past voice clips.

Many com­pa­nies sup­ply some or all of these already; I think we should be look­ing at this as a min­i­mum for the next wave of devices.

Update: Here’s a look at how oth­er com­pa­nies com­mu­ni­cate their pri­va­cy poli­cies on mon­i­tor­ing.


Some Further Thoughts On Privacy

The US has a (large­ly reli­gion-dri­ven) absti­nence-until-mar­riage move­ment; in some states, schools are not required to pro­vide sex­u­al edu­ca­tion to teens, and where it is pro­vid­ed, absti­nence from inter­course is pro­mot­ed as the best method of main­tain­ing sex­u­al health. But a 2007 meta-study found that absti­nence-only at best had no effect at all on teen sex­u­al health, and at worst led to high­er rates of sex­u­al­ly-trans­mit­ted infec­tions: in com­mu­ni­ties with greater than 20% of teens in absti­nence-only pro­grams, rates of STDs were over 60% high­er than in those of reg­u­lar pro­grams.

Igno­rance of their options meant these teens were less like­ly to use con­tra­cep­tion when they did have sex, were more like­ly to engage in oral and anal sex, and less like­ly to seek med­ical test­ing or treat­ment.

I wor­ry that ‘total pri­va­cy’ advo­cates are caus­ing sim­i­lar igno­rance in peo­ple online. An arti­cle in the lat­est Wired UK heav­i­ly hypes up the scare of your data being pub­licly avail­able, but with­out offer­ing any expla­na­tion of why that’s bad or how you can take back con­trol, beyond block­ing all data shar­ing. By pro­mot­ing zero-tol­er­ance pri­va­cy, encour­ag­ing peo­ple to leave social net­works or unin­stall apps that share data, total pri­va­cy advo­cates fail to edu­cate peo­ple on the pri­va­cy options that are avail­able to them, and ways they can use data to their own advan­tage.

Face­book, for exam­ple, has excel­lent expla­na­tions of how they use your data, fil­ters and pref­er­ences that let you con­trol it, and links to exter­nal web­sites that explain and pro­vide fur­ther con­trols for dig­i­tal adver­tis­ing.

My con­cern is that, if you advise only a zero-tol­er­ance pol­i­cy you run the risk of dri­ving peo­ple away to alter­na­tives that are less forth­com­ing with their pri­va­cy con­trols, or mak­ing them feel help­less to the point where they decide to ignore the sub­ject entire­ly.  Either way they’ve lost pow­er over the way they con­trol their per­son­al data, and are miss­ing out on the val­ue it could give them.

And I strong­ly believe there is val­ue in my data. There is val­ue in it for me: I can use it to be more informed about my health, to get a smarter per­son­al assis­tant, to see ads that can be gen­uine­ly rel­e­vant to me. And there is val­ue in it for every­one: shared med­ical data can be used to find envi­ron­men­tal and behav­iour­al pat­terns and improve the qual­i­ty of pub­lic pre­ven­ta­tive health­care.

I’m not blithe about it; I don’t want my data sold to unknown third par­ties, or used against me by insur­ers. I’m aware of the risks of the panop­ti­con of small HD cam­eras that could lead to us all becom­ing wit­ting or unwit­ting infor­mants, and mon­i­tor­ing of com­mu­ni­ca­tion by peo­ple who real­ly have no busi­ness mon­i­tor­ing it.

What we need is not total pri­va­cy, but con­trol over what we expose. We need trans­paren­cy in see­ing who gets our data, we need leg­is­la­tion to con­trol the flow of data between third par­ties, we need the right to opt out, and we need bet­ter anonymi­ty of our data when we choose to release it into large datasets.

Knowl­edge is pow­er, and I’d rather have con­trol of that pow­er myself than com­plete­ly deny it a place in the world.

Sources and further reading