Talking to Léonie Watson about computer vision and blindness

I recent­ly had the plea­sure of talk­ing to Léonie Wat­son for a sec­tion on the rehab Tech Talks pod­cast which I co-present for the com­pa­ny I work for. Léonie works with The Paciel­lo Group on web stan­dards acces­si­bil­i­ty, and I’ve met her on a hand­ful of occa­sions through con­fer­ences that she attends and speaks at. She is also com­plete­ly blind, hav­ing lost her sight as an adult due to com­pli­ca­tions of dia­betes (I high­ly rec­om­mend her per­son­al account of how it hap­pened, Los­ing Sight).

As our pod­cast top­ic was com­put­er vision—extracting seman­tic data from pho­tos and video using machine learning—I was keen to find out how this could help peo­ple with impaired vision. I was absolute­ly amazed and delight­ed by what I learned, and I want­ed to share this extract from our con­ver­sa­tion.

Peter: So regard­ing com­put­er vision, what I want to ref­er­ence specif­i­cal­ly is using it as a see­ing assis­tant; as an assis­tive aid, as it were. But this is kind of not that new; I was read­ing about a ser­vice called VizWiz from about six years ago which uses a com­bi­na­tion of com­put­er vision plus crowd­sourc­ing and social media to… so you as a blind per­son could be hold­ing an object—let’s say, a can of something—and not be aware of what it is; and this gives you the oppor­tu­ni­ty to… either for the com­put­er to recog­nise it and read the label to you, or if not for you to put out a plea to peo­ple to ask to tell you what it is.

Léonie: Yes, I just always found those ear­ly apps a lit­tle bit too cum­ber­some to be real­ly con­ve­nient. I think with that par­tic­u­lar app they used the Ama­zon ser­vice [Mechan­i­cal Turk] and quite often even when you tried to go down the human route you wouldn’t get a reply or some­thing wouldn’t hap­pen. So, yeah, I nev­er found them entire­ly use­ful, at the end of the day.

Peter: And I mean, I’ve used a quite, I sup­pose, sim­ple exam­ple, but we could be talk­ing about things which have poten­tial­ly harm­ful effects to you; if you don’t know which med­i­cine you were tak­ing out of a cab­i­net for exam­ple.

Léonie: Right. Yes, absolute­ly.

Peter: So you said you didn’t find that use­ful at the time, have you had the oppor­tu­ni­ty to use any more recent ver­sions of that? Espe­cial­ly since the break­throughs we’ve had in com­put­er vision in the last few years?

Léonie: I think that par­tic­u­lar app has gone away now, but Microsoft have just released one this year called See­ing AI, and it’s a free app, and it’s absolute­ly extra­or­di­nary. I knew friends in Amer­i­ca that had it before it was released here in the UK and I was dread­ful­ly envi­ous of them because it it does a whole bunch of things: it recog­nis­es bar­codes on most prod­ucts; you can get it to read short text… so as soon as I got this app I went off explor­ing the fridge! Just scan­ning all the things in the fridge, and it came to one thing that I didn’t recog­nise, so I just flipped it into a dif­fer­ent mode and got it to read the label on the pack­age instead. It would also read doc­u­ments although I must admit I haven’t played with that yet.

But what’s real­ly remark­able about it from my point of view is that you can just scan the phone around—just wave your phone around in front of you—and it’ll tell you about peo­ple that have come into the view­port of the cam­era; so it’ll tell you there is a per­son stand­ing cen­tre of the pic­ture, or a group of peo­ple stand­ing twen­ty feet away to the left. And what you can do is you can take a pic­ture of a person—with their per­mis­sion of course—and you can then label that pic­ture so the next time you’re wav­ing your phone around, if it spots that per­son it’ll announce them by name.

Now, wav­ing a phone around’s nev­er an entire­ly social­ly con­ve­nient way of get­ting that! But it’s just… it’s incred­i­ble. And it will also take pic­tures of your envi­ron­ment so you can sit… I was sit­ting in a hotel hav­ing break­fast not long ago and just held up my phone and took a quick snap­shot and it told me I was sit­ting oppo­site a win­dow, and told me what it could see out the win­dow; and that’s just infor­ma­tion I would nev­er have had unless I’d hap­pened to sort of ask who­ev­er I was with to describe it to me. But hav­ing that abil­i­ty to just do that inde­pen­dent­ly is real­ly quite remark­able

Peter: That’s incred­i­ble. So you said that uses still images, not live video input yet…

Léonie: So for the tak­ing pic­tures of the sort of scenery around you, yes it’s still just a snap­shot you have to take. With the recog­nis­ing peo­ple, it will do that as you pan around but if you want to get detailed infor­ma­tion about some­one you have to still take a pic­ture. And it’s often com­i­cal­ly accu­rate or com­i­cal­ly inac­cu­rate; I’ve seen it describe a friend of mine as being aged 40 when he was three days off his for­ti­eth birth­day, and then some­body point­ed the same app at me at the time and it described me as being a 67 year old—and I’m in my ear­ly 40s so I was some­what miffed about that! But, you know…

Peter: How rich are the descrip­tions it gives you?

Léonie: They’re pret­ty good! I think they’ve got the bal­ance right in this app between short and sweet but suit­ably descrip­tive. So I was with a friend, who’s just moved into a new house, this after­noon and was just using it to kind of look around and it sort of described that I was in a room next to a win­dow, there was a desk, and a chair, and a stuffed ani­mal, stuffed toy, sit­ting on the desk. So it gives you enough infor­ma­tion to real­ly get a good sense of what’s around you but it doesn’t take ten min­utes to do it, which is nice.

Peter: So it sounds like you’re real­ly impressed with that, and it’s some­thing you’d con­tin­ue to use?

Léonie: Absolute­ly. I’ve had it since it came out in the UK… what, I think maybe a week ago and, yes. Christ­mas shop­ping, actu­al­ly, it’s been absolute­ly bril­liant because I’ve just had all these parcels turn­ing up from online shop­ping and nor­mal­ly I have a hell of a job try­ing remem­ber a) what I’ve ordered, and b) try­ing to iden­ti­fy these things that turn up. But all I’ve done now is just scan bar­codes, point this thing at it, and every sin­gle time, there isn’t any­thing yet I haven’t man­aged to iden­ti­fy for myself which is… yeah that’s pret­ty cool.

Peter: Thats amaz­ing! Is there a risk as we men­tioned ear­li­er of it misiden­ti­fy­ing some­thing? And how does it han­dle errors, as such?

Léonie: So yes, it’s obvi­ous­ly quite capa­ble of misiden­ti­fy­ing things. Actu­al­ly the friend I was with this after­noon is female and it thought she was a chap, which… it’s embar­rass­ing as much as any­thing. She and I were in stitch­es! But in terms of sort of bar codes and things like that, any­thing that it hasn’t recog­nised it’s just sim­ply said ‘not recog­nised’. With the text, when it reads it it’s quite appar­ent when it gets it wrong because it’s just gar­bled.

I haven’t used it to try and do any­thing sort of life depend­ing, if that makes sense, and I don’t know that I would. If I absolute­ly had to… so I’m dia­bet­ic and I use insulin and I have two dif­fer­ent sorts and they’re kept in dif­fer­ent parts of the shelf in the fridge but usu­al­ly when I need to get one out I will get my hus­band to just to dou­ble check it just to make absolute­ly sure. And I think I prob­a­bly would use this app if I need­ed to do it and he wasn’t around to do that dou­ble check for me—but I would do it very, very cau­tious­ly, I must admit.

Peter: One of the oth­er prob­lems that peo­ple flag with com­put­er vision is bias in the data. I’ll give you a very extreme exam­ple: you may have read about this a few years ago, when Google launched their new machine learn­ing pow­ered Pho­tos prod­uct there was an extreme­ly bad case where it labeled black peo­ple as goril­las. Because there was hor­rif­ic bias in the data, it hadn’t been trained on a suf­fi­cient­ly diverse body of data. Now, hope­ful­ly that would nev­er hap­pen with this app but do you think the poten­tial bias could be an issue for some­thing like this?

Léonie: Almost inevitably. At least in the ear­ly days. I mean, as you say that Google example—I remem­ber it, that was a real­ly ter­ri­ble exam­ple of it. But data inevitably, I think, when we start feed­ing these sys­tems, is going to be biased in some direc­tion. Not nec­es­sar­i­ly a good or a bad one like that, that clear­ly was an awful bias to have dis­cov­ered. But you’ve got to start some­where, you can’t feed these sys­tems all of the data, all at once.

So inevitably you’ve got to start some­where and keep adding to it, adding to it, and adding to it. So in the ear­ly days more… yes, I think there are going to be bias­es; hope­ful­ly a lot of the time they’ll be bias­es that are either incon­se­quen­tial or go by unno­ticed. But yes, I don’t see how it can be any oth­er way. I think what we’ve got to try to do is not do what Google did and make stu­pid mis­takes like that. But yes, we have to start with some chunk of data some­where, and until we can keep adding to that and build­ing it, there is always going to be some bias towards what it knows already as opposed to what it doesn’t.

Peter: You’ve said obvi­ous­ly you can’t just wave your phone around all the time. But with mobiles and the mobile mar­ket is com­ing this dri­ve towards minia­tur­i­sa­tion of com­po­nents; and pre­sum­ably this is some­thing that could be almost, like, a dis­creet device that’s always on. And even poten­tial­ly in the future inte­grat­ed into your glass­es and giv­ing you descrip­tions through bone con­duc­tion. Do you think that’s where this would head?

Léonie: Absolute­ly. I hope so. I did send a tweet in Microsoft’s direc­tion to say, look, is there any chance we’re going to get this app to work with a blue­tooth cam­era? Because there are some pret­ty tiny blue­tooth cam­eras out there already. So if you teamed that up with a blue­tooth ear­piece you could leave your phone in the pock­et, and have access to that, attach it to a pair of glass­es. It might look a lit­tle bit cum­ber­some ini­tial­ly, but not near­ly as idi­ot­ic as wav­ing your phone around in front of you.

And also not as inse­cure; that’s the oth­er thing, I joke about being at con­fer­ences where I have a ter­ri­ble time recog­nis­ing people’s voic­es, even peo­ple I’ve met before, and you can’t real­ly wan­der around with your phone just stretched out in front of you and hope for the best. It’s… there are pri­va­cy con­cerns and it’s social­ly awk­ward. But if you could sort of min­imise it down to a set of glass­es or some­thing, it at least over­comes the social­ly awk­ward part—if not nec­es­sar­i­ly the pri­va­cy thing.

Peter: And that would, pre­sum­ably, be kind of life chang­ing for you—to have this thing con­stant­ly describ­ing the envi­ron­ment around you.

Léonie: Oh it would be extra­or­di­nary! Absolute­ly extra­or­di­nary!

If you found this inter­est­ing, I sug­gest lis­ten­ing to the full episode where we talk a lit­tle more about what com­put­er vision can add to the acces­si­bil­i­ty of vir­tu­al real­i­ty, as well as some chat with my col­leagues about a com­put­er vision pro­to­type we worked on. You can also sub­scribe to our pod­cast on iTunes and Pock­et Casts.

Google might be taking another tilt at messaging

I have a the­o­ry. Yes, anoth­er one. This time it’s about Google, and how I think they’re tak­ing anoth­er bite at the mes­sag­ing apple. And if I’m right, I think they have a bet­ter chance of suc­cess than pre­vi­ous efforts.

tl;dr: I think Google are going to use some of their biggest exist­ing prop­er­ties to launch their third wave of mes­sag­ing.

The Story So Far

Google have already made many attempts at mak­ing a mes­sag­ing app. Google Talk / Gchat was launched in 2005, but dis­con­tin­ued this year. Hang­outs came with Google+ in 2011, was spun out into its own prod­uct in 2013, sub­se­quent­ly suf­fered a series of con­fus­ing updates (includ­ing an ill-fat­ed attempt to merge with SMS), and has now been refo­cused as a busi­ness video con­fer­enc­ing tool. Speak­ing of SMS, there’s the ongo­ing attempt to make an com­peti­tor to Apple’s iMes­sage with Rich Com­mu­ni­ca­tions Ser­vice (RCS) in Android Mes­sages, but this is depen­dent on mobile car­ri­er sup­port, and is still far from wide­spread. Then there’s Allo, of which more short­ly.

To be clear, Google need to be in mes­sag­ing — it’s an incred­i­bly impor­tant space that’s cur­rent­ly being dom­i­nat­ed by Face­book (Mes­sen­ger, 1.3bn month­ly active users (MAUs), What­sApp (1.3bn MAUs), and Insta­gram(800mn MAUs)), Microsoft (Skype), and Apple (iMessage)—and that’s with­out men­tion­ing the Asian giants. Peo­ple are increas­ing­ly spend­ing more time in mes­sag­ing apps, and the more this behav­iour­al data is miss­ing from Google, the less pow­er­ful (and less valu­able) their own data becomes. Also, mes­sag­ing can poten­tial­ly be mon­e­tised, as Face­book are cur­rent­ly try­ing with Mes­sen­ger (and, poten­tial­ly, What­sApp Busi­ness).

Allo was intend­ed to be their Mes­sen­ger / What­sApp con­tender, but it was launched in 2016 into an already full and matur­ing mar­ket, and offered very lit­tle rea­son for peo­ple to switch from their cur­rent pre­ferred mes­sag­ing app—not least the net­work cost of switch­ing the social graph. One of the few things Allo did well was stick­ers, and they seemed to be pop­u­lar; it was one of the few fea­tures that has been reg­u­lar­ly enhanced and updat­ed since launch. But that’s not enough for most peo­ple to make the switch, and so Allo cur­rent­ly lan­guish­es with some 10 mil­lion down­loads on poten­tial­ly bil­lions of devices

The Third Wave

This year, Google announced enhanced shar­ing fea­tures in two of their biggest prop­er­ties: Pho­tos, (500mn MAUs) and YouTube (1.5bn MAUs, large­ly on mobile). In both, the shar­ing takes the form of pri­vate or group mes­sag­ing chats.

Right now, nei­ther chat is very rich. They sup­port web links (with­out pre­views) and emo­ji, but don’t have the fea­tures that mod­ern mes­sag­ing apps do: stick­ers, or GIFs, for exam­ple. But Gboard, Google’s mobile key­board, does have these fea­tures, along with trans­la­tion, search results and a lot more.

Pho­tos, YouTube and Gboard all work cross-plat­form. Pho­tos and YouTube pro­vide a huge amount of reach, and the rich fea­tures of mod­ern mes­sen­ger apps can be sup­plied by Gboard.

(It must be point­ed out that Gboard’s advanced fea­tures don’t cur­rent­ly work in either YouTube or Pho­tos shar­ing. But I don’t think it’s too much of a stretch that this can be enabled—and, as an added future fea­ture, with Assis­tant too.)

It could be that Allo becomes less of a dis­crete app, and more of a frame­work for mes­sag­ing in Google’s oth­er apps. There is some prece­dence in this, as Allo’s com­pan­ion app, Duo, is being steadi­ly inte­grat­ed into Android’s core apps.

So that’s my the­o­ry: Google are going to use some of their biggest prop­er­ties to launch their third wave of mobile mes­sag­ing. What do you think?

On the iPhone X’s notch and being distinctive

I’ve been think­ing about the ‘notch’ in the iPhone X. In case you’ve no idea what I’m talk­ing about, the X has an ‘all-screen’ design; the  home but­ton is gone, and the front of the device no longer has bezels above and below the screen except for a curv­ing indent at the top which holds image sen­sors nec­es­sary for the cam­era and the new facial authen­ti­ca­tion fea­ture.

It seems some­how like a design com­pro­mise; the sen­sors are of course nec­es­sary, but it feels like there could have been a full-width nar­row bezel at the top of the device rather than the slight­ly odd notch that requires spe­cial design con­sid­er­a­tion.

But my thought was: if they chose a full-width bezel, what would make the iPhone dis­tinc­tive? Put one on the table face-up next to, say, a new LG or Sam­sung Galaxy phone, how could you tell, at a glance, which was the iPhone?

Two rows of icons for smartphone functions, using an outline that looks similar to an iPhone
icons from the the noun project

The iPhone’s sin­gle but­ton design is so dis­tinc­tive that it’s become the de fac­to icon for smart­phones. With­out it, the phone looks like every oth­er mod­ern smart­phone (until you pick it up or unlock it). The notch gives the X a unique look that con­tin­ues to make it unmis­tak­ably an Apple prod­uct, even with the full-device screen. It makes it dis­tinc­tive enough to be icon­ic, and to pro­tect legally—given Apple’s liti­gious his­to­ry, not a small con­sid­er­a­tion.

Of course it requires more work from app design­ers and devel­op­ers to make their prod­ucts look good, but Apple is one of the few (per­haps only) com­pa­nies with enough clout, and a devot­ed fol­low­ing, to put in the extra work—you can’t imag­ine LG being able to con­vince Android app mak­ers to put in the extra shift in that way. So per­haps its still some­what of a design kludge, but it’s a kludge with pur­pose.

Augmented reality demos hint at the future of immersion

Twit­ter is awash with impres­sive demos of aug­ment­ed real­i­ty using Apple’s ARK­it or Google’s ARCore. I think it’s cool that there’s a pal­pa­ble sense of excite­ment around AR—I’m pret­ty excit­ed about it myself—but I think that there’s per­haps a lit­tle too much ear­ly hype, and that what the demos don’t show is per­haps more sug­ges­tive of the gen­uine­ly excit­ing future of AR.

Below is an exam­ple of the demos I’m talk­ing about — a mock­up of an AR menu that shows each of the indi­vid­ual dish­es as a ren­dered 3D mod­el, dig­i­tal­ly placed into the envi­ron­ment (and I want to make clear I’m gen­uine­ly not pick­ing on this, just using it as an illus­tra­tion):

This rais­es a few ques­tions, not least around deliv­ery. As a cus­tomer of this restau­rant, how do I access these mod­els? Do I have to down­load an app for the restau­rant? Is it a WebAR expe­ri­ence that I see by fol­low­ing  a URL?

There’s so much still to be defined about future AR plat­forms. Ben Evans’ post, The First Decade of Aug­ment­ed Real­i­ty, grap­ples with a lot of the issues of how AR con­tent will be deliv­ered and accessed:

Do I stand out­side a restau­rant and say ‘Hey Foursquare, is this any good?’ or does the device’s OS do that auto­mat­i­cal­ly? How is this bro­kered — by the OS, the ser­vices that you’ve added or by a sin­gle ‘Google Brain’ in the cloud?

The demo also rais­es impor­tant ques­tions about util­i­ty; for exam­ple, why is see­ing a 3D mod­el of your food on a table bet­ter than see­ing a 3D mod­el in the web page you vis­it, or the app you down­load? Or, why is it bet­ter even than see­ing a reg­u­lar pho­to, or just read­ing the descrip­tion on the menu? Do you get more infor­ma­tion from see­ing a mod­el in AR than from any oth­er medi­um?

Matt Mies­niks’ essay, the prod­uct design chal­lenges of AR on smart­phones, details what’s nec­es­sary to make AR tru­ly use­ful, and it pro­ceeds from a very fun­da­men­tal basis:

The sim­ple ques­tion “Why do this in AR, wouldn’t a reg­u­lar app be bet­ter for the user?” is often enough to cause a rethink of the entire premise.

And a series of tweets by Steven John­son nails the issue with a lot of the demos we’re see­ing:

Again, I’m not set­ting out to crit­i­cise the demos; I think exper­i­men­ta­tion is crit­i­cal to the devel­op­ment of a new technology—even if, as Mies­nieks points out in a sep­a­rate essay, a lot of this exper­i­men­ta­tion has already hap­pened before

I’m see­ing lots of ARK­it demos that I saw 4 years ago built on Vufo­ria and 4 years before that on Layar. Devel­op­ers are re-learn­ing the same lessons, but at much greater scale.

But plac­ing 3D objects into phys­i­cal scenes is just one nar­row facet of the greater poten­tial of AR. When we can extract spa­cial data and infor­ma­tion from an image, and also manip­u­late that image dig­i­tal­ly, aug­ment­ed real­i­ty becomes some­thing much more inter­est­ing.

In Matthew Panzarino’s review of the new iPhones he talks about the Por­trait Light­ing feature—which uses machine learn­ing smarts to cre­ate stu­dio-style photography—as aug­ment­ed real­i­ty. And it is.

AR isn’t just putting a vir­tu­al bird on it or drop­ping an Ikea couch into your liv­ing room. It’s alter­ing the fab­ric of real­i­ty to enhance, remove or aug­ment it.

The AR demos we’re see­ing now are fun and some­times impres­sive, but my intu­ition is that they’re not real­ly rep­re­sen­ta­tive of what AR will even­tu­al­ly be, and there are going to be a few inter­est­ing years until we start to see that revealed.