Talking to Léonie Watson about computer vision and blindness

I recent­ly had the plea­sure of talk­ing to Léonie Wat­son for a sec­tion on the rehab Tech Talks pod­cast which I co-present for the com­pa­ny I work for. Léonie works with The Paciel­lo Group on web stan­dards acces­si­bil­i­ty, and I’ve met her on a hand­ful of occa­sions through con­fer­ences that she attends and speaks at. She is also com­plete­ly blind, hav­ing lost her sight as an adult due to com­pli­ca­tions of dia­betes (I high­ly rec­om­mend her per­son­al account of how it hap­pened, Los­ing Sight).

As our pod­cast top­ic was com­put­er vision—extracting seman­tic data from pho­tos and video using machine learning—I was keen to find out how this could help peo­ple with impaired vision. I was absolute­ly amazed and delight­ed by what I learned, and I want­ed to share this extract from our con­ver­sa­tion.

Peter: So regard­ing com­put­er vision, what I want to ref­er­ence specif­i­cal­ly is using it as a see­ing assis­tant; as an assis­tive aid, as it were. But this is kind of not that new; I was read­ing about a ser­vice called VizWiz from about six years ago which uses a com­bi­na­tion of com­put­er vision plus crowd­sourc­ing and social media to… so you as a blind per­son could be hold­ing an object—let’s say, a can of something—and not be aware of what it is; and this gives you the oppor­tu­ni­ty to… either for the com­put­er to recog­nise it and read the label to you, or if not for you to put out a plea to peo­ple to ask to tell you what it is.

Léonie: Yes, I just always found those ear­ly apps a lit­tle bit too cum­ber­some to be real­ly con­ve­nient. I think with that par­tic­u­lar app they used the Ama­zon ser­vice [Mechan­i­cal Turk] and quite often even when you tried to go down the human route you would­n’t get a reply or some­thing would­n’t hap­pen. So, yeah, I nev­er found them entire­ly use­ful, at the end of the day.

Peter: And I mean, I’ve used a quite, I sup­pose, sim­ple exam­ple, but we could be talk­ing about things which have poten­tial­ly harm­ful effects to you; if you don’t know which med­i­cine you were tak­ing out of a cab­i­net for exam­ple.

Léonie: Right. Yes, absolute­ly.

Peter: So you said you did­n’t find that use­ful at the time, have you had the oppor­tu­ni­ty to use any more recent ver­sions of that? Espe­cial­ly since the break­throughs we’ve had in com­put­er vision in the last few years?

Léonie: I think that par­tic­u­lar app has gone away now, but Microsoft have just released one this year called See­ing AI, and it’s a free app, and it’s absolute­ly extra­or­di­nary. I knew friends in Amer­i­ca that had it before it was released here in the UK and I was dread­ful­ly envi­ous of them because it it does a whole bunch of things: it recog­nis­es bar­codes on most prod­ucts; you can get it to read short text… so as soon as I got this app I went off explor­ing the fridge! Just scan­ning all the things in the fridge, and it came to one thing that I did­n’t recog­nise, so I just flipped it into a dif­fer­ent mode and got it to read the label on the pack­age instead. It would also read doc­u­ments although I must admit I haven’t played with that yet.

But what’s real­ly remark­able about it from my point of view is that you can just scan the phone around—just wave your phone around in front of you—and it’ll tell you about peo­ple that have come into the view­port of the cam­era; so it’ll tell you there is a per­son stand­ing cen­tre of the pic­ture, or a group of peo­ple stand­ing twen­ty feet away to the left. And what you can do is you can take a pic­ture of a person—with their per­mis­sion of course—and you can then label that pic­ture so the next time you’re wav­ing your phone around, if it spots that per­son it’ll announce them by name.

Now, wav­ing a phone around’s nev­er an entire­ly social­ly con­ve­nient way of get­ting that! But it’s just… it’s incred­i­ble. And it will also take pic­tures of your envi­ron­ment so you can sit… I was sit­ting in a hotel hav­ing break­fast not long ago and just held up my phone and took a quick snap­shot and it told me I was sit­ting oppo­site a win­dow, and told me what it could see out the win­dow; and that’s just infor­ma­tion I would nev­er have had unless I’d hap­pened to sort of ask who­ev­er I was with to describe it to me. But hav­ing that abil­i­ty to just do that inde­pen­dent­ly is real­ly quite remark­able

Peter: That’s incred­i­ble. So you said that uses still images, not live video input yet…

Léonie: So for the tak­ing pic­tures of the sort of scenery around you, yes it’s still just a snap­shot you have to take. With the recog­nis­ing peo­ple, it will do that as you pan around but if you want to get detailed infor­ma­tion about some­one you have to still take a pic­ture. And it’s often com­i­cal­ly accu­rate or com­i­cal­ly inac­cu­rate; I’ve seen it describe a friend of mine as being aged 40 when he was three days off his for­ti­eth birth­day, and then some­body point­ed the same app at me at the time and it described me as being a 67 year old—and I’m in my ear­ly 40s so I was some­what miffed about that! But, you know…

Peter: How rich are the descrip­tions it gives you?

Léonie: They’re pret­ty good! I think they’ve got the bal­ance right in this app between short and sweet but suit­ably descrip­tive. So I was with a friend, who’s just moved into a new house, this after­noon and was just using it to kind of look around and it sort of described that I was in a room next to a win­dow, there was a desk, and a chair, and a stuffed ani­mal, stuffed toy, sit­ting on the desk. So it gives you enough infor­ma­tion to real­ly get a good sense of what’s around you but it does­n’t take ten min­utes to do it, which is nice.

Peter: So it sounds like you’re real­ly impressed with that, and it’s some­thing you’d con­tin­ue to use?

Léonie: Absolute­ly. I’ve had it since it came out in the UK… what, I think maybe a week ago and, yes. Christ­mas shop­ping, actu­al­ly, it’s been absolute­ly bril­liant because I’ve just had all these parcels turn­ing up from online shop­ping and nor­mal­ly I have a hell of a job try­ing remem­ber a) what I’ve ordered, and b) try­ing to iden­ti­fy these things that turn up. But all I’ve done now is just scan bar­codes, point this thing at it, and every sin­gle time, there isn’t any­thing yet I haven’t man­aged to iden­ti­fy for myself which is… yeah that’s pret­ty cool.

Peter: Thats amaz­ing! Is there a risk as we men­tioned ear­li­er of it misiden­ti­fy­ing some­thing? And how does it han­dle errors, as such?

Léonie: So yes, it’s obvi­ous­ly quite capa­ble of misiden­ti­fy­ing things. Actu­al­ly the friend I was with this after­noon is female and it thought she was a chap, which… it’s embar­rass­ing as much as any­thing. She and I were in stitch­es! But in terms of sort of bar codes and things like that, any­thing that it has­n’t recog­nised it’s just sim­ply said ‘not recog­nised’. With the text, when it reads it it’s quite appar­ent when it gets it wrong because it’s just gar­bled.

I haven’t used it to try and do any­thing sort of life depend­ing, if that makes sense, and I don’t know that I would. If I absolute­ly had to… so I’m dia­bet­ic and I use insulin and I have two dif­fer­ent sorts and they’re kept in dif­fer­ent parts of the shelf in the fridge but usu­al­ly when I need to get one out I will get my hus­band to just to dou­ble check it just to make absolute­ly sure. And I think I prob­a­bly would use this app if I need­ed to do it and he was­n’t around to do that dou­ble check for me—but I would do it very, very cau­tious­ly, I must admit.

Peter: One of the oth­er prob­lems that peo­ple flag with com­put­er vision is bias in the data. I’ll give you a very extreme exam­ple: you may have read about this a few years ago, when Google launched their new machine learn­ing pow­ered Pho­tos prod­uct there was an extreme­ly bad case where it labeled black peo­ple as goril­las. Because there was hor­rif­ic bias in the data, it had­n’t been trained on a suf­fi­cient­ly diverse body of data. Now, hope­ful­ly that would nev­er hap­pen with this app but do you think the poten­tial bias could be an issue for some­thing like this?

Léonie: Almost inevitably. At least in the ear­ly days. I mean, as you say that Google example—I remem­ber it, that was a real­ly ter­ri­ble exam­ple of it. But data inevitably, I think, when we start feed­ing these sys­tems, is going to be biased in some direc­tion. Not nec­es­sar­i­ly a good or a bad one like that, that clear­ly was an awful bias to have dis­cov­ered. But you’ve got to start some­where, you can’t feed these sys­tems all of the data, all at once.

So inevitably you’ve got to start some­where and keep adding to it, adding to it, and adding to it. So in the ear­ly days more… yes, I think there are going to be bias­es; hope­ful­ly a lot of the time they’ll be bias­es that are either incon­se­quen­tial or go by unno­ticed. But yes, I don’t see how it can be any oth­er way. I think what we’ve got to try to do is not do what Google did and make stu­pid mis­takes like that. But yes, we have to start with some chunk of data some­where, and until we can keep adding to that and build­ing it, there is always going to be some bias towards what it knows already as opposed to what it does­n’t.

Peter: You’ve said obvi­ous­ly you can’t just wave your phone around all the time. But with mobiles and the mobile mar­ket is com­ing this dri­ve towards minia­tur­i­sa­tion of com­po­nents; and pre­sum­ably this is some­thing that could be almost, like, a dis­creet device that’s always on. And even poten­tial­ly in the future inte­grat­ed into your glass­es and giv­ing you descrip­tions through bone con­duc­tion. Do you think that’s where this would head?

Léonie: Absolute­ly. I hope so. I did send a tweet in Microsoft­’s direc­tion to say, look, is there any chance we’re going to get this app to work with a blue­tooth cam­era? Because there are some pret­ty tiny blue­tooth cam­eras out there already. So if you teamed that up with a blue­tooth ear­piece you could leave your phone in the pock­et, and have access to that, attach it to a pair of glass­es. It might look a lit­tle bit cum­ber­some ini­tial­ly, but not near­ly as idi­ot­ic as wav­ing your phone around in front of you.

And also not as inse­cure; that’s the oth­er thing, I joke about being at con­fer­ences where I have a ter­ri­ble time recog­nis­ing peo­ple’s voic­es, even peo­ple I’ve met before, and you can’t real­ly wan­der around with your phone just stretched out in front of you and hope for the best. It’s… there are pri­va­cy con­cerns and it’s social­ly awk­ward. But if you could sort of min­imise it down to a set of glass­es or some­thing, it at least over­comes the social­ly awk­ward part—if not nec­es­sar­i­ly the pri­va­cy thing.

Peter: And that would, pre­sum­ably, be kind of life chang­ing for you—to have this thing con­stant­ly describ­ing the envi­ron­ment around you.

Léonie: Oh it would be extra­or­di­nary! Absolute­ly extra­or­di­nary!

If you found this inter­est­ing, I sug­gest lis­ten­ing to the full episode where we talk a lit­tle more about what com­put­er vision can add to the acces­si­bil­i­ty of vir­tu­al real­i­ty, as well as some chat with my col­leagues about a com­put­er vision pro­to­type we worked on. You can also sub­scribe to our pod­cast on iTunes and Pock­et Casts.

Also pub­lished on Medi­um.