Talking to Léonie Watson about computer vision and blindness

I recently had the pleasure of talking to Léonie Watson for a section on the rehab Tech Talks podcast which I co-present for the company I work for. Léonie works with The Paciello Group on web standards accessibility, and I’ve met her on a handful of occasions through conferences that she attends and speaks at. She is also completely blind, having lost her sight as an adult due to complications of diabetes (I highly recommend her personal account of how it happened, Losing Sight).

As our podcast topic was computer vision—extracting semantic data from photos and video using machine learning—I was keen to find out how this could help people with impaired vision. I was absolutely amazed and delighted by what I learned, and I wanted to share this extract from our conversation.

Peter: So regarding computer vision, what I want to reference specifically is using it as a seeing assistant; as an assistive aid, as it were. But this is kind of not that new; I was reading about a service called VizWiz from about six years ago which uses a combination of computer vision plus crowdsourcing and social media to… so you as a blind person could be holding an object—let’s say, a can of something—and not be aware of what it is; and this gives you the opportunity to… either for the computer to recognise it and read the label to you, or if not for you to put out a plea to people to ask to tell you what it is.

Léonie: Yes, I just always found those early apps a little bit too cumbersome to be really convenient. I think with that particular app they used the Amazon service [Mechanical Turk] and quite often even when you tried to go down the human route you wouldn’t get a reply or something wouldn’t happen. So, yeah, I never found them entirely useful, at the end of the day.

Peter: And I mean, I’ve used a quite, I suppose, simple example, but we could be talking about things which have potentially harmful effects to you; if you don’t know which medicine you were taking out of a cabinet for example.

Léonie: Right. Yes, absolutely.

Peter: So you said you didn’t find that useful at the time, have you had the opportunity to use any more recent versions of that? Especially since the breakthroughs we’ve had in computer vision in the last few years?

Léonie: I think that particular app has gone away now, but Microsoft have just released one this year called Seeing AI, and it’s a free app, and it’s absolutely extraordinary. I knew friends in America that had it before it was released here in the UK and I was dreadfully envious of them because it it does a whole bunch of things: it recognises barcodes on most products; you can get it to read short text… so as soon as I got this app I went off exploring the fridge! Just scanning all the things in the fridge, and it came to one thing that I didn’t recognise, so I just flipped it into a different mode and got it to read the label on the package instead. It would also read documents although I must admit I haven’t played with that yet.

But what’s really remarkable about it from my point of view is that you can just scan the phone around—just wave your phone around in front of you—and it’ll tell you about people that have come into the viewport of the camera; so it’ll tell you there is a person standing centre of the picture, or a group of people standing twenty feet away to the left. And what you can do is you can take a picture of a person—with their permission of course—and you can then label that picture so the next time you’re waving your phone around, if it spots that person it’ll announce them by name.

Now, waving a phone around’s never an entirely socially convenient way of getting that! But it’s just… it’s incredible. And it will also take pictures of your environment so you can sit… I was sitting in a hotel having breakfast not long ago and just held up my phone and took a quick snapshot and it told me I was sitting opposite a window, and told me what it could see out the window; and that’s just information I would never have had unless I’d happened to sort of ask whoever I was with to describe it to me. But having that ability to just do that independently is really quite remarkable

Peter: That’s incredible. So you said that uses still images, not live video input yet…

Léonie: So for the taking pictures of the sort of scenery around you, yes it’s still just a snapshot you have to take. With the recognising people, it will do that as you pan around but if you want to get detailed information about someone you have to still take a picture. And it’s often comically accurate or comically inaccurate; I’ve seen it describe a friend of mine as being aged 40 when he was three days off his fortieth birthday, and then somebody pointed the same app at me at the time and it described me as being a 67 year old—and I’m in my early 40s so I was somewhat miffed about that! But, you know…

Peter: How rich are the descriptions it gives you?

Léonie: They’re pretty good! I think they’ve got the balance right in this app between short and sweet but suitably descriptive. So I was with a friend, who’s just moved into a new house, this afternoon and was just using it to kind of look around and it sort of described that I was in a room next to a window, there was a desk, and a chair, and a stuffed animal, stuffed toy, sitting on the desk. So it gives you enough information to really get a good sense of what’s around you but it doesn’t take ten minutes to do it, which is nice.

Peter: So it sounds like you’re really impressed with that, and it’s something you’d continue to use?

Léonie: Absolutely. I’ve had it since it came out in the UK… what, I think maybe a week ago and, yes. Christmas shopping, actually, it’s been absolutely brilliant because I’ve just had all these parcels turning up from online shopping and normally I have a hell of a job trying remember a) what I’ve ordered, and b) trying to identify these things that turn up. But all I’ve done now is just scan barcodes, point this thing at it, and every single time, there isn’t anything yet I haven’t managed to identify for myself which is… yeah that’s pretty cool.

Peter: Thats amazing! Is there a risk as we mentioned earlier of it misidentifying something? And how does it handle errors, as such?

Léonie: So yes, it’s obviously quite capable of misidentifying things. Actually the friend I was with this afternoon is female and it thought she was a chap, which… it’s embarrassing as much as anything. She and I were in stitches! But in terms of sort of bar codes and things like that, anything that it hasn’t recognised it’s just simply said ‘not recognised’. With the text, when it reads it it’s quite apparent when it gets it wrong because it’s just garbled.

I haven’t used it to try and do anything sort of life depending, if that makes sense, and I don’t know that I would. If I absolutely had to… so I’m diabetic and I use insulin and I have two different sorts and they’re kept in different parts of the shelf in the fridge but usually when I need to get one out I will get my husband to just to double check it just to make absolutely sure. And I think I probably would use this app if I needed to do it and he wasn’t around to do that double check for me—but I would do it very, very cautiously, I must admit.

Peter: One of the other problems that people flag with computer vision is bias in the data. I’ll give you a very extreme example: you may have read about this a few years ago, when Google launched their new machine learning powered Photos product there was an extremely bad case where it labeled black people as gorillas. Because there was horrific bias in the data, it hadn’t been trained on a sufficiently diverse body of data. Now, hopefully that would never happen with this app but do you think the potential bias could be an issue for something like this?

Léonie: Almost inevitably. At least in the early days. I mean, as you say that Google example—I remember it, that was a really terrible example of it. But data inevitably, I think, when we start feeding these systems, is going to be biased in some direction. Not necessarily a good or a bad one like that, that clearly was an awful bias to have discovered. But you’ve got to start somewhere, you can’t feed these systems all of the data, all at once.

So inevitably you’ve got to start somewhere and keep adding to it, adding to it, and adding to it. So in the early days more… yes, I think there are going to be biases; hopefully a lot of the time they’ll be biases that are either inconsequential or go by unnoticed. But yes, I don’t see how it can be any other way. I think what we’ve got to try to do is not do what Google did and make stupid mistakes like that. But yes, we have to start with some chunk of data somewhere, and until we can keep adding to that and building it, there is always going to be some bias towards what it knows already as opposed to what it doesn’t.

Peter: You’ve said obviously you can’t just wave your phone around all the time. But with mobiles and the mobile market is coming this drive towards miniaturisation of components; and presumably this is something that could be almost, like, a discreet device that’s always on. And even potentially in the future integrated into your glasses and giving you descriptions through bone conduction. Do you think that’s where this would head?

Léonie: Absolutely. I hope so. I did send a tweet in Microsoft’s direction to say, look, is there any chance we’re going to get this app to work with a bluetooth camera? Because there are some pretty tiny bluetooth cameras out there already. So if you teamed that up with a bluetooth earpiece you could leave your phone in the pocket, and have access to that, attach it to a pair of glasses. It might look a little bit cumbersome initially, but not nearly as idiotic as waving your phone around in front of you.

And also not as insecure; that’s the other thing, I joke about being at conferences where I have a terrible time recognising people’s voices, even people I’ve met before, and you can’t really wander around with your phone just stretched out in front of you and hope for the best. It’s… there are privacy concerns and it’s socially awkward. But if you could sort of minimise it down to a set of glasses or something, it at least overcomes the socially awkward part—if not necessarily the privacy thing.

Peter: And that would, presumably, be kind of life changing for you—to have this thing constantly describing the environment around you.

Léonie: Oh it would be extraordinary! Absolutely extraordinary!

If you found this interesting, I suggest listening to the full episode where we talk a little more about what computer vision can add to the accessibility of virtual reality, as well as some chat with my colleagues about a computer vision prototype we worked on. You can also subscribe to our podcast on iTunes and Pocket Casts.

Also published on Medium.