There are two important concepts arriving in smartphone AR technology at the moment: believability, and persistence.
Believability comes from digital and physical objects appearing to naturally occupy a space together: for example, if you move a physical object in front of a digital object, the digital one should appear to be partly obscured. In AR parlance this is occlusion, or blending.
You can see the advantage of blending in the two photos at the top of this post: in the photo on the left I’ve disabled blending so the image of the goat appears in front of the table, flattening the depth in the picture; in the second, blending is enabled so the goat appears occluded by the table, as you’d naturally expect it to be; it’s believable.
At the start of each new year I like to clarify my thoughts by writing about a few things I think are worth keeping an eye on in the year ahead. They’re not predictions; I’m not a futurist. In previous years I’ve described these as trends, but they’re better thought of as signals. Or, even better, just some things I think are interesting.
In this article I’m going to be talking about a few current trends in digital technology as we move into 2018. It’s not a predictions piece—I’m a technologist, not a futurist. And there’s so much to talk about that this was at risk of turning into an essay, so I’ve limited it to some of the things that are interesting to me and relevant to my job, rather than the fullest/broadest scope of tech. As I said last year, it’s somewhat informed, purposely skimpy on detail, and very incomplete.
Computers with Eyes
One of the most interesting developments over the past couple of years has been in the transition from cameras to eyes; from taking pictures, to seeing. This has two parts: the first, computer vision, recognises objects in an image; the second, augmented reality, modifies the image before it reaches your eyes.
Computer vision means understanding the content of photos: who is in them and what they are doing, where they are, and what else is around. This unlocks visual search—that is, finding other images that are thematically similar to your photos, rather than visually similar (‘is this mostly blue?’ becomes ‘is this mostly sky?’).
Amazon, ASOS, eBay, and Pinterest (among others) use visual search to recommend products similar to the one you photograph (‘this picture is of a denim skirt; here is our range of denim skirts’), which helps mitigate the problem of using text input to describe the product you want. Microsoft’s Seeing AI is changing the lives of people with visual impairments by using computer vision to describe their immediate environment (‘three people at a table near a window’).
The next step for visual search is to move from classifying objects in an image to providing contextual information about them. Snapchat offers relevant filters based on the content of photos, Pinterest will start offering looks (‘this is a denim skirt; here are products which combine well with this…’). The first mass market general-purpose visual search is Google Lens which, while fairly limited now—it can recognise landmarks, books/media, and URLs/phone numbers—will get smarter through the year, with recognition of apparel and home goods already teased as coming soon.
People will begin to expect their cameras to be smarter, capable of not just capturing a scene, but understanding it. And it’s likely that expectation will be to clearly give a single answer, rather than returning pages of search results; this will lead to the diminishment of organic search, but becomes monetisable (brands can pay to have their products placed in the result). Google’s years of search experience and an expansive knowledge graph gives them a huge software lead over Apple, but I wouldn’t be surprised to see a ‘Siri Lens’ sometime—Bing also has a pretty good knowledge graph they can use.
Augmented reality, in its current form—placing digital objects into a camera image of a physical environment—has been around for a few years, without much impact on public consciousness, but has recently moved into mainstream awareness. Snapchat broke the ground with their face-changing Lenses, then using horizontal plane detection to drop animated 3D digital models into the real world (the dancing hotdog); both were subsequently copied and taken to greater scale by Facebook’s Camera Effects platform.
It’s now being pushed further by deeper integration into the phone OS (Apple’s ARKit and Google’s ARCore both take care of the complex calculations required for AR, reducing the burden on apps), and better hardware—Apple have a major lead here with the new camera setup in the iPhone X, which will doubtless come to all their models in 2018. Google need to rely on their hardware partners to provide the cameras and chips for AR, so will polyfill it with software until that happens (I strongly suspect the Pixel 3 will be heavily optimised through chips and sensors).
IKEA Place and Amazon, amongst others, are using current-stage AR technology to let you see what their products would look like in your home before you buy them. But finding use cases beyond product previews, toys (animoji, AR Stickers), and games (Pokémon Go, the forthcoming Harry Potter) will, I imagine, occupy much of the first part of the year, and possibly beyond; there is much discovery still to be done. It may require an ‘AR Cloud’—a permission/coordinate space that allows digital enhancements to be shareable and persistent, so multiple people can see the same thing, in the same place, in the same state—before it becomes really useful.
The next stage for AR is to provide a map of your immediate environment through infrared scanning—Microsoft’s HoloLens does this, and the required scanners are now in the iPhone X (Apple bought PrimeSense, whose technology powered the Kinect) although not yet enabled. This allows for digital objects to not appear overlaid in two dimensions, but to move around in a space with awareness of objects in it—this is commonly called mixed reality. This unlocks new categories, such as indoor wayfinding; Google teased this at I/O 2017 with ‘visual positioning service’ (VPS), the indoor equivalent of GPS, but this was a feature of the Tango project, which has since been wound down, and without the required hardware in Android phones Apple could leapfrog them here.
Computers with Ears
Voice recognition has improved massively in recent years, and there’s a growing acceptance among the public to interacting through voice. Voice assistants have moved from phones to smart speakers (Echo, Home, HomePod), to cars (CarPlay, Android Auto), to wrists (Apple Watch, Android Wear), to ears (AirPods, Pixel Buds). Of the major digital assistants, Google’s Assistant is much more useful than the others.
In voice-first (or -only) devices, Amazon’s Echo family has the lead in hardware sales over Google’s Home range, although Assistant has greater range thanks to its presence on Android phones. Apple’s HomePod will launch soon, but is coming in at a high price in a market being disputed at the low end (Echo Dot and Home Mini are the big sellers) and may come too late. Both Amazon and Google (and competitors such as Microsoft’s Cortana and Samsung’s Bixby) are now competing to get their assistants embedded in devices made by third-party manufacturers. All voice-first devices, however, have two major problems which they’ll need to address this year.
The first problem is discovery: with no interface, how do people know what they can do? Alexa currently has some ~25k skills on their platform, and although Google are prioritising quality over quantity (by working more closely with brands), getting found is still an issue. For now brands will still have to run off-platform advertising/awareness campaigns, although that’s likely to change (I’ll come back to that later).
The second is in being proactive; right now, both Alexa and Assistant skills are explicitly invoked, so the user has to ask if anything has changed (‘is there an update on my delivery?’). Both Amazon and Google are in the process of enabling notifications on their devices, but they will need careful consideration to avoid notification overload; it’s already considered a problem on phones, and could be worse on voice UI if you have to sit and listen to a stack of spoken notifications.
Audio recognition is capable of understanding more than the human voice. Always-on song recognition (running on-device, not sending data to servers) is a major feature of the Pixel 2, and Apple recently acquired Shazam (Siri already has a Shazam service built-in). The next stage of audio recognition will be to understand other environmental sounds (TV is an area that’s being actively explored) and provide context about what is being listened to.
Computers with Brains
With more devices becoming more capable of extracting information about the world around us, we require better tools to provide context and make decisions about what’s useful. This becomes a virtuous circle, as tools make more data, and more data makes tools more useful.
Recommendations based on visual search become more useful by knowing your taste through your photo history; not just what you wear, but your tastes in furniture, home goods… at the moment the visual search of ASOS and Pinterest give recommendations based on recognising a single product but given, say, your Instagram history, could refine your recommendations with inferences from your broader tastes (‘people who like art deco furniture tend to wear…’).
Algorithmic recommendation could help solve one of the problems facing any future mixed reality interface: as you have a potentially unlimited number of things to look at (it’s the whole world around you), how does your interface decide what is the most appropriate contextual information to provide, and who provides it for you? An app-like experience (‘open TopTable and tell me about this restaurant’) limits discovery, so it may be better to take a search engine approach, where the system tries to infer the best content to offer based on a number of ranking factors.
Mixed reality is a display problem, a sensor problem and a decision problem. Show an image that looks real, work out what’s in the world and where to put that image, and work out what image you should show. — Ben Evans.
As I mentioned earlier, voice-first/-only devices suffer from a lack of discoverability. Alexa and Google Assistant are trying to solve this using intent; if a user asks for something that the assistant doesn’t cover, it will recommend a third-party app. Google calls these implicit invocations; a voice action from, say, Nike, can suggest itself as appropriate if a user asks for running advice rather than explicitly invoking Nike by name (this works like organic search, but there’s future scope for this to be monetised like paid search using an Adwords-like system).
The Natural User Interface
With computers being more aware of what’s around them through their ‘eyes and ears’, the next step will be to bring them together: using computer vision, audio recognition and mixed reality to create meaningful, contextual connections between the physical and digital—a virtual map of the immediate environment, with an awareness and understanding of the things in it, and contextual information provoking relevant interaction with digital objects.
Placing 3D objects into a scene is one part of this, but images can also be enhanced in different ways, enriching and enlivening the world around us. We can ask the question: what would augment reality? Answers range from providing explanations and instructions of physical objects, to translating foreign languages in situ, to showing user reviews or price comparisons. With motion magnification, almost imperceptible movements (like a pulse, or a baby’s breathing) can be amplified to become visible. Really, we’re just at the start of what’s possible.
Different services, powered by machine learning—computer vision, contextual recommendations, mixed reality, and voice recognition—could eventually come together to create the post-mobile interface: understanding the physical environment and enhancing it with a contextual digital layer, and distributing it into devices beyond the phone. Whether anyone will actually achieve that in 2018 is up for debate (but unlikely).
There were signs this year that open social might have peaked. Sharing on Facebook has been declining for a couple of years, offset somewhat by increased sharing on Messenger and WhatsApp. It’s too soon to say it’s definitely peaked—or why—but certainly in the broader media narrative open social (and Facebook in particular) was blamed as the flashpoint for conflicts of the values of different groups and generations. Facebook can’t have failed to notice the decrease, and recent bouts of soul-searching led to them deprioritising articles from the News Feed (with an appropriate drop in engagement for publishers), and promoting sharing and personal updates—even to the extent of trialling a separated news feed, with all articles in a separate (hidden) view—splitting the social from the media.
None of the big open social apps do truly sequential timelines any more; Twitter and Instagram have followed Facebook by showing algorithmically sorted timelines so you don’t miss the good stuff (or, what they understand to be what you think is the good stuff). More sharing on Instagram is going into direct messages—another experiment is underway to move DMs in their own app, which would become Facebook’s fifth messaging app (after Messenger, Messenger Kids, WhatsApp, and recently purchased teen-focused app, tbh). Instagram’s Stories have been one of their successes, quickly surpassing the usage of Snapchat (from which they stole the format), although Snapchat is increasingly more popular with teens—perhaps another reason for the tbh purchase.
The Messenger (bot) platform seems to be settling around customer service, with brands (rather than services) coming to realise that it’s not a great fit for campaigns, but not always able to see another way into it. The early promise of conversational interaction in messaging has hit the reality that natural language requires a great investment in training, scripting, and testing, so bots have tended to fall back into button/prompt UI, which is often a worse experience than using a rich Web or native app interface. With many brands not willing to invest without clear return on investment there is a vicious circle (low investment, diminished experience, low user uptake, and repeat) indicating that messaging is likely to take a while longer to fulfil its promise.
Those are the major trends I’m interested in for (early) 2018, but there’s plenty more to be aware of.
Smarter and Cheaper Devices
Machine learning is increasingly being run on-device (mostly phone) rather than cloud servers. On-device ML is good for getting fast results, lowering network data usage, and improving privacy. Google’s Tensorflow Lite seems set to become the early standard for on-device learning, using pre-trained models accelerated by device APIs (Android 8.1’s Neural Networks API, iOS 11’s Core ML) Many of Apple’s iOS machine learning models, such as face recognition, are already on-device, and Google’s recent photography ‘appsperiments’ (ugh) also show that’s a way forward they’re embracing.
On-device learning combined with cheap, miniaturised hardware (a product of the smartphone boom) opens up a new category of smart, single-purpose devices. Google Clips is one example: a camera with pre-trained computer vision model that detects when an ‘interesting’ moment happens, captures it in a short video clip and sends it to your phone—no operator required.
This could extend to other phone/smart device functions, such as voice-controlled speakers that don’t require the full power of Alexa or Assistant, instead using pre-trained models to control music playback. And research repeatedly shows that some of the most-used functions on smart speakers are setting alarms and timers, and unit conversion (for cookery), so it’s not a stretch to imagine a cheap kitchen timer that has the limited smarts to carry out those core functions.
The Decline of the Ad-funding Model
The steady growth of ad-blockers indicates that users are tired of ads and—in particular—invasive tracking, leading to more device-native ad-blocking; Apple’s Safari browser recently started blocking a number of third-party tracking scripts (the impact of that is already being felt), and from early 2018 Google’s Chrome will start to blacklist sites that persistently violate the Better Ads Standards. The EU’s General Data Protection Regulation (GDPR) will come into force in early 2018, which should make it harder for companies to (legally) track users and share their data with other services. All of this may have a knock-on effect on advertising revenue (especially to those operating in murky areas who deserve punishment).
It seems strange to talk about a decline when digital ad spend continues to grow (and, in 2017, overtook TV spend for the first time), but the problem is that Google and Facebook already take around 2/3 of advertising spend, and Amazon (including Alexa) is on course to join them (as they become the de facto pre-purchase search engine). This leaves digital media publishers with less revenue, and 2017 saw businesses relying on the ad-funding model—such as Buzzfeed and Vice—facing job cuts and restructuring.
Many publishers have opted for paywalls/paygates, but these limit reach and have a natural cap—how many people can afford to pay for one or more subscriptions? A few publishers are trying reader donation services to make up for the drop in ad revenue—the Guardian and New York Times have had some success with this model. With better payment methods arriving in browsers (Apple Pay, the Payment Request API), it’s possible that some ad revenue loss could be offset by micropayments.
The UK’s Open Banking API standard rolls out in early 2018, with the EU Second Payment Services Directive (PSD2) following shortly after. The two are set to have a huge impact on banking and personal finance in Europe, bringing a wave of new banks and savings applications and shaking up the existing institutions.
As for cryptocurrencies… I have a hard time with these. The leading cryptocurrency, Bitcoin, has basically failed to meet every one of its promises, and only really works as an investment vehicle. The underlying blockchain technology promises to have more benefit, but most of them seem to be B2B—I haven’t really seen any convincing consumer use cases. One area that I am intrigued by is using them to create digital scarcity, like CryptoKitties; playful use cases can often lead to more interesting outcomes, and adding value to digital art sounds useful. For everything else… I’ll wait and see.
Although there is growing opportunity in VR gaming, I still can’t see this breaking into the mainstream. Phone-based VR has serious technical limitations to overcome, tethered headsets are too expensive and cumbersome (and don’t seem to have sold well, although recent price cuts have helped a little). The next generation standalone headsets (Oculus Go, Vive Focus, Daydream) could open the market a little more, but I still think it has to overcome its biggest problems: isolation, and requiring exceptional behaviour (it’s not as easy as watching TV or using a phone). This may be mitigated by future technology, but I can’t see any immediate signs of that happening.
There’s little point in talking about machine learning as a separate technology; it’s the fuel powering much of everything interesting that’s happening. One area of particular interest for 2018 will be authenticity: ‘fake’ images and audio generated with machine learning algorithms are getting increasingly convincing, and it seems alarmingly easy to use an adversarial network to ‘trick’ computer vision models into seeing something other than we do.
There’s little point in talking about the Web as a separate technology; it’s the data layer connecting much of everything interesting that’s happening. While the major operating systems and platforms refuse to cooperate, the Web still provides the broadest reach, especially in developing markets using lower-powered devices and without access to closed app stores. It’s interesting to see previously closed platforms like Instagram and Snapchat more willing to go to the Web as they move to scale.
Thanks for reading. If you’re interested in stories about technology’s role in culture, society, science, and philosophy, you might want to subscribe to my newsletter, The Thoughtful Net.
Twitter is awash with impressive demos of augmented reality using Apple’s ARKit or Google’s ARCore. I think it’s cool that there’s a palpable sense of excitement around AR—I’m pretty excited about it myself—but I think that there’s perhaps a little too much early hype, and that what the demos don’t show is perhaps more suggestive of the genuinely exciting future of AR.
Below is an example of the demos I’m talking about — a mockup of an AR menu that shows each of the individual dishes as a rendered 3D model, digitally placed into the environment (and I want to make clear I’m genuinely not picking on this, just using it as an illustration):
This raises a few questions, not least around delivery. As a customer of this restaurant, how do I access these models? Do I have to download an app for the restaurant? Is it a WebAR experience that I see by following a URL?
There’s so much still to be defined about future AR platforms. Ben Evans’ post, The First Decade of Augmented Reality, grapples with a lot of the issues of how AR content will be delivered and accessed:
Do I stand outside a restaurant and say ‘Hey Foursquare, is this any good?’ or does the device’s OS do that automatically? How is this brokered – by the OS, the services that you’ve added or by a single ‘Google Brain’ in the cloud?
The demo also raises important questions about utility; for example, why is seeing a 3D model of your food on a table better than seeing a 3D model in the web page you visit, or the app you download? Or, why is it better even than seeing a regular photo, or just reading the description on the menu? Do you get more information from seeing a model in AR than from any other medium?
Again, I’m not setting out to criticise the demos; I think experimentation is critical to the development of a new technology—even if, as Miesnieks points out in a separate essay, a lot of this experimentation has already happened before…
I’m seeing lots of ARKit demos that I saw 4 years ago built on Vuforia and 4 years before that on Layar. Developers are re-learning the same lessons, but at much greater scale.
But placing 3D objects into physical scenes is just one narrow facet of the greater potential of AR. When we can extract spacial data and information from an image, and also manipulate that image digitally, augmented reality becomes something much more interesting.
In Matthew Panzarino’s review of the new iPhones he talks about the Portrait Lighting feature—which uses machine learning smarts to create studio-style photography—as augmented reality. And it is.
AR isn’t just putting a virtual bird on it or dropping an Ikea couch into your living room. It’s altering the fabric of reality to enhance, remove or augment it.
The AR demos we’re seeing now are fun and sometimes impressive, but my intuition is that they’re not really representative of what AR will eventually be, and there are going to be a few interesting years until we start to see that revealed.