A range of new flagship phones got shown off at the MWC19 trade fair. At one end of the scale, Samsung introduced three variations of its premium Galaxy S10 and a new model, the Galaxy Fold, with its innovative folding screen and almost $2,000 price tag. At the other, the Wizphone WP006, a phone made only for Indonesia (where it will be sold in vending machines), costing about $7.
The WP006 is a featurephone; it has a hardware keyboard, no touchscreen, 4G connectivity, runs on KaiOS (an operating system based on the abandoned FirefoxOS project), and has a prominent microphone button—it’s a voice-forward phone, powered by Google Assistant.
At work, we talk a lot about ‘voice’; what is it good for? Is it the post-mobile platform? And our clients ask us a lot about ‘voice’, and how to build a branded app. But I’m not sure everyone is talking about the same thing; and I’m just as unsure that anyone knows what makes a really good branded ‘voice’ app. I mean, I’m fairly sure I don’t.
This article is my attempt at defining what we’re talking about when we talk about ‘voice’; and, based on my experience as a user and developer of ‘voice’, trying to nail down some of the opportunities for branded third-party apps.
In this article I’m going to be talking about a few current trends in digital technology as we move into 2018. It’s not a predictions piece—I’m a technologist, not a futurist. And there’s so much to talk about that this was at risk of turning into an essay, so I’ve limited it to some of the things that are interesting to me and relevant to my job, rather than the fullest/broadest scope of tech. As I said last year, it’s somewhat informed, purposely skimpy on detail, and very incomplete.
Computers with Eyes
One of the most interesting developments over the past couple of years has been in the transition from cameras to eyes; from taking pictures, to seeing. This has two parts: the first, computer vision, recognises objects in an image; the second, augmented reality, modifies the image before it reaches your eyes.
Computer vision means understanding the content of photos: who is in them and what they are doing, where they are, and what else is around. This unlocks visual search—that is, finding other images that are thematically similar to your photos, rather than visually similar (‘is this mostly blue?’ becomes ‘is this mostly sky?’).
Amazon, ASOS, eBay, and Pinterest (among others) use visual search to recommend products similar to the one you photograph (‘this picture is of a denim skirt; here is our range of denim skirts’), which helps mitigate the problem of using text input to describe the product you want. Microsoft’s Seeing AI is changing the lives of people with visual impairments by using computer vision to describe their immediate environment (‘three people at a table near a window’).
The next step for visual search is to move from classifying objects in an image to providing contextual information about them. Snapchat offers relevant filters based on the content of photos, Pinterest will start offering looks (‘this is a denim skirt; here are products which combine well with this…’). The first mass market general-purpose visual search is Google Lens which, while fairly limited now—it can recognise landmarks, books/media, and URLs/phone numbers—will get smarter through the year, with recognition of apparel and home goods already teased as coming soon.
People will begin to expect their cameras to be smarter, capable of not just capturing a scene, but understanding it. And it’s likely that expectation will be to clearly give a single answer, rather than returning pages of search results; this will lead to the diminishment of organic search, but becomes monetisable (brands can pay to have their products placed in the result). Google’s years of search experience and an expansive knowledge graph gives them a huge software lead over Apple, but I wouldn’t be surprised to see a ‘Siri Lens’ sometime—Bing also has a pretty good knowledge graph they can use.
Augmented reality, in its current form—placing digital objects into a camera image of a physical environment—has been around for a few years, without much impact on public consciousness, but has recently moved into mainstream awareness. Snapchat broke the ground with their face-changing Lenses, then using horizontal plane detection to drop animated 3D digital models into the real world (the dancing hotdog); both were subsequently copied and taken to greater scale by Facebook’s Camera Effects platform.
It’s now being pushed further by deeper integration into the phone OS (Apple’s ARKit and Google’s ARCore both take care of the complex calculations required for AR, reducing the burden on apps), and better hardware—Apple have a major lead here with the new camera setup in the iPhone X, which will doubtless come to all their models in 2018. Google need to rely on their hardware partners to provide the cameras and chips for AR, so will polyfill it with software until that happens (I strongly suspect the Pixel 3 will be heavily optimised through chips and sensors).
IKEA Place and Amazon, amongst others, are using current-stage AR technology to let you see what their products would look like in your home before you buy them. But finding use cases beyond product previews, toys (animoji, AR Stickers), and games (Pokémon Go, the forthcoming Harry Potter) will, I imagine, occupy much of the first part of the year, and possibly beyond; there is much discovery still to be done. It may require an ‘AR Cloud’—a permission/coordinate space that allows digital enhancements to be shareable and persistent, so multiple people can see the same thing, in the same place, in the same state—before it becomes really useful.
The next stage for AR is to provide a map of your immediate environment through infrared scanning—Microsoft’s HoloLens does this, and the required scanners are now in the iPhone X (Apple bought PrimeSense, whose technology powered the Kinect) although not yet enabled. This allows for digital objects to not appear overlaid in two dimensions, but to move around in a space with awareness of objects in it—this is commonly called mixed reality. This unlocks new categories, such as indoor wayfinding; Google teased this at I/O 2017 with ‘visual positioning service’ (VPS), the indoor equivalent of GPS, but this was a feature of the Tango project, which has since been wound down, and without the required hardware in Android phones Apple could leapfrog them here.
Computers with Ears
Voice recognition has improved massively in recent years, and there’s a growing acceptance among the public to interacting through voice. Voice assistants have moved from phones to smart speakers (Echo, Home, HomePod), to cars (CarPlay, Android Auto), to wrists (Apple Watch, Android Wear), to ears (AirPods, Pixel Buds). Of the major digital assistants, Google’s Assistant is much more useful than the others.
In voice-first (or -only) devices, Amazon’s Echo family has the lead in hardware sales over Google’s Home range, although Assistant has greater range thanks to its presence on Android phones. Apple’s HomePod will launch soon, but is coming in at a high price in a market being disputed at the low end (Echo Dot and Home Mini are the big sellers) and may come too late. Both Amazon and Google (and competitors such as Microsoft’s Cortana and Samsung’s Bixby) are now competing to get their assistants embedded in devices made by third-party manufacturers. All voice-first devices, however, have two major problems which they’ll need to address this year.
The first problem is discovery: with no interface, how do people know what they can do? Alexa currently has some ~25k skills on their platform, and although Google are prioritising quality over quantity (by working more closely with brands), getting found is still an issue. For now brands will still have to run off-platform advertising/awareness campaigns, although that’s likely to change (I’ll come back to that later).
The second is in being proactive; right now, both Alexa and Assistant skills are explicitly invoked, so the user has to ask if anything has changed (‘is there an update on my delivery?’). Both Amazon and Google are in the process of enabling notifications on their devices, but they will need careful consideration to avoid notification overload; it’s already considered a problem on phones, and could be worse on voice UI if you have to sit and listen to a stack of spoken notifications.
Audio recognition is capable of understanding more than the human voice. Always-on song recognition (running on-device, not sending data to servers) is a major feature of the Pixel 2, and Apple recently acquired Shazam (Siri already has a Shazam service built-in). The next stage of audio recognition will be to understand other environmental sounds (TV is an area that’s being actively explored) and provide context about what is being listened to.
Computers with Brains
With more devices becoming more capable of extracting information about the world around us, we require better tools to provide context and make decisions about what’s useful. This becomes a virtuous circle, as tools make more data, and more data makes tools more useful.
Recommendations based on visual search become more useful by knowing your taste through your photo history; not just what you wear, but your tastes in furniture, home goods… at the moment the visual search of ASOS and Pinterest give recommendations based on recognising a single product but given, say, your Instagram history, could refine your recommendations with inferences from your broader tastes (‘people who like art deco furniture tend to wear…’).
Algorithmic recommendation could help solve one of the problems facing any future mixed reality interface: as you have a potentially unlimited number of things to look at (it’s the whole world around you), how does your interface decide what is the most appropriate contextual information to provide, and who provides it for you? An app-like experience (‘open TopTable and tell me about this restaurant’) limits discovery, so it may be better to take a search engine approach, where the system tries to infer the best content to offer based on a number of ranking factors.
Mixed reality is a display problem, a sensor problem and a decision problem. Show an image that looks real, work out what’s in the world and where to put that image, and work out what image you should show. — Ben Evans.
As I mentioned earlier, voice-first/-only devices suffer from a lack of discoverability. Alexa and Google Assistant are trying to solve this using intent; if a user asks for something that the assistant doesn’t cover, it will recommend a third-party app. Google calls these implicit invocations; a voice action from, say, Nike, can suggest itself as appropriate if a user asks for running advice rather than explicitly invoking Nike by name (this works like organic search, but there’s future scope for this to be monetised like paid search using an Adwords-like system).
The Natural User Interface
With computers being more aware of what’s around them through their ‘eyes and ears’, the next step will be to bring them together: using computer vision, audio recognition and mixed reality to create meaningful, contextual connections between the physical and digital—a virtual map of the immediate environment, with an awareness and understanding of the things in it, and contextual information provoking relevant interaction with digital objects.
Placing 3D objects into a scene is one part of this, but images can also be enhanced in different ways, enriching and enlivening the world around us. We can ask the question: what would augment reality? Answers range from providing explanations and instructions of physical objects, to translating foreign languages in situ, to showing user reviews or price comparisons. With motion magnification, almost imperceptible movements (like a pulse, or a baby’s breathing) can be amplified to become visible. Really, we’re just at the start of what’s possible.
Different services, powered by machine learning—computer vision, contextual recommendations, mixed reality, and voice recognition—could eventually come together to create the post-mobile interface: understanding the physical environment and enhancing it with a contextual digital layer, and distributing it into devices beyond the phone. Whether anyone will actually achieve that in 2018 is up for debate (but unlikely).
There were signs this year that open social might have peaked. Sharing on Facebook has been declining for a couple of years, offset somewhat by increased sharing on Messenger and WhatsApp. It’s too soon to say it’s definitely peaked—or why—but certainly in the broader media narrative open social (and Facebook in particular) was blamed as the flashpoint for conflicts of the values of different groups and generations. Facebook can’t have failed to notice the decrease, and recent bouts of soul-searching led to them deprioritising articles from the News Feed (with an appropriate drop in engagement for publishers), and promoting sharing and personal updates—even to the extent of trialling a separated news feed, with all articles in a separate (hidden) view—splitting the social from the media.
None of the big open social apps do truly sequential timelines any more; Twitter and Instagram have followed Facebook by showing algorithmically sorted timelines so you don’t miss the good stuff (or, what they understand to be what you think is the good stuff). More sharing on Instagram is going into direct messages—another experiment is underway to move DMs in their own app, which would become Facebook’s fifth messaging app (after Messenger, Messenger Kids, WhatsApp, and recently purchased teen-focused app, tbh). Instagram’s Stories have been one of their successes, quickly surpassing the usage of Snapchat (from which they stole the format), although Snapchat is increasingly more popular with teens—perhaps another reason for the tbh purchase.
The Messenger (bot) platform seems to be settling around customer service, with brands (rather than services) coming to realise that it’s not a great fit for campaigns, but not always able to see another way into it. The early promise of conversational interaction in messaging has hit the reality that natural language requires a great investment in training, scripting, and testing, so bots have tended to fall back into button/prompt UI, which is often a worse experience than using a rich Web or native app interface. With many brands not willing to invest without clear return on investment there is a vicious circle (low investment, diminished experience, low user uptake, and repeat) indicating that messaging is likely to take a while longer to fulfil its promise.
Those are the major trends I’m interested in for (early) 2018, but there’s plenty more to be aware of.
Smarter and Cheaper Devices
Machine learning is increasingly being run on-device (mostly phone) rather than cloud servers. On-device ML is good for getting fast results, lowering network data usage, and improving privacy. Google’s Tensorflow Lite seems set to become the early standard for on-device learning, using pre-trained models accelerated by device APIs (Android 8.1’s Neural Networks API, iOS 11’s Core ML) Many of Apple’s iOS machine learning models, such as face recognition, are already on-device, and Google’s recent photography ‘appsperiments’ (ugh) also show that’s a way forward they’re embracing.
On-device learning combined with cheap, miniaturised hardware (a product of the smartphone boom) opens up a new category of smart, single-purpose devices. Google Clips is one example: a camera with pre-trained computer vision model that detects when an ‘interesting’ moment happens, captures it in a short video clip and sends it to your phone—no operator required.
This could extend to other phone/smart device functions, such as voice-controlled speakers that don’t require the full power of Alexa or Assistant, instead using pre-trained models to control music playback. And research repeatedly shows that some of the most-used functions on smart speakers are setting alarms and timers, and unit conversion (for cookery), so it’s not a stretch to imagine a cheap kitchen timer that has the limited smarts to carry out those core functions.
The Decline of the Ad-funding Model
The steady growth of ad-blockers indicates that users are tired of ads and—in particular—invasive tracking, leading to more device-native ad-blocking; Apple’s Safari browser recently started blocking a number of third-party tracking scripts (the impact of that is already being felt), and from early 2018 Google’s Chrome will start to blacklist sites that persistently violate the Better Ads Standards. The EU’s General Data Protection Regulation (GDPR) will come into force in early 2018, which should make it harder for companies to (legally) track users and share their data with other services. All of this may have a knock-on effect on advertising revenue (especially to those operating in murky areas who deserve punishment).
It seems strange to talk about a decline when digital ad spend continues to grow (and, in 2017, overtook TV spend for the first time), but the problem is that Google and Facebook already take around 2/3 of advertising spend, and Amazon (including Alexa) is on course to join them (as they become the de facto pre-purchase search engine). This leaves digital media publishers with less revenue, and 2017 saw businesses relying on the ad-funding model—such as Buzzfeed and Vice—facing job cuts and restructuring.
Many publishers have opted for paywalls/paygates, but these limit reach and have a natural cap—how many people can afford to pay for one or more subscriptions? A few publishers are trying reader donation services to make up for the drop in ad revenue—the Guardian and New York Times have had some success with this model. With better payment methods arriving in browsers (Apple Pay, the Payment Request API), it’s possible that some ad revenue loss could be offset by micropayments.
The UK’s Open BankingAPI standard rolls out in early 2018, with the EUSecond Payment Services Directive (PSD2) following shortly after. The two are set to have a huge impact on banking and personal finance in Europe, bringing a wave of new banks and savings applications and shaking up the existing institutions.
As for cryptocurrencies… I have a hard time with these. The leading cryptocurrency, Bitcoin, has basically failed to meet every one of its promises, and only really works as an investment vehicle. The underlying blockchain technology promises to have more benefit, but most of them seem to be B2B—I haven’t really seen any convincing consumer use cases. One area that I am intrigued by is using them to create digital scarcity, like CryptoKitties; playful use cases can often lead to more interesting outcomes, and adding value to digital art sounds useful. For everything else… I’ll wait and see.
Although there is growing opportunity in VR gaming, I still can’t see this breaking into the mainstream. Phone-based VR has serious technical limitations to overcome, tethered headsets are too expensive and cumbersome (and don’t seem to have sold well, although recent price cuts have helped a little). The next generation standalone headsets (Oculus Go, Vive Focus, Daydream) could open the market a little more, but I still think it has to overcome its biggest problems: isolation, and requiring exceptional behaviour (it’s not as easy as watching TV or using a phone). This may be mitigated by future technology, but I can’t see any immediate signs of that happening.
There’s little point in talking about machine learning as a separate technology; it’s the fuel powering much of everything interesting that’s happening. One area of particular interest for 2018 will be authenticity: ‘fake’ images and audio generated with machine learning algorithms are getting increasingly convincing, and it seems alarmingly easy to use an adversarial network to ‘trick’ computer vision models into seeing something other than we do.
There’s little point in talking about the Web as a separate technology; it’s the data layer connecting much of everything interesting that’s happening. While the major operating systems and platforms refuse to cooperate, the Web still provides the broadest reach, especially in developing markets using lower-powered devices and without access to closed app stores. It’s interesting to see previously closed platforms like Instagram and Snapchat more willing to go to the Web as they move to scale.
Thanks for reading. If you’re interested in stories about technology’s role in culture, society, science, and philosophy, you might want to subscribe to my newsletter, The Thoughtful Net.
Alright, stand back everyone: I’m about to have some opinions about technology in 2017. Because obviously there’s been a shortage of those.
As part of my Technologist role at +rehabstudio I put together internal briefings about digital media, consumer technology, where the digital marketing industry could go in the near future, and what we should be communicating to our clients. Not trying to make predictions, but to follow trends.
This article is based on my latest briefing. It’s somewhat informed, purposely skimpy on detail, and very incomplete: I have some thoughts on advertising and publishing that I can’t quite distil yet, and machine learning is a vast surface that I can barely scratch.
If for nothing more than press coverage, 2016 was the year of messaging, and the explosion of the messaging bot. The biggest player in the game, Facebook’s Messenger, launched their bot platform in April, and by November some 33,000 bots had been released. Recent tools added to the platform include embedded webviews, HTML5 games, and in-app payments.
The first six months of bots were largely the ‘fart app’ stage, but there are signs that brands and services are finally starting to see the real opportunities in messaging: removing friction from their users’ interactions with them. Friction in app management and UI complexity, for example.
“OK Google, play Stranger Things from Netflix on My TV.”
Home assistants make the smart home easier to manage. No more separate apps for Wemo, Hue, Nest, etc; a single voice interface (perhaps glued together with a cloud service like IFTT) controls all the different devices in your home.
The app only appears in a particular context when necessary and in the format which is most convenient for the user.
While native mobile apps are still a growth area, it’s becoming much harder to get users to download and engage with apps outside of a small popular core. This is especially true for retail, where consumers are more omnivorous and like to browse widely.
Improvements in the capabilities of web apps (especially on Chrome for Android) suggest an alternative to native apps in some cases. This has been demonstrated by the success of new web apps from major retail brands like Flipkart and Ali Baba in developing economies where an official app store may not be available, or network costs may make app downloads undesirable.
Web apps require no installation, avoiding the app store problem. They’re starting to get important features like push notifications and payment APIs. And messaging platforms, with their large installed user base, provide the web with a social and distribution layer that the browser never did:
Messaging apps and social networks [are] wrappers for the mobile web. They’re actually browsers… [and] give us the social context and connections we crave, something traditional browsers do not.
Many brands are finding that their mobile apps are not paying off.
The most important app on your phone could be the camera, which will be increasingly important this year. First, by revealing the ‘dark matter’ of the internet: images, video and sound. So much of this data is uploaded every day, but without the semantic value of text, it’s meaning is lost to non-humans — like search engines, for example. But machine learning is becoming very good at understanding the content of this opaque data, meaning the role of the camera changes:
It’s not really a camera, taking pictures; it’s an eye, that can see.
It can see faces, landmarks, logos, objects; hear background chat and music. That’s understanding context, location, purchase history, and behaviour, without being explicitly told anything. This is why Facebook, through Messenger and Instagram, are furiously copying Snapchat’s best features: they want their young audience and the data they bring.
Will it be intrusive? Yes. Will it happen? Yes. I’ve tried to avoid making hard predictions in this piece, but I am as confident as I can be that our image and video history will be used for marketing data.
Cameras will also be important in altering the images that are shown to the users. Augmented reality is an exciting technology, although good-enough dedicated hardware is still a while away. But there’s a definite market drift in that direction, and leading it is Snapchat: they’re stealthily introducing AR through modifying the base layer of reality—first, by altering faces using their lenses. This isn’t frivolous; it’s expanding the range of digital communication, like emoji do for text.
If people are talking in pictures, they need those pictures to be capable of expressing the whole range of human emotion.
Recent Snapchat lenses have started altering voices, and your environment. They’ve recently bought a company that specialises in adding 3D objects into real environments. With Spectacles they’re not only removing friction from the process of taking a photo, they’re prototyping hardware at scale. This is the road to AR. Snap Inc. want to be the camera company — not in the way that Nikon was, but in the way that Facebook is the social company.
The companion to an augmented reality is a virtual one, but I don’t believe we’ll see VR going mainstream in 2017—and I say that as a proponent. It’s static, isolating, and it requires people to form a new behaviour. It’s interesting to see creators experiment with the form, and I’ve no doubt that we’ll see some very interesting experiences launched this year. But domestic sales aren’t huge, and high-end units are too expensive, and low-end not quite up to scratch yet. Still think it will be big for gamers, though.
I have more. A lot more. But I think it will all be better explained in a series of subsequent blog posts, so I’ll aim to do that. In the meantime, would love to hear your thoughts, arguments, objections, and conclusions.