性视界

The Digital Data Design Institute at 性视界 is now the 性视界 Business School AI Institute.

The Future of News is Humans Talking to Machines

This post was originally published by Trushar Barot for 性视界’s Nieman Lab.

This year, the iPhone turned 10. Its launch heralded a new era in audience behavior that fundamentally changed how news organizations would think about how their work is discovered, distributed and consumed.

This summer, as a Knight Visiting Nieman Fellow at 性视界, I鈥檝e been聽聽I think could lead to a similar step change in how publishers relate to their audiences: AI-driven voice interfaces, such as Amazon鈥檚 Alexa, Google鈥檚 Home and Assistant, Microsoft鈥檚 Cortana, and Apple鈥檚 upcoming HomePod. The more I鈥檝e spoken to the editorial and technical leads building on these platforms in different news organizations, as well as the tech companies developing them, the more I鈥檝e come to this view:聽This is potentially bigger than the impact of the iPhone.聽In fact, I鈥檇 describe these smart speakers and the associated AI and machine learning that they鈥檒l interface with as the huge聽聽the news industry doesn鈥檛 even know it鈥檚 standing on.

This wasn鈥檛 how I planned to open this piece even a week before my Nieman fellowship ended. But as I tied together the research I鈥檇 done with the conversations I鈥檇 had with people across the industry, something became clear: As an industry, we鈥檙e far behind the thinking of the technology companies investing heavily in AI and machine learning. Over the past year, the CEOs of Google, Microsoft, Facebook, and other global tech giants have all said, in different ways, that they now run聽聽companies. I can鈥檛 remember a single senior news exec ever mentioning AI and machine learning at any industry keynote address over the same period.

Of course, that鈥檚 not necessarily surprising. 鈥淲e鈥檙e not technology companies鈥 is a refrain I鈥檝e heard a lot. And there are plenty of other important issues to occupy industry minds: the rise of fake news, continued uncertainty in digital advertising, new tech such as VR and AR, and the ongoing conundrum of responding to the latest strategic moves of Facebook.

But as a result of all these issues, AI is largely being missed as an industry priority; to switch analogies, it feels like we鈥檙e the聽, not perceiving the danger to itself until it鈥檚 too late to jump out.

鈥淚n all the speeches and presentations I鈥檝e made, I鈥檝e been shouting about voice AI until I鈥檓 blue in the face. I don鈥檛 know to what extent any of the leaders in the news industry are listening,鈥 futurist and author聽聽told me. As she put it in聽聽she wrote for Nieman Reports recently:

Talking to machines, rather than typing on them, isn鈥檛 some temporary gimmick. Humans talking to machines 鈥 and eventually, machines talking to each other 鈥 represents the next major shift in our news information ecosystem. Voice is the next big threat for journalism.

My original goal for this piece was to share what I鈥檇 learned 鈥 examples of what different newsrooms are trying with smart speakers and where the challenges and opportunities lie. There鈥檚 more on all that below. But I first want to emphasize the critical and urgent nature of what the news industry is about to be confronted with, and how 鈥 if it鈥檚 not careful 鈥 it鈥檒l miss the boat just as it did when the Internet first spread from its academic cocoon to the rest of the world. Later, I鈥檒l share how I think the news industry can respond.

Talking to objects isn鈥檛 weird any more

In the latest version of her聽annual digital trends report, Kleiner Perkins鈥 Mary Meeker revealed that 20 percent of all Google search was now happening through voice rather than typing. Sales of smart speakers like Amazon鈥檚 Echo were also increasing fast:

It鈥檚 becoming clear that users are finding it useful to interact with devices through voice. 鈥淲e鈥檙e treating voice as the third wave of technology, following the point-and-click of PCs and touch interface of smartphones,鈥澛, a media strategist at Associated Press, told me. He recently coauthored聽AP鈥檚 report on how artificial intelligence will impact journalism. The report gives some excellent insights into the broader AI landscape, including automation of content creation, data journalism through machine learning, robotic cameras, and media monitoring systems. It highlighted smart speakers as a key gateway into the world of AI.

Since the release of the Echo, a number of outlets have tried to learn what content works (or doesn鈥檛) on this class of devices. Radio broadcasters have been at an understandable advantage, being able to adapt their content relatively seamlessly.

In the U.S., NPR was among the first launch partners on these platforms.聽, a senior product manager at NPR working on voice AI, described聽聽as 鈥渢he gateway to NPR鈥檚 content.鈥

鈥淲e鈥檙e very bullish on the opportunity with voice,鈥 Hamano said. She cited聽 showing 32 percent of people aged 18 to 34 don鈥檛 own a radio in their home 鈥 鈥渨hich is a terrifying stat when you鈥檙e trying reach and grow audience. These technologies allow NPR to fit into their daily routine at home 鈥 or wherever they choose to listen.鈥

NPR was available at launch on the Echo and Google Home, and will be soon on Apple鈥檚 HomePod. 鈥淲e think of the newscast as the gateway to the rest of NPR鈥檚 news and storytelling,鈥 she said. 鈥淚t鈥檚 a low lift for us to get the content we already produce onto these platforms. The challenge is finding the right content for this new way of listening.鈥

The API that drives NPR made it easy for Hamano鈥檚 team to integrate the network鈥檚 content into Amazon鈥檚 system. NPR鈥檚 skills 鈥 the voice-driven apps that Amazon鈥檚 voice assistant Alexa recognizes 鈥 can respond to requests like 鈥淎lexa, ask NPR One to recommend a podcast鈥 or 鈥淎lexa, ask NPR One to play Hidden Brain.鈥

Voice AI: What鈥檚 happening now

  • Flash briefings (e.g., NPR, BBC, CNN)
  • Podcast streaming (e.g., NPR)
  • News quizzes (e.g., The Washington Post)
  • Recipes and cooking aide (e.g., Hearst)

The Washington Post 鈥 owned by Amazon CEO Jeff Bezos 鈥 is also an early adopter in running a number of experiments on Amazon鈥檚 and Google鈥檚 smart speaker platforms. Senior product manager聽聽has been leading this work. 鈥淚 think we鈥檙e at the early stages of what I鈥檇 call ambient computing 鈥 technology that reduces the 鈥榝riction鈥 between what we want and actually getting it in terms of our digital activity,鈥 he said. 鈥淚t will actually mean we鈥檒l spend less time being distracted by technology, as it effectively recedes into the background as soon as we are finished with it. That鈥檚 the starting point for us when we think about what voice experiences will work for users in this space.鈥

Not being a radio broadcaster, the Post has had to experiment with different forms of audio 鈥 from using Amazon鈥檚 Alexa automated voices on stories from its website to a Post reporter sharing a particular story in their own voice. Other experiments have included launching an Olympics skill, where users could ask the Post who had won medals during last year鈥檚 Olympics. That was an example of something that didn鈥檛 work, though聽鈥斅燗mazon built the same capability into the main Alexa platform soon afterwards itself.

鈥淭hat was a really useful lesson for us,鈥 Price said. 鈥淲e realized that in big public events like these, where there鈥檚 an open data set about who has won what, it made much more sense for a user to just ask Alexa who had won the most medals, rather than specifically asking The Washington Post on Alexa the same question.鈥 That鈥檚 a broader lesson: 鈥淲e have to think about what unique or exclusive information, content, or voice experience can The Washington Post specifically offer that the main Alexa interface can鈥檛.鈥

One area that Price鈥檚 team is currently working on is the upcoming release of聽聽on both Amazon鈥檚 Alexa and Google鈥檚 Home platforms. For instance, if there鈥檚 breaking news, the Post will be able to make a user鈥檚 Echo chime and flash green, at which point the user can ask 鈥淎lexa, what did I miss?鈥 or 鈥淎lexa, what are my notifications?鈥 Users will have to opt in before getting alerts to their device, and they鈥檒l be able to disable alerts temporarily through a do-not-disturb mode.

Publishers like the Post that produce little or no native audio content have to work out the right way of presenting their text-based content on a voice-driven platform. One option is to allow Alexa to read stories that have been published; that鈥檚 easy to scale up. The other is getting journalists to voice articles or columns or create native audio for the platform. That鈥檚 much more difficult to scale, but several news organizations told me initial audience feedback suggests this is users鈥 preferred experience.

For TV broadcasters like CNN, early experiments have focused on trying to figure out when their users would most want to listen to a bulletin 鈥 as opposed to watching one 鈥 and how much time they might have specifically to do so via a smart speaker.聽, a senior editor at CNN Digital, has been leading the work on developing flash-briefing content for these platforms.

鈥淲e assumed some users would have their device in the kitchen,鈥 she said. 鈥淭his led us to ask, what are users probably doing in the kitchen in the morning? Making breakfast. How long does it take to make a bagel? Five minutes. So that鈥檚 probably the amount of time a user has to listen to us, so let鈥檚 make sure we can update them in less than five minutes. For other times of the day, we tried to understand what users might be doing: Are they doing the dishes? Are they watching a commercial break on TV or brushing their teeth? We know that we鈥檙e competing against a multitude of options, so what sets us apart?鈥

With Amazon鈥檚 recent release of the Echo Show 鈥 which has a built-in screen 鈥 CNN is taking the 鈥渂agel time鈥 philosophy to developing a dedicated video news briefing at the same length as its audio equivalent.

CNN is also thinking hard about when notifications will and won鈥檛 work. 鈥淚f you send a notification at noon, but the user doesn鈥檛 get home until 6 p.m., does it make sense for them to see that notification?鈥 Johnson asked. 鈥淲hat do we want our users to hear when they come home? What content do we have that makes sense in that space, at that time? We already consider the CNN mobile app and Apple News alerts to be different, as are banners on CNN.com 鈥 they each serve different purposes. Now, we have to figure out how to best serve the audience alerts on these voice-activated platforms.鈥

What鈥檚 surprised many news organizations is how broad the age range of their audiences are on smart speakers. Early adopters in this space are very different from early adopters of other technologies. Many didn鈥檛 buy these smart speakers themselves, but were given them as gifts, particularly around Christmas. The fact there鈥檚 very little learning curve to use them means the technical bar is much lower. Speaking to the device is intuitive.

Edison Research was聽聽to find out more about what these users are doing with these devices. Music was unsurprisingly at the top of the reasons why they use these devices, but coming in second was to 鈥渁sk questions without needing to type.鈥 Also high up was an interest to listen to news and information 鈥 encouraging for news organizations.

While screens aren鈥檛 going away 鈥 people will always want to see and touch things 鈥 there鈥檚 no doubt that voice as an interface for devices is already becoming ingrained as a natural behavior among our audiences. If you鈥檙e not convinced, watch children interact with smart speakers: Just as we鈥檝e seen the first Internet-connected generation grow up, we鈥檙e about to see the 鈥渧oice generation鈥 arrive feeling completely at ease with this way of engaging with technology.

The NPR鈥揈dison research has also highlighted this trend. Households with kids that have smart speakers say engagement is high with these devices. Unlike phones or tablet, smart speakers are communal experiences 鈥 which also raises the likelihood of families spending time together, whether for education or entertainment purposes.

(It鈥檚 worth noting here that there have been some concerns raised about whether children asking for 鈥 or demanding 鈥 content from a device without saying 鈥減lease鈥 or 鈥渢hank you鈥 could have downsides. As San Francisco VC and dad聽聽last year: 鈥淎mazon Echo is magical. It鈥檚 also turning my kid into an asshole.鈥 To curb this, skills or apps for children could be designed in the future with voice responses requiring politeness.)

For the BBC, where I work, developing a voice-led digital product for children is an exciting possibility. It already has considerable experience of content for children on TV, radio, online and digital.

鈥淥ffering the ability to seamlessly navigate our rich content estate represents a great opportunity for us to forge a closer relationship with our audience and to serve them better,鈥澛, senior distribution manager at the BBC, said. 鈥淭he current use cases for voice suggest there is demand that sits squarely in the content areas where we consistently deliver on our ambitions 鈥 radio, news, and children鈥檚 genres.鈥

BBC News recently formed a working group to rapidly develop prototypes for new forms of digital audio using voice as the primary interface. Expect to hear more about this in the near future.

Rosenberg also highlights studies that have found voice AI interfaces appeared to significantly increase consumption of audio content. This is something that came out strongly in the NPR-Edison research too:

Owning a smart speaker can lead to a sizeable increase in consumption of music, news and talk content, podcasts, and audiobooks. Media organizations that have such content have a real opportunity if they can figure out how to make it as easily accessible through these devices as possible. That鈥檚 where we get to the tricky part.

Challenges: Discovery, distribution, analytics, monetization

In all the conversations I鈥檝e had with product and editorial teams working on voice within news organizations, the biggest issue that comes up repeatedly is discovery: How do users get to find the content, either as a skill or app, that鈥檚 available to them?

With screens, those paths to discovery are relatively straightforward: app stores, social media, websites. These are tools most smartphone users have learned to navigate pretty easily. With voice, that鈥檚 more difficult: While聽聽can help you navigate what a smart speaker can do, in most cases, that isn鈥檛 the natural way users will want to behave.

If I was to say: 鈥淗ey Alexa/Google/Siri, what鈥檚 in the news today?鈥 鈥 what are these voice assistants doing in the background to deliver back to me an appropriate response? Big news brands have a distinct advantage here. In the U.K., most users who want news are very likely to ask for the BBC. In the U.S., it might be CNN or NPR. It will be more challenging for news brands that don鈥檛 have a natural broadcast presence to immediately come to the mind of users when they talk to a smart speaker for the first time; how likely is it that a user wanting news would first think of a newspaper brand on these devices?

Beyond that, there鈥檚 still a lot of work to be done by the tech platforms to make discovery and navigation easier. In my conversations with them, they鈥檝e made it clear they鈥檙e acutely aware of that and are working hard to do so. At the moment, when you set up a smart speaker, you set preferences through the accompanying mobile app, including prioritizing the sources of content you want 鈥 whether for music, news, or something else. There are plenty of skills or apps you can add on. But as聽, app product manager at Quartz, put it: 鈥淗ow would you remember how to come back to it? There are no screens to show you how to navigate back and there are no standard voice commands that have emerged to make that process easier to remember.鈥

Another concern that came up frequently: the lack of industry standards for voice terms or tagging and marking up content that can be used by these smart speakers. These devices have been built with聽, so they can understand normal speech patterns and derive instructional meaning for them. So 鈥淎lexa, play me some music from Adele鈥 should be understood in the same way as 鈥淎lexa, play Adele.鈥 But learning to use the right words can still sometimes be a puzzle. One solution is likely to be improving the introductory training that starts up when a smart speaker is first connected. It鈥檚 a very rudimentary experience so far, but over the next few months, this should improve 鈥 giving users a clearer idea of how they can know what content is available, how they can skip to the next thing, go back, or go deeper.

Voice AI: Challenges

  • Discoverability
  • Navigation
  • Consistent taxonomies
  • Data analytics/insights
  • Monetization
  • Having a 鈥渟ound鈥 for your news brand

, corporate vice president at聽, which develops its own AI interface聽,听聽recently: 鈥淲eb pages, for example, all have back buttons and they do searches. Conversational apps need those same primitives. You need to be like, 鈥極kay, what are the five things that I can always do predictably?鈥 These understood rules are just starting to be determined.鈥

For news organizations building native experiences for these platforms, a lot of work will need to be done in rethinking the taxonomy of content. How can you tag items of text, audio, and video to make it easy for voice assistants to understand their context and when each item would be relevant to deliver to a user?

The AP鈥檚 Marconi described what they鈥檙e already working on and where they want to get to in this space:

At the moment, the industry is tagging content with standardized subjects, people, organizations, geographic locations and dates, but this can be taken to the next level by finding relationships between each tag. For example, AP developed a robust tagging system called聽聽which is designed to organically evolve with a news story as it moves through related news cycles.

Take the 2016 water crisis in Flint, Michigan, for example. Until it became a national story, Flint hadn鈥檛 been associated with pollution, but as soon as this story became a recurrent topic of discussion, AP taxonomists wrote rules to be able to automatically tag and aggregate any story related to Flint or any general story about water safety moving forward. The goal here is to assist reporters to build greater context in their stories by automating the tedious process often found in searching for related stories based on a specific topic or event.

The next wave of tagging systems will include identifying what device a certain story should be consumed on, the situation, and even other attributes relating to emotion and sentiment.

As voice interfaces move beyond just smart speakers to all the devices around you, including cars and smart appliances, Marconi said the next wave of tagging could identify new entry points for content: 鈥淭hese devices will have the ability to detect a person鈥檚 situation and well as their state of mind at a particular time, enabling them to determine how they interact with the person at that moment. Is the person in an Uber on the way to work? Are they chilling out on the couch at home or are they with family? These are all new types of data points that we will need to start thinking about when tagging our content for distribution in new platforms.鈥

This is where industry-wide collaboration to develop these standards is going to be so important 鈥 these are not things that will be done effectively in the silos of individual newsrooms. Wire services like AP, who serve multiple news clients, could be in an influential position to help form these standards.

Audience data and measuring success

As with so many new platforms that news organizations try out, there鈥檚 an early common complaint:聽We don鈥檛 have enough data about what we鈥檙e doing and we don鈥檛 know enough about our users.聽From the dozen or so news organizations I鈥檝e talked to, nearly all raised similar issues in getting enough data to understand how effective their presence on these platforms was. A lot seems to depend on the analytics platform that they use on their existing websites and how easy it is to integrate into Amazon Echo and Google Home systems. Amazon and Google provide some data and though it鈥檚 basic at this stage, it is likely to improve.

With smart speakers, there are additional considerations to be made beyond the standard industry metrics of unique users, time spent and engagement. What, for example, is a good engagement rate 鈥 the length of time someone talks to these devices? The number of times they use the particular skill/app? Another interesting possibility that could emerge in the future is being able to measure the sentiment behind the experience a user has after trying out a particular skill/app through the tone of their voice. It may be possible in future to tell whether a user sounded happy, angry or frustrated 鈥 metrics that we can鈥檛 currently measure with existing digital services.

And if these areas weren鈥檛 challenging enough, there鈥檚 then the 鈥淢鈥 word to think about鈥

Money, money, money

How do you monetize on these platforms? Understandably, many news execs will be cautious in placing any big bets of new technologies unless there is a path they can see towards future audience reach or revenue (ideally both). For digital providers, there would be a natural temptation to try and figure out how these voice interfaces could help drive referrals or subscriptions. However, a more effective way of looking at this would be through the experience of radio. Internal research commissioned by some radio broadcasters that I鈥檝e seen suggests users of smart speakers have a very high recall rate of hearing adverts while listening to radio being streamed on these devices. As many people are used to hearing ads in this way, it could mean they will have a higher tolerance level to such ads via smart speakers compared to pop-up ads on websites.

One of the first ad networks developed for voice assistants by VoiceLabs gave some early indicators to how advertising could work on these devices in the future 鈥 with interactive advertising that converses with uses. After a recent update on its terms by Amazon, VoiceLabs聽. Amazon鈥檚 updated terms still allow for advertising within 鈥渇lash briefings鈥, podcasts and streaming skills.

Another revenue possibility is if smart speakers 鈥 particularly Amazon鈥檚 at this stage 鈥 are hard wired into shopping accounts. Any action a user takes that leads to a purchase after hearing a broadcast or interacting with a voice assistant could lead to additional revenue streams.

For news organizations that don鈥檛 have much broadcast content and are more focused online, the one to watch is the Washington Post. I鈥檇 expect to see it do some beta testing of different revenue models through its close relationship with Amazon over the coming months, which could include a mix of sponsored content, in-audio ads and referral mechanisms to its website and native apps. These and other methods are likely to be offered by Amazon to partners for testing in the near future too.

Known unknowns and unknown unknowns

While some of the challenges 鈥 around discovery, tagging, monetization 鈥 are getting pretty well defined as areas to focus on, there are a number of others that could lead to fascinating new voice experiences 鈥 or could lead down blind alleys.

There are some who think that a really native interactive voice experience will require news content to replicate the dynamics of a normal human conversation. So rather than just hearing a podcast or news bulletin, a user could have a conversation with a news brand. What could that experience be? One example could be looking at how users could speak to news presenters or reporters.

Rather than just listening to a CNN broadcast, could a user have a conversation with Anderson Cooper? It wouldn鈥檛 have to be the聽actual聽Anderson Cooper, but it could be a CNN app with his voice and powered by natural language processing to give it a bit of Cooper鈥檚 personality. There could be similar experiences that could be developed for well known presenters and pundits for sports broadcasters. This would retain the clear brand association while also giving a unique experience that could only happen through these interfaces.

Another example could be entertainment shows that could bring their audience into their programmes, quite literally. Imagine a reality TV show where rather than having members of the public performing on stage, they simply connect to them through their home smart speakers via the internet and get them to do karaoke from home. With screens and cameras coming to some of these smart speakers (eg the Amazon Echo Show and Echo Look), TV shows could link up live into the homes of their viewers. Some UK TV viewers of a certain age may聽聽(warning, link to Noel鈥檚 House Party) .

Voice AI: Future use cases

  • Audiences talking to news/media personalities
  • Bringing audiences into live shows directly from their homes
  • Limited lifespan apps/skills for live events (e.g. election)
  • Time-specific experiences (e.g. for when you wake up)
  • Room-optimized apps/skills for specific home locations

Say that out loud

Both Amazon and Google have been keen to emphasize the importance of a news brands getting their 鈥渟ound鈥 right. While it may be easy to integrate the sound identity for radio and TV broadcasters, it will be something that print and online players will have to think carefully about.

The name of the actual skill/app that a news brand creates will also need careful consideration. The Amazon skill for the news site Mic (pronounced 鈥渕ike鈥) is named 鈥淢ic Now鈥, rather than just Mic 鈥 as otherwise Alexa would find difficult to distinguish from a microphone. The clear advice is: stay away from generic sounding services on these platforms, keep the sound distinct.

Apart from having these established branded news services on these platforms, we could start to see experimentation with hyper-specific of limited lifespan apps. There is increasing evidence to suggest that as these speakers appear not just in the living room (their most common location currently), but also in kitchens, bathrooms and bedrooms, apps could be developed to work primarily based on those locations.

Hearst Media has already successfully rolled out a cooking and recipe app on Alexa for one of its magazines, intended for use specifically in the kitchen to help people cook. Bedtime stories or lullaby apps could be launched to help children fall asleep in their bedrooms. Industry evidence is emerging to suggest that the smart speaker could replace the mobile phone as the first and last device we interact with each day. Taking advantage of this, could there be an app that is designed specifically to engage you in the first one or two minutes after your eyes open in the morning and before you get out of bed? Currently a common behaviour is to pick up the phone and check your messages and social media feed. Could that be replaced with you first talking to your smart speaker when waking up instead?

Giving voice to a billion people

While these future developments are certainly interesting possibilities, there is one thing I find incredibly exciting: the transformative impact voice AI technology could have in emerging markets and the developing world. Over the next three or four years, a billion people 鈥 often termed 鈥渢he next billion鈥 鈥 will connect to the internet for the first time in their lives. But just having a phone with an internet connection itself isn鈥檛 going to be that useful 鈥 as they will have no experience of knowing how to navigate a website, use search or any of the online services we take for granted in the west. What could be genuinely transformative though is if they are greeted with a voice-led assistant speaking to them in their language and talking them through how to use their new smartphone and help them navigate the web and online services.

Many of the big tech giants know there is a big prize for them if they can help connect these next billion users. There are a number of efforts from the likes of Google and Facebook to make internet access easier and cheaper for such users in the future. However, none of the tech giants are currently focused on developing their voice technology to these parts of the world, where literacy levels are lower and oral traditions are strong 鈥 a natural environment where Voice AI technology would thrive, if the effort to develop it in non-English languages is made. Another big problem is that all the聽聽that voice AI will be built on currently is dominated by English datasets, with very little being done in other languages.

Some examples of what an impact voice assistants on phones could have to these 鈥渘ext billion鈥 users in the developing world include:

Voice AI: Use cases for the 鈥渘ext billion鈥

  • Talking user through how to use phone functions for the first time
  • Setting voice reminders for taking medicines on time
  • Reading out text after pointing at signs/documents
  • Giving weather warnings and updating on local news

There will be opportunities here for news organizations to develop voice-specific experiences for these users, helping to educate and inform them of the world they live in. Considering the huge scale of potential audiences that could be tapped into as a result, it offers a huge opportunity to those news organizations positioned to work on this. This is an area I鈥檒l continue to explore in personal capacity in the coming months 鈥 do聽聽if you have ideas.

Relationship status: It鈥檚 complicated

Voice interfaces are still very new and as a result there are ethical grey areas that will come more to the fore as they mature. One of the most interesting findings from the NPR-Edison research backs up other research that suggests users develop an emotional connection with these devices very quickly 鈥 in a way that just doesn鈥檛 happen with a phone, tablet, radio or TV. Users report feeling less lonely and seem to develop a similar emotional connection to these devices as having a pet. This tendency for people to attribute human characteristics to a computer or machine has some history to it, with its own term 鈥 the聽, first coined in 1966.

What does that do to the way users then relate to the content that is shared to them through the voice of these interfaces? Speaking at聽聽in New York, Judith Donath, from the Berkman Center for Internet and Society at 性视界 explained the possible impact: 鈥淭hese devices have been deliberately designed to make you anthropomorphize them. You try to please them 鈥 you don鈥檛 do that to newspapers. If you get the news from Alexa, you get it in Alexa鈥檚 voice and not in The Washington Post鈥檚 voice or Fox News鈥 voice.鈥

Possible implications for this could be that users lose the ability to distinguish from different news sources and their potential editorial leanings and agendas 鈥 as all their content is spoken by the same voice. In addition, because it is coming from a device that we are forming a bond with, we are less likely to challenge it. Donath explains:

鈥淲hen you deal with something that you see as having agency, and potentially having an opinion of you, you tend to strive to make it an opinion you find favourable. It would be quite a struggle to not try and please them in some way. That鈥檚 an extremely different relationship to what you tend to have with, say, your newspaper.鈥

As notification features begin to roll out on these devices, news organizations will naturally be interested in serving breaking news. However, with the majority of these smart speakers being in living rooms and often consumed in a communal way by the whole family, another ethical challenge arises. Elizabeth Johnson from CNN highlights one possible scenario: 鈥淪ometimes we have really bad news to share. These audio platforms are far more communal than a personal mobile app or desktop notification. What if there is a child in the room; do you want your five year old kid to hear about a terror attack? Is there a parental safety function to be developed for graphic breaking news content?鈥

Parental controls such as these are likely to be developed, giving more control to parents over how children will interact with these platforms.

One of the murkiest ethical areas will be for the tech platforms to continue to demonstrate transparency over: with the 鈥渁lways listening鈥 function of these devices, what happens to the words and sounds their microphones are picking up? Are they all being recorded, in anticipation of the 鈥渨ake鈥 word or phrase? When stories looking into this surfaced last December, Amazon made it clear that their Echo speakers are been聽. Audience research suggests, however, that this remains a concern for many potential buyers of these devices.

Voice AI: The ethical dimension

  • Kids unlearning manners
  • Users developing emotional connections with their devices
  • Content from different news brands spoken in the same voice
  • Inappropriate news alerts delivered in communal family environment
  • Privacy implications of 鈥渁lways-listening鈥 devices

Jumping out of boiling water before it鈥檚 too late

As my Nieman Fellowship concludes, I wanted to go back to the message at the start of this piece. Everything I鈥檝e seen and heard so far with regards to smart speakers suggests to me that they shouldn鈥檛 just be treated as simply another new piece of technology to try out, like messaging apps, bots, Virtual and Augmented Reality (as important as they are). In of themselves, they may not appear much more significant, but the real impact of the change they will herald is through the AI and machine learning technology that will increasingly power them in the future (at this stage, this is still very rudimentary). All indications are that voice is going to become one of the primary interfaces for this technology, complementing screens through providing a greater 鈥渇rictionless鈥 experience in cars, smart appliances and in places around the home. There is still time 鈥 the tech is new and still maturing. If news organizations strategically start placing bets on how to develop native experiences through voice devices now, they will be future-proofing themselves as the technology rapidly starts to proliferate.

What does that mean in reality? It means coming together as an industry to collaborate and discuss what is happening in this space, engaging with the tech companies developing these platforms and being a voice in the room when big industry decisions are made on standardising best practices on AI.

It means investing in machine learning in newsrooms and R&D to understand the fundamentals of what can be done with the technology. That鈥檚 easy to say of course and much harder to do with diminishing resources. That鈥檚 why an industry-wide effort is so important. There is an AI industry body called聽聽which is making real progress in discussing issues around ethics and standardisation of AI technology, among other areas. Its members include Google, Facebook, Apple, IBM, Microsoft, Amazon and a host of other think tanks and tech companies. There鈥檚 no news or media industry representation 鈥 largely, I suspect, because no-one has asked to join it. If, despite their competitive pressures, these tech giants can collaborate together, surely it is behoven on the news industry to do so too?

Other partnerships have already proven to have been successful and form blueprints of what could be achieved in the future. During the recent US elections, the Laboratory of Social Machines at MIT鈥檚 Media Lab partnered with the Knight Foundation, Twitter, CNN, The Washington Post, Bloomberg, Fusion and others to power聽real-time analytics on public opinion聽based on the AI and machine learning expertise of MIT.

Voice AI: How the news industry should respond

  • Experiment with developing apps and skills on voice AI platforms
  • Organize regular news industry voice AI forums
  • Invest in AI and machine learning R&D and talent
  • Collaborate with AI and machine learning institutions
  • Regular internal brainstorms on how to use voice as a primary interface for your audiences

It is starting to happen. As part of my fellowship, to test the waters I convened an informal off-the-record forum, with the help of the Nieman Foundation and AP, bringing together some of the key tech and editorial leads of a dozen different news organizations. They were joined by reps from some of the main tech companies developing smart speakers and the conversation focused on the challenges and opportunities of the technology. It was the first time such a gathering had taken place and those present were keen to do more.

Last month, Amazon and Microsoft聽聽鈥 their respective voice assistants Alexa and Cortana would talk to each other, helping to improve the experience of their users. It鈥檚 the sort of bold collaboration that the media industry will also need to build to ensure it can 鈥 pardon the pun 鈥 have a voice in the development of the technology too. There鈥檚 still time for the frog to jump out of the boiling water. After all, if Alexa and Cortana can talk to each other, there really isn鈥檛 any reason why we can鈥檛 too.

Nieman and AP are looking into how they can keep the momentum going with future forums, inviting a wider network in the industry. If you鈥檙e interested, contact聽 at Nieman or聽聽at AP. It鈥檚 a small but important step in the right direction. If you want to read more on voice AI, I鈥檝e been using the hashtag聽聽to flag up any interesting stories in the news industry on this subject, as well as a聽 of the best accounts to follow.

Engage With Us

Join Our Community

Ready to dive deeper with the HBS AI Institute? Subscribe to our newsletter, contribute to the conversation and begin to invent the future for yourself, your business and society as a whole.