Speech to text —
Publishers say Audible’s new captions feature is illegal. Is it?
Seven of the nation’s top book publishers sued Amazon subsidiary Audible on Friday, asking federal courts to block the company from releasing a new feature called Audible Captions that’s due out next month. The technology does exactly what it sounds like: display text captions on the screen of your phone or tablet as the corresponding words are read in the audio file.
The publishers argue that this is straight-up copyright infringement. In their view, the law gives them the right to control the distribution of their books in different formats. Audio is a different format from text, they reason, so Audible needs a separate license.
This would be a slam-dunk argument if Audible were generating PDFs of entire books and distributing them to customers alongside the audio files. But what Audible is actually doing is subtly different—in a way that could provide the company with firm legal ground to stand on.
The caption feature “is not and was never intended to be a book,” Audible explained in an online statement following the lawsuit. “Listeners cannot read at their own pace or flip through pages as they could with a print book or eBook.” Instead, the purpose is to allow “listeners to follow along with a few lines of machine-generated text as they listen to the audio performance.”
“We disagree with the claims that this violates any rights and look forward to working with publishers and members of the professional creative community to help them better understand the educational and accessibility benefits of this innovation,” Audible added.
Audible’s case likely rests on two landmark video recording precedents
Audible hasn’t fully explained how Audible Captions work, but the company has said enough to provide a pretty clear picture. In a video about the service released earlier this month, an Audible executive explained that the technology was “built on publicly available technology through AWS Transcribe.” That’s Amazon’s cloud-based service for automatic text transcription.
So it seems that the Audible app is generating text captions in realtime as the user plays an audio file. The app sends snippets of audio files to an Amazon server and gets back corresponding sections of text, which it then displays on the screen one word at a time. (It’s possible that AWS Transcribe has an offline mode that allows the transcription to happen on-device, but I haven’t found any documentation about this. I’ve asked Audible about this and will update if they respond.)
Audible is likely doing this because it strengthens the company’s argument that it can do this without a license from publishers.
To see why, it’s helpful to review two of the most important copyright decisions of the modern era. The first was the 1984 decision of Sony v. Universal that declared the VCR legal. Hollywood argued that the “record” button on a VCR was an invitation for customers to infringe their copyrights. But the Supreme Court disagreed, arguing that copyright’s fair use doctrine allowed “time shifting”—recording a show now to play it later.
The courts built on this decision with a 2008 ruling known as Cartoon Network v. Cablevision. In that case, a bunch of media companies sued the cable company Cablevision because it was offering customers a “remote DVR.” Like a conventional DVR (or a VCR before that), Cablevision’s technology allowed customers to record and play back television shows at their convenience. But unlike a conventional DVR, the remote DVR was located in a Cablevision data center, not in the customer’s home.
Television content owners argued that Cablevision was infringing their copyrights by making unauthorized copies of their show on a massive scale. Cablevision disagreed, arguing that the copies were being made by customers, not by Cablevision. The physical DVR might be owned and maintained by Cablevision, but the customer was deciding which shows to record. And the customer was entitled to do that under the earlier Sony ruling. An appeals court ultimately accepted this argument.
The Cablevision ruling provided a legal foundation for cloud-based “storage locker” services that allowed customers to upload, save, and stream (but not share) their music and video collections.
Expect Cablevision to be central to Audible’s defense
That brings us back to Audible’s new transcription technology. Audible doesn’t have the legal right to sell text versions of audiobooks to customers’ without publishers’ permission. But we can expect Audible to argue that it does have a right to sell software tools that allow customers to do speech-to-text conversion.
Audible’s case will likely be strengthened by the fact that its app never creates or saves a permanent, full transcript of an audiobook. Instead, the software only displays a few words on the screen at a time.
If Audible is sending audio files to Amazon’s servers for transcription, publishers are likely to argue this means Amazon—not users—are creating the transcripts. But this seems closely analogous to the Cablevision case: the conversion is being done by Amazon servers, but only when explicitly requested by users. And each translation is only sent back to the user who requested it.
Of course, outcomes in disputes like this aren’t easy to predict. Courts could decide that a conversion from audio to text doesn’t have the same fair use protections as the time-shifting activities the courts have blessed in the past. The courts could decide that Amazon plays too active a role in the conversion process to portray itself as a passive supplier of technology like the maker of a VCR. Or they might decide that this whole legal argument is a too-clever-by-half ploy to justify reproducing publishers’ copyrighted books. Ultimately, we’ll have to wait and see what the courts say.
Interestingly, Amazon faced the same battle in reverse a decade ago when it proposed adding a text-to-speech feature to Kindle. Like Audible today, Amazon was planning to offer the text-to-speech feature for all e-books without getting approval from book publishers. But after a backlash from publishers, Amazon caved and gave publishers the right to disable the feature. Amazon now says that “most Kindle content and personal documents are eligible for Text-to-Speech.”