Just Artificial, Not Intelligence
By Porter Anderson (@Porter_Anderson) | July 29, 2022 |

Image – Getty iStockphoto: Phonlamai Photo
Scaring Up Some Audiobooks
Recently, a distributor of digital books – both ebooks and audiobooks – announced that it was adding a new offering for publishers: “AI voicing” for audiobooks. The company could barely clear the word audiobooks before rushing to assure everyone that “AI narration” would never match the primacy of audio work using human readers and production. The caveat, going on and on, came across almost as an apology before any offense had been committed.
True, a certain resistance to the idea of machine-generated audiobooks is hardly eased by such headlines as Synthetic Voices Want To Take Over Audiobooks (Wired, January 27). No, they don’t. Synthetic voices don’t want to take over audiobooks. They don’t want anything. They’re synthetic. But book publishing is an industry that’s never accepted digital developments easily. Even after e-commerce and digital products played a key role in the US market’s comparative success during the still ongoing pandemic, those “synthetic voices” seem to murmur something sinister.
The many vendors now offering machine-generated audio narration know that this is the pushback to expect. It’s a mindfield of emotional reaction. They’re nervous about it.
Some defensiveness isn’t without reason. The business of gifted human narrators – who are actually readers, voice actors, interpreters, not narrators, the term has never been quite right – are supported by many additional workers in important roles. Those workers include sound technicians, audio editors, studio and tracking-booth providers, producers, in some cases directors, and more folks. Jobs are involved, and they comprise a lot of talent and many skill sets. Programs like the Audio Publishers Association support these workers, and the APA’s Audie Awards rightly honor their work in 25 categories.
Nevertheless, there are compelling reasons for publishers to listen to machine-generated audiobook readings. The kind of work they can handle is unlikely ever to be produced in human readings because of the cost factor.
As many publishing professionals readily agree, machine-produced voicings may be best for nonfiction, which is generally thought not to need the emotional and aesthetic nuance of fiction. But of course, in a great many cases, nonfiction is read by the human author, who may be untrained and inexperienced at the microphone. While there’s always someone asserting that those “synthetic voices” feature many mistakes in pronunciation, so does the work of many human authors.
Just days ago, I heard a very fine, prominent nonfiction author in his reading of his own book pronounce scathing with a short a, making that first syllable rhyme with cat. Most of us have had the experience of discovering, red-faced, that we’ve been pronouncing something wrongly for years. The audio edition of one of last year’s most important American political books was at times almost comical in its mispronunciations by its much-praised author. In both machine-generated and human-produced readings, proof-listening is critical to catch these things.
Still, the imperative for publishers regarding audio actually goes beyond nonfiction.
Listening Out for the Backlist
There are many cases in which no audio edition of a book has been made or will be made because of the expense of standard human labor-intensive production. Consider a publisher with a large backlist of important titles never given audio treatments. Is the author helped by the fact that no audiobook edition of her or his book is available? Of course not. There are customers who want audio. Some of them consume books only in audio renditions. Is that money to be left on the table simply because a human-produced rendition can’t be afforded?

Provocations graphic by Liam Walsh
What’s more, machine-generated voices have improved dramatically in the last two to four years. Have you heard Amazon’s male Alexa voice? While we weren’t listening, the quality of those “synthetic voices” has been making progress.
The vendor called Speechki now offers 364 synthetic voices (lots of accents and dialects) in 77 languages. On its home page, listen briefly to the short demo under the header “Your Audiobook Could Sound Like This!” What do you think? Try your ear in the 10-file quiz in which the company challenges you, “Bet you can’t tell a robot from a human!” No self-respecting robot would use as many gratuitous exclamation points as Speechki and many other excited vendors do, but you may be surprised how you score on that quiz.
The cost factor of standard audiobook production is daunting, especially if you have a big catalogue of good backlist that needs audio renditions. While a well-made audiobook with a standard human reading can cost thousands of dollars, the digitally produced edition can come in at several hundred bucks. It also takes less than a day to produce an audiobook when the talent is a distant cousin of the elevator that tells you, “Doors opening.”
But where so much of the discussion goes subtly awry is in overheated connotations of the term artificial intelligence. The commercial sector’s fondness for that term, all robot-y and Ex Machina sexy is wrongly applied here, just as it’s being wrongly applied in so many parts of industry and entertainment.
The “synthetic voices” – usually sampled, of course, from human voices – have zero intelligence. They’re digitally manipulated to sound as realistic as possible. They’re not thinking when they scan your book. Code is simply rendering text into pre-designed sounds.
By tossing the phrase AI around all over the place, many of the biggest advocates of machine-generated audio are doing themselves a disservice and not helping the publishing industry dispassionately consider its unvoiced backlist problem. The people who love those exclamation points are their own worst enemies, triggering knee-jerk objections with the implication that another kind of intelligence is coming to getcha! Those marketing folks need to sit down, get over it, and quietly run a search and replace to put periods where all their exclamation points are.
So many things today are unnecessarily called artificial intelligence, processes that make no selections, have no prerogatives of their own, and certainly no consciousness. What we forget – what some never knew – is that artificial intelligence is defined as “the capability of a machine to imitate intelligent human behavior” (emphasis mine, and we’re getting that definition from Merriam-Webster’s Unabridged Dictionary). Popular usage – I’m looking at you, Hollywood – has morphed the term into something much more menacing than it really is.
Is it possible that the scare factor in popular speculation about AI could make it harder for publishing people to weigh the authentic advantages and disadvantages of synthetically generated audiobooks needed by the industry?
What do you think? Could the same logic behind changing the term UFO to UAP (unidentified aerial phenomenon) – getting us past the hype – help publishers, authors, and readers more rationally debate the question of how best to produce the audiobooks they need to be selling?
The earliest versions of the Kindle had a “listen” capability. Out of necessity, I tried it a few times. Granted, AI voices are better today as you suggested in your post, but judging from Siri’s sometimes comical efforts to read back my texts, I’m not convinced we’re there yet.
Regarding the Hollywood model of scary, world-dominion hungry machines, I think you’re right. Totally overblown. The little chess player who had his finger broken this week by the AI chess machine might disagree however. :-O
Hey, Janee, thanks for reading me and dropping a note.
I think Siri and other voice assistants (even the male Alexa voice, which has some improvements) may not be the best gauge of what the voices being designed for book-reading are sounding like. But some observers have suggested that we don’t even need or want them to sound exactly like humans. We just need good, strong, serviceable reads that can be affordable enough to get more literature into the audio arena where so many consumers are waiting for it. (And dyslexic consumers, blind consumers, and others can enjoy it. Audio now has a big place in working toward accessibility in publishing.)
Agree about Hollywood. When the robots do rise up to kill us all, I’m pretty sure they’re going to start on Wilshire Boulevard in LA to get revenge for making them look so bad. :)
Thanks again,
-p.
On Twitter: @Porter_Anderson
AI is far from intelligent: At best, it plays chess with blunt force. It cannot actually intuit, infer or invent. Like a serial killer it can mimic but not genuinely feel. It’s best quality and greatest danger is cost efficiency.
It is antithetical to art. Audiobook consumers value the reader as much as the author so it is hard to see why any author or publisher would want to devalue the audio experience. I’m sure the quality is better than the voice giving me driving directions in my car but I’m for preserving what is artful and human in storytelling.
Resist! 1984 was written by a human. Imagine it read by a machine. When that happens, Big Brother has won.
Hey, Ben, thanks for your note and for reading me, as always. Great to hear from you again. You always challenge us.
While I enjoy a good resistance as much as anyone (say it the French way, so much better, lol), I think you may be surprised some day when you discover that the prose or poetry you’ve just heard artfully read was delivered by a synthetic voice. Remember, these are samplings of real people. And they’re designed by real people. And all books aren’t our beloved “1984.” More than once I’ve had to abandon an ebook, not because the book wasn’t good but because the reading was terrible. Even when we dwell only in the land of genuine human utterances, success isn’t always ours.
You’re absolutely entitled to your own opinion, of course, and I respect you (and always appreciate your comments). But I’d just invite you to remember that giving things a try often leads to discoveries of enormous value.
The arts are often — to all of our surprise — buoyed by contemporary capabilities. And if more people can listen to more literature because more voices are made available by more accomplishments in machine-generated voices, I’m willing at least to listen and see how it sounds. And I hope you will be, too. After all, the book we hear produced this way at some point may be yours. And Big Brother can eat his heart out. :)
Take good care, my friend,
-p.
On Twitter: @Porter_Anderson
Hi Porter, you educate me by laying all of this out, but I’m with Benjamin on this one. When I lived in Des Moines and drove monthly to Chicago to visit my mother, audio books made the trip fun. I still remember a female voice, narrating a story about a marriage, the husband a professor at a small college in Ohio, cheating on the narrator with one of his students. SEE? I identified so much with the human voice, that the story probably meant more to me than if I simply had a copy of the book. Yes, that’s only one example, but I will never forget it. I looked up the narrator, who was also an actress in a popular police procedural. Sometimes it’s hard to move into the future, Beth
Hey, Beth, great to hear from you, hope you’re well.
I’d moderate your last line only to say that sometimes the future comes and gets you, making it easier than you expect to move forward. :) And the production of voices (that start as samples of actual humans — maybe your actress in the popular police procedural) probably has come farther than you realize. The kind of synthetic voices I’m talking about (still in my own voice, by the way! lol) sound closer to regular readers than you might expect.
But you needn’t worry. As I was saying to Ben, someday, you’ll just hear some well-read literature, check to see if your actress is around, and discover that the voice was automated. And in such a case, it may be that you couldn’t have heard it in an audio format without the availability of these quickly moving developments, possibly because expense or market conditions would have left that literature off the audiobook list for a given publisher.
So no worries, the future will wait for you and pick you up when you’re ready.
Thanks again,
-p.
On Twitter: @Porter_Anderson
In the mid-1980’s I had a mid-level recording studio and thought that I might be able to leverage a venture into “Books on Tape”, the early appellation for audiobooks. Although I knew little about the market, I did have the necessary gear and access to a wide variety of technical talent so I spent a few thousand dollars on communications consultants to bring me up to speed. The consensus was that a blockbuster novel
narrated by a major actor with impeccable elocution was necessary for best results (Think Marlon Brando narrating Mario Puzo’s “The Godfather”) and I was cautioned that absent a budget of at least than $100,000 (about $250,000 today), success would be elusive. Disheartened, I bid adios to audios.
About the same time technology was morphing from analog to digital which led to the development of electronic musical instruments like the “drum machine”, the digital avatar of a sentient percussionist. It didn’t take but a few heartbeats for adherents of the meme of a drummer being “the guy who hangs out with the musicians” to tout the drum machine as “never talking back or showing up late or calling in sick, never complaining about the venue or the pay scale, never zonked out on dope and always there to get the job done”. Many drummers laughed it off. Others sweated bloody bullets.
Forty-odd years on, it’s not ‘de rigueur’ to have Louis L’Amour narrated by John Wayne (or his digital ghost) and the drum machine has carved out a comfortable niche in the music business. Moral of the story: Different Strokes for Different Folks.
Hey, Jay, thanks for reading my column and for your interesting response.
It’s cool that you were looking into those “books on tape” in the ’80s. I actually was hearing some things at the time, myself, and boy, am I glad we got past those cassette tapes (and CDs!). Much of the field of audio in publishing now is enjoying its success thanks to digital downloads and streaming making audiobooks a kind of born-again format. It’s the format that has the most reliable sales gains each year, in fact, though still about an 8-to-11-percent share of the overall formats market.
Your drum machine anecdote is especially apt, in that so many things in sound production have been basically redeveloped in the last decades. I wish that I could sit with George Frederic Handel in a Carnegie Hall audience when he heard what his Messiah can sound like with modern instrumentation and choral forces. I think he’d be ecstatic, even before he heard what happened with the same players and singers on a soundstage for a recording. (The best example of this is Thomas Beecham’s direction of the oratorio with the Royal Philharmonic, the “Handel-Hardy” setting that was created specifically to use contemporary orchestral and studio capabilities to fully explore the potential — respectfully but with sonic capacities that Handel didn’t have to draw on. And yeah, with bassoonists who actually came in when the conductor looked at them.
And careful what you ask for — voice synthesists today could pull off a pretty convincing John Wayne for those Louis L’Amour books. (Audible has free audio renditions of some of those books for members, by the way. Lance Axt seems to do a lot of the reading.)
Thanks again!
-p.
On Twitter: @Porter_Anderson