I definitely do not want to support this practice, but there’s no way to filter these out 😠.
I listened to one of the Audible samples, labeled Virtual Voice. Apple had one labeled as “Madison,” so who knows whether they’re all going to be labeled so clearly.
It sounded like a TikTok narrator, passable but at the quality level I would expect from a Netflix second-screen show. The book was at the same quality level, too. (The author does “life and business coaching with innovative and adaptable strategies, transcending traditional boundaries.”)
I consider these kinds of books and narration to be slop, so I’m definitely not the target market. My worry is that publishers will use AI narrators as virtual scabs to lowball actual creators.
Yeah, I don’t think I would like them at all. I had the audiobooks for a whole series by one author, and they were all read by the same narrator except the latest book. I couldn’t handle it; it was a real person, not AI, but they were just terrible (but still better than an AI-generated voice).
Did it do distinct voices for the characters? I could maybe see biographies being tolerable read my a machine (though ironic), but books with multiple characters interacting would be a mess. That’s one of the things I appreciate about John Lee (who narrated almost that entire series). He even used the same voices for the same characters across books.
I don’t know about distinct voices. The sample didn’t cover dialogue. It’s possible with AI, but I wouldn’t expect it from low-effort AI-generation.
deleted by creator
I very much understand the misgivings about this, and certain parts make me uncomfortable with it, too. But this could be revolutionary for media accessibility, and in my mind could easily be worth it for the ability to make new media immediately accessible to folks with vision challenges, deaf and hard of hearing individuals, and a lot of other folks for whom most media is not easily interactive/accessible. For many people in this situation, you wait months after a traditional version of something is published before an accessible version is released, if it ever is. Often, it’s just not seen as worth a publisher’s time to make their content accessible to an audience they don’t see as significantly profitable.
Like the printing press took jobs from scribes, but had far more significant impacts democratizing information and education, so might AI in the long run.
But this could be revolutionary for media accessibility, and in my mind could easily be worth it for the ability to make new media immediately accessible to folks with vision challenges, deaf and hard of hearing individuals, and a lot of other folks for whom most media is not easily interactive/accessible
As an accessibility add-on / upgrade to standard TTS, sure. Sounds great, even. But I will not accept soulless, robotic, AI-generated voices for something being sold as an audiobook. I just won’t.
What about if we sweeten the deal and allow you to choose the voice actor on the fly? Want Star wars novels read by James Earl Jones, or Tolkien read by Arnold Schwarzenegger? You can have that with AI voices.
Lol, there is no sweetening the deal. Every one of those is a lost job opportunity for an actual voice actor.
Except many books get read by the author or never get read at all so it opens new opportunities for people who wouldn’t ever use audio books or books who never will get audio version now vision impaired people have the option to hear them.
One blogger cited in the report claimed converting an ebook to audio using the AI narration took just 52 minutes
This does not inspire confidence. The technology is there to do this very well, but it takes skill and effort. The technology to automate it end to end with high quality does not yet exist.
52 minutes. That’s maybe 1/10th the time it would take to listen to it. I wonder how much of these 40,000 books were even proof-listened once.
Honestly, I don’t really care if the LLM can spit out a perfect replica of Stephen Fry with every inflection and intonation possible and in the correct spots.
Tools like these can and will be used to take jobs from actual voice actors. I want no part of it.
I get where you’re coming from, but it doesn’t sit quite right with me. The whole point of technology is to save human time and effort. That should be a good thing. The problem is the capitalist hellscape that is the status quo. I don’t think we should put the onus of propping up that capitalist hellscape onto book authors. I mean, maybe that’s the easiest way to maintain the status quo, but the status quo was never sustainable in the first place.
I don’t know. This is not a fully fleshed out philosophy. At some level I’m sure it’s the same old idealism-vs-pragmatism debate.
Let me rephrase the issue for you and see if you have a different emotional reaction.
A person’s job was replaced with a capitalist’s robot, and now the capitalist earns all the money.
I know I’m way late to the party but…
A person’s job was replaced with a capitalist’s robot, and now the capitalist earns all the money.
Not necessarily. A lot of Text-to-Speech (TTS) tech comes out of academia and free, open source software (FOSS). That includes AI models and voice changing tools like RVC (Retrieval-based Voice Conversion). It is fully open source and there’s thousands upon thousands of voices to choose from that are also free and not a one is an exact replica of a real person’s voice (because it doesn’t do that good a job; just gets close). Many of the most popular voices are mashups of many different voices anyway.
You can use any number of FOSS TTS tools (some of the newer open source AI models are great) to have it read your text and then have it processed through RVC into whatever voices you want.
Alternatively, you could just read the text yourself and change the voices using RVC. That works far better than you’d think it would but it requires reading your whole book out loud which requires overcoming laziness haha.
TL;DR: A person’s job could be replaced with a FOSS robot, and now the author earns all the money.
If they are as shitty as the obviously ai powered closed captioning we are seeing now, they will hopefully be easy to recognize.





