Read Aloud

Read Aloud lets visitors listen to your FlipBook instead of read it. The toolbar gets a "Read Aloud" button; clicking it starts narration that highlights each word as it's spoken. There are two playback options: free browser TTS (works out of the box) and Pro cloud TTS (studio-quality voices, pre-generated per FlipBook).

This article walks through both options end to end.

Why offer Read Aloud

  • Accessibility — readers with visual impairments or reading difficulties can listen.
  • Multitasking — readers can listen while commuting, exercising, or doing other things.
  • Language learners — hearing pronunciation alongside text helps comprehension.
  • Reach — younger audiences increasingly prefer audio content (podcast generation).

The two tiers

Browser TTS (free, all versions)

  • Uses the reader's device voice — the same voice their phone or computer uses for accessibility features.
  • Each sentence is spoken in sequence; the current word is highlighted as it's read.
  • Voice quality and language coverage depend entirely on the reader's device.
  • Zero setup required. Works on any visitor's browser.

Cloud TTS (Pro)

  • Uses professional cloud voice services to generate audio ahead of time.
  • Audio is pre-generated and cached — playback is instant, with no streaming delay.
  • Voice quality is dramatically better than browser TTS.
  • Requires a provider API key (Google, ElevenLabs, Azure, Amazon Polly, or OpenAI) — you pay the provider directly.
  • One-time setup per provider, then any FlipBook can use it.

Comparing cloud providers

Provider

Best for

Notes

ElevenLabs

Most natural-sounding voices

Priciest of the five; subscription model

OpenAI TTS

High quality, low cost

6 voices: alloy, nova, echo, fable, onyx, shimmer

Google Cloud TTS

Wide language coverage

Neural2 and WaveNet voices; pay-as-you-go

Azure Cognitive Services

Already on Azure infrastructure

Hundreds of neural voices; enterprise-friendly

Amazon Polly

AWS-native stacks

Standard and Neural voices; cheapest tier

You don't have to pick just one. You can set up multiple providers and assign different voices to different FlipBooks.

Setting up cloud providers (one-time per provider)

All keys live under FlipBooks → Settings → Integrations. Set up only the providers you plan to use.

ElevenLabs

  1. Visit elevenlabs.io and sign in (free trial available).
  2. Click your profile (top right) → Profile + API Key.
  3. Copy the API key.
  4. In WordPress, go to FlipBooks → Settings → Integrations.
  5. Paste into the ElevenLabs API Key field.
  6. Save.

OpenAI

  1. Visit platform.openai.com/api-keys and sign in.
  2. Click Create new secret key, name it (e.g. "FlipBook TTS").
  3. Copy the key (you won't see it again).
  4. Make sure your OpenAI account has billing set up.
  5. In WordPress: FlipBooks → Settings → Integrations → paste into OpenAI API Key → save.

Google Cloud TTS

  1. Open the Google Cloud Console and select or create a project.
  2. Enable the Cloud Text-to-Speech API (search for it in the API library and click Enable).
  3. Go to APIs & Services → Credentials → + Create Credentials → API Key.
  4. Copy the key.
  5. In WordPress: FlipBooks → Settings → Integrations → paste into Google Cloud TTS API Key → save.

Azure Cognitive Services

  1. In the Azure Portal, create a Speech resource (under AI + Machine Learning).
  2. Open the resource → Keys and Endpoint.
  3. Copy Key 1 and note the Region (e.g. eastus).
  4. In WordPress: FlipBooks → Settings → Integrations → paste Azure Speech Key → select matching Region → save.

Amazon Polly

  1. In the AWS Console, create an IAM user with the AmazonPollyReadOnlyAccess policy.
  2. Generate access keys for that user.
  3. Copy the Access Key ID and Secret Access Key.
  4. In WordPress: FlipBooks → Settings → Integrations → paste both, choose your AWS Region → save.

Generating audio for a FlipBook (Pro)

Once at least one provider is set up:

  1. Open the FlipBook for editing.
  2. Switch to the Read Aloud tab.
  3. Pick a Provider and a Voice from the dropdowns.
  4. Click Generate Audio.
  5. Progress is shown live — leave the tab open until it finishes (a few minutes for short books, longer for big ones).
  6. When complete, the new audio set appears in the list with Set Default and Delete buttons.

You can generate multiple sets per FlipBook — one per language, one per provider, one per voice. Readers pick from a dropdown in the viewer toolbar.

Browser TTS workflow

If you'd rather not pay for cloud TTS, Read Aloud still works out of the box using the reader's browser TTS:

  1. No setup required.
  2. When a reader clicks the Read Aloud button, the browser uses its built-in voice.
  3. Voice quality depends on the device — iOS and macOS have nice voices; Windows and older Androids have less natural ones.

You don't need to generate anything per FlipBook for browser TTS.

How readers use Read Aloud

When the Read Aloud button is enabled in the toolbar:

  1. A reader clicks the Read Aloud button.
  2. A small dropdown lets them pick a voice (if multiple sets exist).
  3. Playback starts. The current word highlights as it's spoken.
  4. Pause / Resume controls appear in the toolbar.
  5. Speed control (0.5x to 2x) is available.
  6. When a page finishes, playback auto-advances to the next page.

Switching voices mid-reading

If a reader wants a different voice partway through:

  1. They click the voice dropdown again.
  2. They pick a new voice.
  3. Playback stops cleanly and restarts from sentence 1 of the current page in the new voice.

What pages get audio

Audio is generated only for pages with extractable text. The plugin uses two methods to extract text:

  1. pdfparser (fast, pure PHP) — for most PDFs.
  2. pdftotext (fallback) — for encrypted or unusual PDFs.

Pages with no extractable text (e.g. image-only scanned PDFs) get skipped during generation. To fix: run OCR on the PDF first, then re-upload.

For Image Gallery FlipBooks, there's no text to extract — Read Aloud isn't available. (The toolbar button auto-hides.)

Costs

You pay your chosen provider directly. Rough guidance based on a 50-page PDF (~ 10,000 words):

Provider

Estimated cost (50-page PDF)

ElevenLabs

$2-5 (premium subscription tiers reduce this)

OpenAI TTS

~$0.15

Google Cloud TTS

~$1.60 (Neural2 voices)

Azure Cognitive Services

~$1.60

Amazon Polly

~$0.40 (Neural)

Check each provider's current pricing page before generating large books.

Audio storage

Generated audio files are stored on your WordPress site under wp-content/uploads/tncfb3d-tts/{flipbook_id}/{set_key}/. They're served directly from your server.

Each sentence becomes its own MP3 file — typical size 10-50 KB per sentence. A 50-page PDF generates roughly 5-15 MB of audio total.

You can delete a set anytime with the Delete button on the Read Aloud tab. This frees up storage but means readers can no longer use that voice.

Best practices

  • Generate at least two voices per FlipBook — one for English/global readers and one for a localized accent if your audience is international.
  • Pick a Neural voice when available — Standard voices on Google/Azure/Polly sound robotic.
  • Test playback before publishing — preview the first page on the Read Aloud tab.
  • For multilingual FlipBooks, generate one set per language and label them clearly.

Troubleshooting

"Generation failed: invalid API key."

  • Re-paste the key in Integrations — make sure there's no whitespace at either end.
  • For Google Cloud TTS, confirm the Text-to-Speech API is enabled on the project (not just the key created).
  • For Azure, confirm the Region matches the region of your Speech resource.

"Some pages have no audio."

  • That page has no extractable text. Run OCR on the PDF (Adobe Acrobat → Recognize Text, or a free tool) and re-upload.

"Long pauses between sentences the first time through." The audio file is loading on the reader's first listen. Subsequent listens are instant once the browser caches.

"Generation takes a long time." Cloud TTS sends each sentence as a separate request. A 100-page PDF can mean 1000+ API calls. Expected duration: ~5-15 minutes for a typical PDF. Leave the tab open.

"My voice doesn't show in the dropdown." The voice list is pulled from the provider's API. If a voice you expect is missing, your account may not have access to it (e.g. preview voices on ElevenLabs).

"Generation cost more than expected." Confirm you set rate limits in your provider account if available. ElevenLabs and OpenAI have per-month caps.

Next steps

Did this answer your question? Thanks for the feedback There was a problem submitting your feedback. Please try again later.

Still need help? Contact Us Contact Us