Frequently Asked Questions

Overview

What is TTS?

TTS stands for Text-to-Speech – a technology that transforms written text into spoken language. A previously created voice clone reads the text aloud, delivering natural-sounding voice output with ease.

What is STS?

STS stands for Speech-to-Speech – a form of speech synthesis that uses an existing voice recording as input. The voice clone speaks with the same intonation and speed as the original, preserving the natural rhythm and expression.

What is STT?

STT stands for Speech-to-Text. This feature converts spoken audio into written text, automatically recognizing the language used in the recording.

What are Voice Operations?

A voice operation refers to either creating or editing a voice clone or a voice design/remix. Each subscription includes a specific number of voice operations, which counts toward your monthly quota. You can find the exact number included in your subscription details.

What is a Voice Clone?

A voice clone is a virtual reproduction of a real voice, created using AI technology. It mimics the tone, style, and character of the original speaker so naturally that it sounds like the real person is talking.

Each subscription includes a curated library of high-quality voices. In addition, every plan comes with a quota for custom voice clones that users can create themselves.*

What is Voice Design?

Voice design is a feature that allows users to create custom voices from scratch by describing them through text prompts. You can specify attributes such as age, gender, accent, tone and emotion. This enables you to generate entirely new, realistic voices tailored precisely to your needs.

What is Voice Remixing?

Voice Remixing allows you to modify a cloned or custom-created voice by entering a text prompt. You can specify attributes such as age, gender, accent, tone, and emotion, making it easy to adjust a voice to fit your exact needs.

Voice Remixing is available for voices in the VoiceWunder library. It is not supported for shared voices.

What is Voice Sharing (coming soon)?

With the Voice Sharing plan, a voice talent can make up to 20 different clones of their own voice available to other VoiceWunder users with Studio plans – for example, to capture different emphasis styles or tonal variations. This may help optimize workflows, especially with producers a voice talent works regularly.

Granting access is simple and fully managed by the voice talent. To enable access, the talent requests the studios’s VoiceWunder ID and enters it in their user area, where access can also be revoked at any time with equal ease.

While standard VoiceWunder speech synthesis does not create any logs, voice sharing provides a transparent usage record that is visible only to the voice talent. This record shows exactly which user generated which text with the shared voice, and when – almost as if the talent had been present in the recording studio.

The voice talent is solely responsible for granting the user the necessary usage rights and for obtaining the user’s prior authorization for activity logging before granting access. All agreements regarding such rights and authorizations are concluded directly between the voice talent and the respective user. The sharing of voice clones is strictly limited to a talent’s own voice and is available exclusively with the Voice Sharing subscription. Voice Sharing is available exclusively for users with a Studio subscription.

 

The Voice Library

VoiceWunder offers a range of default voices in its Voice Library, tailored to each subscription plan.

Basic Plan: Includes a curated selection of professional-quality default voices.

Studio Plan: Includes all Basic voices plus an additional extended voice library featuring exceptionally high-quality voices.

 

For both the Basic and Studio plans, commercial use of the default voices is permitted and royalty-free – even after the subscription has ended.

What factors should I keep in mind when developing a Voice Clone?

For best results, use the highest possible audio quality – free from background noise. We recommend a loudness level of approximately -23 dB to -18 dB RMS, with a True Peak of -6 dB to prevent distortion. The recording should be spoken in a consistent tone and volume throughout. The more consistent the input, the more natural and coherent the voice clone will sound. A recording duration of 1–3 minutes is usually sufficient.*

Coins/Billing

Each subscription includes a specific number of coins, which are used for different forms of speech synthesis:

  • TTS (Text-to-Speech): 1 character = 1 coin
  • STS (Speech-to-Speech): 1 minute of audio = 1,000 coins, regardless of how much
    text it contains
  • STT (Speech-to-Text): 1 minute of audio = 1,000 coins, regardless of how much
    text it contains
  • WUNDER BUTTON: 1 minute of audio = 1,000 coins, regardless of how much text it contains

Unused coins expire at the end of the month, and the coin balance is refilled at the beginning of each month according to the subscription plan.

Consent

Due to legal requirements, it is essential to obtain the explicit consent of the respective speaker before creating a voice clone. Use of the plug-in or platform is permitted only with material for which the user holds the appropriate rights or authorization.

All AI speech synthesis from VoiceWunder must be labeled. We will comply with the EU AI rules once they are finalized, expected by late 2025 – current EU requirements are not yet defined. Please also follow platform-specific rules (e.g., YouTube, TikTok).

Optimizing TTS results

If the voice output doesn’t meet your expectations, try the following adjustments:

  • Re-render the text: The voice often adjusts its emphasis slightly with each rendering.
  • Correct pronunciation: Spell words phonetically to match how they should sound.
  • Use SSML Phoneme Tags: For English, phonetic notation allows more precise control.
  • Adjust rhythm and emphasis: Add punctuation like “, . ; : – ! ?” to influence pauses and tone.
  • Use CAPITAL LETTERS: This can emphasize certain words or syllables.
  • If you experience issues with pronunciation or language usage, try enabling safe mode in the settings.
  • Use advanced mode and emotional tags. Emotional tags are inline cues placed in square brackets (e.g., [sigh], [excited]) within the text you want to synthesize. They guide how the voice clone speaks by adding emotional, non-verbal, or stylistic elements. To enable advanced mode, open the settings page and select “Advanced Mode” under “Speech Generation.” Once enabled, you can also choose emotional tags from a pop-up list by clicking the icon in the lower-right corner of the text box. You can also try specifying a language here if the voice has a particular accent or coloration.

Optimizing STS results

To achieve the best results with Speech-to-Speech (STS), consider the following guidelines:

  • Match vocal characteristics: The original speaker and the voice clone should sound similar in tone and style.
  • Mimic the voice clone: Have the speaker imitate the tone of the target voice for a more natural output.
  • Maintain proper audio levels: Both recordings should be between -23 dB and -18 dB RMS, with a True Peak no higher than -6 dB.
  • Clean the audio: Remove background noise, mouth clicks, and other artifacts before use.
  • Use consistent training data: Clone the voice with 1-3 minutes of speech spoken in a steady pitch and tone.
  • Avoid mixed sources: Do not use recordings with multiple speakers when cloning a voice.*

Optimizing Voice Design

If the voice you designed doesn’t sound quite right, try these tips to fine-tune the result:

  • Adjust Loudness: If the voice sounds distorted or overly intense, lower the Loudness value. This affects both the speaker’s tone and overall volume. Default: 50
  • Increase Prompt Strength: If the voice doesn’t match your prompt closely enough, raise the Prompt Strength value. This makes the system rely more heavily on your text description. Default: 0.5 – very high values may produce unintended results.
  • Regenerate Previews: If you’re not satisfied with the suggested voices, click “Create Previews” again to generate new variations.
  • Refine Your Prompt: Experiment with different character traits and emphasize key qualities using modifiers like “very,” “slightly,” “deep,” “warm,” or “gentle.”
  • Use Advanced Mode and Emotional Tags: You can add emotional or performance cues directly into the preview text using square-bracket tags such as [sigh], [excited], or [whispering]. These guide the voice’s emotion, delivery, and style. To enable this feature, go to Settings → Speech Generation → Advanced Mode. Once enabled, you can insert emotional tags using the icon in the lower-right corner of the preview text field.
  • Specify the Language: If the voice will speak a particular language, include it in your prompt so the voice is optimized for pronunciation and cadence.

Optimizing Voice Remixing

If the voice you remixed doesn’t sound quite right, try these tips to fine-tune the result:

  • Lower Loudness if the voice sounds distorted or too intense. This affects both tone and volume. Default: 50.
  • Increase Prompt Strength if the voice doesn’t match your description closely enough (Default: 0.5 – very high values may produce unintended results).
  • Click “Create Previews” again to generate new voice variations if you’re not satisfied.
  • Refine your prompt by adding clear character traits and emphasizing words like “very,” “warm,” “soft,” “deep,”or “energetic.”
  • Use Advanced Mode and emotional tags in the Preview Text field. Emotional tags are placed in square brackets (e.g., [sigh], [excited]) to guide emotion, delivery, and style. Enable this by going to Settings → Speech Generation → Advanced Mode, then use the emotion tag icon in the preview field.
  • Specify the language if you want the voice optimized for a particular language.

Can I install the plug-in on multiple workstations?

Yes. With a paid plan, you can install the plug-in on multiple workstations (Basic: up to 3, Studio: up to 5). However, the plug-in can only be active on one workstation at a time. To switch devices, simply close the plug-in window on your current workstation before opening it on another.

Commercial usage

All voice output generated using voice design/remix or the provided voice clones can be used without restrictions – including for commercial purposes – if created under a Basic or Studio subscription. This usage remains valid even after the subscription ends.

For custom voice clones you create, commercial use depends on your individual agreement with the respective speaker.*

Please note: Commercial use of any material generated under a Free subscription is strictly prohibited.

Prohibited use

Any use that violates ethical standards or applicable laws is strictly prohibited. This includes, but is not limited to:

  • Inciting violence, discrimination, or criminal behavior
  • Promoting drug use
  • Supporting fraudulent, exploitative, or abusive practices

System requirements

The plug-in is compatible with any internet-enabled Mac or Windows PC running Avid Pro Tools (version 2025.6), Steinberg Nuendo/Cubase (version 14), Apple Logic Pro (version 11.2 with Rosetta enabled), Presonus Studio One Pro (version 7) or Cockos Reaper (7.43). A high-speed internet connection is strongly recommended, as audio transmission may involve large data volumes.

Ending a Subscription

You can easily cancel your subscription at https://voicewunder.ai in the user area under “Subscription.”
After the subscription period ends, your account and all voice clones and voice designs you created will be deleted.

Security

Your personal data is used exclusively for generating the requested speech synthesis and is never utilized for model training purposes. All data is transmitted securely using SSL encryption.
We offer European data residency and adhere fully to GDPR and CCPA regulations. For your security, our team will never request your password.

Which languages are supported?

Currently up to 74 languages are supported: Afrikaans, Arabic, Armenian, Assamese, Azerbaijani, Belarusian, Bengali, Bosnian, Bulgarian, Catalan, Cebuano, Chichewa, Croatian, Czech, Danish, Dutch, English, Estonian, Filipino, Finnish, French, Galician, Georgian, German, Greek, Gujarati, Hausa, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Irish, Italian, Japanese, Javanese, Kannada, Kazakh, Kirghiz, Korean, Latvian, Lingala, Lithuanian, Luxembourgish, Macedonian, Malay, Malayalam, Mandarin Chinese, Marathi, Nepali, Norwegian, Pashto, Persian, Polish, Portuguese, Punjabi, Romanian, Russian, Serbian,Sindhi, Slovak, Slovenian, Somali, Spanish, Swahili, Swedish, Tamil, Telugu, Thai, Turkish, Ukrainian, Urdu, Vietnamese, Welsh.

* = The prerequisite for cloning a voice is the explicit consent of the respective speaker.
© 2026 VoiceWunder UG (haftungsbeschränkt) · All rights reserved

All prices are net prices, additional VAT (19%) may apply.
The products are only available to commercial customers.
Powered by VoiceWunder® & ElevenLabs.