TTS stands for Text-to-Speech – a technology that transforms written text into spoken language. A previously created voice reads the text aloud, delivering natural-sounding voice output with ease.
STS stands for Speech-to-Speech – a form of speech synthesis that uses an existing voice recording as input. The new voice speaks with the same intonation and speed as the original, preserving the natural rhythm and expression.
STT stands for Speech-to-Text. This feature converts spoken audio into written text, automatically recognizing the language used in the recording.
A voice operation refers to either creating or editing a voice or a voice design/remix. Each subscription includes a specific number of voice operations, which counts toward your monthly quota. You can find the exact number included in your subscription details.
Voice synthesis is a virtual reproduction of a real voice, created using AI technology. It mimics the tone, style, and character of the original speaker so naturally that it sounds like the real person is talking.
Each subscription includes a curated library of high-quality voices. In addition, every plan comes with a quota for custom synthetic voices that users can create themselves.
Voice Design is a feature that allows users to create custom voices from scratch by describing them through text prompts. You can specify attributes such as age, gender, accent, tone and emotion. This enables you to generate entirely new, realistic voices tailored precisely to your needs.
The Voice Design feature is not available in the Voice Talent plan.
Voice Remixing allows you to modify a synthesized or custom-created voice by entering a text prompt. You can specify attributes such as age, gender, accent, tone, and emotion, making it easy to adjust a voice to fit your exact needs.
Voice Remixing is available for voices in the VoiceWunder library. It is not supported for shared voices.
The Voice Remixing feature is not available in the Voice Talent plan.
What is Voice Sharing?
Voice Sharing allows voice talents to create licensed digital versions of their own voice and grant selected studios access. Studios can use these licensed voices for production, while licensing and ownership remain with the voice talent.
Who owns the voice?
The voice talent always remains the owner of their voice and their synthetic voices. Voice Sharing provides the technology. Ownership does not change.
Who controls access?
Only the voice talent can grant or revoke access. Studios cannot access a voice unless the talent has explicitly approved it.
Can I see when my voice is used?
Yes. Every use of your voice is logged and visible to you. You can see who used your voice, when it was used, and what was created with it.
Can my voice be used without my knowledge?
No. Studios can only use your voice after you grant access. All usage is visible in your activity log.
Can I revoke access?
Yes. You can revoke access at any time.
Who defines the license terms?
License terms, scope, and fees are agreed directly between the voice talent and the studio. Voice Sharing does not set license fees and does not take commissions.
Do I get paid when my voice is used?
Voice Sharing enables licensing between you and the studio. Compensation is agreed directly between both parties.
Is there a recommended pricing model?
Voice Sharing provides an example price matrix as a reference. You are free to define your own pricing.
Can my voice still be used for regular recordings?
Yes. Voice Sharing complements traditional voice production. It does not replace recording sessions.
Can someone use my voice without permission?
No. Voice Sharing is limited to voices created and shared by the voice talent. Studios cannot create or access a voice without permission.
Can Voice Sharing voices be resold?
No. Voice Sharing voices are licensed directly by the voice talent. Voice Sharing does not sell or sublicense voices.
How do I share my voice?
Upload up to 20 different versions of your voice and grant access to selected studios – conveniently within a DAW or via a simple browser upload. You remain in control at every step.
How do studios get access?
Studios provide their VoiceWunder ID to the voice talent. The talent grants access.
Still have questions?
Contact our support team.
VoiceWunder offers a range of default voices in its Voice Library, tailored to each subscription plan.
Basic Plan: Includes a curated selection of professional-quality default voices.
Studio Plan: Includes all Basic voices plus an additional extended voice library featuring exceptionally high-quality voices.
For both the Basic and Studio plans, commercial use of the default voices is permitted and royalty-free – even after the subscription has ended.
For best results, use the highest possible audio quality – free from background noise. We recommend a loudness level of approximately -23 dB to -18 dB RMS, with a True Peak of -6 dB to prevent distortion. The recording should be spoken in a consistent tone and volume throughout. The more consistent the input, the more natural and coherent the voice output will sound. A recording duration of 1–3 minutes is usually sufficient.
Each subscription includes a specific number of coins, which are used for different forms of speech synthesis:
Unused coins expire at the end of the month, and the coin balance is refilled at the beginning of each month according to the subscription plan.
Due to legal requirements, it is essential to obtain the explicit consent of the respective speaker before creating a synthetic voice. Use of the plug-in or platform is permitted only with material for which the user holds the appropriate rights or authorization.
If the voice output doesn’t meet your expectations, try the following adjustments:
Use advanced mode and emotional tags. Emotional tags are inline cues placed in square brackets (e.g., [sigh], [excited]) within the text you want to synthesize. They guide how the voice speaks by adding emotional, non-verbal, or stylistic elements. To enable advanced mode, open the settings page and select “Advanced Mode” under “Speech Generation.” Once enabled, you can also choose emotional tags from a pop-up list by clicking the icon in the lower-right corner of the text box. You can also try specifying a language here if the voice has a particular accent or coloration.
To achieve the best results with Speech-to-Speech (STS), consider the following guidelines:
If the voice you designed doesn’t sound quite right, try these tips to fine-tune the result:
If the voice you remixed doesn’t sound quite right, try these tips to fine-tune the result:
Yes. With a paid plan, you can install the plug-in on multiple workstations (Basic: up to 3, Studio: up to 5). However, the plug-in can only be active on one workstation at a time. To switch devices, simply close the plug-in window on your current workstation before opening it on another.
All voice output generated using voice design/remix or the provided voice library can be used without restrictions – including for commercial purposes – if created under a Basic or Studio subscription. This usage remains valid even after the subscription ends.
For custom voices you create, or for shared voices, commercial use depends on your individual agreement with the respective speaker.
Please note: Commercial use of any material generated under a Free subscription is strictly prohibited.
Any use that violates ethical standards or applicable laws is strictly prohibited. This includes, but is not limited to:
Pro Tools often has issues accessing audio files within an embedded OMF via ARA. When importing the OMF, please select “Copy from source media” under Audio Media Options. This will copy the audio files from the OMF into the session’s Audio Files folder.
The plug-in is compatible with any internet-enabled Mac or Windows PC running Avid Pro Tools (version 2025.6), Steinberg Nuendo/Cubase (version 14), Apple Logic Pro (version 11.2 with Rosetta enabled), Presonus Studio One Pro (version 7) or Cockos Reaper (7.43). A high-speed internet connection is strongly recommended, as audio transmission may involve large data volumes.
You can easily cancel your subscription at https://account.voicewunder.ai/ in the user area under “Subscription|Manage Subscription….”
After the subscription period ends, your account and all voices and voice designs you created will be deleted.
Your personal data is used exclusively for generating the requested speech synthesis and is never utilized for model training purposes. All data is transmitted securely using SSL encryption.
We offer European data residency and adhere fully to GDPR and CCPA regulations. For your security, our team will never request your password.
Currently up to 74 languages are supported: Afrikaans, Arabic, Armenian, Assamese, Azerbaijani, Belarusian, Bengali, Bosnian, Bulgarian, Catalan, Cebuano, Chichewa, Croatian, Czech, Danish, Dutch, English, Estonian, Filipino, Finnish, French, Galician, Georgian, German, Greek, Gujarati, Hausa, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Irish, Italian, Japanese, Javanese, Kannada, Kazakh, Kirghiz, Korean, Latvian, Lingala, Lithuanian, Luxembourgish, Macedonian, Malay, Malayalam, Mandarin Chinese, Marathi, Nepali, Norwegian, Pashto, Persian, Polish, Portuguese, Punjabi, Romanian, Russian, Serbian,Sindhi, Slovak, Slovenian, Somali, Spanish, Swahili, Swedish, Tamil, Telugu, Thai, Turkish, Ukrainian, Urdu, Vietnamese, Welsh.
If you subscribed before 10.02.2026, your invoices are issued by Lemon Squeezy.
If you subscribed after 10.02.2026, invoices are issued by VoiceWunder GmbH.
All prices are net prices and may be subject to applicable taxes.
Our services are intended exclusively for businesses, freelancers, voice talents and professional users. Private consumers are not eligible to use our services.
Voice creation requires speaker consent.
To give you the best experience, we use technologies like cookies to store and access device information. Consenting allows us to process data like browsing behavior. Without consent, some features may not work properly.