Mandarin Text-to-Speech API Documentation

1. API Overview

Bring Your Mandarin Text to Life with Advanced AI Voice Synthesis.

Our Mandarin Text-to-Speech (TTS) API provides a powerful and easy-to-use solution for converting written Mandarin content into high-quality, lifelike audio streams. Leveraging the cutting-edge AI technology of Amazon Polly, this API ensures exceptional voice clarity, natural intonation, and reliable performance for all your voice generation needs.

You can choose from flexible audio output formats including MP3, Ogg Vorbis, and raw PCM. Support for SSML (Speech Synthesis Markup Language) provides fine-grained control over speech characteristics, and you can process text inputs efficiently, supporting up to 1,000 characters per single request.

This API is ideal for developers building voice assistants, e-learning platforms, IVR systems, audio content creation, accessibility tools, and any application requiring dynamic, high-quality Mandarin speech output.

2. Authentication

This API uses RapidAPI Key authentication. To access the API, you must include your X-RapidAPI-Key in the request headers. Your RapidAPI Key can be found in the header of the code snippets provided on the RapidAPI platform.

3. Endpoint

3.1. Text-to-Speech (TTS) Conversion

Converts provided Mandarin text (or SSML) into an audio stream.

Method: POST
URL: https://mandarin-text-to-speech.p.rapidapi.com/

Request Body Parameters:

The request body should be a JSON object with the following parameters:

Parameter	Type	Required	Description	Valid Values	Default
`text`	`string`	Yes	The text content to be synthesized into speech. If `textType` is specified as `ssml`, the input must conform to the Speech Synthesis Markup Language (SSML) format. Note: When sending SSML content, ensure the entire SSML string is on a single line, with all newline characters removed, within the JSON `text` parameter to prevent parsing errors.	Any valid string (Max 1000 characters per request)	N/A
`outputFormat`	`string`	Yes	The speech audio format.	`mp3`, `ogg_vorbis`, `pcm`	N/A
`sampleRate`	`string`	No	The output audio sample rate in Hz. For mp3 and ogg_vorbis formats, supported values are 8000, 16000, 22050, and 24000, with 22050 as the default if omitted. For pcm output, valid options include 8000 and 16000, with 16000 as the default.	`8000`, `16000`, `22050`, `24000` (Note: All values must be sent as strings, e.g., "8000")	See Desc
`textType`	`string`	No	Specifies the format of the input text field: either plain text or Speech Synthesis Markup Language (SSML).	`ssml`, `text`	`text`

Example Request Body (JSON):

{
  "outputFormat": "mp3",
  "text": "我可以阅读你发给我的任何普通话文本。"
}

Success Response:

Status Code: 200 OK
Content-Type: audio/mpeg (for mp3), audio/ogg (for ogg_vorbis), audio/pcm (for pcm)
Description: The generated audio content is returned as a binary stream.

Example Error Responses:

Status Code	Description	Example Error Body (JSON)
`400 Bad Request`	Indicates invalid or missing input parameters in the request body.	`{ "message": "For pcm, sampleRate must be one of [8000, 16000]" }`
`A 502 Bad Gateway`	Indicates an Amazon Polly Server Error.	`{ "message": "The Amazon Polly response did not contain a valid audio stream." }`

4. Voice

This API uses Amazon Polly's Zhiyu voice, a high-quality Mandarin female voice for all synthesis requests.

5. SSML (Speech Synthesis Markup Language) Guide

Our API supports Speech Synthesis Markup Language (SSML) to give you enhanced control over how your text is spoken. SSML allows you to specify details such as pauses, pronunciation, volume, and speaking rate, resulting in more expressive and natural-sounding audio.

For a comprehensive guide on SSML syntax and best practices, we recommend referring to the official W3C SSML Specification or Amazon Polly's SSML documentation, as our API adheres to these standards.

6. Getting Started & Quick Start

To start using the Mandarin Text-to-Speech API:

Subscribe to the API on RapidAPI.
Select your preferred programming language on the endpoint page.
Copy the generated code snippet.
Paste the code into your application and begin converting Mandarin text to speech!