Mandarin Text-to-Speech API Documentation

1. API Overview

Bring Your Mandarin Text to Life with Advanced AI Voice Synthesis.

Our Mandarin Text-to-Speech (TTS) API provides a powerful and easy-to-use solution for converting written Mandarin content into high-quality, lifelike audio streams. Leveraging the cutting-edge AI technology of Amazon Polly, this API ensures exceptional voice clarity, natural intonation, and reliable performance for all your voice generation needs.

You can choose from flexible audio output formats including MP3, Ogg Vorbis, and raw PCM. Support for SSML (Speech Synthesis Markup Language) provides fine-grained control over speech characteristics, and you can process text inputs efficiently, supporting up to 1,000 characters per single request.

This API is ideal for developers building voice assistants, e-learning platforms, IVR systems, audio content creation, accessibility tools, and any application requiring dynamic, high-quality Mandarin speech output.

2. Authentication

This API uses RapidAPI Key authentication. To access the API, you must include your X-RapidAPI-Key in the request headers. Your RapidAPI Key can be found in the header of the code snippets provided on the RapidAPI platform.

3. Endpoint

3.1. Text-to-Speech (TTS) Conversion

Converts provided Mandarin text (or SSML) into an audio stream.

Request Body Parameters:

The request body should be a JSON object with the following parameters:

Parameter Type Required Description Valid Values Default
text string Yes The text content to be synthesized into speech. If textType is specified as ssml, the input must conform to the Speech Synthesis Markup Language (SSML) format. Note: When sending SSML content, ensure the entire SSML string is on a single line, with all newline characters removed, within the JSON text parameter to prevent parsing errors. Any valid string (Max 1000 characters per request) N/A
outputFormat string Yes The speech audio format. mp3, ogg_vorbis, pcm N/A
sampleRate string No The output audio sample rate in Hz. For mp3 and ogg_vorbis formats, supported values are 8000, 16000, 22050, and 24000, with 22050 as the default if omitted. For pcm output, valid options include 8000 and 16000, with 16000 as the default. 8000, 16000, 22050, 24000 (Note: All values must be sent as strings, e.g., "8000") See Desc
textType string No Specifies the format of the input text field: either plain text or Speech Synthesis Markup Language (SSML). ssml, text text

Example Request Body (JSON):

{
  "outputFormat": "mp3",
  "text": "我可以阅读你发给我的任何普通话文本。"
}

Success Response:

Example Error Responses:

Status Code Description Example Error Body (JSON)
400 Bad Request Indicates invalid or missing input parameters in the request body. { "message": "For pcm, sampleRate must be one of [8000, 16000]" }
A 502 Bad Gateway Indicates an Amazon Polly Server Error. { "message": "The Amazon Polly response did not contain a valid audio stream." }

4. Voice

This API uses Amazon Polly's Zhiyu voice, a high-quality Mandarin female voice for all synthesis requests.

5. SSML (Speech Synthesis Markup Language) Guide

Our API supports Speech Synthesis Markup Language (SSML) to give you enhanced control over how your text is spoken. SSML allows you to specify details such as pauses, pronunciation, volume, and speaking rate, resulting in more expressive and natural-sounding audio.

For a comprehensive guide on SSML syntax and best practices, we recommend referring to the official W3C SSML Specification or Amazon Polly's SSML documentation, as our API adheres to these standards.

6. Getting Started & Quick Start

To start using the Mandarin Text-to-Speech API:

  1. Subscribe to the API on RapidAPI.
  2. Select your preferred programming language on the endpoint page.
  3. Copy the generated code snippet.
  4. Paste the code into your application and begin converting Mandarin text to speech!