Kling AI / Kling V3 / Lip sync
kling_ai/kling-v3/lip_sync
Async
Make a person in a video speak — either text (Kling TTS voices) or a supplied audio clip. The mouth is re-animated to match the speech while the rest of the video is preserved.
Parameters
| Name | Type | Required | Description | Allowed values | Bundle dim. |
|---|---|---|---|---|---|
| mode | string | yes | text2video: synthesize speech from `text` with `voice_id`. audio2video: lip-sync to `audio_url`. | text2video, audio2video | — |
| text | string | no | Spoken text (text2video only). Max 120 characters. | — | — |
| video_id | string | no | ID of a Kling-generated video (5s/10s, within the last 30 days) to lip-sync. Alternative to `input_video`. | — | — |
| voice_id | string | no | TTS voice identifier (text2video only). E.g. oversea_male1, uk_boy1, uk_man2, genshin_vindi2. | — | — |
| audio_url | string | no | Audio clip URL for audio2video (.mp3/.wav/.m4a/.aac, ≤5MB, 2–60s). | — | — |
| input_video | string | no | Source video URL (.mp4/.mov, 2–10s, 720p/1080p, ≤100MB). Provide this OR `video_id`. | — | — |
| voice_speed | number | no | Speech rate for text2video, 0.8–2.0. | — | — |
| callback_url | string | no | Webhook URL invoked when async task completes. | — | — |
| voice_language | string | no | Voice language for text2video (default en). | zh, en | — |
Example request
{
"provider": "kling_ai",
"model": "kling-v3",
"method": "lip_sync",
"params": {
"mode": "text2video",
"text": "Hello! Welcome to our product demo.",
"voice_id": "oversea_male1",
"input_video": "https://example.com/talking-head.mp4",
"voice_speed": 1,
"voice_language": "en"
}
}Example response
{
"status": "queued",
"task_id": "tsk_01H..."
}