Kling AI / Kling V3 / Lip sync

kling_ai/kling-v3/lip_sync

Async

Make a person in a video speak — either text (Kling TTS voices) or a supplied audio clip. The mouth is re-animated to match the speech while the rest of the video is preserved.

Parameters

Name	Type	Required	Description	Allowed values	Bundle dim.
mode	string	yes	text2video: synthesize speech from `text` with `voice_id`. audio2video: lip-sync to `audio_url`.	text2video, audio2video	—
text	string	no	Spoken text (text2video only). Max 120 characters.	—	—
video_id	string	no	ID of a Kling-generated video (5s/10s, within the last 30 days) to lip-sync. Alternative to `input_video`.	—	—
voice_id	string	no	TTS voice identifier (text2video only). E.g. oversea_male1, uk_boy1, uk_man2, genshin_vindi2.	—	—
audio_url	string	no	Audio clip URL for audio2video (.mp3/.wav/.m4a/.aac, ≤5MB, 2–60s).	—	—
input_video	string	no	Source video URL (.mp4/.mov, 2–10s, 720p/1080p, ≤100MB). Provide this OR `video_id`.	—	—
voice_speed	number	no	Speech rate for text2video, 0.8–2.0.	—	—
callback_url	string	no	Webhook URL invoked when async task completes.	—	—
voice_language	string	no	Voice language for text2video (default en).	zh, en	—

Example request

{
  "provider": "kling_ai",
  "model": "kling-v3",
  "method": "lip_sync",
  "params": {
    "mode": "text2video",
    "text": "Hello! Welcome to our product demo.",
    "voice_id": "oversea_male1",
    "input_video": "https://example.com/talking-head.mp4",
    "voice_speed": 1,
    "voice_language": "en"
  }
}

Example response

{
  "status": "queued",
  "task_id": "tsk_01H..."
}

Pricing: see your dashboard (auth required).