AigenwayDocumentation

Kling AI / Kling V3 / Lip sync

kling_ai/kling-v3/lip_sync
Async

Make a person in a video speak — either text (Kling TTS voices) or a supplied audio clip. The mouth is re-animated to match the speech while the rest of the video is preserved.

Parameters

NameTypeRequiredDescriptionAllowed valuesBundle dim.
modestring
yes
text2video: synthesize speech from `text` with `voice_id`. audio2video: lip-sync to `audio_url`.text2video, audio2video
textstringnoSpoken text (text2video only). Max 120 characters.
video_idstringnoID of a Kling-generated video (5s/10s, within the last 30 days) to lip-sync. Alternative to `input_video`.
voice_idstringnoTTS voice identifier (text2video only). E.g. oversea_male1, uk_boy1, uk_man2, genshin_vindi2.
audio_urlstringnoAudio clip URL for audio2video (.mp3/.wav/.m4a/.aac, ≤5MB, 2–60s).
input_videostringnoSource video URL (.mp4/.mov, 2–10s, 720p/1080p, ≤100MB). Provide this OR `video_id`.
voice_speednumbernoSpeech rate for text2video, 0.8–2.0.
callback_urlstringnoWebhook URL invoked when async task completes.
voice_languagestringnoVoice language for text2video (default en).zh, en

Example request

{
  "provider": "kling_ai",
  "model": "kling-v3",
  "method": "lip_sync",
  "params": {
    "mode": "text2video",
    "text": "Hello! Welcome to our product demo.",
    "voice_id": "oversea_male1",
    "input_video": "https://example.com/talking-head.mp4",
    "voice_speed": 1,
    "voice_language": "en"
  }
}

Example response

{
  "status": "queued",
  "task_id": "tsk_01H..."
}
Pricing: see your dashboard (auth required).