在 2026 年初的 CES 大展上,Lightricks 推出了震撼 AI 業界的 LTX-2 開源影音模型,正式宣告影片生成進入了「音影同步」的新紀元。不同於以往模型僅能生成無聲畫面,LTX-2 是一款強大的影音底層模型,其最大的特色在於能夠在單次生成過程中(Single Pass),同步產生動作細節、人物對話、背景環境音以及配樂。這種高度的一致性解決了過往影片生成後需另外配音、音畫不同步的痛點,為創作者提供了極大的自由度與控制力。
更令人振奮的是,ComfyUI 也在第一時間達成原生支援,讓使用者能透過節點式工作流精確操控 T2V、I2V 以及多種 ControlNet 模式。針對消費級硬體,LTX-2 也與 NVIDIA 深度合作,優化了 NVFP8 與 NVFP4 等量化格式,顯著降低了顯示記憶體(VRAM)的負擔。本次測試環境採用最新一代 RTX 5070 Ti (16G VRAM) 搭配 64G RAM,在執行這款高達 19B 參數的模型時,雖然模型總重(包含 Gemma 3 12B 文字編碼器)遠超顯存容量,但透過 ComfyUI 優異的權重串流技術,依然能穩定產出高品質的 720p 影音作品。實測顯示,透過提示詞產出一段 5 秒的影音影片約需 4 至 5 分鐘,雖然耗費近 50GB 的系統記憶體,但換來的是強大的運鏡與充滿空間感的立體音效。

▲ 在開始體驗 LTX-2 之前,我們首先需要準備好對應的工作流,並且確保 ComfyUI 已是最新版本。透過 ComfyUI 選單中的「File → Open」功能,載入官方提供的 JSON 工作流檔案(直接下載點 / 來源網址)。

▲ 初次載入 LTX-2 工作流時,若系統偵測到缺少的模型,會跳出提醒視窗。這裡可以看到 LTX-2 所需的模型相當龐大,包含約 25GB 的 FP8 主模型、22GB 的 Gemma 3 文字編碼器,以及 LoRA 和 Upscaler,建議預留充足的硬碟空間。點擊「Download」可以下載模型。

▲ 當所有模型下載好後會自動依照正確路徑(如 checkpoints、text_encoders 等目錄)放置妥當,這時我們可以關閉視窗,開始進入實際的參數設定階段。

▲ 進入主畫面後可以看到 Text to Video (LTX 2.0) 節點。提示詞部分不僅要描述視覺畫面,還能加入聲音細節,如雨聲、歌聲或特定旋律,模型會一次性將其實現。我們先點入節點進入子流程看看吧!

▲ 展開整個 LTX-2 子流程,可以看到其複雜的節點連接結構。模型使用了分段採樣與潛空間放大技術,確保在 16G 顯存的環境下,依然能分階段處理文字編碼、影音採樣以及最後的 VAE 解碼,發揮 5070 Ti 的計算紅利。

▲ 設定好 Prompt 並檢查各項參數後,點擊「Run」按鈕開始生成。系統會先進行模型加載,由於模型總量超過 50GB,這時系統 RAM 的壓力會顯著提升,建議在生成期間關閉其他不必要的背景程式,以換取最佳的處理效能。

▲ 影片生成完成!在 RTX 5070 Ti 的測試環境下,一段 720p 24fps 的 5 秒影片共耗時約 260.29 秒。生成的結果不僅畫面細膩,運鏡滑順,重點是內嵌的 MP4 檔案中已經包含了同步生成的背景音效與配樂,成果相當令人驚艷。
▲ 測試示範影片(一)欣賞《ᴴᴰ【LTX-2】文生影測試:奇幻森境童話、寫實青春午後》
這部測試示範影片以 1280 x 704 px 24fps 規格生成 5 秒章節影片,一共兩章節合成一部完整短片作品,平均一章節生成時間約為 5 ~ 6 分鐘。
🎬 測試示範影片(一)章節:
一、奇幻森境童話:走入發光雨傘與魔法森林交織的夢幻世界,感受純真童趣。
📝 提示詞:A wide shot of a cute little girl walking through a dreamy, enchanted forest under a soft glowing umbrella. Ethereal light filters through the ancient trees as gentle rain falls, making the surroundings sparkle. She strolls rhythmically along a mossy path, swaying with joy. She looks up and sings in a sweet, clear voice: "It's raining, it's raining!" The camera follows her movement as she skips slightly. The background features glowing flora and soft mist, creating a whimsical and magical atmosphere. High-quality cinematic lighting, 8k, vibrant colors.
二、寫實青春午後:捕捉金黃光影下少女的靜謐時光,挑戰 LTX-2 對皮膚紋理與自然光影的寫實極限。
📝 提示詞:Photorealistic cinematic film, 8k resolution, shot on 35mm lens. A direct frontal medium shot of a real Japanese teenage girl wearing a realistic modern school uniform and high-quality white over-ear headphones. She is sitting on actual lush green grass in a sun-lit public park, surrounded by natural wildflowers and blooming cherry blossom trees. The lighting is authentic golden hour sunlight with natural lens flares and soft shadows. Starting with a relaxed posture, she slowly and naturally raises her head to look toward the distant horizon, a subtle, peaceful smile of genuine enjoyment on her face. Her hair moves naturally in the wind. The camera performs a slow, smooth optical zoom-in on her face. Audio: A beautiful, emotive, and immersive piano track plays clearly, representing the music she is listening to. Background ambient sounds of nature, gentle wind rustling through real leaves, and distant birds chirping blend softly. High dynamic range, sharp focus, realistic skin textures.
▲ 測試示範影片(二)欣賞《ᴴᴰ【LTX-2】文生影測試:動漫女孩的日常晚上》
這部測試示範影片以 1920 x 1088 px 24fps 規格生成 10 秒章節影片,一共四章節合成一部完整短片作品,平均一章節生成時間約為 7 ~ 9 分鐘。
🎬 測試示範影片(二)章節:
一、吃晚餐:在木質調的家中,與家人共度充滿歡笑的晚餐時光。
📝 提示詞:A cozy Japanese anime style video clip set at night. Inside a warmly lit home with extensive wooden architecture and furniture, a family is sharing a dinner meal. A small stature young girl with shoulder-length white hair and bright blue eyes is the focus, smiling as she eats at a wooden table. Through a large window in the background, a sparkling city night skyline is clearly visible. Camera movement: A slow, smooth push-in (zoom in) towards the white-haired girl's joyful face. Audio background: The scene is filled with the cheerful sounds of family chatter, laughter, clinking plates, and a warm ambient atmosphere.
二、洗澡:與黃色小鴨一同在溫暖雲霧中,享受療癒的泡澡。
📝 提示詞:High-quality Japanese anime style, Makoto Shinkai aesthetic, vibrant colors, shimmering atmosphere. A small girl with shoulder-length white hair and bright blue eyes is soaking comfortably in a bathtub filled with thick, fluffy white bubbles. A cute yellow rubber duck floats on the water's surface beside her. The bathroom is filled with a soft, thick mist and warm, gentle light, creating a cozy and relaxing environment. She is playfully scrubbing her shoulder with a bubbly washcloth, a peaceful and joyful smile on her face. The camera performs a slow, smooth cinematic zoom-in towards her face. Audio: The immersive sound of gentle water splashing, the soft rhythmic sound of scrubbing bubbles, a faint squeak of the rubber duck, and the echoing, peaceful ambient atmosphere of a warm, misty bathroom.
三、入睡:伴隨窗外閃爍的城市夜景,鑽進蓬鬆溫暖的被窩。
📝 提示詞:High-quality Japanese anime style, Makoto Shinkai aesthetic, vibrant colors, shimmering atmosphere. A small girl with shoulder-length white hair and bright blue eyes is in a cozy, dimly lit bedroom at night. She walks gracefully towards a soft bed with fluffy pillows. She sits on the edge of the bed for a moment, then tucks herself comfortably under the thick blankets, pulling them up to her chin. Through a large window in the background, a breathtaking and sparkling city night skyline is visible. The camera follows her movement with a smooth, cinematic tracking shot. Audio: A peaceful, soothing, and relaxing sleep-inducing light music track plays softly in the background, creating a calm and sleepy ambient atmosphere.
四、深夜城市大景:鏡頭從溫暖的小屋拉遠,展現整座城市的宏偉與繁星。
📝 提示詞:High-quality Japanese anime style, Makoto Shinkai aesthetic, cinematic grand scale. The scene begins with a close-up of a single house window glowing with warm, golden light against the dark night. The camera performs a smooth, continuous, and expansive zoom-out (pull back) shot, revealing the quiet neighborhood, then the sprawling city landscape. The small house becomes a tiny speck of light among a vast, breathtaking city night skyline filled with millions of sparkling lights. Above, a deep indigo night sky is adorned with shimmering stars and soft clouds. The atmosphere is serene and majestic. Audio: A peaceful and quiet night ambient soundscape, featuring the faint, distant hum of the city, the soft whisper of a night breeze, and a profound sense of nocturnal silence.
《上一篇》Gemini x Nano Banana Pro:繪製全景式角色深度概念分解圖
《下一篇》ComfyUI:解決 RTX 50 系列 Blackwell 架構 CUDA Kernel 錯誤 









留言區 / Comments
萌芽論壇