ComfyUI x Wan 2.1:快速本地端實現高品質 AI 文生影!

2025/03/10 445 3 軟體應用 , 人工智慧 , AI影片

隨著 AI 技術的飛速發展,利用生成式模型創作高品質圖像與影片已不再是夢想。今天,我們將介紹如何透過 ComfyUI 結合 Wan 2.1 模型,在本地端輕鬆實現 AI 文生影(Text-to-Video)的驚艷效果。不僅如此,這套工具完全可在您的本機部署與運行,無需依賴雲端服務,創作自由度無限,生成影片的可能性也毫無限制!無論您是創作者、設計師還是 AI 愛好者,這篇文章都將帶您快速入門,並以筆者的設備(搭載 RTX 3060 顯示卡,VRAM 12GB)為例,分享實用的前置準備與建議。

什麼是 ComfyUI?

ComfyUI 是一款開源的圖形化介面工具,專為生成式 AI 模型設計。它以「節點式工作流」(Node-based Workflow)為核心,讓使用者可以直觀地拖曳、連接不同的功能模組,輕鬆構建從文生圖(Text-to-Image)、圖生圖(Image-to-Image)到文生影(Text-to-Video)的生成流程。相較於傳統的命令列操作,ComfyUI 的優勢在於其友善的視覺化設計,不僅適合新手快速上手,也能滿足進階用戶的客製化需求。

更重要的是,ComfyUI 支援本地部署,您可以直接在自己的電腦上運行所有運算,無需上傳資料到雲端,保障隱私的同時也免除了網路限制。這意味著,只要您的硬體足夠強大,創作的可能性完全取決於您的想像力!對於像我這樣使用 RTX 3060(VRAM 12GB)的使用者來說,ComfyUI 提供了極佳的彈性,讓我們能在有限資源下實現高品質生成。

ComfyUI 的安裝教學可以參考先前我撰寫的文章:ComfyUI 桌面版:Windows 環境輕鬆體驗本機端 AI 繪圖

什麼是 Wan 2.1 模型?

Wan 2.1 是由阿里巴巴實驗室開發的開放原始碼 AI 影片生成模型,於 2025 年 2 月發布,性能超越當前最先進的開源模型及 OpenAI 的閉源模型 Sora,運行速度提升 2.5 倍。該模型支援文本到影片(T2V)和圖像到影片(I2V)生成,特色在於能在影片中生成中文和英文文本,應用範圍涵蓋廣告、教育及娛樂。其輕量級 1.3B 版本適合消費級 GPU,例如在 NVIDIA RTX 4090 上僅需 8.19 GB VRAM,就能在 4 分鐘內生成 5 秒 480p 影片;14B 版本則支援更高達 720p 的解析度。Wan 2.1 採用擴散變壓器架構與流匹配技術,並根據 Apache 2.0 許可證發布,促進社群參與及創新。與商業方案相比,其低成本與高效率挑戰市場格局,特別適合中小型創作者及企業快速原型設計,成為 2025 年備受矚目的影片生成工具。

對於像我這樣使用 NVIDIA RTX 3060(VRAM 12GB)的使用者來說,選擇合適的模型版本至關重要。在眾多選項中,wan2.1_t2v_1.3B_fp16.safetensors 是最推薦的文生影模型。這個模型檔案大小約 2.84GB,運行時的 VRAM 需求約 8 ~ 9 GB,不僅能完美適配 12GB VRAM 的限制,還能保持最佳品質(品質排序:fp16 > bf16 > fp8_scaled > fp8_e4m3fn)。相比之下,14B 版本(如 wan2.1_t2v_14B_fp16.safetensors,檔案大小約 28.6GB,運行需求超過 32GB)對消費級 GPU 來說負擔過重,因此不建議選用。

值得一提的是,Wan 2.1 不僅限於文生影,若您有更強大的設備,還能使用如 wan2.1_i2v_480p_14B_fp16.safetensors 這樣的圖生影模型,將靜態圖像轉為動態影片。不過,這類模型對硬體要求極高,12GB VRAM 無法勝任,因此本文將先以文生圖作為示範起點。

前置準備:打造您的本地 AI 創作環境

在開始使用 ComfyUI 與 Wan 2.1 之前,您需要完成以下準備工作:

  • 硬體確認
    筆者的設備為 RTX 3060,擁有 12GB VRAM,這是運行 Wan 2.1 的基礎條件。只要您的顯示卡支援 CUDA,並有足夠的 VRAM,就能輕鬆在本機部署。
  • 安裝 ComfyUI
    下載桌面版安裝檔是最佳的方案,請參考我的教學文章操作安裝。
  • 下載必要檔案
    要順利生成影片,您需要事先下載以下模型檔案:

為什麼選擇本地部署?

選擇在本地運行 ComfyUI 與 Wan 2.1 的最大優勢,在於無限創作自由。您不需要擔心雲端服務的流量限制、訂閱費用或隱私問題,所有運算都在您的電腦上完成。只要硬體許可,您可以無限生成影片,隨心所欲地調整參數,探索 AI 的無窮可能。對於像我這樣的 RTX 3060 使用者來說,雖然無法運行最大的 14B 模型,但 1.3B fp16 版本已足以生成令人驚艷的成果!

接下來,我們將進入詳細的圖文教學,帶您一步步完成從文字到影片的生成流程,並為未來的文生影創作奠定基礎。請準備好您的 ComfyUI,讓我們一起開啟 AI 創作之旅吧!


▲ 從 Hugging Face 下載三個檔案:umt5_xxl_fp8_e4m3fn_scaled.safetensors(Text Encoder)、wan2.1_vae.safetensors(VAE)、wan2.1_t2v_1.3B_fp16.safetensors(模型),分別放入 ComfyUI 的 text_encodersvaediffusion_models 資料夾。

備註:ComfyUI 的預設根目錄為 C:\Users\[使用者名稱]\Documents\ComfyUI\


點我前往下載 ComfyUI 範例工作流,進入後可以點滑鼠右鍵另存至電腦為 json 格式。我在下方也將工作流備份了一份,可直接取用:

📝 text_to_video_wan.min.json

{"last_node_id":48,"last_link_id":95,"nodes":[{"id":8,"type":"VAEDecode","pos":[1210,190],"size":[210,46],"flags":{},"order":8,"mode":0,"inputs":[{"name":"samples","type":"LATENT","link":35},{"name":"vae","type":"VAE","link":76}],"outputs":[{"name":"IMAGE","type":"IMAGE","links":[56,93],"slot_index":0}],"properties":{"Node name for S&R":"VAEDecode"},"widgets_values":[]},{"id":39,"type":"VAELoader","pos":[866.3932495117188,499.18597412109375],"size":[306.36004638671875,58],"flags":{},"order":0,"mode":0,"inputs":[],"outputs":[{"name":"VAE","type":"VAE","links":[76],"slot_index":0}],"properties":{"Node name for S&R":"VAELoader"},"widgets_values":["wan_2.1_vae.safetensors"]},{"id":28,"type":"SaveAnimatedWEBP","pos":[1460,190],"size":[870.8511352539062,643.7430419921875],"flags":{},"order":9,"mode":0,"inputs":[{"name":"images","type":"IMAGE","link":56}],"outputs":[],"properties":{},"widgets_values":["ComfyUI",16,false,90,"default",""]},{"id":7,"type":"CLIPTextEncode","pos":[413,389],"size":[425.27801513671875,180.6060791015625],"flags":{},"order":5,"mode":0,"inputs":[{"name":"clip","type":"CLIP","link":75}],"outputs":[{"name":"CONDITIONING","type":"CONDITIONING","links":[52],"slot_index":0}],"title":"CLIP Text Encode (Negative Prompt)","properties":{"Node name for S&R":"CLIPTextEncode"},"widgets_values":["色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走"],"color":"#322","bgcolor":"#533"},{"id":38,"type":"CLIPLoader","pos":[12.94982624053955,184.6981658935547],"size":[390,82],"flags":{},"order":1,"mode":0,"inputs":[],"outputs":[{"name":"CLIP","type":"CLIP","links":[74,75],"slot_index":0}],"properties":{"Node name for S&R":"CLIPLoader"},"widgets_values":["umt5_xxl_fp8_e4m3fn_scaled.safetensors","wan","default"]},{"id":40,"type":"EmptyHunyuanLatentVideo","pos":[520,620],"size":[315,130],"flags":{},"order":2,"mode":0,"inputs":[],"outputs":[{"name":"LATENT","type":"LATENT","links":[91],"slot_index":0}],"properties":{"Node name for S&R":"EmptyHunyuanLatentVideo"},"widgets_values":[832,480,33,1]},{"id":47,"type":"SaveWEBM","pos":[2367.213134765625,193.6114959716797],"size":[315,130],"flags":{},"order":10,"mode":4,"inputs":[{"name":"images","type":"IMAGE","link":93}],"outputs":[],"properties":{"Node name for S&R":"SaveWEBM"},"widgets_values":["ComfyUI","vp9",24,32]},{"id":3,"type":"KSampler","pos":[863,187],"size":[315,262],"flags":{},"order":7,"mode":0,"inputs":[{"name":"model","type":"MODEL","link":95},{"name":"positive","type":"CONDITIONING","link":46},{"name":"negative","type":"CONDITIONING","link":52},{"name":"latent_image","type":"LATENT","link":91}],"outputs":[{"name":"LATENT","type":"LATENT","links":[35],"slot_index":0}],"properties":{"Node name for S&R":"KSampler"},"widgets_values":[82628696717253,"randomize",30,6,"uni_pc","simple",1]},{"id":48,"type":"ModelSamplingSD3","pos":[440,50],"size":[210,58],"flags":{},"order":6,"mode":0,"inputs":[{"name":"model","type":"MODEL","link":94}],"outputs":[{"name":"MODEL","type":"MODEL","links":[95],"slot_index":0}],"properties":{"Node name for S&R":"ModelSamplingSD3"},"widgets_values":[8]},{"id":37,"type":"UNETLoader","pos":[20,40],"size":[346.7470703125,82],"flags":{},"order":3,"mode":0,"inputs":[],"outputs":[{"name":"MODEL","type":"MODEL","links":[94],"slot_index":0}],"properties":{"Node name for S&R":"UNETLoader"},"widgets_values":["wan2.1_t2v_1.3B_fp16.safetensors","default"]},{"id":6,"type":"CLIPTextEncode","pos":[415,186],"size":[422.84503173828125,164.31304931640625],"flags":{},"order":4,"mode":0,"inputs":[{"name":"clip","type":"CLIP","link":74}],"outputs":[{"name":"CONDITIONING","type":"CONDITIONING","links":[46],"slot_index":0}],"title":"CLIP Text Encode (Positive Prompt)","properties":{"Node name for S&R":"CLIPTextEncode"},"widgets_values":["a fox moving quickly in a beautiful winter scenery nature trees mountains daytime tracking camera"],"color":"#232","bgcolor":"#353"}],"links":[[35,3,0,8,0,"LATENT"],[46,6,0,3,1,"CONDITIONING"],[52,7,0,3,2,"CONDITIONING"],[56,8,0,28,0,"IMAGE"],[74,38,0,6,0,"CLIP"],[75,38,0,7,0,"CLIP"],[76,39,0,8,1,"VAE"],[91,40,0,3,3,"LATENT"],[93,8,0,47,0,"IMAGE"],[94,37,0,48,0,"MODEL"],[95,48,0,3,0,"MODEL"]],"groups":[],"config":{},"extra":{"ds":{"scale":1.1167815779425205,"offset":[-5.675057867608515,8.013751263058214]}},"version":0.4}


▲ 左上方「Workflow」→「Open」開啟剛剛下載的範例工作流。

▲ 紅框處請特別注意三個模型都已經載入,接著撰寫正面提示詞,負面提示詞保持預設即可,另外 AI 參數(如:種子、步數、CFG、取樣器等)及輸出影片參數(如:寬度、高度、長度等)皆可依照個人喜好調整。

▲ 最後點擊「Queue」執行工作流開始生成影片,依照設定參數及硬體能力,可能需要等候數分鐘至數小時都有可能,預設參數以我的設備來說大概跑了五分鐘。生成完成後可以在工作流內看到成果預覽,檔案會儲存在 ComfyUI\output 路徑下,格式為 .webp。

示範作品(一):一位日本動漫女孩穿著水手服咬著吐司在路上奔馳趕上學

影片寬高:832 x 480 px / 影格數量:33 格 / 生成時間:約 5 分鐘 / 直出原始 WEBP 格式

📝 提示詞:A Japanese anime-style schoolgirl with short black hair and big expressive eyes is running hurriedly on a bustling city street in the morning. She is wearing a traditional blue and white sailor school uniform with a red ribbon and holding a school bag in one hand. A piece of toast is clamped between her teeth as she dashes forward, looking determined and slightly panicked. The urban background features cherry blossom trees, sunlight filtering through buildings, and other students in the distance heading to school. The scene has a dynamic motion blur effect, emphasizing her rushing movement.

示範作品(二):貓貓是壽司店大師傅

影片寬高:480 x 848 px / 影格數量:93 格 / 生成時間:約 25 分鐘 / 轉檔 MP4 格式後每秒 16 格,片長近 6 秒

📝 提示詞:A chubby calico cat is a sushi master in a traditional Japanese sushi restaurant. The cat is wearing a white chef’s hat and a kimono-style uniform, expertly preparing sushi on a wooden counter. Behind the cat, shelves filled with fresh ingredients and Japanese décor create an authentic atmosphere.

示範作品(三):貓穿著運動服在陽光明媚的公園慢跑

影片寬高:480 x 848 px / 影格數量:93 格 / 生成時間:約 25 分鐘 / 轉檔 MP4 格式後每秒 16 格,片長近 6 秒

📝 提示詞:Anthropomorphic cat in sportswear jogging through a sunny park, realistic running motion with wind in fur

示範作品(四):貓戴著頭盔在陡坡上順暢地滑滑板

影片寬高:480 x 848 px / 影格數量:93 格 / 生成時間:約 25 分鐘 / 轉檔 MP4 格式後每秒 16 格,片長近 6 秒

📝 提示詞:Personified cat in a helmet skateboarding smoothly down a steep ramp, balanced stance with tail swaying

示範作品(五):貓咪農夫在花園中幫花朵澆水

影片寬高:480 x 848 px / 影格數量:93 格 / 生成時間:約 25 分鐘 / 轉檔 MP4 格式後每秒 16 格,片長近 6 秒

📝 提示詞:A farmer cat watering flowers in a lush garden, wearing a straw hat and overalls, using a small watering can.

示範作品(六):貓咪登山客走在步道上

影片寬高:480 x 848 px / 影格數量:93 格 / 生成時間:約 25 分鐘 / 轉檔 MP4 格式後每秒 16 格,片長近 6 秒

📝 提示詞:A hiker cat trekking on a mountain trail, wearing a backpack and holding trekking poles, surrounded by scenic nature.

示範作品(七):冰塊在玻璃表面上緩慢滑動

影片寬高:480 x 848 px / 影格數量:93 格 / 生成時間:約 25 分鐘 / 轉檔 MP4 格式後每秒 16 格,片長近 6 秒

📝 提示詞:Ice cubes sliding across a glassy surface in slow motion, leaving faint trails of mist

👉 更多 AI 動畫作品:https://mnya.tw/2d/word/category/works/ai-animation

贊助廣告 ‧ Sponsor advertisements

留言區 / Comments

萌芽論壇