Python:MT 檔案取出所有影像到本地

2023/04/19 248 0 程式設計 , Python

為了快速將部落格匯出的純文字 MT 檔中每張照片都下載回本地備份或後續運用,特別與 ChatGPT 合作寫出了這段 Python 腳本。隨意窩 8 月底要關站了,這個小程式應該可以幫到不少人忙,如果有幫助到你,歡迎透過右邊按鈕進入贊助,謝謝。

GitHub:https://github.com/qwe987299/MTfileGetImagesToLocal

使用方式

1. 首先將要處理的純文字 MT 檔改名 input.txt 並與 run.py 放在一起。
2. 雙擊 run.bat 批次檔開始運作。
3. 終端機看到「ALL DONE!!!」代表完成。
4. images 子目錄存放所有下載回來的圖檔,output.txt 則是修改完網址的新純文字 MT 檔。

完整程式碼

import os
import re
import requests


def download_image(url, output_dir):
    num_retries = 2
    for i in range(num_retries):
        try:
            response = requests.get(url)
            response.raise_for_status()
            image_name = os.path.basename(url)
            with open(os.path.join(output_dir, image_name), "wb") as f:
                f.write(response.content)
                print(f"Downloaded {image_name} to {output_dir}")
            break
        except (requests.exceptions.RequestException, IOError) as e:
            print(f"Failed to download {url} (attempt {i+1}/{num_retries})")
            if i == num_retries - 1:
                print(f"Gave up downloading {url}: {str(e)}")


def main(input_file):
    # Create output directory
    output_dir = "images"
    if not os.path.exists(output_dir):
        os.makedirs(output_dir)

    # Open input file
    with open(input_file, "r", encoding='utf-8') as f:
        lines = f.readlines()

    # Find all image URLs and download them
    img_urls = []
    for line in lines:
        matches = re.findall(r'<img\s.*?src="(.*?)".*?/>', line)
        for match in matches:
            img_urls.append(match)

    # Remove duplicates from the image URL list
    img_urls = list(set(img_urls))

    # Download all images
    for img_url in img_urls:
        download_image(img_url, output_dir)

    # Replace image URLs in input file
    output_lines = []
    for line in lines:
        replaced_urls = []  # List to store replaced URLs for the current line
        output_line = line  # Create a copy of the original line to modify
        for img_url in img_urls:
            if img_url not in replaced_urls:  # Check if URL already replaced
                output_line = output_line.replace(
                    img_url, f"{output_dir}/{os.path.basename(img_url)}")
                replaced_urls.append(img_url)  # Add URL to replaced list
        output_lines.append(output_line)

        # Clear replaced URLs list for the next line
        replaced_urls = []

    with open(f"output.txt", "w", encoding='utf-8') as f:
        for line in output_lines:
            f.write(line)


if __name__ == "__main__":
    input_file = "input.txt"
    main(input_file)
    print(f"ALL DONE!!!")


▲ 要匯入處理的純文字 MT 檔必須先改名 input.txt,可以看到裡面很多遠端影像網址,稍後程式會自動判斷並下載。執行 run.bat 批次檔開始運作。

▲ run.bat 批次檔開始運作了!會有下載成功與否的資訊在終端機上。

▲ 終端機看到「ALL DONE!!!」代表完成。

▲ 專案中的 images 子目錄存放所有下載回來的圖檔,output.txt 則是修改完網址的新純文字 MT 檔。

贊助廣告 ‧ Sponsor advertisements

留言區 / Comments

萌芽論壇