Saturday, 19 July 2025

Download Large Files in Chunks Automatically Using curl and Python

Downloading large files from the internet can be time-consuming and error-prone. One efficient technique is to download the file in smaller parts (chunks) and merge them after completion. In this guide, we’ll show you how to automate and accelerate chunk downloads using curl with parallel threads in Python.

Why Parallel Chunk Downloads?

  • Faster downloads using multiple threads
  • More stable over poor connections
  • Improved control over large files

Requirements

  • Python 3.x
  • curl installed on your system
  • A server that supports HTTP Range requests

Python Script for Parallel Download

Save the following code as parallel_chunk_download.py:

import os
import math
import threading
import subprocess
import requests

def get_file_size(url):
    response = requests.head(url, allow_redirects=True)
    if 'Content-Length' in response.headers:
        return int(response.headers['Content-Length'])
    else:
        raise Exception("Cannot determine file size. Server does not return 'Content-Length'.")

def download_chunk(url, start, end, part_num):
    filename = f"part{part_num:03d}.chunk"
    cmd = ["curl", "-s", "-r", f"{start}-{end}", "-o", filename, url]
    subprocess.run(cmd, check=True)

def merge_chunks(total_parts, output_file):
    with open(output_file, "wb") as out:
        for i in range(total_parts):
            part = f"part{i:03d}.chunk"
            with open(part, "rb") as pf:
                out.write(pf.read())
            os.remove(part)

def main():
    url = input("Enter file URL: ").strip()
    output_file = input("Enter output filename: ").strip()
    chunk_size = 100 * 1024 * 1024  # 100 MB

    total_size = get_file_size(url)
    total_parts = math.ceil(total_size / chunk_size)

    print(f"Total size: {total_size} bytes")
    print(f"Starting parallel download in {total_parts} chunks...")

    threads = []
    for i in range(total_parts):
        start = i * chunk_size
        end = min(start + chunk_size - 1, total_size - 1)
        t = threading.Thread(target=download_chunk, args=(url, start, end, i))
        t.start()
        threads.append(t)

    for t in threads:
        t.join()

    print("Merging chunks...")
    merge_chunks(total_parts, output_file)
    print(f"Download complete: {output_file}")

if __name__ == "__main__":
    main()

How It Works

  1. The script uses requests to find the total file size
  2. Divides the file into 100MB chunks
  3. Spawns a thread for each chunk, each using curl with a specific byte range
  4. Merges all parts after download

Tips

  • Adjust chunk_size for optimal performance
  • To go beyond I/O bottlenecks, use multiprocessing instead of threading
  • For unstable connections, ensure partial downloads are re-attempted

Conclusion

Using Python and curl together allows you to automate and optimize file downloads, especially when working with large files. Parallel chunk downloading is an efficient and scriptable way to speed up your workflow.

No comments:

Post a Comment