Downloading large files from the internet can be time-consuming and error-prone. One efficient technique is to download the file in smaller parts (chunks) and merge them after completion. In this guide, we’ll show you how to automate and accelerate chunk downloads using curl with parallel threads in Python.
Why Parallel Chunk Downloads?
- Faster downloads using multiple threads
- More stable over poor connections
- Improved control over large files
Requirements
- Python 3.x
curlinstalled on your system- A server that supports HTTP
Rangerequests
Python Script for Parallel Download
Save the following code as parallel_chunk_download.py:
import os
import math
import threading
import subprocess
import requests
def get_file_size(url):
response = requests.head(url, allow_redirects=True)
if 'Content-Length' in response.headers:
return int(response.headers['Content-Length'])
else:
raise Exception("Cannot determine file size. Server does not return 'Content-Length'.")
def download_chunk(url, start, end, part_num):
filename = f"part{part_num:03d}.chunk"
cmd = ["curl", "-s", "-r", f"{start}-{end}", "-o", filename, url]
subprocess.run(cmd, check=True)
def merge_chunks(total_parts, output_file):
with open(output_file, "wb") as out:
for i in range(total_parts):
part = f"part{i:03d}.chunk"
with open(part, "rb") as pf:
out.write(pf.read())
os.remove(part)
def main():
url = input("Enter file URL: ").strip()
output_file = input("Enter output filename: ").strip()
chunk_size = 100 * 1024 * 1024 # 100 MB
total_size = get_file_size(url)
total_parts = math.ceil(total_size / chunk_size)
print(f"Total size: {total_size} bytes")
print(f"Starting parallel download in {total_parts} chunks...")
threads = []
for i in range(total_parts):
start = i * chunk_size
end = min(start + chunk_size - 1, total_size - 1)
t = threading.Thread(target=download_chunk, args=(url, start, end, i))
t.start()
threads.append(t)
for t in threads:
t.join()
print("Merging chunks...")
merge_chunks(total_parts, output_file)
print(f"Download complete: {output_file}")
if __name__ == "__main__":
main()
How It Works
- The script uses
requeststo find the total file size - Divides the file into 100MB chunks
- Spawns a thread for each chunk, each using
curlwith a specific byte range - Merges all parts after download
Tips
- Adjust
chunk_sizefor optimal performance - To go beyond I/O bottlenecks, use
multiprocessinginstead ofthreading - For unstable connections, ensure partial downloads are re-attempted
Conclusion
Using Python and curl together allows you to automate and optimize file downloads, especially when working with large files. Parallel chunk downloading is an efficient and scriptable way to speed up your workflow.
No comments:
Post a Comment