Intro
I have been following Modal for a while but never used it until last week. I recently wrote a blog post on using Axolotl to fine-tune a decoder LLM for the first time. For that work I needed to transcribe some YouTube podcasts with OpenAI Whisper model. I figured it would be a cool use case for Modal and a chance to learn something new.
I’m no stranger to parallel processing with Python. I have used celery extensively with a Redis backend in production for doing all sorts of things, including deployment of ML models. It’s actually given me a good understanding of tasks, queues, concurrency, CPU/MEM usage, Redis, and so on. But I didn’t want to have to deal with all that for this personal project. I didn’t want to think about infrastructure, Kubernetes, helm charts, node groups, ec2 instances, etc. I just wanted to write some Python code and get the job done.
In this post I will describe how I used Modal to accomplish this task. I’m no expert in using Modal because this is the first project I used it on. But I did love every bit of it. Honestly, Modal is amazing. It’s pure magic. I can not wait to use it for my next project.
Setup
First I signed up and logged in via GitHub. Then I simply followed the instructions on screen.
I created a virtual env and installed modal
as well as python-dotenv
and pytube
.
python3 -m venv env
source env/bin/activate
pip install python-dotenv
pip install pytube
pip install modal
python3 -m modal setup
Task Description
The Modal docs are amazing, so I’m not going to repeat that here. Just go read the docs and try some of the simple hello world type of examples.
For the task of transcribing a YouTube video with Whisper I want a Python function that will do the following:
- takes a YouTube video
url
as input –>https://www.youtube.com/watch?v=<video_id>
. - checks if the url i.e.
<video_id>
has already been processed by checking if the transcript json file<video_id>.json
is already in s3. If so, then exit the function. - save the audio as a mp4 file to s3:
s3://<s3-bucket>/youtube_downloader/audio_files/<video_id>.mp4
. - save audio transcript to s3:
s3://<s3-bucket>/youtube_downloader/transcripts/<video_id>.json
.
Then I can use the Modal map
feature to fan out this function/task in parallel.
Show me the Code
There are various ways to use secrets in Modal. One simple way is through dotenv
. I defined some environment variables in a local .env
file.
S3_BUCKET=<my-s3-bucket>
S3_PREFIX=youtube_downloader/
AWS_ACCESS_KEY_ID=<my access key>
AWS_SECRET_ACCESS_KEY=<my secret access key>
Here is the code in a file transcribe_video.py
:
import os
import io
import json
import modal
from dotenv import load_dotenv
load_dotenv()
= os.environ["S3_BUCKET"]
S3_BUCKET = os.environ["S3_PREFIX"]
S3_PREFIX = modal.Stub("video-transcription")
stub
= (
image ="3.10")
modal.Image.debian_slim(python_version"apt-get update", "apt update && apt install ffmpeg -y")
.run_commands(
.pip_install("openai-whisper",
"pytube",
"boto3",
"python-dotenv",
)
)
def upload_file(filename, s3_filename, bucket_name):
import boto3
= boto3.client("s3")
client = {"ACL": "public-read"}
headers "CacheControl"] = "max-age %d" % (3600 * 24 * 365)
headers[=headers)
client.upload_file(filename, bucket_name, s3_filename, ExtraArgsreturn f"https://{bucket_name}.s3.amazonaws.com/{s3_filename}"
def check_file(filename, bucket_name):
import boto3
from botocore.errorfactory import ClientError
= boto3.client("s3")
client = True
file_exists try:
=bucket_name, Key=filename)
client.head_object(Bucketexcept ClientError:
= False
file_exists return file_exists
def upload_fileobj(file_object, s3_filename, bucket_name):
import boto3
= boto3.client("s3")
client = {"ACL": "public-read"}
headers "CacheControl"] = "max-age %d" % (3600 * 24 * 365)
headers[=headers)
client.upload_fileobj(file_object, bucket_name, s3_filename, ExtraArgsreturn f"https://{bucket_name}.s3.amazonaws.com/{s3_filename}"
def dict_to_s3(record, s3_filename, bucket_name):
= io.BytesIO()
in_mem_file =True, indent=4).encode())
in_mem_file.write(json.dumps(record, sort_keys0)
in_mem_file.seek(
upload_fileobj(in_mem_file, s3_filename, bucket_name)return f"https://{bucket_name}.s3.amazonaws.com/{s3_filename}"
@stub.function(
=image,
image=[modal.Secret.from_dotenv()],
secrets=2,
cpu=1024 * 3,
memory="A10G",
gpu=600,
timeout
)def process_video(url):
import re
from pytube import YouTube
import whisper
= re.search(r"v=([a-zA-Z0-9_-]{11})", url).group(1)
video_id = f"{video_id}.mp4"
file_name_audio = os.path.join(S3_PREFIX, "audio_files", file_name_audio)
s3_file_name_audio = os.path.join(S3_PREFIX, "transcripts", f"{video_id}.json")
s3_file_name_transcript if check_file(s3_file_name_transcript, S3_BUCKET):
print(f"Already processed {s3_file_name_audio}. Skipping.")
return
= YouTube(url)
yt = yt.streams.get_audio_only()
audio_stream =file_name_audio)
audio_stream.download(filename
upload_file(file_name_audio, s3_file_name_audio, S3_BUCKET)
# transcribe video
= whisper.load_model("small", device="cuda")
model = whisper.load_audio(file_name_audio)
audio # audio = whisper.pad_or_trim(audio) # useful for debugging
= model.transcribe(audio, fp16=True)
result # for debugging in ipython shell
# modal.interact()
# import IPython
# IPython.embed()
return dict_to_s3(result, s3_file_name_transcript, S3_BUCKET)
@stub.local_entrypoint()
def main():
from pytube import Playlist
import random
= set()
all_urls for playlist_url in [
"https://www.youtube.com/playlist?list=PL8xK8kBHHUX4NW8GqUsyFhBF_xCnzIdPe",
"https://www.youtube.com/playlist?list=PL8xK8kBHHUX7VsJPqv6OYp71Qj24zcTIr",
"https://www.youtube.com/playlist?list=PL8xK8kBHHUX5X-jGZlltoZOpv5sKXeGVV",
]:= Playlist(playlist_url)
p for url in p.video_urls:
all_urls.add(url)
= random.sample(list(all_urls), 20)
urls for msg in process_video.map(urls):
print(msg)
You can execute it with Modal using modal run transcribe_video.py
.
I think there is more optimizations that could be done such as caching the model downloading but I have not looked into it.
Some comments on the code:
- define your own custom container
image
and install whatever you like on it. - for the task function
process_video
we simply decorate it with@stub.function
and we can specify things such as theimage
, mem/cpu resources, secrets etc. In this case we can even run it on an A10 GPU for faster inference with Whisper. I increased the timeout because I was transcribing longer videos. - I am using
from pytube import Playlist
to list out the video urls in a given YouTube playlist within themain
function. That logic runs on my local machine. You can add whatever logic you want there, different video urls, etc. The main idea is thatprocess_video
takes the videourl
as input. You can increase the sample size inurls = random.sample(list(all_urls), 20)
to whatever you like. This code,modal run transcribe_video.py
, can be run over and over again and any video that was already transcribed and saved to s3 will be skipped automatically. process_video.map(urls):
fans out the tasks and it runs entirely on Modal’s cloud infrastructure. Not locally!
The output is saved to s3.
Jumping into Ipython Shell for Interactive Debugging
You can even jump into the ipython shell
on the server at any point in the function logic to debug. That is pretty cool! Just remove the comments:
# for debugging in ipython shell
modal.interact()import IPython
IPython.embed()
I also removed the comment # audio = whisper.pad_or_trim(audio) # useful for debugging
during debugging so the transcription was just for 30 seconds and hence the code runs much faster.
Then just run the code in interactive mode with modal run -i transcribe_video.py
I did this several times to check the output of result = model.transcribe(audio, fp16=True)
. I also debugged the files being saved locally on the image container. It gave the same feeling as developing locally but it was all running in the cloud! I did not have to get ffmpeg and OpenAI Whisper running locally on a GPU or mess around with cuda drivers. It just worked, surprisingly. Here are some screenshots of the interactive ipython
shell:
Conclusion
I hope this got you excited to go and try out Modal. They actually give you $30 of compute each month. And the dashboard is great. It can show you detailed logs, resource consumption, and the billing is very transparent (updated in real time).