Too Much AI Content to Watch? 3 Smart Ways to Turn YouTube Videos into Text Fast
Let the bots watch the hour-long video - your job is to skim the five-minute summary.
I love listening to AI podcasts, and I also subscribe to several AI YouTube channels. There's so much content, though, that it's impossible to listen to and/or watch it all. Sometimes, instead of watching a video for an hour, I often want to get a quick text-based summary.
The question is: what's the best way to do this? In this article, I will focus on YouTube videos, and in another article, I'll address podcasts. I like Riley Brown's content a lot - he's an absolutely incredible vibe-coder and his content is really great. He had a video released last week titled "Can I Clone a $250M App Better Than a Pro Developer?" (link here:).
Three ways to get transcripts:
OPTION #1:
Many YouTubers have transcripts enabled on their videos. If they do, one of the things you can do is open up the video, click on the button that says "...more" just below the video, then the button that says "Show transcript" at the bottom, and on the right hand side of the video you can now see the full transcript and you can go copy and past it.
To obtain it, you must manually select all the text from the transcript. However, the downside is that it appears in this format when pasted elsewhere, where you might want to process it further.
0:00
Can a Vive coder make a better app than a senior iOS engineer? Well, that's exactly what we're going to try and find
0:06
out today. In this video, I Riley Brown, a Vive Coder, am taking on Vishall Dwey,
0:12
a senior iOS developer who's been building iOS apps for the past 10 years. We're going head-to-head to see who can
0:19
build a better clone of Granola, a $250 million app with multiple AI features.
0:26
We each only have five prompts to build this app and we can only use AI. And in
Really not the greatest format. It works, but it's kind of ugly. You can take that text and dump it into your favorite LLM to do more processing and get a good summary.
OPTION #2:
If you've been following my writing for a while, you know I am a massive fan of Google's AI Studio. It lets you access the full power of the Gemini models and do so much more. If you enter AI Studio and open a new chat using the Gemini 2.5 Pro model, towards the bottom right-hand side of the screen is a small toggle labeled "URL Content". Turn that on, paste the link to the video, and give it some instructions, something like "Create a detailed transcript from this video:"
What's great is that you can see how many tokens the video uses. In this case, this 55-minute video took 987K out of the 1M tokens. Just barely fit into the context window. But when you then ask it to give you a detailed transcript, you get a much better version that looks like this:
Can a VibeCoder make a better app than a senior iOS engineer? Well, that's exactly what we're going to try and find out today. In this video, I, Riley Brown, a VibeCoder, am taking on Vishal Dubey, a senior iOS developer who's been building iOS apps for the past 10 years. We're going head-to-head to see who can build a better clone of Granola, a $250 million app with multiple AI features. We each only have five prompts to build this app, and we can only use AI. And in this video, we're going to see whose app is better. And so Vishal is going to be using Claude Code, and I'm going to be using a tool made specifically for vibe coding mobile apps. And you're going to be the judge of whose app is better in the comments. And we're going to be giving $1,000 of credits to three people who vote in the comments below. And if you like videos like these, make sure to hit that like and subscribe button because if this video does well, we're just going to keep doing them. Let's not waste any more time. Let's dive into the video.
I could have asked for timestamps - but that's not something I needed. The best part is that now that the full transcript is in memory, you can ask follow-up questions, such as "What were the three most important questions to takeaway from this video?" or "What were the best quotes from this video that I should reference in a blog article?"
OPTION 3:
What happens, though, if the video you want to access doesn't have a transcript built in, or it's too big to fit in the 1M token context window from Gemini? Here's the link to a recent Diary of A CEO video I watched (link here:)
) that is an hour and 32 minutes long. If you try to load it, it has over 1.6 million tokens, making it too large for even Gemini. Now this one happens to have a transcript, so you could use option 1, but I often come across videos that don't have transcripts, and I really want the content from them.
So, what do you do? The easiest thing to do is to go get a tool like “4K Video Downloader+.” It allows you to paste in the URL for a YouTube video and download the video file, or just the audio from it. They allow you to try it for free for 10 videos, after which you will need to pay. I forked over the $45 for lifetime unlimited access, as this is a tool I use a lot. If you download the YouTube Video as a video file, you will have to go through an extra step to extract the audio (on the Mac, load the file into the QuickTime Player app and extract the audio). But with 4K Video Download+, you can paste in the URL from the video and tell it to download the audio only (it downloads as a .m4a file).
For that 1.6M token video, the audio file comes in at 177K tokens - definitely small enough to fit in the Gemini context window - in fact, you could have a video that is approximately 5x as long (about 7.5 hours), and if you just extract the audio only, you could still process it. Load that audio file up and say "Generate a detailed transcript from this audio file," and you'll get a fantastic word-for-word transcript. You can then ask any follow-up questions or conduct further analysis on the text as needed.
I hope you found this helpful. I've gone through this process several times in the past few weeks and figured it was worth sharing since not everyone knows how to do this! Once you have the transcripts in Gemini (or any other LLM), you can do all sorts of fantastic analysis.
If you enjoyed this article, please subscribe to my newsletter and share it with your network! Looking for help to really drive the adoption of AI in your organization? Want to use AI to transform your team’s productivity? Reach out to me at: steve@intelligencebyintent.com
Magnus and I are on a weeklong adventure up in Oregon, and he’s loving exploring the parks and the river.
Great article, thanks for the tip on 4k Video Download+.