12 minutes

Summarise.live

Gorov's project uses AI to summarize YouTube videos, providing key insights and chapters. It's available as a Chrome plugin for content-heavy users. Aiming for high quality, Gorov is open to user feedback and collaboration for further development.
00:00

>> As it's coming, so hello everybody. My name is Gorov and I just want to thank Lucas and a few other people here just give me the opportunity to present here. It's because I was an engineer maybe like 10 years ago. I would say I was a 0.7x engineer. Now I'm a 1.1x engineer thanks to cursor, GitHub co-pilot and so on. So this is like my site project. I'm getting back into the engineering world and it's given me some aha and wow moments.

00:29

So I'm trying to solve a problem of my own. So how many of you have like YouTube videos in your watch later list now? Okay. How many of you have more than 20? That's quite a few. How many of you know that you'll never go and watch it again? Okay, great. So I think there's a boom of amazing videos and podcasts, partly it's therapy for a lot of people who just like start a new video. But there's a lot of gems in it as well. I'm like trying to figure out how I can get to the insights.

01:02

So I started building something thanks to all of the cursors. I want to show you what it is about. So before I begin, I know there was some questions technically. So I actually drew how I'm doing this. So, summarize live is basically like a quality summariser of long videos. It's currently with YouTube. So basically I just take the transcript from YouTube. But with their own APIs, it's not very well done in terms of like speaker labeling or punctuation and so on. But that's okay because LLMs can understand that.

01:34

I try and punctuate if you're looking for the transcript. And I'm trying to use a local model that works in the browser itself. If there is compute, I try and do that. Otherwise, if the user is not looking for punctuation, I try and go after like essentially figuring out the chapters or topics if it's a really long like one plus hour video. And then for every chapter, I try and figure out what the highlights or what the insights are and then you get like maybe a couple of sentences that really tell you what the gist of the conversation is. Today on YouTube, you'll see sections that talks about different topics. That's actually coming from the person who uploaded it.

02:10

But it might not be the best thing. It must just be some questions they ask. But I believe we can do much better in like segmenting or chapterisation to actually get to the insights, right? So that's the process in the back end. Of course, you know, I started with YouTube, but I think there's a lot of opportunity to go after other platforms. But let me just show you a demo. So some principles of how I built this is, you know, reading is faster than listening, so you'll never have time to watch all these videos more and more coming.

02:42

The second is if you are a productivity focused and you're looking at a particular topic that you want to understand, there's actually a lot of content there. Often people come and say, hey, listen to this or take care of this. But you don't have the time, but you want to quickly fix skim and know if it's worth your time. So I'm trying to solve the problem, which is, hey, can you skim and figure out if you want to watch it? And then once you skim, to go deeper. I'm yet to build a deeper bit, deeper bits. But let me show you an example.

03:11

So I actually, as a demo, wanted to take one of Lucas's talks. So I went to his site and do you have a recommendation, Lucas, which stuff you should go after? >> The photogrammetry one. >> Okay, great. So this is quite a long talk, right? So I actually built a Chrome plugin, so that is the summarize live button. And then you can go directly. So what it does now is pull the transcripts from YouTube. And okay, so by the way, all of the UI and the components actually came from cursor.

03:54

So I divide the context into what I think of as key takeaways. So if you just have one or two minutes, this is what you would read. And Lucas is a good person to dictate quality here, because he actually spoke about it, right? So that's the key takeaways. And then in the left, you start seeing that I'm actually chapterizing. If the chapters exist on YouTube, I'm using that. Otherwise, I try and create that myself. Either based on when I know a question is being asked in the transcript, so I just use reggics for that now, but I think I can do better. Or not more than a seven to ten minute window.

04:34

And so that figures out what the topic is, and then I create the context. So this is like a quick key takeaways, right? We're talking about machine learning, impact of your developers and so on. But then if you really want to go deep, you can actually go to a particular chapter. So this could be like, does unity empower artists, right? And so that's a couple of things. So if you don't know any, if it actually have no idea what scriptable render pipelines is. So it turns out that LLMs are pretty good at linking to Wikipedia. So okay, maybe this is not the best example, but you can quickly go and figure out that there is something there.

05:12

So all of this was happening live, I've never done this before. The plugin is basically no login, no bullshit basically. I don't want to know who you are, you can just run it on Chrome by yourself. Of course I limit the number based on your IP, how many times you can run it because I'm paying for opening I costs. But what I've done is essentially said, hey, if you want to run it forever, just pay me $20. And I actually thanks to my group of Slack communities and so on, managed to get like 30 plus subscribers over the last month, just like it trading with them.

05:47

And I'm excited to see where this goes. Talking about where it goes, I actually like topic graphs like these. So essentially to take a big long video and if you want to really understand what's going on, to be able to create these tree graphs. So I'm trying to work on this and I actually came as feedback from some of the academicians who have these long videos to watch. I have a product background, I help founders with go to market and stuff. So I'm actively thinking about distribution, the customer segments that I have to go after. I have a few in mind, but I know this is a technical community.

06:20

So I've been very excited to actually explore all of these different models, what the current prompt should be focusing on quality, how to do the chunking and making sure that this is as high quality as possible . Of course, if you're interested in a particular thing, you can just play. >> [INAUDIBLE] >> Very specific content for your needs. >> Oops, sorry. >> And it really helps you achieve the results that you want. >> While giving you a ton of your. >> So you can just play the video at the particular time.

06:48

And what is interesting is, so people are mostly interested in the takeaways and the summaries which is here. But then I've also have tags on the left, so you kind of broadly know the input before the tree structure comes in. I just wanted to also show my local environments, so let's see. So this is my local environment and I just want to show you how the transcript looks like from YouTube, it's not great, right, with punctuation and so on. So I'm running like a transformer from Hagging Face, which is this on the browser with the trans, which is essentially like a web worker to punctuate each of these transcripts.

07:33

Of course, it takes a lot of CPU power, but I'm trying to see if I can fix that well. But this is more for like deeper use cases if you really want to go read the transcript. Not many people buy in for want to do that. That's it, so I would love for you to check it out. I'm not tracking anything, it's called Summarize Live. And I'm going to be introducing not only the tree structure, but also different languages. So I know a lot of you might speak different languages. And so I would love for you to test and then give me some feedback.

08:00

Thank you very much. >> [APPLAUSE] >> So one burning question. >> Yeah. >> Or two? >> Do you think it's sustainable? I mean, I can imagine that YouTube is working quite hard on shipping a feature like that. >> What I understand, actually, even if you go take the same YouTube video and then put it up on Google Gemini, it comes up with some kind of a summary, but it's not very deep with the chapters and so on.

08:36

And so that's another reason why I started working on it. This was before the whole 1 million context window with Gemini. That being said, they're focused on different kinds of problems, right? So this is a very unique use case, essentially I'm optimizing a workflow. So there are a few people who are willing to see the value of it. So I think that's kind of the segment I'm going after. The challenge generally in the AI app layer is like there's somebody who can build this at any point in time, right? And so I'm not too concerned about it as long as I'm able to cater to a few segments. So I'm just focusing on building a good product but also think about distribution because there's no differentiation in the product .

09:14

I don't know if I answered your question. >> Hi, thanks for the presentation. I definitely could use something like this and kind of I'm using something alternative. But I was curious, what is the big, bright future, the big features that you see that you'll be building next to make, take this to next level, maybe differentiate more? >> I don't know. >> [LAUGH] >> That's the true answer. >> Thank you for honest.

09:52

>> Yeah, yeah, yeah, I have no idea. I think that for me, I really want to focus on quality. I mean, you guys know if something is written by Chan Chi BT, right? Internally, you know, when you read it. So I don't want to be that. So I'm trying to figure out what prompting techniques can get me there. The other is also really thinking about learning. So if you're concerned about sleep and this eight hour podcast about sleep, what does it that makes sense to you? I think that's the question I'm trying to answer instead of you searching the transcript or looking at particular kind of advice.

10:23

I don't know how I'm going to do that, but basically the folks that I've sent this who paid for it, can't even keep giving me some feedback. So I'm essentially wearing this product hat on and trying to go back to the engineering hat, builds, and so on. So yeah, it's not like a complicated product. It's pretty simple workflow optimization. >> Can I put my own API key, open the API key or even customize the API endpoint in your plugin? Or do you even plan to do it? Or it's not reasonable? >> So I didn't think of that use case.

11:07

I am interested, so if you want to get access to that repository, happy to share it with you. So you can have your own API keys, right? So I am using Gemini, I'm using OpenAI, and I'm also using Grok. I'm trying to figure out both prompts and context windows for different use cases, chapterization and summary. But you'll have to play around with it to know what works well for you. Currently, I'm just defaulted to Gemini for the summary of a chapter, and then I'm using OpenAI for the chapterization. >> Actually, there are some open source models as well, which helps, which you can try or I can try.

11:50

And it could work not even worse, the same quality, but cheaper or even free. >> Cool, yeah, I'd love to know how to talk to you.