Whisper Clipboard

00:00

So hi, I'm Kanish. I'm actually from the US, and I'm only an Amsterdam for a month for work. And I've been to AI Tinkers event in SF, and I liked it, so I decided to come to one here. For context, so me and my friend have been working on this tool for a bit.

00:21

And it's called Whisper Clipboard. And basically, a lot of time when we want to craft a prompt that we want to ask a language model, we want to maybe like inter-sperse questions like thoughts we have with context about the problem. And this helps us do that.

00:45

So for example, let's say I'm reading an algorithm's textbook. And this is a description of the solution to the end Queen's problem. So one way-- one thing I can do is I can hit Command Shift X to start recording, and I'll ask, hey, Claude, I don't quite understand this description of the end Queen's algorithm.

01:10

Can you explain it? Here it is. So that's my recording. And then I can highlight the part that I want to do. And whenever I copy it, that copied part will show up there as well.

01:25

And then I can give it some more context. And here is a more formal, pseudocode description of it. Could you maybe please also give me an example that works through it. And then I can highlight the next part, and I'll show up as well.

01:47

And the cool thing, like the UI feature, or I guess UX feature of this I like a lot, is it's like a queue. I can, like, control V. Control V will go to the next one. Next one. Next one. And I can press Enter.

01:59

So this is trying to reduce the activation energy of utilizing stuff like Claude, where we need to give context on multiple sources along with instructions. It interspersed them. And yeah, that's basically it.

02:17

[APPLAUSE] This is very cool. I'm cute. Essentially, you're kind of maintaining a clipboard like thing. Would that be fair? Yes.

02:32

How do you build that context across different kind of-- if it's the web browser, or even if it's just like an application that you're using, and also both text and voice. How do you bring all of these things together? Because as a user, whenever you're thinking about, I'm in this app or on that web browser or this tab.

02:52

So how do you think about that? So right now, we're triggering off-load the command C, like set of key presses. And this app actually started more out as just a transcription app than my friend made for himself. Because he kind of injured his friends.

03:08

I got like, he injured his fingers. Got RSI from like Planktonch basketball. So he wanted to like, make a transcription app. And so right now, all we're doing is command C. But we have seen other transcription apps. Say, for example, they'll take in the current focused window as context when feeding your transcription to models like Whisper.

03:31

So for example, if I'm in my messages app, and I then hit command shift X here and start recording, like some other apps will take what's on your screen. And Fita is context in Whisper. And the value out of this is if there's proper nouns, like names or like Twitter handles or stuff like that, Whisper will actually be able to get them.

03:54

So this is like a future extension that I've seen in other transcription apps. But we right now only trigger off a command V. I imagine going through a longer research session that the Clayborne environment could get quite noisy. And have you thought of-- like in my experimentation, I found that some models are better if you have a series of a stack of clips and some are completely outside of the topic of what the others are.

04:27

I mean, you thought of grouping them or sort of making some judgment around, yeah, this looks like you just copied a message from random DM as opposed to this is topical to the prompt that you're currently building. We don't have that yet.

04:42

Actually, he fixed this morning so I could demo it. And we have not released it to anyone yet. Yeah, we don't have that yet. But on more of the transcript side, he does have plans to add more LLM-based features to the transcripts.

05:04

I've started a job, so I've stopped working on this. If you're interested in using it, I'll forward you and click your Twitter to him. He would love for you guys to try it out. Yeah. Just curious how you thought about this.

05:25

Now you-- I think it's awesome that you can very easily sort of create contacts around copying something, transcribing, et cetera. But you still now have to paste it into cloth. What is the reasoning behind not actually just hitting the enter on the clipboard basically and getting the answer there? Or have you thought about that? I just haven't thought about it.

05:47

This-- the stack-based queue, it's fairly easy. You just press command V5 times. Yeah, we just didn't think about it. OK. OK. Yeah.

06:02

[BLANK_AUDIO]