16 minutes

speedrun blooper reel of a year of ai app development.

In this video, I share how AI has made my life easier through automating daily tasks. From transcriptions to dealing with official letters, AI assists. But, human supervision is key.
00:00

I spent the last 15 years building a company called Unity. We made a game engine that powers most of the games on your phone and on your PlayStation, on your Xbox. And this talk is about me being really fucking annoyed about things that should exist, but they don't. And things that shouldn't exist, but they do. A few weeks ago I heard my knee in the gym. Oh actually I should say this first . A lot of the stuff in this talk is annoyance about AI being so great and all these models being so great and open source models and llamas being so great and everything's amazing. But in my life shit isn't amazing. I do a lot of blowering stuff that I don't want to do all the time and I hate it.

00:52

And I'm here going to complain about that for 50 minutes to you. I heard my media while ago in the gym, so I went to the physiotherapist. You would think that the experience is kind of like this where they check out your knee, but actually the experience is more like this. Where there should do, typing in the computer very slowly answers to questions already answered by my driver's license, which is right in front of it. 25 minutes later into this intake, time is up. I have to leave and my knee still burns. Makes me fucking pissed off. And since I have some time on my hands now, whenever I'm pissed off I try to do something about it. So for this guy I figured I'm going to build a app and I made it a website. It's called Scravené for me.

01:43

It basically listens to a conversation and then when you're done with the conversation you click a button saying you're done. And then you get some kind of report at the end. And it's not really a report like meeting notes. It's the kind of report that the person that was having the conversation wants to have. So if you had this one that I made it for the house arch, which is like a doctor or PD, when they have to put my information in the computer system, it has to be in a very specific format. For every complaint you have to do like sukikti, fokikti, fokikti, filiu vansi plon. So we made sure that whatever sort of what's sent in the conversation comes in, it's a bad format so that the doctor can actually spend some time on me instead of on his lower computer.

02:30

You think actually that making it was kind of simple or at least I assumed it, but it was harder than I thought. Because going from a microphone to audio transcription, if the conversation is like an hour, at the end of the hour you don't want to wait for like five or seven or ten minutes for the thing to be transcribed. So I figured there should really be something that sort of like does real-time transcription, transcription. So either I'm not good at using the internet and exists and it's already exists and that would be silly. But I figured like, okay, I guess I have to do it myself. Fucking annoying. So we go from the browser's microphone over to the web socket, it goes to my server, and then on the server I have to pipe it to an FFT bag to chunk it up in like 30 second segments.

03:23

And then I sent the 30 second segments to whisper, actually sent them to a link from Gloq. Yeah, what about the fucking great limit student? We cannot work like this. I'm working on it. I get 60 requests a day, like this is unusable. So we have these 30 second segments and you get back the transcript for that. And it kind of works, but it really breaks into edges of these segments, right? If the segment cuts right in the middle of a word, you might get like a weirdo word at the end of a segment or like a weirdo word at the beginning of the day. So to fix that, we do like overlap chunks. So I take the few seconds of the previous segment and basically double it up, put it in front of this. It makes it so that the transcript actually has a lot of double stuff in it, but at least you catch every word.

04:17

And then you just fix it up with L.M. at the end. You basically explain the shit you were doing and if it makes it up. You know what Whisper actually returns if one of these 30 second chunks is kind of empty? Yeah, usually all the details by something. It is. You would get shit like this. So I'm actually going to watch another YouTube. I'll just mention that it's interesting. Turns out it's published for "Please like my YouTube channel below" like my content. And on another day when you send a almost empty chunk, which actually kind of happens like in our own conversation someone, you know, it happens a lot. You would get copyright on the top of it.

05:03

I don't actually have a dog in the whole AI model, copyright landscape, but I still find that. But I openly admit to it like this, I still find that it's kind of hilarious. So yeah, it kind of like these empty chunks sort of like it feels like you opened up like a portal to like the bottom half of the internet. It was, I thought it was fine. All right, more things that piss me off. You know how you get like an invoice on your phone and then you want to pay it, but you have to like copy paste the e-bond. Out on the thing and you can't hit the fan, then you have to do the same thing for the amount and for the count holder and for the invoice number. Fuck that shit, that should not actually. It really should not.

05:56

Fuck annoying. So we've Alfred. This Alfred's my personal assistant. You can tell because he looks chronically miserable. It is an open AI assistant inside of a telegram bot. It works like this. I get an invoice on my iPhone and then I use like the built-in share feature up there. I click it and then I see that there you are. I see a Alfred from my friend's list and then it says the Alfred and Alfred comes back saying like, "Hey, great. I'm going to pay this invoice to Sneaky with this e-bond number with this account, you know, like basically with all the data that you don't have to copy anybody." Every time I use this, it's great.

06:49

It feels so great. It actually works. It's also correct for everything you're better. Let me get to that. So this is one of the first invoices I tried to do. It's a video invoice. I'm going to zoom in a little bit or maybe not a video. It's not my invoice. But it's one of the first ones I tested. The total amount here is like 36 euros, 26. You know what is in the text layer of the PDF? Zero, zero, zero. It's in the text layer of this PDF.

07:21

It is 3626. You know what's not in the PDF text layer? The fucking... they draw the comma on top with an image layer. His image base is just the only way. Exactly. So bad. Actually, a lot of the AI stuff I've been digging around with, it comes back to that. Like, whenever it doesn't work, you just have to make it work like a human war. So now Alfred basically converts it to an image and then sends it to the OpenAI Vision model. And then that gets it out. And that wasn't a shaky, but now with the only model that came out, it actually got a lot better.

08:02

You could power the EU with all the W3C people that are spinning in their grades. Alright, so that was one funny problem. Another one is sort of like homework. So I shared a message to Alfred and a telegram sent to my server a webhook. My server receives it and then starts talking to OpenAI and function calls and it verifies the E-bun number with the checksup and all that stuff. And then it does a bunch of stuff. And then it's an API call to my back to actually pay the money. Because if it was fake, if it was fake, it wouldn't be no use. But what happens if after you make an API call to the bank, then there's a server button.

08:52

Then the telegram webhook sees a HTTP 500 interval server error. And I didn't really think about that. But what do you think telegram does when it gets HTTP 500? It doesn't throw out your buttons. That wouldn't matter. That wouldn't matter if that actually happened. It calls you again. And again, until you run out of money, which is the true story that actually happened. Fucking hell. But yes, other than these babysits, I had to fix a bunch of stuff. So now before it actually sends money, it does this telegram multiple choice thing where you can say yes, no.

09:49

And add a database to check if it paid this exact invoice before and it checks if you already paid money to this person before. But yeah, it's sort of in production in the sense that I wrote it only for me, but it is paying all my inviscence. My next up for Alfred is getting money back from the Deutsche Bahn. I'm not sure if there's any fans here of Deutsche Bahn. I am not. I went on a nitrate of basil. It was supposed to look like this. But when I got into the train at 11pm, it looked like this. And they say something about some German bureaucracy. So my next ask is actually for Alfred to go fix this problem. To search the internet, how to get my money back.

10:41

To download a bunch of PDFs or whatever it needs to do, fill them in, send them out, and then do whatever it needs to do. Email follow-ups, phone call follow-ups, checking if the money is back into my account. It's kind of a stretch goal, but you're trying to destroy Germany. I'm just trying to get to basil. I'm just trying to get to basil. So this is all the things that I hope it will be. One may do, but it doesn't do you. More things that piss me off when you get a letter like this. I'm going to zoom in a little bit. This letter is a bottle check. It's from the open bar ministry that says that the meeting on January 5th, 2023 at 2pm at the open bar ministry of the Prince Klausland, by way in the Hach, that is very important that you do not show up.

11:37

I'm doing it. So this is actually a project that my girlfriend started to help people that get letters, you know, that get annoying letters from Iraq or, you know, overly complicated letters to help deal with them. It's called the lay simple app. It's actually very popular and it gets used a lot. It works like this. You grab a letter, you grab your phone, you use the app to make a photo, and then you get a compliment. And then you get a summary in very simple to understand language in a very visually pleasing column demeanor. And people use this, like people use this a lot.

12:24

It's a free app, sort of our community service project. This compliment is actually a fun thing about the project. Whenever, like when you take a picture, you get this, everyone just lights up. You see people smile, they physically relax, the shoulders drop back. It's the best part of that. It's also complete bullshit. At this point, we haven't even looked at the photo. It's just a place holder in the beginning, but everyone triggers by it. If they get triggered by it, it's so badly that we just decided to keep, like, you know, if it makes the world a better place, let's keep it in. One of the first versions of the app had the, I guess, that every AI will be app has.

13:09

If you take a picture of this white wall, you would get a summary like this with a completely confalulated bullshit letter that never existed. So we had to check for that. Another thing we learned on the project, we have a mixed panel with all these stats about the app and installs and all these things. Then we have open telemetry data to check out the latency of all the different bullet points that slide in, and the first one versus the rest of them. Then we have latency distribution crafts over time of the different language models, providers, and all this stuff. We basically never fucking use it. We never use it.

13:51

What we really use is a telegram channel, and the server just sends an emoji when someone installs the app. You get a text message when someone leaves a message. We use that 100 times more than all this fancy-pancy stuff. I have one more that didn't actually happen, but we're just going to pretend that it happens. This guy. Take a moment to have his sign sort of into your brain, and I would love for someone to answer the question if on Tuesday at 11am if I am allowed to park or not. Anyone? No. That's the first one, right? Anyone at all? I just bought the findout.

14:39

If you throw this thing into the new Omi model and ask it to, if you're allowed to park, it will actually write you a Python program, according to the stuff on the sign, to figure it out. Then if you ask it to turn that Python program into a much more user-friendly sign, it will actually make this. Wow. And it's actually correct. So, you know, like with the whole room of really smart people, and only one or a half, maybe knowing if Tuesday at 11am is a parking. If that's not AGI, I don't know what the fuck it is. That's it for me. Some of the stuff in this talk was slightly exaggerated, but I thought it would be okay because if anyone cared about accuracy, there wouldn't be an AI model.

15:38

Please, it really sucks, and I have to build all this stuff. Why don't you build all this stuff that solves my problems? That would be great. If you do, I would love to be Churchy Drought. You can find me at LucasMiner. I host a small AI hackathon from time to time. It's like 15 people. If you'd like to join one day, come find me, and that's it . [applause]