>> I think I can. Hello, do you hear me? >> No. >> Turn it on. >> Do you hear me now? >> Yes. >> Perfect. Hello, everyone. I'm just going to quickly switch through the presentation one minute and then the rest is going to be just demoing what we are building. My name is Amir and I love building teams. I have started building on body since 2018 and five projects and last February together with my friend we turn it into a little startup based in Rotterdam and today I 'm going to talk about what we are solving and how we are solving it.
Is this supposed to work? >> That's for the music, I think. >> No, it's fine. No, it's all right. Okay, so quickly the context is that we are seeing that the AI first is becoming kind of the new mobile first. It's becoming a standard in every little website or app. It's expected AI features, semantics, generative search, as the local is also saying. A lot of problems can be solved but we don't see much of those problems happening because of the one reason and the complexity.
But before doing that, I'm going to explain, let's see what the focus is the AI native app. What we mean when we talk about AI and applications already in websites. In onboarding, we break down these into three particular concepts or pillars. One is personalization, second context errors and the other one proactive interactions. The personalization can happen when we have access to data. When you can personalize it for a particular data, set of data. That can be in different formats, structured data or unstructured data. A PDF file or an audio file or a lot of records on your data. The second part is the context. Context also matters and the AI native app should be aware of the context. Context is nothing than all the data that we create and produce on a daily basis on various platforms scattered around the platforms and places in various forms and formats.
The third is basically the new shift in the mental model of the user experience . We are not no longer looking for information. We are looking for answers. We are not going to go around and click around and find a particular answer. We just want to ask a question and get the answer back from the computer. We learned we can start a dialogue with computers since chat GPT came around. That means that we need to offer or an application that understands human language but also be able to speak the human language. So these three parts, let's put them together and let's see if we want to make a quick chat. The problem is the complexity. AI is complex. This is a simple diagram. You have a simple chat but you want to make on top of your PDF on Google Drive.
You need to put together 10 different frameworks, a lot of complexity and that 's a scene we are not going to use our own LLMs. This is just put together tool chains to make one particular application. This makes it super complicated. This is the diagram that flow off a rag which will be all augmented generation pipeline. This is a very difficult one. A lot of processes 9 to 8 to 9 different steps involved. What do we simplify this into one particular plug-in and play solution? How it works, it works basically. We tell them what the value of the data is. Then you set up your AI stack. You say, "Okay, I want to use this model on a hunting phase or open AI." Let's say, a model from Gemini or etc.
Then once you have this done, the process is done and gives you a graph-related point where you can start building with it. Today we are going to basically see what it does. Let's start with going to the website. There is a button here where you can click dashboard. You can start by creating a project. A project is basically where your data is scoped to. You have a tenant. Isolated data is one project. One project, let's say, M06. Then here you can select from the presets. You can say, "Okay, I want to mainly have search functionalities." You can select from either our circle, set models, or from open AI code here. From each of these you can select different variations or generative features or you can go with use cases.
I assume there are a lot of people in this room that know a lot about AI. Let's go with the advanced stuff. You can set up your own text recognizer. You can select other use containers. You can use the old ones, the transformers. You have your self-hosting. Or you can go, for example, the topen AI and then you choose text and the new logic tree, or cohere, for example, month, link, etc. Also, same for image vectorizer because you are going to make a multi-model wrap. Q&A modules, there are a couple of, so this is really fine to have as many of them as possible. There are two of them. Or you can have also the typical generative on top of that.
That would be, again, from three providers, we can install open AI and cohere. We haven't had the self-hosting model for this, but soon will be added. You can also have some content enhancement modules by default, enable or disable auto captioning of the images. Auto transcription of videos and audio comes by default in order to set it up automatically. And by the way, you can use that without any, and we don't have a right unit as long as you can. [laughter] Automatic summarization, automatic keyword extraction, etc. And a bit of a utility like re-rankers, spell checks, etc. Let's create the project. Now, we can start adding sources, right? We have various sources, Google Drive to Discord messages, GitHub, etc.
And we can add into it. Assume the local folders will be all set. Let's go with Google Drive. And the next step is to connect it. So here I'm connecting to, I'm allowing the body to access those files. Once you have done that, if I go here and you can see this, I have a mine drive , share drive, and my computer, under share drive, there is a on body drive. So if I go back to this screen, you see it's automatically get listed. So from mine drive or on body drive, doesn't matter. If you select a folder or the entire drive, that's the entry point, that's where the data is supposed to come to a body. So let's select on folder, demo one, and you press initialize. At this moment, I'm going to go and index and process all the files.
From this on any changes you make, simultaneously being processed by your body. So you don't need to ever come back here. Once this initialization is done, you basically get the project done and let me , I have to save time. I have this project already connected to two GitHub repository, one Discord channel, and one Google drive. To just give you also a bit of a context, in this Google drive, I have various files from the video, or another video about Tarantino, a couple of in the Martin file about on body, and then another two articles about Martin Tarantino . Each of them can contain multiple images, text, etc. And I don't want to turn a table with the data Maritona, by the way I generate all of this.
So I don't know what's going on but I'm going to do that. And a tool about the electric cars. I'm going to use this folder and the other GitHub repositories for the demo. So the first demo would be, let's go with the semantic search, right? Like pretty straightforward. And everyone knows what the semantic search is. So if I say here instead of Tarantino, I can say a movie director, right? And then search. And the first two top is going to be Tarantino. Or I can then again select between different things. And for example here, I can say in top Discord text. And this is pretty basic stuff, right? And I can say it's not a very simple thing. And I can say it's not a very simple thing.
And I can say it's not a very simple thing. And I can say it's not a very simple thing. And I can say it's not a very simple thing. And I can say it's not a very simple thing. And I can say it's not a very simple thing. And I can say it's not a very simple thing. And I can say it's not a very simple thing. And I can say it works, right? And no results. Why? Because we are using our Google Docs. In Google Docs we are using for general purpose material. It doesn't know about TypeScript play. But as soon as I change this to our GitHub repositories, for example GitHub comment, and then ask it, then it's going to find the answer and it's going to generate a particular for that with the examples and so on.
So you can do basically all of these together. You can chain all the sources together. Another example I want to show you is a logo. This is, I have a message there stuff like thousands of files, a lot of things that you also have them, a lot of screenshots, a lot of invo ices that I have no idea what's going on. So yesterday I was just sitting at home, so I thought, okay, and because we just released the image captioning yesterday. So I built this demo quickly with Ombud. So I put all my data into Google Drive and processed it on the V8 Ombud. So here I can, for example, ask, let's start with something simple. For example, you can say find their icon, right? The first one is find their icon.
Or you can say, for example, the meme about London 3. Or flying to London. Because I was giving a talk at London a few weeks ago, and then I made this meme actually to go there. So you understand. So, and then if you remove, for example, the meme, and entire, like, this sort of, then search for it, then it finds my probably boarding pass. So now I can also start extracting information because it's already knows it. Another cool thing you can say, for example, dark team UI. And it shows all the dark team UI. Or you can say, for example, marketing slide from an Ombud presentation. I think I have one marketing slide.
Yeah, it's one of these. This is the first one. This is still without re-ranking any of the defines unique, right? This is just right out of the voice. And now I can actually also do a bit of generative stuff. So, for example, I can say, oh, this is also a cool or fun evening at Irish pop . And that's only for the defined because that's exactly where we were. And now let's move on to another example, when you can start asking questions. For example, we can ask what was the presentation at MongoDB about. So this is the top I gave the other, we got MongoDB. And let's see if it can find the answers based on only screenshots that I have. Right? So here you go.
The presentation at MongoDB was about what is an AI-80 by a body feature in Cl erking MongoDB. Again, this is no text, material, no PDF, only screenshots. Imagine looking at all the other data to it. And we're going to give a quick example. And we have a workshop in almost 10 days, how we make them. But now let's move on to the coolest part, which is the code. You all see the complexity, right? And this is basically the entire code you need to make one search. Or let's make it a bit more complicated here. This is the entire code you need by unbody to make it a rag pipeline entirely. Image block, select this particular field, search about the query, limit the top two results, generate by grouping all these results together, do this task actually based on these properties.
And done. And we have the answers in very, very fast, and also way than any particular API out there. The build is to make it easy for mainly people outside the AI realm, right? Because most of us here, we know AI. You have a lot of knowledge in AI. But the problem is that there are like 80%, 90% of developers out there that they are not AI engineers. They are front-end developers, iOS developers, Kotlin developers. And they need, they don't know this complexity. And that's one of the reasons we had a huge gap between how iOS is advancing in terms of technology and science and how the real world applications comes.
Because those developers, they don't have access, they don't have the tools that they don't know what's vector databases, they have no idea what's re-ranking it. Trust me, they are senior developers and they don't know it. And they don't need to know. So that's why we build our body to simplify this process. I'm thinking if I have, oh yeah. So again, just to show you, as soon as, for example, if you look here, you can see image block, Google Doc, text block, text document, direct Discord messages, GitHub content, video files, audio files, etc. etc. So all these processes can be applied on top of any type of file.
You can chain them together. We just few lines of code. Actually, technically this is just one of code in JavaScript. I don't know how you call it in Python or other languages, but this is just a chain of command. But yeah, so this was my talk. I hope you enjoyed it. I'm sorry I heard you. [applause] So the reason why the UI makes it so that you make me select which source something came from? No, that's much. Because these are just the use cases you make for developer communities if you have to make to simplify it.
But you can of course chain them all together. Right. I don't mind them. This is just for the sake of clarity. That's it.