Fine tuning LLMs for function calling

00:00

plus and compact Java altogether camel K a and other stuff Python it's it's different so to understand more of that sort of stuff like you know maybe not all models but really well so most people choose the past one let's say Python function signature but we also found that Jason schema is like much better placed like you know you don't generate like code a tool spec by by the programming language or like different way but you go and generate a Jason spec and it also better for complex arguments but still I think there are problems generating this say for example even with the Jason schema like no quite a complex there are complex parameter types like said there is an object and if it gets too complicated to relate it I think it's also like a bit difficult to do that but the jason schema is a bit easier and also explain why it is easier in the next slide so like I keep on saying that preserving based model capabilities is quite helpful when you do not even pull chip pull by tuning but then it is like if you do a bit of like specific instruction tuning what happens is like the model loses out the existing knowledge it only can either generate pull tool calls or it can only generate like you know emits between you know regular chat and everything so the model will lose the base knowledge let's say will not be able to do multi-ton conversations or it will not be able to do if you don't want to model to just behave like the normal way but then you kind of like you know have the tools also generated in some cases it's not like every time you are you want to get a real-time information right sometimes you just want to like see what are the movies released in 2020 right and then you just get the movies what movies are playing here is something that you would ask next time so that could be a tool call so models fail at that point of time as well so every time they they don't need to be a API model or a tool call so preserving like the base model capabilities is very important one of the thing that is done here is like your number of like the print examples and you know increasingly data quality help a bit better and then it was it was differently done so the data also played an important role in fact actually what I observed with LLM is like if you have good data in your mind tuning if you could use tools like clean up a bunch of these open source tools to properly get a good instruction mentioned in data set that also as good as like generic good model model like GPT so most of these models have good data and which means good responses later point of time so I think if you are even looking at fine-chilling anytime like make sure that you have good data set and I think that will further take care of like you know any model I like many issues or other stuff as well so yeah I think I'll just skip over because I'm taking more time so yeah basically serving over LORA stands serving over you know kind of like you know full fine-chilling and a lot of those are all videos I just I just think more over so about constraint generation I think instead of what we generate what we did with this specific model by function 3 to which is a departure from the other approach that we used in the past is we the model spec is registered like a like a showed in the app like the entire spec of it like quad parameters and everything the only tokens are the value sort of generated from the pond is the values for the attributes because the structure is already given model platforms can be metered and model understands that you know what to be generated and then it will try to like you know have better context there are still problems like still not other than it might generate the fill in the exact parameters but then it fills the value tokens most of the times and it does better on many many benchmarks like empty bench etc we also have a benchmark this thing and I could show so it will enforce the structure and you know the generation speed is also like remarkably well like you know it does really really faster so yeah yeah yeah yeah this is like the benchmark that I was talking about there are bunch of benchmarks like I think it was really good when you compare the GPT 4.0 yeah like a family of models that we are like training and trying to do custom tuning there's another function volume model like if you help out about head meets to pro that's also like you know train we also help with the data and stuff from now's research it's also an open source open very small that you should definitely consider looking at it so yeah if you want to give it a shot on the playground with some of the existing tool calls just handle your code you put it prior to playground or something if you have more questions I might have one and then happy to answer as well yeah yeah this is like the model page if you want to download it and run it I think it's like it's a 70-beel model so it's like really heavy but you could quantize it I'm looking to take some time maybe on the weekend to see if I could put it on Ola so that people could yeah I think there are some Lava 3 70-beels already on the Ola so 40 GB but you could probably run if you have an empty max or three ultra of the it still runs after a while but otherwise it's a bit harder to run that level models yeah where do you see what's in these functional models a year ago from now if you imagine beyond just more accuracy and more functions what do you see more exciting version of this I think I think like I come to my search bar I used to work at Elasticsearch and search used to be in e-commerce and a lot of these solutions it with drag and all it has to come to exciting it's easy to build some sort of nice experience for customers like rather than in old times building the search bar itself takes a lot of effort so now with function calling it's not just about search it's also gathering real-time information say if you have data in S3 and it's a long -term storage and it's in a specific schema and you could use a function calling model and register a bunch of functions and start calling and the data so you are actually what you call shortcutting the SQL or like you programs or database or CPD so eventually you could build great workflows and that will be like you know another wave so you're also not not looking to use any more like a you know a database a traditional database probably still got a store but then I think that's where I see and so it's going to be exciting I think more enterprise search use cases and things like that it's already we're also building something in this area so it's under wraps I think but more stuff yeah so you're mentioning that today it's so scary important obviously so you can't tell a little bit about how you got data portrayed is it generated yeah it's a combination of synthetic but whatever we have the open source to be honest most of it is like also custom made let's say for what we want to do and it took quite some time in fact we started this process of what the problem is type and we released me one in a way and I think it took some time for us to like you know get a good data and we know the problems and like and like I think we saw a cross-20 to be came in like each of experts and then and then that wave has passed then llama 3 has come in and and then like it was number three of course it was really well and it was very good as well so so yeah I think it took some life but like it's a bit soft synthetic and you know open source data plus choosing the samples and making it like say for example that we were talking about right I think so so samples which which has multi-term conversations and also like samples which is not just like right where you put all the Google calls and it's brains on it and model becomes tambouric I'd like to ask like in the demo you had a bunch of functions have you seen any when you raise the number of functions you give yeah we have seen like say for example it does really well up to 30 like say for example we have the otoscope used for using for 100 specs I don't know how much would work but you could definitely do it at 20 to 30 and it works really well we have used it for multiple tools yeah so I built a small demo as well like say if I want to like you know find out I can make sure that as well and can it still respond to normal by sales or it tries always to call a tool like yeah that's what it responds to normal questions that's the whole crux okay just don't want to like you know kill the code model capabilities and make it only function call it yeah we want to make sure that it works well but for normal conversations and cause the function whenever it is really needed so if I'm learning about the history of NVIDIA you should not bring in like say the NVIDIA stock price or you know as well it could embed but then that is not the thing that I'm looking at already just doing how do you chat with chat with you something like that so so taking a good chat based model or something like that so so I kind of like you know made a small demo like or try to learn something one on the same thing say not the sample app but in this something that I built using the SERP API which is like a search in an API and say if I want to ask like you know say watch comedy movies horror movies how does it work and like you know five which there are a bunch of functions and you know like APIs that are like out in the white and it did work so I think trying out with multiple events so trying out with zero it would create a g-run to get it by one so things like that so seems like much better but I think like the pizzas are here and then talk more happy to answer more questions as well (applause)