11 minutes

Fine tuning LLMs for function calling

In this video, I present a novel method for fine-tuning LLMs to enhance their ability to call functions. My approach leverages the LLM's existing general knowledge and adds specification registration, enabling it to generate values accurately and efficiently.
00:00

and compact Java all together camel case and other stuff. Python it's different . So to understand more of that sort of stuff like you know maybe not all models really well. So most people choose the fast one like say Python function signature but we also found that Jason schema is like much better placed like you know you don't generate like a code or a tool spec by by the programming language or like different way but you go and generate a Jason spec and it also better for complex arguments but still I think there are problems generating this say for example even with the Jason schema like you know Python function signature there are complex parameter types like said this is an object and if it if it gets too complicated too related I think it's also like a bit difficult to do that but the Jason schema is a bit easier and also explain why it is easier in the next slide I guess.

00:58

So like I keep on saying that preserving based model capabilities is quite helpful when you do not even pull simple punch in it but then it is like if you do a bit of like specific instruction tuning what happens is like the model loses out the existing knowledge it only can either generate full tool calls or it can only generate like you know a midst between you know regular chat and everything so the model will lose the base knowledge like say it will not be able to do multi-ton quantization or it will not be able to do if you don't want to model to just behave like the normal way but then you kind of like you know have the tools also generated in some cases it's not like every time you are you want to get a real time information right sometimes you just want to like see what are the movies released in 2020 right and then you just get the movies what movies are playing here is something you could ask next time so that could be a tool call so models fail at that point of time as well so every time they don't need to be a API model or a tool call so preserving like the base model capabilities is very important when one of the thing that is done here is like you remember the number of like the printing samples and you know increasing the data quality helped a bit better and then it was it was differently done so the data also played an important role in fact actually what I observed with LLM is like if you have good data in your fine tuning if you could use tools like clean up a bunch of these open source tools to properly get a good instruction in a fine tuning dataset that also as good as like generate a good model like GPD so most of these models have good data and which needs good response is later at the time so I think if you are even looking at fine tuning anytime like make sure that you have good data set and I think that will further take care of like you know any model LLM and issues are other stuff as well so yeah I think I'll just skip over because it's taking more time so yeah basically you serving over your ass then serving over you know kind of like you know full fine tuning and a lot of those are of the days I just I'm just going to be more so about constraint generation I think instead of what we did with this specific model by it's V2 which is a departure from the other approach that we used in the past is we the model spec is registered like I showed in the app like the entire spec of it like what parameters and everything the only tokens for the values that are generated upon is the values for the attributes because the structure is already given model performs a bit better and if model understands that you know what to be generated and then it will try to like you know have better context there are still problems like still not not all the times it might generate the fill in the exact parameters but then it fills the value tokens more so the times and it does better on many many benchmarks like empty bench etc we also have a benchmark of this thing that I could show so it will enforce the structure and you know the generation speed is also like remarkably well like it does really really faster so yeah yeah I think oh yeah this is like the benchmark that I was talking about there are bunch of benchmarks like I think this is really good when you compare with the GPT 4O yeah like a family of models that we are like training and trying to do custom tuning there's another function following model like you have heard about head meets to pro there's also like you know train we also have the data and stuff from now's research it's also no thanks also open rates model that you should definitely consider looking at it so yeah if you want to give it a shot on the playground with some of the cool calls just hand the QR code you can try out on playground or something if you have more questions I might want and then happy to answer as well yeah this is like the model page if you want to download it and run it I think it's like it's a 70-beel model so it's like really heavy but you could want to eyes it I'm looking to take some time maybe on the weekend to see if I could put it on a lot so that people could yeah I think there are some some 770b is already on the golem so 40gb but you could probably run if you have an empty max or three ultra something it still runs after a while but otherwise it's a bit harder to run that level models yeah good yeah where do you see what's it these functional models a year from now if you imagine beyond just more accuracy and more functions what do you see more exciting direction I think I think like I come to my search bag I used to work at elastic search and search used to be in e-commerce and a lot of these solutions it with the drag it all it has to come to exciting it's easy to build some sort of nice experience for customers like rather than in old times building the search bar itself takes a lot of effort so now with function calling it's not just about search it's also gathering real time information say if you have data in s3 and sell on top storage and it's in a specific schema and you could use a function calling model and register a bunch of functions and start calling and the data so you are actually what you call short cutting the SQL or you know programs or database or CTD so eventually you could build great workflows and that will be like another wave so you're also not looking to use anymore like a database a traditional database probably still got the store but then I think that's where it is and search is going to be exciting I think there's more enterprise search use cases and things like that it's only we're also building something in this area so it's under wraps I think but more stuff will go yeah so you're mentioning that today is the discovery importance of GSC so can you tell a little bit about how you cost data for trade is generated yeah it's a combination of synthetic but that's what we have open source to be honest most of it is also custom made like say for what you want to do and it took quite some time in fact we started this process of what the problem is type and we released the one in a way and I think it took some time for us to like you know get a good data and we know the problems and like I think we saw eight cross ready to be came in like each of experts and then and then that wave has passed then Lama 3 has come in and and then like it was Lama 3 of course it was really well and it was really good as well so yeah I think it took some time but like it so much of synthetic and you know open source data plus choosing the samples and making it like say for example that tool call that we were talking about right I think so samples which which has multi turn conversations and also like samples which is not just like rack learning right where you put all the tool calls and it's trains on it and model becomes like to ask like in the demo you have the lots of functions have you seen any Linux when you increase the number of functions you give yeah we have seen let's say for example it does really well up to 30 like say for example we have the autoscope use word they are using for more than 100 specs I we don't know like how much would work but you could definitely do it at 20 to 30 and it works really well we have used it for multiple tools so I built a small demo as well like say if I want to like you know find out I think let me show that as well and kind of still respond to normal wise you know so it tries always to call a tool like so but yeah that's what the response to normal questions that's the little crux of it just don't want to like you know kill the core model capabilities or and make it only function call it we want to make sure that it works well for normal conversations and calls the function whenever it is really needed so if I'm learning about the history of Nvidia it should not bring in like say the Nvidia stock price as well it could be a bit but then that is not the thing that I'm looking at right it just doing how do you chat with tragedy video something like that so taking a good chat based model or something like that so I kind of like you know made a small demo like or try to learn something on the same thing say not the sample app but it is something that I built using the API which is like a search in API and say if I want to ask like you know say watch comedy movies horror movies how does it work and like in a five please tell a bunch of questions and you know like APIs that are like out in the wide and it did work so I think trying it out with multiple API so trying out with Gira it would create a Gira to get it five of a month so things like that right so seems like much better in that manner yeah but I think like the pizza's up here and then you could talk more happy to answer more questions as well.

11:08

All right. [APPLAUSE]