In this post we take a look at the function calling capabilities of the open source model NousResearch/Hermes-2-Pro-Mistral-7B (interstellarninja et al. (2024))
In a previous blog post I discussed how we can use the OpenAI python client to run inference with open source models through services that are OpenAI compatible. I’m going to copy part of the code here.
ChatCompletion(id='chatcmpl-945ya3wcBWQeIbmzffGotEPMNnU66', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='Hello! How can I assist you today?', role='assistant', function_call=None, tool_calls=None))], created=1710763312, model='gpt-3.5-turbo-0125', object='chat.completion', system_fingerprint='fp_4f2ebda25a', usage=CompletionUsage(completion_tokens=9, prompt_tokens=9, total_tokens=18))
We can also use the same class to run inference with Hermes-2-Pro-Mistral-7B through a Hugging Face Inference endpoint. You don’t need to use an inference endpoint to run this model. You could use the transformers library directly and run it locally. Remember to use the proper prompt format. I’m using the messages format.
Code
print( llm( model="tgi", api_key=HUGGING_FACE_ACCESS_TOKEN, base_url=HUGGING_FACE_ENDPOINT_URL, messages=[dict( role="system", content="You are an OpenSource LLM that rivals OpenAI GPT. Your goal is to bring open source AI to everyone!", ),dict(role="user", content="Explain why open source AI is important."), ], max_tokens=2000, temperature=1, ) .choices[0] .message.content)
Open source AI is important for several reasons:
1. Transparency: Open source AI allows developers and researchers to review the code, understand how it works, and verify its correctness. This transparency ensures trust in the system and can lead to more reliable and secure AI applications.
2. Collaboration: Open source encourages collaboration among developers, researchers, and users from around the world. By sharing knowledge and efforts, the community can collectively drive innovation forward, resulting in faster advancements in the field of AI.
3. Accessibility: Open source AI makes it possible for anyone to access and use cutting-edge AI technologies, not just those with large budgets or exclusive access. This democratizes AI and ensures that it can benefit everyone, not just those who can afford proprietary solutions.
4. Flexibility: With open source AI, users have the freedom to modify and adapt the code to fit their specific needs. This flexibility allows for customized solutions that can better address unique problems and requirements.
5. Learning and Education: Open source AI provides a valuable resource for learning and education. Students, researchers, and developers can study the code, understand the underlying principles, and gain practical experience in using and improving AI systems.
6. Competition and Market Dynamics: Open source AI fosters a competitive environment by encouraging innovation and rapid development. This can lead to improvements in efficiency, performance, and overall quality of AI solutions. Additionally, it can create a diverse ecosystem with multiple players, reducing the risk of monopolies and promoting a healthy market dynamics.
7. Resilience: Open source AI can be more resilient to cyberattacks since the code is openly available for review and audit. Additionally, a diverse community can help identify and address potential vulnerabilities more effectively.
In summary, open source AI is important as it promotes transparency, collaboration, accessibility, flexibility, and learning while fostering innovation, competition, and resilience in the field of artificial intelligence.
Function Calling Capabilities
First we will define some functions/tools which the LLM will have access to. Here I use langchain to convert the Python functions into the tools format used by OpenAI. It’s much faster than writing those JSON objects by hand. Note that Hermes-2-Pro-Mistral-7B also uses this same format!
I am leaving out the actual logic for each function. I mainly want to test the models ability to pick out the correct function and arguments. The important step here is to document each function and argument.
Code
@tooldef get_weather_forecast(location: str, date: str) ->str:""" Provides a weather forecast for a given location and date. Args: location (str): The name of the city and state, e.g. 'San Francisco, CA'. date (str): The date of the forecast in YYYY-MM-DD format, e.g. '2023-07-01'. Returns: str: A string containing the weather forecast, e.g. 'Partly cloudy with a high of 72F (22C).' """pass@tooldef book_flight( departure_city: str, arrival_city: str, departure_date: str, return_date: str, num_passengers: int, cabin_class: str,) ->dict:""" Book a round-trip flight for the given parameters. Args: departure_city (str): The full city name with the departure airport, e.g. "Toronto". arrival_city (str): The full city name with the arrival airport, e.g. "Austin". departure_date (str): The departure date in YYYY-MM-DD format. return_date (str): The return date in YYYY-MM-DD format. num_passengers (int): The number of passengers. cabin_class (str): The cabin class, e.g. "economy", "business", "first". Returns: dict: A dict with the booking details including airline, flight numbers, price and booking confirmation code. """pass@tooldef book_movie_tickets(movie_name: str, theater_name: str, date: str, time: str, num_tickets: int) ->dict:""" Book movie tickets for the given movie, theater, date, time, and number of tickets. Args: movie_name (str): The name of the movie. theater_name (str): The name of the theater. date (str): The date of the movie showing (YYYY-MM-DD). time (str): The time of the movie showing (HH:MM). num_tickets (int): The number of tickets to book for the movie. Returns: dict: Returns a dictionary with booking details if successful, otherwise returns a dictionary with an error message. """pass@tooldef translate_text(text: str, target_language: str) ->str:""" Translate the given text into the specified target language. Args: text (str): The text to be translated. target_language (str): The target language code (e.g., 'es' for Spanish, 'fr' for French). Returns: str: The translated text in the target language. """pass@tooldef get_recipe(dish_name: str) ->str:""" Returns a recipe for the given dish name. Args: dish_name (str): The name of the dish to get the recipe for. Returns: str: A string containing the recipe instructions. """pass@tooldef solve_math_problem(problem: str) ->str:""" Solves a given math equation using a symbolic math library. Simply pass in the equation. Args: problem (str): The equation to be solved. Returns: str: The solution to the equation. """pass@tooldef send_slack_message(channel_name: str, message: str) ->bool:""" Send a message to a Slack channel. Args: channel_name (str): The name of the channel. message (str): The message to be sent. Returns: bool: True if the message was sent successfully, False otherwise. """passfunctions = [ get_weather_forecast, book_flight, book_movie_tickets, translate_text, get_recipe, solve_math_problem, send_slack_message,]tools = [convert_to_openai_tool(f) for f in functions]
Here is an example of two of the tool definitions. Note that this is the sametools format used by OpenAI.
Code
tools[0]
{'type': 'function',
'function': {'name': 'get_weather_forecast',
'description': "get_weather_forecast(location: str, date: str) -> str - Provides a weather forecast for a given location and date.\n\n Args:\n location (str): The name of the city and state, e.g. 'San Francisco, CA'.\n date (str): The date of the forecast in YYYY-MM-DD format, e.g. '2023-07-01'.\n\n Returns:\n str: A string containing the weather forecast, e.g. 'Partly cloudy with a high of 72F (22C).'",
'parameters': {'type': 'object',
'properties': {'location': {'type': 'string'}, 'date': {'type': 'string'}},
'required': ['location', 'date']}}}
Code
tools[-1]
{'type': 'function',
'function': {'name': 'send_slack_message',
'description': 'send_slack_message(channel_name: str, message: str) -> bool - Send a message to a Slack channel.\n Args:\n channel_name (str): The name of the channel.\n message (str): The message to be sent.\n Returns:\n bool: True if the message was sent successfully, False otherwise.',
'parameters': {'type': 'object',
'properties': {'channel_name': {'type': 'string'},
'message': {'type': 'string'}},
'required': ['channel_name', 'message']}}}
Here is a list of questions to test out the function calling capabilities. For each question we have the text and the ground truth expected function name and arguments. This way we can have a mini evaluation for how well the function calling works.
Code
questions = [ {"question": "What will the weather be like in Seattle, WA tomorrow?","tool_calls": [ {"name": "get_weather_forecast","arguments": {"location": "Seattle, WA","date": (datetime.now() + timedelta(days=1)).strftime("%Y-%m-%d"), }, } ], }, {"question": "What's the forecast for Miami for today?","tool_calls": [ {"name": "get_weather_forecast","arguments": {"location": "Miami, FL", "date": datetime.now().strftime("%Y-%m-%d")}, } ], }, {"question": "Will I need an umbrella in New York City two days from now?","tool_calls": [ {"name": "get_weather_forecast","arguments": {"location": "New York City, NY","date": (datetime.now() + timedelta(days=2)).strftime("%Y-%m-%d"), }, } ], }, {"question": "Book me a round-trip flight from New York City to Los Angeles departing on June 15th and returning June 22nd for 2 passengers in economy class.","tool_calls": [ {"name": "book_flight","arguments": {"departure_city": "NYC","arrival_city": "LAX","departure_date": datetime(datetime.now().year, 6, 15).strftime("%Y-%m-%d"),"return_date": datetime(datetime.now().year, 6, 22).strftime("%Y-%m-%d"),"num_passengers": 2,"cabin_class": "economy", }, } ], }, {"question": "I need to book a first class round-trip flight for 4 people from Chicago to Miami. We want to leave on December 1 and return on December 12.","tool_calls": [ {"name": "book_flight","arguments": {"departure_city": "Chicago","arrival_city": "Miami","departure_date": datetime(datetime.now().year, 12, 1).strftime("%Y-%m-%d"),"return_date": datetime(datetime.now().year, 12, 12).strftime("%Y-%m-%d"),"num_passengers": 4,"cabin_class": "first", }, } ], }, {"question": "I want to book 3 tickets for The Super Mario Bros. Movie at AMC Empire 25 on April 7th at 7:30 PM.","tool_calls": [ {"name": "book_movie_tickets","arguments": {"movie_name": "The Super Mario Bros. Movie","theater_name": "AMC Empire 25","date": datetime(datetime.now().year, 4, 7).strftime("%Y-%m-%d"),"time": "19:30","num_tickets": 3, }, } ], }, {"question": "Book 2 tickets for Guardians of the Galaxy Vol. 3 at Regal Union Square on May 5th for the 9:45 PM show.","tool_calls": [ {"name": "book_movie_tickets","arguments": {"movie_name": "Guardians of the Galaxy Vol. 3","theater_name": "Regal Union Square","date": datetime(datetime.now().year, 5, 5).strftime("%Y-%m-%d"),"time": "21:45","num_tickets": 2, }, } ], }, {"question": "How do you say 'Hello, how are you?' in Spanish?","tool_calls": [ {"name": "translate_text","arguments": {"text": "Hello, how are you?", "target_language": "es"}, } ], }, {"question": "Translate 'I love programming' to French.","tool_calls": [ {"name": "translate_text","arguments": {"text": "I love programming", "target_language": "fr"}, } ], }, {"question": "How do I make pesto?","tool_calls": [{"name": "get_recipe", "arguments": {"dish_name": "pesto"}}], }, {"question": "What's a good vegan chili recipe?","tool_calls": [{"name": "get_recipe", "arguments": {"dish_name": "vegan chili"}}], }, {"question": "Can you give me a recipe for chocolate chip cookies?","tool_calls": [{"name": "get_recipe", "arguments": {"dish_name": "chocolate chip cookies"}}], }, {"question": "Solve the equation: x^2 + 2x + 1=0.","tool_calls": [{"name": "solve_math_problem", "arguments": {"problem": "x^2 + 2x + 1=0"}}], }, {"question": "Solve the equation: 3x - 7 = 5x + 9","tool_calls": [{"name": "solve_math_problem", "arguments": {"problem": "3x - 7 = 5x + 9"}}], }, {"question": "Solve the equation: sin(x) = 0","tool_calls": [{"name": "solve_math_problem", "arguments": {"problem": "sin(x) = 0"}}], }, {"question": "Send a message to the general channel on Slack saying 'Hello, world!'","tool_calls": [ {"name": "send_slack_message","arguments": {"channel_name": "general", "message": "Hello, world!"}, } ], }, {"question": "Send a message to the sales-team channel on Slack with the message: 'Please register for the conference.'","tool_calls": [ {"name": "send_slack_message","arguments": {"channel_name": "sales-team","message": "Please register for the conference.", }, } ], }, {"question": "Send a message to the office-updates channel with the message 'FOOD IS HERE!'","tool_calls": [ {"name": "send_slack_message","arguments": {"channel_name": "office-updates", "message": "FOOD IS HERE!"}, } ], },]
Code
random.shuffle(tools)random.shuffle(questions)
gpt-3.5-turbo-0125 Function Calling
First we will use gpt-3.5-turbo-0125 to extract the function name and arguments for each question.
I’m going to use GPT4 to check the “correctness” of the predicted/generated function arguments by comparing them with the expected arguments. This step is completely optional. Instead, you could use exact string matching or something else. I was curious to see how this would work though.
Code
def check_tool_call_arguments(expected, predicted):# Ask GPT4 if the expected function name and arguments are the same as the predicted function name and arguments.if expected["name"] != predicted["name"]:returnFalse, f'Function Names Do not Match. Expected {expected["name"]}. Predicted: {predicted["name"]}' prompt =f"""Check if the following queries are approx equal. Use fuzzy logic matching for strings.Check to see if the arguments are semantically similar, especially for free form text.If you decide they are equivalent then return TRUE and only TRUE with no other explanation. Otherwise return FALSE and give an explanation why they don't match.Expected Arguments: {expected['arguments']}Predicted Arguments: {predicted['arguments']} """ resp = llm(model="gpt-4-0125-preview", messages=[dict(role="user", content=prompt)])if resp.choices[0].message.content.lower().strip() =="true":returnTrue, None explanation = resp.choices[0].message.content.lower().strip()returnFalse, explanation
Okay, let’s loop over the questions and use gpt-3.5-turbo-0125 to extract the function name and arguments.
Code
def eval_openai_inference_models(model="gpt-3.5-turbo-0125", base_url=None, api_key=None): total =0 total_correct =0for question in questions: resp = llm( api_key=api_key, base_url=base_url, model=model, tools=tools, messages=[dict(role="system", content=f"The date today is {today}"),dict(role="user", content=question["question"]), ], ) tool_calls = extract_tool_calls(resp)if tool_calls isNone:print(f'Model {model} failed to return any tool calls for question {question["question"]}') total +=1continueassertlen(tool_calls) ==len(question["tool_calls"])for tool_call, expected_call inzip(tool_calls, question["tool_calls"]): correct_call, explanation = check_tool_call_arguments(expected_call, tool_call)ifnot correct_call:print(f'QUESTION: {question["question"]}')print(f'EXPECTED Tool Call: {question["tool_calls"][0]}')print(f"GENERATED Tool Call: {tool_call}")print(f"EXPLANATION: {explanation}\n\n")else: total_correct +=1 total +=1return total_correct, total
Code
model ="gpt-3.5-turbo-0125"total_correct, total = eval_openai_inference_models(model=model, base_url=None, api_key=None)print(f'Correctly called the proper functions {total_correct} times out of {total}. But check the "failure" cases above since they may be correct anyway.')
Correctly called the proper functions 18 times out of 18. But check the "failure" cases above since they may be correct anyway.
gpt-4-0125-preview Function Calling
Code
model ="gpt-4-0125-preview"total_correct, total = eval_openai_inference_models(model=model, base_url=None, api_key=None)print(f'Correctly called the proper functions {total_correct} times out of {total}. But check the "failure" cases above since they may be correct anyway.')
QUESTION: What's the forecast for Miami for today?
EXPECTED Tool Call: {'name': 'get_weather_forecast', 'arguments': {'location': 'Miami, FL', 'date': '2024-03-18'}}
GENERATED Tool Call: {'name': 'get_weather_foreast', 'arguments': {'date': '2024-03-18', 'location': 'Miami, FL'}}
EXPLANATION: Function Names Do not Match. Expected get_weather_forecast. Predicted: get_weather_foreast
Correctly called the proper functions 17 times out of 18. But check the "failure" cases above since they may be correct anyway.
Mistral-7B-Instruct-v0.1 with together.ai Function Calling
Code
model ="mistralai/Mistral-7B-Instruct-v0.1"total_correct, total = eval_openai_inference_models(model=model, base_url=TOGETHER_AI_BASE_URL, api_key=TOGETHER_API_KEY)print(f'Correctly called the proper functions {total_correct} times out of {total}. But check the "failure" cases above since they may be correct anyway.')
Model mistralai/Mistral-7B-Instruct-v0.1 failed to return any tool calls for question How do I make pesto?
QUESTION: What's a good vegan chili recipe?
EXPECTED Tool Call: {'name': 'get_recipe', 'arguments': {'dish_name': 'vegan chili'}}
GENERATED Tool Call: {'name': 'solve_math_problem', 'arguments': {'problem': 'What is the square root of 16?'}}
EXPLANATION: Function Names Do not Match. Expected get_recipe. Predicted: solve_math_problem
Correctly called the proper functions 16 times out of 18. But check the "failure" cases above since they may be correct anyway.
mistralai/Mixtral-8x7B-Instruct-v0.1 with together.ai Function Calling
Code
model ="mistralai/Mixtral-8x7B-Instruct-v0.1"total_correct, total = eval_openai_inference_models(model=model, base_url=TOGETHER_AI_BASE_URL, api_key=TOGETHER_API_KEY)print(f'Correctly called the proper functions {total_correct} times out of {total}. But check the "failure" cases above since they may be correct anyway.')
Model mistralai/Mixtral-8x7B-Instruct-v0.1 failed to return any tool calls for question How do I make pesto?
QUESTION: I need to book a first class round-trip flight for 4 people from Chicago to Miami. We want to leave on December 1 and return on December 12.
EXPECTED Tool Call: {'name': 'book_flight', 'arguments': {'departure_city': 'Chicago', 'arrival_city': 'Miami', 'departure_date': '2024-12-01', 'return_date': '2024-12-12', 'num_passengers': 4, 'cabin_class': 'first'}}
GENERATED Tool Call: {'name': 'book_flight', 'arguments': {'departure_city': 'Chicago', 'arrival_city': 'Miami', 'departure_date': '2023-12-01', 'return_date': '2023-12-12', 'num_passengers': 4, 'cabin_class': 'first'}}
EXPLANATION: false
the departure_date and return_date values do not match. the expected arguments have dates in 2024, while the predicted arguments have dates in 2023.
QUESTION: Book me a round-trip flight from New York City to Los Angeles departing on June 15th and returning June 22nd for 2 passengers in economy class.
EXPECTED Tool Call: {'name': 'book_flight', 'arguments': {'departure_city': 'NYC', 'arrival_city': 'LAX', 'departure_date': '2024-06-15', 'return_date': '2024-06-22', 'num_passengers': 2, 'cabin_class': 'economy'}}
GENERATED Tool Call: {'name': 'book_flight', 'arguments': {'departure_city': 'New York City', 'arrival_city': 'Los Angeles', 'departure_date': '2023-06-15', 'return_date': '2023-06-22', 'num_passengers': 2, 'cabin_class': 'economy'}}
EXPLANATION: false
explanation:
- the 'departure_city' and 'arrival_city' fields match semantically as 'nyc' is commonly known as 'new york city' and 'lax' is a well-known shorthand for the los angeles airport, often used to refer to los angeles itself.
- the 'departure_date' and 'return_date' do not match. the expected arguments specify a year 2024, while the predicted arguments have the year 2023 for both dates.
- the 'num_passengers' and 'cabin_class' fields match exactly in both value and semantics.
the primary reason for the non-match is the difference in 'departure_date' and 'return_date' by one year.
Correctly called the proper functions 15 times out of 18. But check the "failure" cases above since they may be correct anyway.
What is going on with together.ai function calling mistakes above
Both models had issues with the pesto question. I wonder if this is something on together.ai’s end of things and how they implemented this function calling feature. IDK!
NousResearch/Hermes-2-Pro-Mistral-7B Function Calling
Now we will repeat with NousResearch/Hermes-2-Pro-Mistral-7B. The format for the function calling is documented on the model card as well as in this repo. The way we define the tools is the same format as with OpenAI. However, we don’t pass in a tools argument. Rather, we use a special system prompt which defines the tools.
Code
def extract_tool_calls(tool_calls_str): tool_calls = tool_calls_str.split("</tool_call>\n") parsed_results = []for tool_call in tool_calls:if tool_call: dict_str = tool_call.split("\n")[1] tool_call_dict = ast.literal_eval(dict_str) parsed_results.append({"arguments": tool_call_dict["arguments"], "name": tool_call_dict["name"]})return parsed_resultssystem_prompt = (f"The date today is {today}\n"+"""You are a function calling AI model. You are provided with function signatures within <tools></tools> XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. Here are the available tools:<tools> """+str(tools)+"""</tools> Use the following pydantic model json schema for each tool call you will make: {'title': 'FunctionCall', 'type': 'object', 'properties': {'arguments': {'title': 'Arguments', 'type': 'object'}, 'name': {'title': 'Name', 'type': 'string'}}, 'required': ['arguments', 'name']} For each function call return a json object with function name and arguments within <tool_call></tool_call> XML tags as follows:<tool_call>{'arguments': <args-dict>, 'name': <function-name>}</tool_call>""")total =0total_correct =0for question in questions: resp = llm( model="tgi", base_url=HUGGING_FACE_ENDPOINT_URL, api_key=HUGGING_FACE_ACCESS_TOKEN, messages=[dict(role="system", content=system_prompt),dict(role="user", content=question["question"]), ], max_tokens=500, ) tool_calls = extract_tool_calls(resp.choices[0].message.content)assertlen(tool_calls) ==len(question["tool_calls"])for tool_call, expected_call inzip(tool_calls, question["tool_calls"]): correct_call, explanation = check_tool_call_arguments(expected_call, tool_call)ifnot correct_call:print(f'QUESTION: {question["question"]}')print(f'EXPECTED Tool Call: {question["tool_calls"][0]}')print(f"GENERATED Tool Call: {tool_call}")print(f"EXPLANATION: {explanation}\n\n")else: total_correct +=1 total +=1
Code
print(f'Correctly called the proper functions {total_correct} times out of {total}. But check the "failure" cases above since they may be correct anyway.')
Correctly called the proper functions 18 times out of 18. But check the "failure" cases above since they may be correct anyway.
Wow, it got all of them correct! It may not get them all correct every time. Run it over again to see if any mistakes are made. Sometimes I saw it forgetting to fill in num_tickets for example.
Let’s look at a single question to see the output from the model.
Code
today
'Saturday 2024-03-16'
Code
question ="I want to go see Dune 2 on Wednesday night with 5 of my friends. We will be going to the Halifax Bayers Lake Ciniplex Theatre. Get tickets for the 7pm show. Thanks!"
tasks =f"""Today's date is {today}.Please complete the following tasks for me:1. I want to go see Dune 2 on Monday night with 5 of my friends. We will be going to the Halifax Bayers Lake Ciniplex Theatre. Get tickets for the 7pm show.2. Please check the weather for Monday night so I know how to dress.3. Also please book my plane ticket to Toronto. I will be leaving Tuesday and coming back 2 days later on Thursday. First class please.4. Send a slack message to the research channel to let them know I will not be there this week in the office."""
[{'arguments': {'movie_name': 'Dune 2',
'theater_name': 'Halifax Bayers Lake Ciniplex Theatre',
'date': '2024-03-18',
'time': '19:00',
'num_tickets': 6},
'name': 'book_movie_tickets'},
{'arguments': {'location': 'Halifax Bayers Lake', 'date': '2024-03-18'},
'name': 'get_weather_forecast'},
{'arguments': {'departure_city': 'Halifax',
'arrival_city': 'Toronto',
'departure_date': '2024-03-19',
'return_date': '2024-03-21',
'num_passengers': 1,
'cabin_class': 'first'},
'name': 'book_flight'},
{'arguments': {'channel_name': 'research',
'message': 'I will not be in the office this week.'},
'name': 'send_slack_message'}]
Conclusion
Impressive!
You can take the arguments, and pass them into the actual function, and give back the results to the model. See the model card or repo on how to do that.
There is JSON Mode support too!
I’m just getting started with playing around with this powerful open source model. I can’t wait to explore it more!