OpenAI’s backend converting messy unstructured data to structured data via functions
OpenAI’s “Function Calling” might be the most groundbreaking yet under appreciated feature released by any software company… ever.
What are GPT Functions
Functions allow you to turn unstructured data into structured data. This might not sound all that groundbreaking but when you consider that 90% of data processing and data entry jobs worldwide exist for this exact reason, it’s quite a revolutionary feature that went somewhat unnoticed.
Have you ever found yourself begging GPT (3.5 or 4) to spit out the answer you want and absolutely nothing else? No “Sure, here is your…” or any other useless fluff surrounding the core answer. GPT Functions are the solution you’ve been looking for.
How are Functions meant to work?
OpenAI’s docs on function calling are extremely limited. You’ll find yourself digging through their developer forum for examples of how to use them. I dug around the forum for you and have many example coming up.
Here’s one of the only examples you’ll be able to find in their docs:
functions = [
{
“name”: “get_current_weather”,
“description”: “Get the current weather in a given location”,
“parameters”: {
“type”: “object”,
“properties”: {
“location”: {
“type”: “string”,
“description”: “The city and state, e.g. San Francisco, CA”,
},
“unit”: {“type”: “string”, “enum”: [“celsius”, “fahrenheit”]},
},
“required”: [“location”],
},
}
]
A function definition is a rigid JSON format that defines a function name, description and parameters. In this case, the function is meant to get the current weather. Obviously GPT isn’t able to call this actual API (since it doesn’t exist) but using this structured response you’d be able to connect the real API hypothetically.
At a high level however, functions provide two layers of inference:
Picking the function itself:
You may notice that functions are passed into the OpenAI API call as an array. The reason you provide a name and description to each function are so GPT can decide which to use based on a given prompt. Providing multiple functions in your API call is like giving GPT a Swiss army knife and asking it to cut a piece of wood in half. It knows that even though it has a pair of pliers, scissors and a knife, it should use the saw!
Function definitions contribute towards your token count. Passing in hundreds of functions would not only take up the majority of your token limit but also result in a drop in response quality. I often don’t even use this feature and only pass in 1 function that I force it to use. It is very nice to have in certain use cases however.
Picking the parameter values based on a prompt:
This is the real magic in my opinion. GPT being able to choose the tool in it’s tool kit is amazing and definitely the focus of their feature announcement but I think this applies to more use cases.
You can imagine a function like handing GPT a form to fill out. It uses its reasoning, the context of the situation and field names/descriptions to decide how it will fill out each field. Designing the form and the additional information you pass in is where you can get creative.
GPT filling out your custom form (function parameters)
5 Useful Applications
Data Extraction
One of the most common things I use functions for to extract specific values from a large chunk of text. The sender’s address from an email, a founders name from a blog post, a phone number from a landing page.
I like to imagine I’m searching for a needle in a haystack except the LLM burns the haystack, leaving nothing but the needle(s).
GPT Data Extraction Personified.
Use case: Processing thousands of contest submissions
I built an automation that iterated over thousands of contest submissions. Before storing these in a Google sheet I wanted to extract the email associated with the submission. Heres the function call I used for extracting their email.
{
“name”:”update_email”,
“description”:”Updates email based on the content of their submission.”,
“parameters”:{
“type”:”object”,
“properties”:{
“email”:{
“type”:”string”,
“description”:”The email provided in the submission”
}
},
“required”:[
“email”
]
}
}
Scoring
Assigning unstructured data a score based on dynamic, natural language criteria is a wonderful use case for functions. You could score comments during sentiment analysis, essays based on a custom grading rubric, a loan application for risk based on key factors. A recent use case I applied scoring to was scoring of sales leads from 0–100 based on their viability.
Use Case: Scoring Sales leads
We had hundreds of prospective leads in a single google sheet a few months ago that we wanted to tackle from most to least important. Each lead contained info like company size, contact name, position, industry etc.
Using the following function we scored each lead from 0–100 based on our needs and then sorted them from best to worst.
{
“name”:”update_sales_lead_value_score”,
“description”:”Updates the score of a sales lead and provides a justification”,
“parameters”:{
“type”:”object”,
“properties”:{
“sales_lead_value_score”:{
“type”:”number”,
“description”:”An integer value ranging from 0 to 100 that represents the quality of a sales lead based on these criteria. 100 is a perfect lead, 0 is terrible. Ideal Lead Criteria:n- Medium sized companies (300-500 employees is the best range)n- Companies in primary resource heavy industries are best, ex. manufacturing, agriculture, etc. (this is the most important criteria)n- The higher up the contact position, the better. VP or Executive level is preferred.”
},
“score_justification”:{
“type”:”string”,
“description”:”A clear and conscise justification for the score provided based on the custom criteria”
}
}
},
“required”:[
“sales_lead_value_score”,
“score_justification”
]
}
Categorization:
Define custom buckets and have GPT thoughtfully consider each piece of data you give it and place it in the correct bucket. This can be used for labelling tasks like selecting the category of youtube videos or for discrete scoring tasks like assigning letter grades to homework assignments.
Use Case: Labelling news articles.
A very common first step in data processing workflows is separating incoming data into different streams. A recent automation I built did exactly this with news articles scraped from the web. I wanted to sort them based on the topic of the article and include a justification for the decision once again. Here’s the function I used:
{
“name”:”categorize”,
“description”:”Categorize the input data into user defined buckets.”,
“parameters”:{
“type”:”object”,
“properties”:{
“category”:{
“type”:”string”,
“enum”:[
“US Politics”,
“Pandemic”,
“Economy”,
“Pop culture”,
“Other”
],
“description”:”US Politics: Related to US politics or US politicians, Pandemic: Related to the Coronavirus Pandemix, Economy: Related to the economy of a specific country or the world. , Pop culture: Related to pop culture, celebrity media or entertainment., Other: Doesn’t fit in any of the defined categories. “
},
“justification”:{
“type”:”string”,
“description”:”A short justification explaining why the input data was categorized into the selected category.”
}
},
“required”:[
“category”,
“justification”
]
}
}
Option-Selection:
Often times when processing data, I give GPT many possible options and want it to select the best one based on my needs. I only want the value it selected, no surrounding fluff or additional thoughts. Functions are perfect for this.
Use Case: Finding the “most interesting AI news story” from hacker news
I wrote another medium article here about how I automated my entire Twitter account with GPT. Part of that process involves selecting the most relevant posts from the front pages of hacker news. This post selection step leverages functions!
To summarize the functions portion of the use case, we would scrape the first n pages of hacker news and ask GPT to select the post most relevant to “AI news or tech news”. GPT would return only the headline and the link selected via functions so that I could go on to scrape that website and generate a tweet from it.
I would pass in the user defined query as part of the message and use the following function definition:
{
“name”:”find_best_post”,
“description”:”Determine the best post that most closely reflects the query.”,
“parameters”:{
“type”:”object”,
“properties”:{
“best_post_title”:{
“type”:”string”,
“description”:”The title of the post that most closely reflects the query, stated exactly as it appears in the list of titles.”
}
},
“required”:[
“best_post_title”
]
}
}
Filtering:
Filtering is a subset of categorization where you categorize items as either true or false based on a natural language condition. A condition like “is Spanish” will be able to filter out all Spanish comments, articles etc. using a simple function and conditional statement immediately after.
Use Case: Filtering contest submission
The same automation that I mentioned in the “Data Extraction” section used ai-powered-filtering to weed out contest submissions that didn’t meet the deal-breaking criteria. Things like “must use typescript” were absolutely mandatory for the coding contest at hand. We used functions to filter out submissions and trim down the total set being processed by 90%. Here is the function definition we used.
{
“name”:”apply_condition”,
“description”:”Used to decide whether the input meets the user provided condition.”,
“parameters”:{
“type”:”object”,
“properties”:{
“decision”:{
“type”:”string”,
“enum”:[
“True”,
“False”
],
“description”:”True if the input meets this condition ‘Does submission meet the ALL these requirements (uses typescript, uses tailwindcss, functional demo)’, False otherwise.”
}
},
“required”:[
“decision”
]
}
}
If you’re curious why I love functions so much or what I’ve built with them you should check out AgentHub!
AgentHub is the Y Combinator-backed startup I co-founded that let’s you automate any repetitive or complex workflow with AI via a simple drag and drop no-code platform.
“Imagine Zapier but AI-first and on crack.” — Me
Automations are built with individual nodes called “Operators” that are linked together to create power AI pipelines. We have a catalogue of AI powered operators that leverage functions under the hood.
Our current AI-powered operators that use functions!
Check out these templates to see examples of function use-cases on AgentHub: Scoring, Categorization, Option-Selection,
If you want to start building AgentHub is live and ready to use! We’re very active in our discord community and are happy to help you build your automations if needed.
Feel free to follow the official AgentHub twitter for updates and myself for AI-related content.
GPT Function Calling: 5 Underrated Use Cases was originally published in Better Programming on Medium, where people are continuing the conversation by highlighting and responding to this story.