This looks great; very useful for (example) ranking outputs by confidence so you can do human reviews of the not-confident ones.
Any chance we can get Pydantic support?
Fyi logprobs !== confidence.
If you run "bananas,fishbowl,phonebook," and get {"sponge": 0.76}
It doesn't mean that "placemat" was the 76% correct answer. Just that the word "sponge" was the next most likely word for the model to generate.
Actually, OpenAI provides Pydantic support for structured output (see in
The library is compatible with that but does not use Pydantic further than that.
Right the hope was to go further. E.g. if the input is:
class Classification(BaseModel):
color: Literal['red', 'blue', 'green']
```then the output type would be:
class ClassificationWithLogProbs(BaseModel):
color: Dict[Literal['red', 'blue', 'green'], float]
```Don't take this too literally; I'm not convinced that this is the right way to do it. But it would provide structure and scores without dealing with a mess of complex JSON.
but this ultimately just converts to json schema, or the openai function calling definition format.
One question I always had was what about the descriptions you can attach to the class and attributes? ( = Field(description=...) in pydantic) is the model made aware of those descriptions?