Note that Microsoft do have their own LLM team, and their own model called Phi-4.
Recently I was looking for a small LLM that could perform reasonably well while answering questions with low latency, for near realtime conversations running on a single RTX 3090. I settled on Microsoft’s Phi-4 model so far. However I’m not sure yet if my choice is good and open to more suggestions!
I've been using claude running via Ollama (incept5/llama3.1-claude) and I've been happy with the results. The only annoyance I have is that it won't search the internet for information because that capability is disabled via flag.
That's.. that's not the Claude people talk about when they say Claude. Just to be sure.