Phi-3 is five months old now. I suggest trying Phi 3.5 instead - it's effectively the same size (2.2GB from HF - Phi-3 Mini is 2.2GB as well) but should provide better results.
If you have Ollama installed for that there are plenty of other interesting models to try out too. I like the Llama 3.2 small models, or if you have a whole lot of RAM (I use 64GB on an M2 MacBook Pro) you can run Llama 3.3 70B which is genuinely GPT-4 class: https://simonwillison.net/2024/Dec/9/llama-33-70b/
pkaye 21 days ago [-]
How does Llama-3.2 3B compare to Phi-3.5?
ron0c 22 days ago [-]
This is the AI I am excited for. Data and execution local to my machine. I think Intel is betting on this with the copilot included processors. I hope ollama or other local AI services will be able to utilize these co-processors soon.
ekianjo 22 days ago [-]
The NPUs on laptops don't have access to enough memory to run very large models.
talldayo 22 days ago [-]
Oftentimes they do. If they don't, it's not very hard to page memory to and from the NPU until the operation is completed.
The bigger problem is that this NPU hardware isn't built around scaling to larger models. It's laser-focused on dense computation and low-precision inference, which usually isn't much more efficient than running the same matmul as a compute shader. For Whisper-scale models that don't require insanely high precision or super sparse decoding, NPU hardware can work great. For LLMs it is almost always going to be slower than a well-tuned GPU.
650REDHAIR 22 days ago [-]
Right, but for most people do they need access to a huge model locally?
e12e 21 days ago [-]
AFAIU NPUs are for things like voice input/output, computer vision/hand gesture io, knowing how many people/who are in front of the camera etc. Always on, real-time "ai peripherals" - not content generation.
I believe Microsoft calls them "SLMs - Small Language Models".
ben_w 22 days ago [-]
Most people shouldn't host locally at all.
Of those who do, I can see students and researchers benefiting from small models. Students in particular are famously short on money for fancy hardware.
My experience trying one of the Phi models (I think 3, might have been 2) was brief, because it failed so hard: my first test was to ask for a single page web app Tetris clone, and not only was the first half the output simply doing that task wrong, the second half was a sudden sharp turn into python code to train an ML model — it didn't even delimit the transition, one line javascript, the next python.
diggan 21 days ago [-]
> My experience trying one of the Phi models (I think 3, might have been 2) was brief
The Phi models are tiny LMs, maybe SLM is more fitting label than LLM (Large -> Small). As such, you cannot throw even semi-complicated problems at them. Things like "autocomplete" and other simpler things are the use cases you'd use it for, not "code this game for me", you'll need something much more powerful for that.
ben_w 21 days ago [-]
> Things like "autocomplete" and other simpler things are the use cases you'd use it for, not "code this game for me", you'll need something much more powerful for that.
Indeed, clearly.
However, it was tuned for chat, and people kept telling me it was competitive with the OpenAI models for coding.
ron0c 13 days ago [-]
Asking a leading LLM to "code a game" is a tall order. I have found a lot of success with self hosted small models to accomplish coding that would have taking me months without. I just break down the "code me a game" to its parts.
Think of it like an extended auto complete.
miohtama 21 days ago [-]
Maybe a better solution is privately hosted cloud solution, or just any SaaS that cannot violate data privacy by design.
sofixa 21 days ago [-]
> any SaaS that cannot violate data privacy by design
And that is hosted in a jurisdiction that forces them to take it seriously, e.g. Mistral in France that has to comply with GDPR and any AI and privacy regulations out of the EU.
msoad 22 days ago [-]
in my opinion there is room for small and fast and large and slow but much smarter models. Use cases like phone keyboard autocomplete and next few words suggestion in coding or writing need very fast models that should by definition small. Very large models that are much smarter are also useful, for instance debugging issues or proofreading long letters.
Cursor really aced this. The Cursor model is very fast to suggest useful inline completions and then leaves big problems to big models.
mycall 22 days ago [-]
Could chaining models together via tool calls based on benchmarking that would redirect to the best model allow for smaller models to perform as well as big[er] models for memory constrained/local environments?
Are there any other tools similar to pieces.app that are useful and preferably open-source, which can be integrated into the developer workflow? I’ve used Heynote, which helps to some extent, but it’s not a direct fit and isn’t a complete AI developer workflow companion.
maccam912 22 days ago [-]
Is there any rule of thumb for small language models vs large language models? I've seen phi 4 called a small language model but with 14 billion parameters, it's larger than some large language models.
ekianjo 22 days ago [-]
7b to 9b is usually what we call small. the rule of thumb is a model that you can run on a single GPU.
exitb 21 days ago [-]
It’s not a useful distinction. The first LLMs had less than 1 billion parameters anyway.
kittikitti 21 days ago [-]
I would claim that even 500 million parameters could be considered large.
akudha 22 days ago [-]
Apologies for the dumb question - can these models be used at my work, i.e, for commercial purposes? What is the legality of it?
Do we know for sure that model is not trained with copyrighted material or with GPL-lisenced code? That is the biggest issue right now.
minimaxir 21 days ago [-]
That is the case with every LLM (except a couple research experiments) and won’t be resolved until the courts do.
Literally every tech company that uses LLMs would be in legal trouble if that becomes the precedent.
nicce 21 days ago [-]
Yes. It is a bigger problem than the correct lisence of the model, and I feel that original commenter is not aware of that.
Many companies are waiting for court decisions and are not using even GitHub Copilot. There is even growing business for making analysis for binaries and source code whether they use GPL code or not.
21 days ago [-]
smallerize 22 days ago [-]
In the USA, code generated by a computer cannot be copyrighted. So you can use it for commercial purposes, but you can't control it the way you could with code that you wrote yourself. And that's legally fine, but your company's legal department might not like that idea.
lodovic 22 days ago [-]
That's not entirely accurate. In the US, computer-generated code can be copyrighted. The key point is that copyright protection extends to the original expression in the code, but not to its functional aspects, such as algorithms, system design, or logic.
kittikitti 21 days ago [-]
This person has no idea what they're talking about. "Code generated by a computer"?
smallerize 21 days ago [-]
"works produced by a machine or mere mechanical process that operates randomly or automatically without any creative input or intervention from a human author"
If you have Ollama installed for that there are plenty of other interesting models to try out too. I like the Llama 3.2 small models, or if you have a whole lot of RAM (I use 64GB on an M2 MacBook Pro) you can run Llama 3.3 70B which is genuinely GPT-4 class: https://simonwillison.net/2024/Dec/9/llama-33-70b/
The bigger problem is that this NPU hardware isn't built around scaling to larger models. It's laser-focused on dense computation and low-precision inference, which usually isn't much more efficient than running the same matmul as a compute shader. For Whisper-scale models that don't require insanely high precision or super sparse decoding, NPU hardware can work great. For LLMs it is almost always going to be slower than a well-tuned GPU.
I believe Microsoft calls them "SLMs - Small Language Models".
Of those who do, I can see students and researchers benefiting from small models. Students in particular are famously short on money for fancy hardware.
My experience trying one of the Phi models (I think 3, might have been 2) was brief, because it failed so hard: my first test was to ask for a single page web app Tetris clone, and not only was the first half the output simply doing that task wrong, the second half was a sudden sharp turn into python code to train an ML model — it didn't even delimit the transition, one line javascript, the next python.
The Phi models are tiny LMs, maybe SLM is more fitting label than LLM (Large -> Small). As such, you cannot throw even semi-complicated problems at them. Things like "autocomplete" and other simpler things are the use cases you'd use it for, not "code this game for me", you'll need something much more powerful for that.
Indeed, clearly.
However, it was tuned for chat, and people kept telling me it was competitive with the OpenAI models for coding.
Think of it like an extended auto complete.
And that is hosted in a jurisdiction that forces them to take it seriously, e.g. Mistral in France that has to comply with GDPR and any AI and privacy regulations out of the EU.
Cursor really aced this. The Cursor model is very fast to suggest useful inline completions and then leaves big problems to big models.
Literally every tech company that uses LLMs would be in legal trouble if that becomes the precedent.
Many companies are waiting for court decisions and are not using even GitHub Copilot. There is even growing business for making analysis for binaries and source code whether they use GPL code or not.
https://arstechnica.com/information-technology/2023/02/us-co...