Is there a terminology battle happening in some circles? And if so, what are the consequences of being wrong and using the wrong terminology?
I follow the rnd and progress in this space and I haven't heard anyone make a fuss about it. They are all LLMs or transformers or neural nets but they can be trained or optimized to do different things. For sure, there's terms like Reasoning models or Chat models or Instruct models and yes they're all LLMs.
But you can now start combining them to have hybrid models too. Are Omni models that handle audio and visual data still "language" models? This question is interesting in its own right for many reasons, but not to justify or bemoan the use of term LLM.
LLM is a good term, it's a cultural term too. If you start getting pedantic, you'll miss the bigger picture and possibly even the singularity ;)
bluejay2387 33 days ago [-]
So there is a language war going on in the industry and some of its justified and some of its not. Take 'agents' as an example. I have seen an example of where a low code / no code service dropped in a LLM node in a 10+ year old product, started calling themselves an 'agent platform' and jacked up their price by a large margin. This is probably a case where a debate as to what qualifies as an 'agent' is appropriate.
Alternatively I have seen debates as to what counts as a 'Small Language Model' that probably are nonsensical. Particularly because in my personal language war the term 'small language model' shouldn't even exist (no one knows that the threshold is, and our 'small' language models are bigger than the 'large' language models from just a few years ago).
This is fairly typical of new technology. Marketing departments will constantly come up with new terms or try to take over existing terms to push agendas. Terms with defined meaning will get abused by casual participants and loose all real meaning. Individuals new to the field will latch on to popular misuses of terms as they try to figure out what everyone is talking about and perpetuate definition creep. Old hands will overly focus on hair splitting exercises that no one else really cares about and sigh in dismay as their carefully cultured taxonomies collapse under expansion of interest in their field.
It will all work itself out in 10 years or so.
BoiledCabbage 33 days ago [-]
There is a reason why cars and computers are sold with specs. 0-60 time, fuel efficiency...
People need to know the performance they can expect from LLMs or agents. What are they capable of?
graypegg 33 days ago [-]
A 2009 honda civic can get an under-5 seconds 0-60 easily... however it does involve high a cliff.
Result Specs (as in measuring output/experimental results) need strict definitions to be useful and I think the current ones with have for LLMs are pretty weak. (mostly benchmarks that model one kind of interaction, and usually not any sort of useful interaction)
Well i don't see why we need to mangle the jargon. "Language model" has an old meaning from NLP (which still applies), as a computer model of language itself. Most commonly, a joint probability distribution over words or sequences of words, which is what LLMs are too. Prompted replies are literally conditional probability densities conditioned on the context you give it. "Foundation model" is a more general term I see a lot.
To say a model is "just a LLM" is to presumably complain that it has no added bells or whistles that someone thinks is required beyond the above statistical model. And maybe I missed the point, but the author seems to be saying "yes it's just a LLM, but LLMs are all you need".
janalsncm 32 days ago [-]
There was an HN thread that talked about how “just” is a 4 letter word. It significantly risks underestimating emergent properties and behaviors.
Every time you see “X is just Y” you should think of emergent behaviors. Complexity is difficult to predict.
> R1 Zero has similar reasoning capabilities of R1 without requiring any SFT
In fact R1 zero was slightly better. This is an argument that RL and thinking tokens were a genuinely useful technique which I see as counter to the author’s thesis.
I also think a lot of what the author is referring to was more generously arguing against next token prediction (exact match of an answer) rather than the sequence-level rewards in R1.
“The architecture of the DeepSeek SYSTEM includes a model, and RL architecture that leverages symbolic rule.”
Marcus has long been a critic of deep learning and LLMs, saying they would “hit a wall”.
throwaway314155 33 days ago [-]
> They say: “the progresses we are seeing are due to the fact that models like OpenAI o1 or DeepSeek R1 are not just LLMs”.
Would be nice if the author could cite even one example of this as it doesn't match my experience whatsoever.
tucnak 32 days ago [-]
Your experience doesn't include Le Cunn, Chollet, et al?
throwaway314155 32 days ago [-]
It doesn't. This is particularly tough to search for and i'm not on social media. I'd be surprised if Le Cunn somehow thought these reasoning models were somehow architecturally unique from a good old LLM. It's all in the training regime, right?
In any case I'll take your word for it, but that's still surprising to me.
I follow the rnd and progress in this space and I haven't heard anyone make a fuss about it. They are all LLMs or transformers or neural nets but they can be trained or optimized to do different things. For sure, there's terms like Reasoning models or Chat models or Instruct models and yes they're all LLMs.
But you can now start combining them to have hybrid models too. Are Omni models that handle audio and visual data still "language" models? This question is interesting in its own right for many reasons, but not to justify or bemoan the use of term LLM.
LLM is a good term, it's a cultural term too. If you start getting pedantic, you'll miss the bigger picture and possibly even the singularity ;)
Alternatively I have seen debates as to what counts as a 'Small Language Model' that probably are nonsensical. Particularly because in my personal language war the term 'small language model' shouldn't even exist (no one knows that the threshold is, and our 'small' language models are bigger than the 'large' language models from just a few years ago).
This is fairly typical of new technology. Marketing departments will constantly come up with new terms or try to take over existing terms to push agendas. Terms with defined meaning will get abused by casual participants and loose all real meaning. Individuals new to the field will latch on to popular misuses of terms as they try to figure out what everyone is talking about and perpetuate definition creep. Old hands will overly focus on hair splitting exercises that no one else really cares about and sigh in dismay as their carefully cultured taxonomies collapse under expansion of interest in their field.
It will all work itself out in 10 years or so.
People need to know the performance they can expect from LLMs or agents. What are they capable of?
Result Specs (as in measuring output/experimental results) need strict definitions to be useful and I think the current ones with have for LLMs are pretty weak. (mostly benchmarks that model one kind of interaction, and usually not any sort of useful interaction)
To say a model is "just a LLM" is to presumably complain that it has no added bells or whistles that someone thinks is required beyond the above statistical model. And maybe I missed the point, but the author seems to be saying "yes it's just a LLM, but LLMs are all you need".
Every time you see “X is just Y” you should think of emergent behaviors. Complexity is difficult to predict.
> R1 Zero has similar reasoning capabilities of R1 without requiring any SFT
In fact R1 zero was slightly better. This is an argument that RL and thinking tokens were a genuinely useful technique which I see as counter to the author’s thesis.
I also think a lot of what the author is referring to was more generously arguing against next token prediction (exact match of an answer) rather than the sequence-level rewards in R1.
“The architecture of the DeepSeek SYSTEM includes a model, and RL architecture that leverages symbolic rule.”
Marcus has long been a critic of deep learning and LLMs, saying they would “hit a wall”.
Would be nice if the author could cite even one example of this as it doesn't match my experience whatsoever.
In any case I'll take your word for it, but that's still surprising to me.
Some argue they were kind of tricked into thinking that; see https://www.interconnects.ai/p/openais-o1-using-search-was-a... and some other writing by Lambert which has turned out pretty much on-point as far as RL and verifiers are concerned.
* applies only to meatreaders