> 1 The tech industry has accidentally invented the initial stages a completely new kind of mind, based on completely unknown principles...
> 2) The intelligence illusion is in the mind of the user and not in the LLM itself.
I've felt as though there is something in between. Maybe:
3) The tech industry invented the initial stages a kind of mind that, though misses the mark, is approaching something not too dissimilar to how an aspect of human intelligence works.
> By using validation statements, … the chatbot and the psychic both give the impression of being able to make extremely specific answers, but those answers are in fact statistically generic.
"Mr. Geller, can you write some Python code for me to convert a 1-bit .bmp file to a hexadecimal string?"
Sorry, even if you think the underlying mechanisms have some sort of analog there's real value in LLM's, not so psychics doing "cold readings".
everdrive 34 days ago [-]
I think it's fair to argue that part of human intelligence is actually just a statistical matching. The best example which comes to mind is actually grammar. Grammar has a very complex set of rules, however most people cannot really name them or describe them accurately; instead, they just know whether or not a sentence _sounds_ correct. This feels a lot like the same statistical matching performed by LLMs. An individual's reasoning iterates in their mind what words follow each other, and what phrases are likely.
Outside of grammar, you can hear a lot of this when people talk; their sentences wander, and they don't always seem to know ahead of time where their sentences will end up. They start anchored to a thought, and seem to hope that the correct words end up falling into place.
Now, does all thought work like this? Definitely not, and more importantly, there are many other facets of thought which are not present in LLMs. When someone has wandered badly when trying to get a sentence out, they are often also able to introspect and see that they failed to articulate their thought. They can also slow down their speaking, or pause, and plan out ahead of time; in effect, using this same introspection to prevent themselves from speaking poorly in the first place. Of course there's also memory, consciousness, and all sorts of other facets of intelligence.
What I'm on the fence about is whether this point, or your point actually detracts from the author's argument.
mock-possum 33 days ago [-]
Recognizing a pattern and reproducing it is a huge part of the human experience, and that’s the main thing that’s intelligent-like about LLMs. A lot of time they’re lack context / cohesion, and they’re at a disadvantage for not being able to follow normal human social cues to course correct, as you point out.
ianbicking 34 days ago [-]
Yeah, the basic premise is off because LLM responses are regularly tested against ground truth (like running the code they produce), and LLMs don't get to carefully select what requests they fulfill. To the contrary they fulfill requests even when they are objectively incapable of answering correctly, such as incomplete or impossible questions.
I do think there is a degree of mentalist-like behavior that happens, maybe especially because of the RLHF step, where the LLM is encouraged to respond in ways that seem more truthful or compelling than is justified by its ability. We appreciate the LLM bestowing confidence on us, and rank an answer more highly if it gives us that confidence... not unlike the person who goes to a spiritualist wanting to receive comforting news of a loved one who has passed. It's an important attribute of LLMs to be aware of, but not the complete explanation the author is looking for.
pockmarked19 33 days ago [-]
Which aspect of how a human leg works is a truck tire similar to?
Terr_ 34 days ago [-]
There's another illusory effect here: Humans are being encouraged to confuse a fictional character with the real-world "author" system.
I can create a mad-libs program which dynamically reassembles stories involving a kind and compassionate Santa Claus, but that does not mean the program shares those qualities. I have not digitally reified the spirit of Christmas, not even if excited human kids contribute some of the words that shape its direction and clap with glee.
P.S.: This "LLM just makes document bigger" framing is also very useful understanding how prompt injection and hallucinations are constant core behaviors, which we just ignore except when they inconvenience us The assistant-bot in the story can be twisted or vanish so abruptly because it's just something in a digital daydream.
bloomingkales 34 days ago [-]
And only to your eyes and those you force your vision onto. The rest of the universe never sees it. You don’t exist to much of the universe (if a tree falls and no one is around to hear it, you understand what I mean).
So you simultaneously exist and don’t exist. Sorry about this, your post took me on this tangent.
GuB-42 34 days ago [-]
"Do LLMs think?" is a false problem outside of the field of philosophy.
The real question that gets billions invested is "Is it useful?".
If the "con artist" solves my problem, that's fine by me. It is like having a mentalist tell me "I see that you are having a leaky faucet and I see your future in a hardware store buying a 25mm gasket and teflon tape...". In the end, I will have my leak fixed and that's what I wanted, who care how it got to it?
lukev 34 days ago [-]
I don't disagree that "is it useful" is the important question.
The amount of money being invested is very clearly disproportionate to the current utility, and much of it is obviously based on the premise that LLMs can (or will soon) be able to think.
edanm 34 days ago [-]
> The amount of money being invested is very clearly disproportionate to the current utility,
I don't think this is so clear. At least, if by "current utility" you also include potential future utility even without any advance in the underlying models.
Most of the money invested in the 2000 bubble was lost, but that didn't mean the utility of the internet was overblown.
lukev 33 days ago [-]
Well, I find myself forced to agree with you there. I think these models are tremendously useful for data processing, and that chasing "reasoning" and the production of artifacts for human consumption are entirely red herrings.
everdrive 34 days ago [-]
I think at best, there's a wide gulf between what LLMs can actually do, and what people _believe_ they can do. This is not to say that LLMs are not useful, but just that people broadly and regularly misunderstand their usefulness. This is not necessarily the fault of LLMs, but it does render their usage and adoption a bit problematic.
seunosewa 32 days ago [-]
I think that people figure out what LLMs are capable of doing well pretty quickly after they start using them. I'd say that their capabilities are underestimated by many people who once tried to use non-reasoning LLMs to do things that reasoning LLMs can do very well today.
yannyu 34 days ago [-]
It has to both be useful and economical. If that answer cost $2 to get you and a search+youtube video would have been just as effective and much cheaper, then it's possible that the new way to get the answer isn't significantly better than the old way.
The graveyards of startups are littered with economically infeasible solutions to problems.
34 days ago [-]
cratermoon 34 days ago [-]
So far it seems LLM-based systems are still reaching for a use.
crummy 34 days ago [-]
Does copilot not count?
cratermoon 33 days ago [-]
If copilot is so good why does MS/Github keep pushing it on people who haven't asked for it and don't want it?
They're now giving it away for free,
just to be able to put it everywhere.
fragmede 33 days ago [-]
The original grocery store that invented the shopping cart had to hire actors to pretend to use them before customers "got it".
I wouldn't read too much into customers knowing what they want.
I don't think trial versions of software reek of desperation. They are pushing trials because they think you'll like it and sign up for the paid version.
>LLMs <snip> do not meaningfully share any of the mechanisms that animals or people use to reason or think.
This seems to be a hard assumption the entire post, and many other similar ones, rely upon. But how do you know how people think or reason? How do you know human intelligence is not an illusion? Decades of research were unable to answer this. Now when LLMs are everywhere, suddenly everybody is an expert in human thinking with extremely strong opinions. To my vague intuition (based on understanding of how LLMs work) it's absolutely obvious they do share at least some fundamental mechanisms, regardless of vast low-level architecture/training differences. The entire discussion on whether it's real intelligence or not is based on ill-defined terms like "intelligence", so we can keep going in circles with it.
By the way, OpenAI does nothing of this, see [1]:
>artificial general intelligence (AGI)—by which we mean highly autonomous systems that outperform humans at most economically valuable work
Neither do others. So the author describes "tech industry" unknown to me.
> The field of AI research has a reputation for disregarding the value of other fields [...] It’s likely that, being unaware of much of the research in psychology on cognitive biases or how a psychic’s con works, they stumbled into a mechanism and made chatbots that fooled many of the chatbot makers themselves.
Just because cognitive scientists don't know everything about how intelligence works (or on what it is) doesn't mean that they know nothing. There has been a lot of progress in cognitive science, in the last decade in particular on reasoning.
> based on ill-defined terms like "intelligence".
The whole discussion is about "artificial intelligence". Arguably AI researchers ought to have a fairly well defined stance of what "intelligence" means and can't use a trick like "nobody knows what intelligence is" to escape criticism.
orbital-decay 34 days ago [-]
>Just because cognitive scientists don't know everything about how intelligence works (or on what it is) doesn't mean that they know nothing.
Author's claim is pretty strong: that human intelligence and what is called GenAI have no common ground at all. This is untrue at least intuitively for the entire field of ML. "Intuitively" because it cannot be proven or disproven until you know exactly what is human intelligence, or whatever the author means by thinking or reasoning.
>Arguably AI researchers ought to have a fairly well defined stance of what "intelligence" means and can't use a trick like "nobody knows what intelligence is" to escape criticism.
If you don't formalize your definition, the discussion can be easily escaped by either side by simply moving along the abstraction tree. The tiresome stochastic parrot/token predictor argument is about trees while the human intelligence is discussed in terms of a forest. And if you do formalize it, it's possible to discover that human intelligence is not what it seems either. I'm not even starting on the difference between individual intelligence, collective intelligence, and biological evolution, it's not easy to define where one ends and another begins.
AI researchers mainly focus on usefulness (see the definition above). The proof is in the pudding. Philosophical discussions are fine to have, but pretty meaningless at best and designed to support a narrative at worst.
aaplok 34 days ago [-]
>Author's claim is pretty strong: that human intelligence and what is called GenAI have no common ground at all.
I feel that you're being unfair to the author here; the quote you responded to in your GP post alluded to "reason or think", and their argument is that LLMs don't. This is more specific than the sweeping statement you attribute to them.
> AI researchers mainly focus on usefulness (see the definition above).
And usefulness is something the article doesn't touch on, I think? The point of the article is that some users attribute capabilities to LLMs that can be explained with the same mechanisms as the capabilities they attribute to psychics (which are well understood to be nonexistent).
> Philosophical discussions are fine to have, but pretty meaningless at best and designed to support a narrative at worst.
Why this is relevant to AI research is (in my interpretation) that it is known to be hard for humans collectively to evaluate how intelligent an entity really is. We are easily fooled into seeing "intelligence" where it is not. This is something that cognitive scientists have spent a lot of time thinking about, and may perhaps be able to comment on.
For what it's worth I've seen cognitive scientists use AI to do cools stuff. I remember seeing someone who was using AI to show that it is possible to build inference without language (I am speaking from memory here, it was a while ago and I lost the reference sadly so I hope I'm not deforming it too much). She was certainly not claiming that her experiments showed how intelligence worked, but only that they pointed to the fact that language does not have to be a prerequisite for building inferences. Interesting stuff though without the sensationalism that sometimes accompanies the advocacy of LLMs.
BehindBlueEyes 30 days ago [-]
> The field of AI research has a reputation for disregarding the value of other fields [...] It’s likely that, being unaware of much of the research in psychology on cognitive biases or how a psychic’s con works
Is the reputation warranted? Just a US thing? Or maybe the question is "since when did this change"? because in the mid 2000s in france at least, llm research was led by cognitive psychology professors who dabble in programming or had partnerships with a nearby technical university.
aaplok 30 days ago [-]
I am not based in the US, nor in France, so I can't say what the situation is there. I was only pointing out that quote to GP.
My experience isn't much, since I am neither doing AI nor cognitive science but I have seen cognitive scientists do cool stuff with AI as a means to study cognitive science, and I have seen CS researchers involving themselves into the world of AI with varying success.
I would not be as emphatic as the author of the article but I would say that a good portion of AI research lost the focus on what is intelligence and instead just aimed at getting computers to perform various tasks. Which is completely fine until these researchers start claiming that they have produced intelligent machines (a manifestation of the Dunning Kruger effect).
mewpmewp2 34 days ago [-]
As far as I know the best definition of intelligence is "ability to solve problems".
nurettin 34 days ago [-]
The best we've got is "produce some speech that may contain the answer, BUT craft the question in a way that is more likely to generate an answer".
aaplok 34 days ago [-]
best definition according to whom?
There is a lot of work to define intelligence, both in the context of cognitive science and in the context of AI [0].
I haven't spent enough time looking for a good review article, but for example, this article [1] says this: "Intelligence in the strict sense is the ability to know with conscience. Knowing with conscience implies awareness." and contrasts it with this: "Intelligence, in a broad sense, is the ability to process information. This can be applied to plants, machines, cells, etc. It does not imply knowledge." If you are interested in the topic the whole article is interesting and worth reading.
> Intelligence in the strict sense is the ability to know with conscience. Knowing with conscience implies awareness.
This one I would disagree with.
> Intelligence, in a broad sense, is the ability to process information. This can be applied to plants, machines, cells, etc. It does not imply knowledge.
This one starts with one part of what I consider intelligence, but besides processing it also needs to be able to use the information to solve problems. Because you could process information, which everything in the World actually does technically, but you would not be using that information to do anything.
So ultimately, maybe it would be best to define it as it's the ability to take in and use information to solve some sort of problem.
swaraj 34 days ago [-]
You should try the arc agi puzzles yourself, and then tell me you think these things aren't intelligent
I wouldn't say it's full agi or anything yet, but these things can definitely think in a very broad sense of the word
gessha 34 days ago [-]
[Back in the 1980s]
You should try to play chess yourself, and then tell me you think these things aren't intelligent.
jhbadger 34 days ago [-]
While I agree that we should be skeptical about the reasoning capabilities of LLMs, comparing them to chess programs misses the point. Chess programs were specifically created to play chess. That's all they could do. They couldn't generalize and play other board games, even related games like Shogi and Xiangqi, the Japanese and Chinese versions of chess. LLMs are amazing at being able to do things they never were programmed to do simply by accident.
gessha 34 days ago [-]
Are they though? They’ve been shown to generalize poorly to tasks where you switch up some of the content.
jhbadger 32 days ago [-]
Here's an example. I'm interested in obscure conlangs like Volapük. I can feed a LLM (which had no idea what Volapük was), a English-language grammar of Volapük and suddenly it can translate to and from the language. That couldn't work with a chess program. I couldn't give it a rule book of Shogi and have it play that.
Apologies, I was a bit curt because this is a well-worn interaction pattern.
I don't mean anything by the following either, other than, the goalposts have moved:
- This doesn't say anything about generalization, nor does it claim to.
- The occurrences of the prefix general* refer to "Can fine-tuning with synthetic logical reasoning tasks improve the general abilities of LLMs?"
- This specific suggestion was accomplished publicly to some acclaim in September
- To wit, the benchmark the article is centered around hasn't been updated since since September, because the preview of the large model accomplishing that blew it out of the water, 33% on all at the time, 71%: https://huggingface.co/spaces/allenai/ZebraLogic
- these aren't supposed to be easy, they're constraint satisfaction problems, which they point out are used on the LSAT
- The major other form of this argument is the Apple paper, which shows a 5 point drop from 87% to 82% on a home-cooked model
daveguy 34 days ago [-]
LLMs don't do too well on those ARC-AGI problems. Even though they're pretty easy for a person.
Let me know when they can perform that well without a 300-shot. Or that well on unseen ARC-AGI-2.
aoeusnth1 34 days ago [-]
Two years give or take 6 months.
mrbungie 34 days ago [-]
Give a group of "average human" two years, give or take 6 months, and they will also saturate the benchmark and probably some humans would beat the SOTA LLM/RLM.
People tend to do so all the time, with games for example.
aoeusnth1 34 days ago [-]
Average humans cannot be copy-pasted.
daveguy 32 days ago [-]
Average companies also don't pay humans to complete a benchmark consisting of a fixed set of problems.
refulgentis 34 days ago [-]
Done (link says 6 samples?)
daveguy 33 days ago [-]
> OpenAI shared they trained the o3 we tested on 75% of the Public Training set.
I'm talking transfer learning and generalization. A human who has never seen the problem set can be told the rules of the problem domain and then get 85+% on the rest. o3 high compute requires 300 examples using SFT to perform similarly. An impressive feat, but obviously not enough to just give an agent instructions and let it go. 300 examples for human level performance on the specific task, but that's still impressive compared to SOTA 2 years ago. It will be interesting to see performance on ARC-AGI-2.
33 days ago [-]
dosinga 34 days ago [-]
This feels rather forced. The article seems to claim both that LLMs don't actually work, it is all an illusion and that of course the LLMs know everything, they stole all our work from the last 20 years by scraping the internet and underpaying people to produce content. If it was a con, it wouldn't have to do that. Or in other words, if you had a psychic who actually memorized all biographies of all people ever, they wouldn't need their cons
pona-a 34 days ago [-]
Why would it have to be one or the other? Yes, it's been proven LLMs do create world models, how good they are is a separate matter. There still could be goal misalignment, especially when it comes to RLHF.
If the model has in its internal world model knowledge it likely does not know how to solve a coding question, but the RLHF stage has reviewers rate refusals lower, it would in turn force its hand when it comes to tricks it knows it can pull based on its model of human reviewers. It can only implement the surface level boilerplate and pass that off as a solution, write its code in APL to obfuscate its lack of understanding, or keep misinterpreting the problem into a simpler one.
A psychic that read on ten thousand biographies might start to recall them, or he might interpolate the blanks with a generous dose of BS, or more likely do both in equal measure.
dosinga 34 days ago [-]
Thanks. That is a good point. The RLHF phase indeed might "force" the LLM to adopt con artist tricks and probably does.
pama 34 days ago [-]
This is from 2023 and is clearly dated. It is mildly interesting to notice how quickly things changed since then. Nowadays models can solve original math puzzles much of the time and it is harder to argue they cannot reason when we have access to R1, o1, and o3-mini.
Terr_ 34 days ago [-]
> Nowadays models can solve original math puzzles much of the time
Isn't that usually by not even trying, and delegating the work regular programs?
bbor 34 days ago [-]
In what way is your mathematical talent truly you, but a python tool called by an LLM-centric agent not truly that agent?
Terr_ 34 days ago [-]
For starters, it means you should not take the success of the math and ascribe it to an advance in the LLM, or whatever phrase is actually being used to describe the the new fancy target of hype and investment.
An LLM is at best, a possible future component of the speculative future being sold today.
How might future generations visualize this? I'm imagining some ancient Greeks, who have invented an inefficient reciprocating pump, which they declare is a heart and that means they've basically built a person. (At the time, many believed the brain was just there to cool the blood.) Look! The fluid being pumped can move a lever: It's waving to us.
bbor 34 days ago [-]
Interesting metaphor, but I’m not sure you’re fully appreciating the hypothetical. The agent didn’t seem like it was going to solve a math problem, it did.
Before intuitive computing, the best we could do with word problems was Wolfram-esque regex stuff, which I’m guessing we all know was quite error-prone. Now, we have agents that can take quite vague word problems and use any sequence of KB/web searches, python programs, and further intuitive reasoning steps to arrive at the requested answer. That’s pretty impressive, and I don’t think “well technically it relies on tools” makes it less impressive! Something that wasn’t possible yesterday is possible today; that alone matters.
Re:general skepticism, I’ve given up on convincing people that AGI is close, so all ill say is “hedge your bets” ;)
34 days ago [-]
34 days ago [-]
tmnvdb 34 days ago [-]
I'm amazed people are upvoting this piece which does not grapple with any of the real issues in a serious way. I guess some folks just really want AI to go away and are longing to hear that it is just all newfangled nonsense from the city slickers!
grayhatter 34 days ago [-]
how come you elected not to enumerate *any* of the real issues you expected?
ImPostingOnHN 34 days ago [-]
looks like nobody asked
if you wish for GP to do that, ask them to do that
EagnaIonat 34 days ago [-]
I was hoping it was talking about how it can resonate with users using those techniques. Or some experiments to prove the point. But it is not even that.
There is nothing of substance in this and it feels like the author has a grudge against LLMs.
thinkingemote 34 days ago [-]
Interesting. Resonate with people is what a mentalist does. More generically a good bedside manner is what a good doctor does and it's the same thing. Many times we like being cold read and comforted and comfortable!
Agreed on the experiments. What would they look like? Can a chat bot give the same info without any bedside manner?
manmal 34 days ago [-]
Well they have a book to sell, at the bottom of the article.
bbor 34 days ago [-]
Absolutely -- there is a whole cottage industry of people who know that LLMs are big news and decide to chime in, without ever considering that maybe this is a scientific discussion, not a cultural/"take-driven" one. Bluesky in particular is probably their biggest source of income!
You can spot them easily, because instead of critiquing some specific thing and sticking to it, they can't resist throwing in "obviously, LLMs are all 100% useless and anyone who says otherwise is a Tech Bro" somewhere. Like:
completely unknown processes that have no parallel in the biological world.
c'mon... Anyone who knows a tiny bit about ML knows that both of those claims are just absurdly off base.
s1mplicissimus 34 days ago [-]
[flagged]
EagnaIonat 34 days ago [-]
Well happy to hear your conclusions from reading the article.
34 days ago [-]
karmakaze 34 days ago [-]
AlphaGo also doesn't reason. That doesn't mean it can't do things that humans do by reasoning. It doesn't make sense to make these comparisons. It's like saying that planes don't really fly because they aren't flapping their wings.
Edit: Don't conflate mechanisms with capabilities.
psytrancefan 34 days ago [-]
At this point I think it is because of a type of religious sentimentality about the sanctity of human reasoning.
To take your analogy even further, it is like asking when is the plane going to improve enough that it can really fly by flapping it's wings.
throwaway87543 34 days ago [-]
Some people seem to want to believe that true thought is spiritual in nature. They will never accept that something physical and made by man could do it. They would stop believing humans are intelligent if given conclusive proof of how the brain works.
viach 34 days ago [-]
> 1 The tech industry has accidentally invented the initial stages a completely new kind of mind, based on completely unknown principles...
> 2) The intelligence illusion is in the mind of the user and not in the LLM itself.
3) The intelligence of the users is illusion either?
scandox 34 days ago [-]
You're right! AI makes us ask really important questions about our own intelligence. I think it will lead to greater recognition that we are first and foremost animals: creatures of intention and action. I think we've put way too much emphasis on our intellectual dimension in the last few hundred years. To the point that some people started to believe that was what we are.
viach 34 days ago [-]
Yup. And the real danger of AI is not that it enslaves humans but in that it will bring great disillusionment and existential crisis.
Someone should write a blog post about this to warn humanity.
ImPostingOnHN 34 days ago [-]
the real danger isn't that it enslaves humans, but that it makes most humans useless to those with capital and robots and AI
at that point, capital can tell most humans to just go away and die, and can use their technology to protect themselves in the meantime
Earw0rm 34 days ago [-]
Perhaps we're confusing intelligence with awareness.
What LLMs seem to emulate surprisingly well is something like a person's internal monologue, which is part of but not the whole of our mind.
It's as if it has the ability to talk to itself extremely quickly and while plugged directly into ~all of the written information humanity has ever produced, and what we see is the output of that hidden, verbally-reasoned conversation.
Something like that could be called intelligent, in terms of its ability to manipulate symbols and rearrange information, without having even a flicker of awareness, and entirely lacking the ability to synthesise new knowledge based on an intuitive or systemic understanding of a domain, as opposed to a complete verbal description of said domain.
Or to put it another way - it can be intelligent in terms of its utility, without possessing even an ounce of conscious awareness or understanding.
jbay808 34 days ago [-]
I was interested in this question so I trained NanoGPT from scratch to sort lists of random numbers. It didn't take long to succeed with arbitrary reliability, even given only an infinitesimal fraction of the space of random and sorted lists as training data. Since I can evaluate the correctness of a sort arbitrarily, I could be certain that I wasn't projecting my own beliefs onto its response, and reading more into the output than was actually there.
That settled this question for me.
dartos 34 days ago [-]
I don’t really understand what you’re testing for?
Language, as a problem, doesn’t have a discrete solution like the question of whether a list is sorted or not.
Seems weird to compare one to the other, unless I’m misunderstanding something.
What’s more, the entire notion of a sorted list was provided to the LLM by how you organized your training data.
I don’t know the details of your experiment, but did you note whether the lists were sorted ascended or descended?
Did you compare which kind of sorting was most common in the output and in the training set?
Your bias might have snuck in without you knowing.
jbay808 34 days ago [-]
> I don’t really understand what you’re testing for?
For this hypothesis: The intelligence illusion is in the mind of the user and not in the LLM itself.
And yes, the notion was provided by the training data. It indeed had to learn that notion from the data, rather than parrot memorized lists or excerpts from the training set, because the problem space is too vast and the training set too small to brute force it.
The output lists were sorted in ascending order, the same way that I generated them for the training data. The sortedness is directly verifiable without me reading between the lines to infer something that isn't really there.
IshKebab 34 days ago [-]
A large number of commenters are under the illusion that LLMs are "just" stochastic parrots and can't generalise to inputs not seen in their training data. He was proving that that isn't the case.
dartos 34 days ago [-]
Not saying I disagree with the thesis, but I don’t think this proves anything.
If every pair of digits appears sorted in the dataset, then that could still be “just” a stochastic parrot.
I’m kind of interested to see if an LLM can sort when the dataset specifically omits comparisons between certain pairs of numbers.
Also I don’t think OC was responding to commenters, but the article
jbay808 34 days ago [-]
It might seem like you could sort with just pairwise correlations, but on closer analysis, you cannot. Generating the next correct token requires correctly weighing the entire context window.
dartos 34 days ago [-]
Of course, that’s how attention works, after all.
But by specifically avoiding certain cases, wet could verify if the model is generalizing or not.
jbay808 34 days ago [-]
I mean that needing to scan the full context of tokens before the nth is inherent to the problem of sorting. Transformers do scan that input, which is good; it's not surprising that they're up to the task. But pairwise numeral correlations will not do the job.
As for avoiding certain cases, that could be done to some extent. But remember that the untrained transformer has no preconception of numbers or ordering (it doesn't use the hardware ALU or integer data type) so there has to be enough data in the training set to learn 0<1<2<3<4<5<6, etc.
dartos 34 days ago [-]
> there has to be enough data in the training set to learn 0<1<2<3<4<5<6
This is the kind of thing I’d want it to generalize.
If I avoid having 2 and 6 in the same unsorted list in the training set, will sets containing those numbers be correctly sorted in the same list in the test set and at the same rate as other lists.
My intuition is that, yes, it would. But it’d be nice to see and would be a clear demonstration of the ability to generalize at all.
tossandthrow 34 days ago [-]
Commenter is merely saying that LLMs indeed are able to approximate arbitrary functions exemplified through sorting.
It is nothing new and has been well established in the literature since the 90s.
The shared article really is not worth the read and mostly uncovers an author who does not know what he write about.
dartos 34 days ago [-]
You’re talking specifically about perceptrons and feed forward neural networks.
LLMs didn’t exist in then. Attention only came out in 2017…
tossandthrow 34 days ago [-]
Yes? Are you saying that attention is less expressive?
dartos 33 days ago [-]
I’m saying that LLMs (models trained on language specifically) are not automatically capable of the same generic function solving.
The network itself can be trained to solve most functions (or all, I forget precisely if NNs can solve all functions)
But the language model is not necessarily capable of solving all functions, because it was already trained on language.
manmal 34 days ago [-]
Have you considered that the nature of numeric characters is just so predictable that they can be sorted without actually understanding their numerical value?
jbay808 34 days ago [-]
Can you say more precisely what you mean?
manmal 34 days ago [-]
I mean that maybe gradient descent is a passable sorting algorithm, once the weights have been learned to properly describe ordering. It may be a speciality of transformers that they can sort things well. Which wouldn’t tell us that much about whether they are mentalists or not.
twobitshifter 34 days ago [-]
Lost me here - “LLMs are not brains and do not meaningfully share any of the mechanisms that animals or people use to reason or think.“
“the initial stages a completely new kind of mind, based on completely unknown principles, using completely unknown processes that have no parallel in the biological world.”
We just call it a neural network because we wanted to confuse biology with math for the hell of it?
“There is no reason to believe that it thinks or reasons—indeed, every AI researcher and vendor to date has repeatedly emphasised that these models don’t think.”
I don’t understand the denialism behind replicating minds and thoughts with technology - that had been the entire point from the start.
exclipy 34 days ago [-]
Yeah I was expecting the article to give an argument to back up this claim by talking about the mechanisms behind LLMs and the mechanisms behind human thought and demonstrating a lack of overlap.
But I don't see any discussion of multilayer perceptrons or multi-head attention.
Instead, the rest of the article is just saying "it's a con" with a lot of words.
cratermoon 34 days ago [-]
I've never gotten a good answer to my question regarding why Open AI chose a chat UI for their gpt, but this article comes closest to explaining it.
habitue 34 days ago [-]
This kind of "LLMs don't really do anything, it's all a trick" / "they're stochastic parrots" argument was kind of maybe defensible a year and a half ago. At this point, if you're making these arguments you're willfuly ignorant of what is happening.
LLMs write code, today, that works. They solve hard PhD level questions, today.
There is no trick. If anything, it's clear they haven't found a trick and are mostly brute forcing the intelligence they have. They're using unbelievable amounts of compute and are getting close to human level. Clearly humans still have some tricks that LLMs dont have yet, but that doesn't diminish what they can objectively do.
habitue 34 days ago [-]
Apparently, this article was written almost exactly a year and a half ago so... I guess the author is forgiven!
ramesh31 34 days ago [-]
It feels like we are stuck in two different worlds with this stuff right now. One being the AI users that interface solely through app based things like ChatGPT, who have been burned over and over again by hallucinations or lack of context, to the point of disillusionment. The other world is the one where developers who are working with agentic systems built on frontier models right now are literally watching AGI materialize in front of us in real time. I think 2025 will be the year those worlds converge (for the better).
olddustytrail 34 days ago [-]
> One of the issues in during this research—one that has perplexed me—has been that many people are convinced that language models, or specifically chat-based language models, are intelligent.
Different people have different definitions of intelligence. Mine doesn't require thinking or any kind of sentience so I can consider LLMs to be intelligent simply because they provide intelligent seeming answers to questions.
If you have a different definition, then of course you will disagree.
It's not rocket science. Just agree on a definition beforehand.
fleshmonad 34 days ago [-]
Unfounded cope. And I know this will get me downvoted, as these arguments seem to be popular among the intellectuals on this glorious page.
The machanism of intelligence is not understood. There isn't even a rigorous definition of what intelligence is. "All it does is combine parts it has seen in its training set to give an answer", well then the magic lies in how it knows what parts to combine, if one wants to go with this argument. Also conveniently, the fact that we have millions of years of evolution behind us, plus exabytes of training data over the years in form of different stimuli since birth gets shoved under the rug. I don't want to say that the conclusion is necessarily wrong, but the argument is always bad. I know it is hard to come to terms with the thought that intelligence may be more fundamental in nature and not exclusively a capability of carbon based life forms.
kelseyfrog 34 days ago [-]
Intelligence feels like a hard scientific concept, but scratch the surface and you find a circular definition: we measure it with tools we designed for the purpose, then declare it real because the tools say so. That’s affirming the consequent.
If intelligence were an objective property of the universe, we’d define it like mass or charge—quantifiable, invariant, fundamental. Instead, it shifts to match whatever we decide to measure. The instruments don’t quantify intelligence; they create it.
Why LLM has incentive to provide answer that sounds suitable regardless of quality, is there a feedback loop? If I am asking for the first time without previous context, it considers quality and suitability based on other users interactions?
yetihehe 34 days ago [-]
> Why LLM has incentive to provide answer that sounds suitable regardless of quality, is there a feedback loop?
Yes, that is how LLM's work. They are trained with feedback loops to answer plausibly.
casey2 33 days ago [-]
It's very telling that this author conflates thinking, reasoning and intelligence. It shows a heavy European bias. Since this article is biased and confused it can be easily dismissed with no further reasoning needed.
throw-qqqqq 26 days ago [-]
> It shows a heavy European bias
Can you please explain that a bit further? I don’t catch the connection you’re making between the conflation and being european.
casey2 33 days ago [-]
As an example, standards (and European) committees have a great deal of thinking and reasoning, but very little intelligence.
ripped_britches 34 days ago [-]
This article confuses conscious, felt experience with intelligence.
prideout 34 days ago [-]
I lost interest fairly quickly because the entire article seems to rely on a certain definition of "intelligent" that is not made clear in the beginning.
34 days ago [-]
zahlman 34 days ago [-]
Original title (too long for submission):
> The LLMentalist Effect: how chat-based Large Language Models replicate the mechanisms of a psychic's con
dist-epoch 34 days ago [-]
Yeah, when I read about AI solving international math olympiad problems it's not intelligence, it's just me projecting my math skills upon the model.
> LLMs are a mathematical model of language tokens. You give a LLM text, and it will give you a mathematically plausible response to that text.
> The tech industry has accidentally invented the initial stages a completely new kind of mind, based on completely unknown principles, using completely unknown processes that have no parallel in the biological world.
Or maybe our mind is based on a bunch of mathematical tricks too.
pona-a 34 days ago [-]
> AI solving international math olympiad problems is not intelligence
But couldn't it be overfitting? LLMs are very good at deriving patterns, many of which humans simply can't tell apart from noise. With a few billion parameters and whatever black magic is going on inside CoT, it's not unreasonable to think even small amounts of fine-tuning combined with many epochs of training would be enough for it to conjure a compressed representation of that problem type.
Without an extensive audit, I'd be skeptical of OpenAI's claims, especially given how o1 is often wrong on much more trivial compositional questions.
What defines intelligence is generalization, the ability to learn new tasks from few examples, and while LLMs have made some significant progress here, they are still many orders below a child and arguably even many animals.
svachalek 34 days ago [-]
I suspect that's actually what's going on, LLMs are finding patterns that apply to their question and figure out how to combine them in the correct way. However, I'd also say this is how vast majority of humans solve math problems. What I've seen from o1/R1 is that they are more capable at this process than the average human, more capable than the vast majority of humans.
We can say that they're not "intelligent" because they're not capable of solving problems they can't map to something in their training at all, but that would also put 99.9% of humanity in the unintelligent bucket.
dist-epoch 34 days ago [-]
A trained LLM can learn from a few examples.
A human takes 14+ years until it's intelligent, also requires extensive training.
s1mplicissimus 34 days ago [-]
6 year olds can fairly reliably count the number of occurrences of a letter in a word, at least according to the school system I attended.
LLMs will never be able to do it due to their inherent limitations (being statistical next-word predictors)
kevin0091 34 days ago [-]
It is a calculation not learning
aoeusnth1 34 days ago [-]
the human, or the LLM?
vharuck 34 days ago [-]
>Or maybe our mind is based on a bunch of mathematical tricks too.
Some people used to push the theory that quantum probability was where free will and the soul reside. That is to say, people will imagine how the hard questions of old neatly fit into the hard questions of today. Nothing won't with that, it's how we explore different paths and make progress. But I'm not one of those exploring experts, so I'll wait for stricter definitions and experimental data.
tomohelix 34 days ago [-]
On a bit of a tangent and hypothetical, but what if we pool eough resources together to do a training that includes everything a human can experience? I am thinking of all the five senses and all the data that comes with it, e.g. books, movies, songs, recitals, landscape, the wind brushing against the "skin", the pain of getting burned, the smell of coffee in the morning, the itchiness of a mosquito's bite, etc.
It is not impossible I think, just require so much effort, talents, and funding that the last thing resembling such an endeavor was the Manhattan project. But if it succeeded, the impact could rival or even exceed what nuclear power had done.
Or am I deluded and there is some sort of fundamental limit or restriction on the transformer that would completely prevent this from the start?
GuB-42 34 days ago [-]
But why would we do that even if we could? Making a very expensive machine act like a human is essentially useless, it is not like there is a shortage of humans on Earth. It wouldn't even be a great model of a human brain.
The reason we are doing all that is for its potential uses. Write letters, code, help customers, find information, etc... Even AGI is not about making artificial humans, it is about solving general problems (that's the "G").
And even if we could make artificial humans, there would be a philosophical problem. Since the idea is to make these AIs work for us, if we make these AI as human-like as possible, isn't it slavery? It is like making artificial meat but insist on making the meat-making machine conscious so that it can feel being slaughtered.
tomohelix 34 days ago [-]
Because right now the major reason people still deny LLM as "intelligent" is because it has no connection or understanding to the things it is saying. You can make it say 1+1=2 but it inherently does not have a real concept of what is one thing and what are two things. Its neural network just perceived the weights to give the most statistically correct answer based on what it was modeled on, i.e. text.
So instead of training it that way, the network can potentially be trained to "perceive" or "model" the reality beyond the digital world. The only way we know or have enough experience and data to do so is through our own experience. An embodied AI is what I think is required for anything to actually grasp the real concepts, or at least as close as possible to them.
And without that inherent understanding, no matter how useful a model is, it will never be a "general" inteligence.
GuB-42 33 days ago [-]
It makes sense to have an embodied AI, i.e. a robot. Self driving cars count.
But it doesn't have to be modeled after humans. The purpose of humans if we can call it that is to make more of itself, like all forms of life. That's not what we build robots for. We don't even give robots the physical abilities to do that. Giving them a human mind (assuming we could) would not be adequate. Wrong body, wrong purpose.
cratermoon 34 days ago [-]
How would you model embodiment and embodied experience?
tomohelix 34 days ago [-]
Sight, sound are quite obvious.
Taste and olfactory are matters of chemical compositions. It will take an incredible effort but something similar to a mass spectrometer can be used to detect every taste and smell we can think of and beyond. How fast and how efficient they can be is probably the main challenge.
Touch is difficult. We don't even know fully why or how does an itch "work". But force, temperature, atmospheric, humidity sensors, etc are widely available. They can provide a crude approximation, imo.
Just off the top of my head. I am sure smarter people can come up with much more suitable ways to "embody" a machine learning model.
cratermoon 33 days ago [-]
I suggest,
before throwing around words like "obvious" and "crude approximation",
reading some Martin Heidegger, Hubert Dreyfus, or Joseph Weizenbaum.
Half-baked attempts at mechanistic and reductionist implementations of embodiment are a dime a dozen.
selfhoster11 33 days ago [-]
I mean, sight and sound in large language models are obvious by now, in that rendering them into a token-based representation that an LLM can manipulate and learn from (however well it actually succeeds in picking up on the patterns - nothing about that is guaranteed, but the information is there theoretically) is currently a conceptually solved problem that will be gradually improved upon.
If reproducing the artifacts and failure modes of human modes of interpretation of this physical data (say, yanny/laurel, or optical illusions, or persistence of vision phenomena) is deemed important, that's another matter. If all that's required is a black-box understanding that is idiosyncratic to LLMs in particular, but where it's functionally good enough to be used as sight and hearing, then I don't see see why it can't be called "solved" for most intents and purposes in six months' time.
I guess it boils down to this: do you want "sight" to mean "machine" sight or "human" sight. The latter is a hard problem, but I'd prefer to let machines be machines. It's less work, and gives us a brand-new cognitive lens to analyse what we observe, a truly alien perspective that might prove useful.
cratermoon 32 days ago [-]
This seems to give up on the GP comment's goal of "everything a human can experience" and create nothing more than a fancy Mechanical Turk.
selfhoster11 31 days ago [-]
If the goal is to build a human experience simulator that reacts in the same ways as a human would, then the you can't just collect the sensory data, you need to gather data on how humans react (or have the model learn unsupervised from recorded footage of humans exposed to these stimuli). Unless maybe it's good enough to learn associations from literature and poetry.
No matter how you build it, it is still experiencing everything a human can experience. There's just no guarantee it would react the same way to the same stimuli. It would react in its own idiosyncratic way that might both overlap and contrast with a human experience.
A more "human" experience simulator would paradoxically be more and less authentic at the same time - more authentic in showing a human-style reaction, but at the cost of erasure of model's own emergent ones.
s1mplicissimus 34 days ago [-]
Is there anything except sensory input that you assume part of the embodied experience? What would that be?
Apart from that, I'm afraid that at this point research on sensory input apart from audio and visual needs much more advancement. For example, it's not clear to me what kind of data structure would be a good fit for olfactory or sensory training data
tomohelix 34 days ago [-]
As mentioned above, olfactory data can be just chemical fingerprints. Mass spectrometers already do this and provide very distinct signals for every chemical component.
Touch and such can have some approximation done through various sensors like temperature, force, humidity, electromagnetic, etc.
s1mplicissimus 34 days ago [-]
Sure you can punch in a chemical fingerprint for say the smell of a specific type of rose.
Maybe it doesn't matter for the learning process that in an equivalent human experience it was preceeded by someone making you a compliment a couple minutes earlier, or that it was combined with all the other chemical fingerprints present at the moment, like maybe it just rained shortly before and there's a slight smell of wet earth in the air or someone smoking a cigarette walked by and there's minimal leftovers of that, or the window wasn't opened for a couple hours and everything has a slight tint of "used air" to it, which might add a factor of dampened learning, which might be necessary for the specific learning process to happen slow enough to sink in properly etc...
Don't get me wrong I would be curious to see such research done to see whether it would improve anything above the stochastic parrot level - it's just going to take a while to figure out what is even relevant
tomohelix 34 days ago [-]
I think those factors you mentioned are important but ultimately additional context to the main data, i.e. "rose smell". They certainly can add additional meanings and alter how the main data is processed, but they are just "context" added on, just like how the word "lie" is very context dependent and by itself it is nigh impossible to know what it means. Is it a verb? noun? Which verb, lying to someone or lie on a couch?
But an LLM has no problem at all deciphering and processing and most importantly, responding meaningfully to all the ways we can use or encounter the word "lie". I contend that if a model large enough is trained on enough data, the concepts will automatically blend and explain each other sufficiently, or at least enough to cover your example and those similar to it.
34 days ago [-]
Centigonal 33 days ago [-]
My hot take is that "consciousness" is a gestalt perception phenomenon that occurs (as far as I know) exclusively inside the minds of human observers.
IshKebab 34 days ago [-]
> But there isn’t any mechanism inherent in large language models (LLMs) that would seem to enable this
Stopped reading here. What is the mechanism in humans that enables intelligence? You don't know? Didn't think so. So how do you know LLMs don't have the required mechanism?
xg15 34 days ago [-]
> But that isn’t how language models work. LLMs model the distribution of words and phrases in a language as tokens. Their responses are nothing more than a statistically likely continuation of the prompt.
Not saying the author is wrong in general, but this kind of argument always annoys me. It's effectively a Forer statement for the "sceptics" side: It appears like a full-on refutation, but really says very little. It also evokes certain associations which are plain incorrect.
LLMs are functions that return a probability distribution of the next word given the previous words; this distribution is derived from the training data. That much is true. But this does not tell anything about how the derivation and probability generation processes actually work or how simple or complex they are.
What it does however, is evoke two implicit assumptions without justifying them:
1) LLMs fundamentally cannot have humanlike intelligence, because humans are qualitatively different: An LLM is a mathematical model and a human is, well, a human.
Sounds reasonable until you have a look at the human brain and find that human consciousness and thought too could be represented as nothing more than interactions between neurons. At which point, it gets metaphysical...
2) It implies that because LLMs are "statistical models", they are essentially slightly improved Markov chains. So if an LLM predicts the next word, it would essentially just look up where the previous words appeared in its training data most often and then return the next word from there.
That's not how LLMs work at all. For starters, the most extensive Markov chains have a context length of 3 or 4 words, while LLMs have a context length of many thousand words. Your required amounts of training data would go to "number of atoms in the universe" territory if you wanted to create a Markov chain with comparable context length.
Secondly, as current LLMs are based on the mathematical abstraction of neural networks, the relationship between training data and the eventual model weights/parameters isn't even fully deterministic: The weights are set to initial values based on some process that is independent of the training data - e.g. they are set to random values - and then incrementally adjusted so that the model can increasingly replicate the training data. This means that the "meaning" of individual weights and their relationship to the training data remains very unclear, and there is plenty of space in the model where higher-level "semantic" representations might evolve.
None of that is proof that LLMs have "intelligence", but I think it does show that the question can't be simply dismissed by saying that LLMs are statistical models.
> 2) The intelligence illusion is in the mind of the user and not in the LLM itself.
I've felt as though there is something in between. Maybe:
3) The tech industry invented the initial stages a kind of mind that, though misses the mark, is approaching something not too dissimilar to how an aspect of human intelligence works.
> By using validation statements, … the chatbot and the psychic both give the impression of being able to make extremely specific answers, but those answers are in fact statistically generic.
"Mr. Geller, can you write some Python code for me to convert a 1-bit .bmp file to a hexadecimal string?"
Sorry, even if you think the underlying mechanisms have some sort of analog there's real value in LLM's, not so psychics doing "cold readings".
Outside of grammar, you can hear a lot of this when people talk; their sentences wander, and they don't always seem to know ahead of time where their sentences will end up. They start anchored to a thought, and seem to hope that the correct words end up falling into place.
Now, does all thought work like this? Definitely not, and more importantly, there are many other facets of thought which are not present in LLMs. When someone has wandered badly when trying to get a sentence out, they are often also able to introspect and see that they failed to articulate their thought. They can also slow down their speaking, or pause, and plan out ahead of time; in effect, using this same introspection to prevent themselves from speaking poorly in the first place. Of course there's also memory, consciousness, and all sorts of other facets of intelligence.
What I'm on the fence about is whether this point, or your point actually detracts from the author's argument.
I do think there is a degree of mentalist-like behavior that happens, maybe especially because of the RLHF step, where the LLM is encouraged to respond in ways that seem more truthful or compelling than is justified by its ability. We appreciate the LLM bestowing confidence on us, and rank an answer more highly if it gives us that confidence... not unlike the person who goes to a spiritualist wanting to receive comforting news of a loved one who has passed. It's an important attribute of LLMs to be aware of, but not the complete explanation the author is looking for.
I can create a mad-libs program which dynamically reassembles stories involving a kind and compassionate Santa Claus, but that does not mean the program shares those qualities. I have not digitally reified the spirit of Christmas, not even if excited human kids contribute some of the words that shape its direction and clap with glee.
P.S.: This "LLM just makes document bigger" framing is also very useful understanding how prompt injection and hallucinations are constant core behaviors, which we just ignore except when they inconvenience us The assistant-bot in the story can be twisted or vanish so abruptly because it's just something in a digital daydream.
So you simultaneously exist and don’t exist. Sorry about this, your post took me on this tangent.
The real question that gets billions invested is "Is it useful?".
If the "con artist" solves my problem, that's fine by me. It is like having a mentalist tell me "I see that you are having a leaky faucet and I see your future in a hardware store buying a 25mm gasket and teflon tape...". In the end, I will have my leak fixed and that's what I wanted, who care how it got to it?
The amount of money being invested is very clearly disproportionate to the current utility, and much of it is obviously based on the premise that LLMs can (or will soon) be able to think.
I don't think this is so clear. At least, if by "current utility" you also include potential future utility even without any advance in the underlying models.
Most of the money invested in the 2000 bubble was lost, but that didn't mean the utility of the internet was overblown.
The graveyards of startups are littered with economically infeasible solutions to problems.
I wouldn't read too much into customers knowing what they want.
https://www.afstores.com/the-intriguing-story-of-how-the-sho...
or if those sources aren't your cup of tea, how about Fox News
https://www.foxnews.com/lifestyle/meet-american-invented-sho...
> The public — naturally — hated the idea.
This seems to be a hard assumption the entire post, and many other similar ones, rely upon. But how do you know how people think or reason? How do you know human intelligence is not an illusion? Decades of research were unable to answer this. Now when LLMs are everywhere, suddenly everybody is an expert in human thinking with extremely strong opinions. To my vague intuition (based on understanding of how LLMs work) it's absolutely obvious they do share at least some fundamental mechanisms, regardless of vast low-level architecture/training differences. The entire discussion on whether it's real intelligence or not is based on ill-defined terms like "intelligence", so we can keep going in circles with it.
By the way, OpenAI does nothing of this, see [1]:
>artificial general intelligence (AGI)—by which we mean highly autonomous systems that outperform humans at most economically valuable work
Neither do others. So the author describes "tech industry" unknown to me.
[1] https://openai.com/charter/
From the article:
> The field of AI research has a reputation for disregarding the value of other fields [...] It’s likely that, being unaware of much of the research in psychology on cognitive biases or how a psychic’s con works, they stumbled into a mechanism and made chatbots that fooled many of the chatbot makers themselves.
Just because cognitive scientists don't know everything about how intelligence works (or on what it is) doesn't mean that they know nothing. There has been a lot of progress in cognitive science, in the last decade in particular on reasoning.
> based on ill-defined terms like "intelligence".
The whole discussion is about "artificial intelligence". Arguably AI researchers ought to have a fairly well defined stance of what "intelligence" means and can't use a trick like "nobody knows what intelligence is" to escape criticism.
Author's claim is pretty strong: that human intelligence and what is called GenAI have no common ground at all. This is untrue at least intuitively for the entire field of ML. "Intuitively" because it cannot be proven or disproven until you know exactly what is human intelligence, or whatever the author means by thinking or reasoning.
>Arguably AI researchers ought to have a fairly well defined stance of what "intelligence" means and can't use a trick like "nobody knows what intelligence is" to escape criticism.
If you don't formalize your definition, the discussion can be easily escaped by either side by simply moving along the abstraction tree. The tiresome stochastic parrot/token predictor argument is about trees while the human intelligence is discussed in terms of a forest. And if you do formalize it, it's possible to discover that human intelligence is not what it seems either. I'm not even starting on the difference between individual intelligence, collective intelligence, and biological evolution, it's not easy to define where one ends and another begins.
AI researchers mainly focus on usefulness (see the definition above). The proof is in the pudding. Philosophical discussions are fine to have, but pretty meaningless at best and designed to support a narrative at worst.
I feel that you're being unfair to the author here; the quote you responded to in your GP post alluded to "reason or think", and their argument is that LLMs don't. This is more specific than the sweeping statement you attribute to them.
> AI researchers mainly focus on usefulness (see the definition above).
And usefulness is something the article doesn't touch on, I think? The point of the article is that some users attribute capabilities to LLMs that can be explained with the same mechanisms as the capabilities they attribute to psychics (which are well understood to be nonexistent).
> Philosophical discussions are fine to have, but pretty meaningless at best and designed to support a narrative at worst.
Why this is relevant to AI research is (in my interpretation) that it is known to be hard for humans collectively to evaluate how intelligent an entity really is. We are easily fooled into seeing "intelligence" where it is not. This is something that cognitive scientists have spent a lot of time thinking about, and may perhaps be able to comment on.
For what it's worth I've seen cognitive scientists use AI to do cools stuff. I remember seeing someone who was using AI to show that it is possible to build inference without language (I am speaking from memory here, it was a while ago and I lost the reference sadly so I hope I'm not deforming it too much). She was certainly not claiming that her experiments showed how intelligence worked, but only that they pointed to the fact that language does not have to be a prerequisite for building inferences. Interesting stuff though without the sensationalism that sometimes accompanies the advocacy of LLMs.
Is the reputation warranted? Just a US thing? Or maybe the question is "since when did this change"? because in the mid 2000s in france at least, llm research was led by cognitive psychology professors who dabble in programming or had partnerships with a nearby technical university.
My experience isn't much, since I am neither doing AI nor cognitive science but I have seen cognitive scientists do cool stuff with AI as a means to study cognitive science, and I have seen CS researchers involving themselves into the world of AI with varying success.
I would not be as emphatic as the author of the article but I would say that a good portion of AI research lost the focus on what is intelligence and instead just aimed at getting computers to perform various tasks. Which is completely fine until these researchers start claiming that they have produced intelligent machines (a manifestation of the Dunning Kruger effect).
There is a lot of work to define intelligence, both in the context of cognitive science and in the context of AI [0].
I haven't spent enough time looking for a good review article, but for example, this article [1] says this: "Intelligence in the strict sense is the ability to know with conscience. Knowing with conscience implies awareness." and contrasts it with this: "Intelligence, in a broad sense, is the ability to process information. This can be applied to plants, machines, cells, etc. It does not imply knowledge." If you are interested in the topic the whole article is interesting and worth reading.
[0] https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&as_ylo...
[1] https://www.cell.com/heliyon/fulltext/S2405-8440(21)00373-X
It's what makes most sense to me.
> Intelligence in the strict sense is the ability to know with conscience. Knowing with conscience implies awareness.
This one I would disagree with.
> Intelligence, in a broad sense, is the ability to process information. This can be applied to plants, machines, cells, etc. It does not imply knowledge.
This one starts with one part of what I consider intelligence, but besides processing it also needs to be able to use the information to solve problems. Because you could process information, which everything in the World actually does technically, but you would not be using that information to do anything.
So ultimately, maybe it would be best to define it as it's the ability to take in and use information to solve some sort of problem.
https://arcprize.org/blog/openai-o1-results-arc-prize
I wouldn't say it's full agi or anything yet, but these things can definitely think in a very broad sense of the word
You should try to play chess yourself, and then tell me you think these things aren't intelligent.
https://huggingface.co/blog/yuchenlin/zebra-logic
I don't mean anything by the following either, other than, the goalposts have moved:
- This doesn't say anything about generalization, nor does it claim to.
- The occurrences of the prefix general* refer to "Can fine-tuning with synthetic logical reasoning tasks improve the general abilities of LLMs?"
- This specific suggestion was accomplished publicly to some acclaim in September
- To wit, the benchmark the article is centered around hasn't been updated since since September, because the preview of the large model accomplishing that blew it out of the water, 33% on all at the time, 71%: https://huggingface.co/spaces/allenai/ZebraLogic
- these aren't supposed to be easy, they're constraint satisfaction problems, which they point out are used on the LSAT
- The major other form of this argument is the Apple paper, which shows a 5 point drop from 87% to 82% on a home-cooked model
People tend to do so all the time, with games for example.
I'm talking transfer learning and generalization. A human who has never seen the problem set can be told the rules of the problem domain and then get 85+% on the rest. o3 high compute requires 300 examples using SFT to perform similarly. An impressive feat, but obviously not enough to just give an agent instructions and let it go. 300 examples for human level performance on the specific task, but that's still impressive compared to SOTA 2 years ago. It will be interesting to see performance on ARC-AGI-2.
If the model has in its internal world model knowledge it likely does not know how to solve a coding question, but the RLHF stage has reviewers rate refusals lower, it would in turn force its hand when it comes to tricks it knows it can pull based on its model of human reviewers. It can only implement the surface level boilerplate and pass that off as a solution, write its code in APL to obfuscate its lack of understanding, or keep misinterpreting the problem into a simpler one.
A psychic that read on ten thousand biographies might start to recall them, or he might interpolate the blanks with a generous dose of BS, or more likely do both in equal measure.
Isn't that usually by not even trying, and delegating the work regular programs?
An LLM is at best, a possible future component of the speculative future being sold today.
How might future generations visualize this? I'm imagining some ancient Greeks, who have invented an inefficient reciprocating pump, which they declare is a heart and that means they've basically built a person. (At the time, many believed the brain was just there to cool the blood.) Look! The fluid being pumped can move a lever: It's waving to us.
Before intuitive computing, the best we could do with word problems was Wolfram-esque regex stuff, which I’m guessing we all know was quite error-prone. Now, we have agents that can take quite vague word problems and use any sequence of KB/web searches, python programs, and further intuitive reasoning steps to arrive at the requested answer. That’s pretty impressive, and I don’t think “well technically it relies on tools” makes it less impressive! Something that wasn’t possible yesterday is possible today; that alone matters.
Re:general skepticism, I’ve given up on convincing people that AGI is close, so all ill say is “hedge your bets” ;)
if you wish for GP to do that, ask them to do that
There is nothing of substance in this and it feels like the author has a grudge against LLMs.
Agreed on the experiments. What would they look like? Can a chat bot give the same info without any bedside manner?
You can spot them easily, because instead of critiquing some specific thing and sticking to it, they can't resist throwing in "obviously, LLMs are all 100% useless and anyone who says otherwise is a Tech Bro" somewhere. Like:
c'mon... Anyone who knows a tiny bit about ML knows that both of those claims are just absurdly off base.Edit: Don't conflate mechanisms with capabilities.
To take your analogy even further, it is like asking when is the plane going to improve enough that it can really fly by flapping it's wings.
> 2) The intelligence illusion is in the mind of the user and not in the LLM itself.
3) The intelligence of the users is illusion either?
Someone should write a blog post about this to warn humanity.
at that point, capital can tell most humans to just go away and die, and can use their technology to protect themselves in the meantime
What LLMs seem to emulate surprisingly well is something like a person's internal monologue, which is part of but not the whole of our mind.
It's as if it has the ability to talk to itself extremely quickly and while plugged directly into ~all of the written information humanity has ever produced, and what we see is the output of that hidden, verbally-reasoned conversation.
Something like that could be called intelligent, in terms of its ability to manipulate symbols and rearrange information, without having even a flicker of awareness, and entirely lacking the ability to synthesise new knowledge based on an intuitive or systemic understanding of a domain, as opposed to a complete verbal description of said domain.
Or to put it another way - it can be intelligent in terms of its utility, without possessing even an ounce of conscious awareness or understanding.
That settled this question for me.
Language, as a problem, doesn’t have a discrete solution like the question of whether a list is sorted or not.
Seems weird to compare one to the other, unless I’m misunderstanding something.
What’s more, the entire notion of a sorted list was provided to the LLM by how you organized your training data.
I don’t know the details of your experiment, but did you note whether the lists were sorted ascended or descended?
Did you compare which kind of sorting was most common in the output and in the training set?
Your bias might have snuck in without you knowing.
For this hypothesis: The intelligence illusion is in the mind of the user and not in the LLM itself.
And yes, the notion was provided by the training data. It indeed had to learn that notion from the data, rather than parrot memorized lists or excerpts from the training set, because the problem space is too vast and the training set too small to brute force it.
The output lists were sorted in ascending order, the same way that I generated them for the training data. The sortedness is directly verifiable without me reading between the lines to infer something that isn't really there.
If every pair of digits appears sorted in the dataset, then that could still be “just” a stochastic parrot.
I’m kind of interested to see if an LLM can sort when the dataset specifically omits comparisons between certain pairs of numbers.
Also I don’t think OC was responding to commenters, but the article
But by specifically avoiding certain cases, wet could verify if the model is generalizing or not.
As for avoiding certain cases, that could be done to some extent. But remember that the untrained transformer has no preconception of numbers or ordering (it doesn't use the hardware ALU or integer data type) so there has to be enough data in the training set to learn 0<1<2<3<4<5<6, etc.
This is the kind of thing I’d want it to generalize.
If I avoid having 2 and 6 in the same unsorted list in the training set, will sets containing those numbers be correctly sorted in the same list in the test set and at the same rate as other lists.
My intuition is that, yes, it would. But it’d be nice to see and would be a clear demonstration of the ability to generalize at all.
It is nothing new and has been well established in the literature since the 90s.
The shared article really is not worth the read and mostly uncovers an author who does not know what he write about.
LLMs didn’t exist in then. Attention only came out in 2017…
The network itself can be trained to solve most functions (or all, I forget precisely if NNs can solve all functions)
But the language model is not necessarily capable of solving all functions, because it was already trained on language.
“the initial stages a completely new kind of mind, based on completely unknown principles, using completely unknown processes that have no parallel in the biological world.”
We just call it a neural network because we wanted to confuse biology with math for the hell of it?
“There is no reason to believe that it thinks or reasons—indeed, every AI researcher and vendor to date has repeatedly emphasised that these models don’t think.”
I mean just look at the Nobel Prize winners for counter examples to all of this https://www.cnn.com/2024/10/08/science/nobel-prize-physics-h...
I don’t understand the denialism behind replicating minds and thoughts with technology - that had been the entire point from the start.
But I don't see any discussion of multilayer perceptrons or multi-head attention.
Instead, the rest of the article is just saying "it's a con" with a lot of words.
LLMs write code, today, that works. They solve hard PhD level questions, today.
There is no trick. If anything, it's clear they haven't found a trick and are mostly brute forcing the intelligence they have. They're using unbelievable amounts of compute and are getting close to human level. Clearly humans still have some tricks that LLMs dont have yet, but that doesn't diminish what they can objectively do.
Different people have different definitions of intelligence. Mine doesn't require thinking or any kind of sentience so I can consider LLMs to be intelligent simply because they provide intelligent seeming answers to questions.
If you have a different definition, then of course you will disagree.
It's not rocket science. Just agree on a definition beforehand.
The machanism of intelligence is not understood. There isn't even a rigorous definition of what intelligence is. "All it does is combine parts it has seen in its training set to give an answer", well then the magic lies in how it knows what parts to combine, if one wants to go with this argument. Also conveniently, the fact that we have millions of years of evolution behind us, plus exabytes of training data over the years in form of different stimuli since birth gets shoved under the rug. I don't want to say that the conclusion is necessarily wrong, but the argument is always bad. I know it is hard to come to terms with the thought that intelligence may be more fundamental in nature and not exclusively a capability of carbon based life forms.
If intelligence were an objective property of the universe, we’d define it like mass or charge—quantifiable, invariant, fundamental. Instead, it shifts to match whatever we decide to measure. The instruments don’t quantify intelligence; they create it.
Yes, that is how LLM's work. They are trained with feedback loops to answer plausibly.
Can you please explain that a bit further? I don’t catch the connection you’re making between the conflation and being european.
> The LLMentalist Effect: how chat-based Large Language Models replicate the mechanisms of a psychic's con
> LLMs are a mathematical model of language tokens. You give a LLM text, and it will give you a mathematically plausible response to that text.
> The tech industry has accidentally invented the initial stages a completely new kind of mind, based on completely unknown principles, using completely unknown processes that have no parallel in the biological world.
Or maybe our mind is based on a bunch of mathematical tricks too.
But couldn't it be overfitting? LLMs are very good at deriving patterns, many of which humans simply can't tell apart from noise. With a few billion parameters and whatever black magic is going on inside CoT, it's not unreasonable to think even small amounts of fine-tuning combined with many epochs of training would be enough for it to conjure a compressed representation of that problem type.
Without an extensive audit, I'd be skeptical of OpenAI's claims, especially given how o1 is often wrong on much more trivial compositional questions.
What defines intelligence is generalization, the ability to learn new tasks from few examples, and while LLMs have made some significant progress here, they are still many orders below a child and arguably even many animals.
We can say that they're not "intelligent" because they're not capable of solving problems they can't map to something in their training at all, but that would also put 99.9% of humanity in the unintelligent bucket.
A human takes 14+ years until it's intelligent, also requires extensive training.
Some people used to push the theory that quantum probability was where free will and the soul reside. That is to say, people will imagine how the hard questions of old neatly fit into the hard questions of today. Nothing won't with that, it's how we explore different paths and make progress. But I'm not one of those exploring experts, so I'll wait for stricter definitions and experimental data.
It is not impossible I think, just require so much effort, talents, and funding that the last thing resembling such an endeavor was the Manhattan project. But if it succeeded, the impact could rival or even exceed what nuclear power had done.
Or am I deluded and there is some sort of fundamental limit or restriction on the transformer that would completely prevent this from the start?
The reason we are doing all that is for its potential uses. Write letters, code, help customers, find information, etc... Even AGI is not about making artificial humans, it is about solving general problems (that's the "G").
And even if we could make artificial humans, there would be a philosophical problem. Since the idea is to make these AIs work for us, if we make these AI as human-like as possible, isn't it slavery? It is like making artificial meat but insist on making the meat-making machine conscious so that it can feel being slaughtered.
So instead of training it that way, the network can potentially be trained to "perceive" or "model" the reality beyond the digital world. The only way we know or have enough experience and data to do so is through our own experience. An embodied AI is what I think is required for anything to actually grasp the real concepts, or at least as close as possible to them.
And without that inherent understanding, no matter how useful a model is, it will never be a "general" inteligence.
But it doesn't have to be modeled after humans. The purpose of humans if we can call it that is to make more of itself, like all forms of life. That's not what we build robots for. We don't even give robots the physical abilities to do that. Giving them a human mind (assuming we could) would not be adequate. Wrong body, wrong purpose.
Taste and olfactory are matters of chemical compositions. It will take an incredible effort but something similar to a mass spectrometer can be used to detect every taste and smell we can think of and beyond. How fast and how efficient they can be is probably the main challenge.
Touch is difficult. We don't even know fully why or how does an itch "work". But force, temperature, atmospheric, humidity sensors, etc are widely available. They can provide a crude approximation, imo.
Just off the top of my head. I am sure smarter people can come up with much more suitable ways to "embody" a machine learning model.
If reproducing the artifacts and failure modes of human modes of interpretation of this physical data (say, yanny/laurel, or optical illusions, or persistence of vision phenomena) is deemed important, that's another matter. If all that's required is a black-box understanding that is idiosyncratic to LLMs in particular, but where it's functionally good enough to be used as sight and hearing, then I don't see see why it can't be called "solved" for most intents and purposes in six months' time.
I guess it boils down to this: do you want "sight" to mean "machine" sight or "human" sight. The latter is a hard problem, but I'd prefer to let machines be machines. It's less work, and gives us a brand-new cognitive lens to analyse what we observe, a truly alien perspective that might prove useful.
No matter how you build it, it is still experiencing everything a human can experience. There's just no guarantee it would react the same way to the same stimuli. It would react in its own idiosyncratic way that might both overlap and contrast with a human experience.
A more "human" experience simulator would paradoxically be more and less authentic at the same time - more authentic in showing a human-style reaction, but at the cost of erasure of model's own emergent ones.
Apart from that, I'm afraid that at this point research on sensory input apart from audio and visual needs much more advancement. For example, it's not clear to me what kind of data structure would be a good fit for olfactory or sensory training data
Touch and such can have some approximation done through various sensors like temperature, force, humidity, electromagnetic, etc.
Don't get me wrong I would be curious to see such research done to see whether it would improve anything above the stochastic parrot level - it's just going to take a while to figure out what is even relevant
But an LLM has no problem at all deciphering and processing and most importantly, responding meaningfully to all the ways we can use or encounter the word "lie". I contend that if a model large enough is trained on enough data, the concepts will automatically blend and explain each other sufficiently, or at least enough to cover your example and those similar to it.
Stopped reading here. What is the mechanism in humans that enables intelligence? You don't know? Didn't think so. So how do you know LLMs don't have the required mechanism?
Not saying the author is wrong in general, but this kind of argument always annoys me. It's effectively a Forer statement for the "sceptics" side: It appears like a full-on refutation, but really says very little. It also evokes certain associations which are plain incorrect.
LLMs are functions that return a probability distribution of the next word given the previous words; this distribution is derived from the training data. That much is true. But this does not tell anything about how the derivation and probability generation processes actually work or how simple or complex they are.
What it does however, is evoke two implicit assumptions without justifying them:
1) LLMs fundamentally cannot have humanlike intelligence, because humans are qualitatively different: An LLM is a mathematical model and a human is, well, a human.
Sounds reasonable until you have a look at the human brain and find that human consciousness and thought too could be represented as nothing more than interactions between neurons. At which point, it gets metaphysical...
2) It implies that because LLMs are "statistical models", they are essentially slightly improved Markov chains. So if an LLM predicts the next word, it would essentially just look up where the previous words appeared in its training data most often and then return the next word from there.
That's not how LLMs work at all. For starters, the most extensive Markov chains have a context length of 3 or 4 words, while LLMs have a context length of many thousand words. Your required amounts of training data would go to "number of atoms in the universe" territory if you wanted to create a Markov chain with comparable context length.
Secondly, as current LLMs are based on the mathematical abstraction of neural networks, the relationship between training data and the eventual model weights/parameters isn't even fully deterministic: The weights are set to initial values based on some process that is independent of the training data - e.g. they are set to random values - and then incrementally adjusted so that the model can increasingly replicate the training data. This means that the "meaning" of individual weights and their relationship to the training data remains very unclear, and there is plenty of space in the model where higher-level "semantic" representations might evolve.
None of that is proof that LLMs have "intelligence", but I think it does show that the question can't be simply dismissed by saying that LLMs are statistical models.