I'd like to recommend that we halt the development of AI until everyone has access to free basic food, clothes, housing and education.
CliveBloomers 18 days ago [-]
[dead]
jfengel 17 days ago [-]
Promises, promises.
Giant Meteor keeps letting me down.
pfdietz 19 days ago [-]
Is this "could" in the same sense as "I could be elected Pope"?
tromp 19 days ago [-]
No. More like
> Prof Hinton has previously predicted there was a 10 per cent chance AI could lead to the downfall of humankind within three decades.
pfdietz 18 days ago [-]
So, a spuriously quantitative prediction pulled out of his butt?
How can anyone take something like that seriously? Do you even bother to ask yourself how he could possibly know that?
ottaborra 19 days ago [-]
Given how o3 cracked the arc bench and I'm probably sounding like a broken record, this isn't as farfetched as some of you may think it is. ML models will very likely continue to scale regardless of how many bets are placed against it. I'm not sure why a lot of people aren't concerned about arc bench being cracked so fast. Our grand delusions of specialness has been shown to just that, delusions
"Humanity is a just a small step in the giant staircase of intelligence" - Geoffrey Hinton
crackrook 19 days ago [-]
I have no clue if AGI will look anything like today's LLMs but I don't think the information we have about o3 so far suggests that it's particularly earth shaking or even a significant step towards AGI.
From the ARC announcement: "a large ensemble of low-compute Kaggle solutions can now score 81% on the private eval." If I understand this correctly, o3's performance is not a grand leap beyond the capabilities of many times cheaper models with similarly privileged information. The ARC news seems more likely to be evidence that the benchmark needs tweaking than proof that scaling works (although OpenAI's marketing team would like us very much to interpret it as the latter).
There has also been a bit of imprecision and hand waving around other benchmarks that bolsters my skepticism. For instance the Codeforces benchmark results were touted with no meaningful description of the methodology and what little we do know suggests (to me, at least) that comparing o3's elo to that of a human is an apples to oranges comparison: https://codeforces.com/blog/entry/137539
ottaborra 19 days ago [-]
I don't understand. If kaggle solutions were able to do those, what the hell do these mean?
In my (possibly flawed) interpretation: o3's scores appear to be an achievement because they were attained by a single model, but the benchmark itself needs refinement before it can claim to be a measure of AGI like it set out to be, as one can bruteforce their way to similar results.
Giant Meteor keeps letting me down.
> Prof Hinton has previously predicted there was a 10 per cent chance AI could lead to the downfall of humankind within three decades.
How can anyone take something like that seriously? Do you even bother to ask yourself how he could possibly know that?
"Humanity is a just a small step in the giant staircase of intelligence" - Geoffrey Hinton
From the ARC announcement: "a large ensemble of low-compute Kaggle solutions can now score 81% on the private eval." If I understand this correctly, o3's performance is not a grand leap beyond the capabilities of many times cheaper models with similarly privileged information. The ARC news seems more likely to be evidence that the benchmark needs tweaking than proof that scaling works (although OpenAI's marketing team would like us very much to interpret it as the latter).
There has also been a bit of imprecision and hand waving around other benchmarks that bolsters my skepticism. For instance the Codeforces benchmark results were touted with no meaningful description of the methodology and what little we do know suggests (to me, at least) that comparing o3's elo to that of a human is an apples to oranges comparison: https://codeforces.com/blog/entry/137539
https://arcprize.org/2024-results
In my (possibly flawed) interpretation: o3's scores appear to be an achievement because they were attained by a single model, but the benchmark itself needs refinement before it can claim to be a measure of AGI like it set out to be, as one can bruteforce their way to similar results.