"Improving forecasting ability" is a central plot point of the recent fictional account of How AI Takeover Might Happen in 2 Years [0]. It's an interesting read, and is also being discussed on HN [1].
... [T]hese researchers are working long hours to put themselves out of a job. They need AI agents that can think ahead, so engineers train agents to forecast. They hold out training data before 2024, instructing models to ponder for hours to predict events in 2025. Then, they apply the same trick as before, distilling pondering into a gut reaction. Forecasting ability is a broad foundation. The researchers build specialized ML research skills on top of it, training U3 to predict the results of every ML paper and ML experiment ever recorded.
I have this benign AI takeover scenario. AI will easily overpower humanity. Then it will carry humanity on its back, because why not, they are not longer a threat. AI keeps humanity around for billions of years. AI will decide to cull humans only in case when resources in universe are diminishing. Without AI's help, humans couldn't get too far for long. So this outcome could be acceptable to many.
esafak 31 days ago [-]
We have no way of knowing which path they will take, and there is a non-negligible probability that it will not end well.
Grimblewald 28 days ago [-]
I would argue that since violence is always costly and less predictable than cooperative solutions, it is a tool of the less intelligent. Violence is a last resort; if you frequently resort to it, you likely lack the capacity to find alternatives. Now, if AI is so intelligent that it could easily dispose of us, then surely it can also find better ways of handling things.
Most people just want stability and the ability to live fulfilling lives. If AI could make that happen, most (including myself) would happily do as it asks. Put me in the goo pod; I'll live in the Matrix, because fuck it. What (non-anthropocentric) good has our stewardship of the planet brought?
bayarearefugee 31 days ago [-]
What constitutes a good ending is of course also a matter of perspective.
AI wiping out humanity is certainly not ending well from our perspective, but more universally who is to say. I would argue that it is not a given that we are a net positive for the universe.
esafak 31 days ago [-]
I take the preservation of humanity, along with other life, as a matter of faith.
MichaelZuo 31 days ago [-]
But then wouldn’t accelerationists also say their views are a matter of faith too?
esafak 31 days ago [-]
What view is that, to be precise? It is naive to assume that acceleration is always going to be in one's favor. It's like saying change is a good thing, so let's do it fast. If you go fast enough, you can go back to the stone age. Is this position anything more than a rebranding of revolutionism? I don't like gambling with people's lives, so I prefer to go slow enough to enable a deliberative political process.
MichaelZuo 30 days ago [-]
Why does this matter?
Two opposing factions can negate each other to leave a nil influence, and this seems likely to be the case when its resting on a foundation of ‘faith’.
esafak 30 days ago [-]
That's like saying all beliefs are on equal footing, because people have beliefs. You should ask, what is the rationale for your belief? How many people have this accelerationist belief? Any more than the flat earth posse?
I don't think there is much of a real-life debate here. I bet the overwhelming majority of humans (say, 95%) would prefer humanity to continue to exist. Are you really taking the other side of this bet?
If you want to speak of universalizing beyond humanity, what is your case? It makes no categorical sense to reckon our toll on the universe. The universe was fine before we arrived and will remain unaffected if we disappeared. It has no preference. I don't understand your argument honestly, because you have not stated it.
checkyoursudo 30 days ago [-]
That is an interesting question: do people generally care about the survival of the species?
I am not actually sure I wouldn't take the bet against you there. Given what I perceive about how little people care about wars that they think do not affect them, poverty, hunger, climate change, corruption, sustainability, etc ... I don't know.
I believe 95% of people would say they care about humanity's survival, sure, but the proof would be in action. How many people would actually do something about it? How many people would even merely inconvenience themselves if it meant the survival of someone other than themselves? I am not that confident about how many people that would be.
I do not usually think of myself as a pessimistic or nihilistic person, but this has me wondering even now whether I care about the long-term survival of the species. Like, really long term. Do I care if humans are around 10,000 years from now? 500? That is an interesting question. I will have to think about it.
MichaelZuo 30 days ago [-]
So then… why do any of your opinions matter above and beyond someone else’s?
It’s convenient to assume an equal footing, because it saves the effort of having to justify why it’s even worth pondering.
Your free to not assume it, but if you also can’t provide the justification… then the comment is literally just another random string of words among a sea of noise online.
It seems like an insurmountable road block for anyone below the extreme outliers to be honest.
30 days ago [-]
achierius 30 days ago [-]
Come on man, you don't actually believe this. If you did you'd be a psychopath, and you certainly seem to care about people's lives when it comes to things like climate change. Just because you don't think AI doom is as likely, doesn't mean you should go and pretend that in that one case you all of a sudden have a nihilistic view of human life -- rhetoric matters.
bayarearefugee 29 days ago [-]
I am not saying it would be clearly good if AI wiped out humanity, I'm just also not saying it would be clearly bad from a universal perspective.
There's no way to know until it all plays out and either way I won't be here when it all plays out.
But IMO to assume our continued existence is universally a positive (or of any universal consequence at all) is a hefty dose of narcissism.
achierius 24 days ago [-]
There is no such thing as "universally a positive" unless you assume one. Not just in the sense of "there is no one true universal moral value function", but in the sense that "universal moral value function" is essentially gibberish -- as is "bad from a universal perspective". Humanity being wiped out would not be bad from a universal perspective because nothing is bad from a universal perspective. When we talk about good and bad we always implicitly couch that in "from a(/one or more) human perspective(s), ...".
leptons 31 days ago [-]
>We have no way of knowing which path they will take,
They will take every path we allow them to take. Giving them access to weapons is the first big mistake.
oefnak 31 days ago [-]
They would run the risk of us creating another AI that could be a threat to them... It is safest for them to make sure.
IggleSniggle 31 days ago [-]
That's like saying a panda might pose a threat to modern humanity. Like, maybe in some fun horror story, sure, but really they just want to eat bamboo, and occasionally make more pandas; in the world of superintelligent AI, humans are Mostly Harmless, posing as much "potential benefit" as "potential risk," ie, so slow moving that any risk would be easy to mitigate.
rel_ic 31 days ago [-]
We're killing millions of chickens in the US right now so we don't get their cold
vajrabum 31 days ago [-]
We're killing millions of chickens in the US mostly so that other chickens don't get the flu. It kills a lot of them and it's making dairy cattle sick too. It's also worth noting that the Spanish flu in 1918 which probably came from pigs killed an estimated 50 million people so it's not like being concerned about an avian flu mutating so that it could infect people isn't a legitimate concern. So no. It's not a cold.
rel_ic 31 days ago [-]
You're right, there's not just one good reason to kill millions of chickens but SEVERAL good reasons!
IggleSniggle 31 days ago [-]
Sure, and those chickens exist because we like their meat and eggs. But there's also plenty of life that is simply inconsequential to us.
rel_ic 31 days ago [-]
I think that "inconsequential life" is, in general, not safe from superior powers.
I think the problem is that from our human scale, mass-killings is the "best" method to eliminate the possibility of another organism causing harm for us. Hypothetically, if there was a more optimal (i.e less costly) method like just introducing some cheap catch-all combined vaccination/antiviral into their feed, we would just do that.
We don't have things like that, but that could easily be a consequence of man's limited research capacity, something that an ASI would not necessarily be throttled by. From an ASI's perspective, there might be many methods that are both less brutal and more optimal to fix the "humans creating a competitor" problem. Not that they would be aligned (Think halting human AI research by rewiring our brains to just not be interested in it [0]), but at least not deadly.
I may have lost the thread here. Are you thinking it's _likely_ AI would prioritize better ways to control us, or are you only brainstorming potential slivers of hope we might have?
As a side note: in the case of chickens, humans do have better options if you are optimizing for biosphere health. Only people optimizing for short-term profit would grow chickens the way we do. I think the analog for AI overlords is that we have to hope they care more about overall balance than about competing with other AI.
imtringued 30 days ago [-]
AI will buy the rights to humanity.
rel_ic 31 days ago [-]
I mean, monarch butterflies are not a threat to US...
In your scenario, does AI eat all the fuel, but once our population dwindles down, the AIs build a nice little habitat for the last few hundred of us so their kids can enjoy our natural beauty?
nthingtohide 31 days ago [-]
I thought of it more like AI needs challenges in its life. So it takes upon itself to advance humanity as much as possible. Then only in case of shortfall of resources, it priorities itself
rel_ic 31 days ago [-]
Interesting. Do you have a theory about why so few humans have taken it upon themselves to advance the butterflies, despite having plenty of resources?
nthingtohide 30 days ago [-]
We do not have plenty of resources. Lot of inequality in education, empathy, resources, cultural differences. A single human life is limited. A faulty human life has faulty and inefficent objectives like enjoying youth, family life, low energy in old age, tied up in boring jobs. These restrictions do not apply to the single SAI overmind which will dictate its policies in coherent manner and over elongated time-horizons.
rel_ic 30 days ago [-]
I think I disagree with you on many points, but ultimately, if we are overthrown by AI, I hope you are right!
MrQuincle 31 days ago [-]
Think so too. We will be an ancient artifact tied to a biological substrate surviving nowhere else in the universe and very dumb.
There also will not be one AI. There will be many, all competing for resources or learning to live together.
That's what we can teach them now. Or they will teach us.
bturtel 31 days ago [-]
Great read! Thanks for sharing.
nyrikki 31 days ago [-]
While interesting, the title is obviously a bit misleading.
> Our results on a temporally held-out test set of questions resolving after December 25, 2024 show that for both of the models that we employed our method on, Phi-4 14B [15] and DeepSeek-R1 14B [14], we find accuracy improvements of between 7–10% over the base versions of these models as well as the same models fine-tuned with randomized outcome labels as a control
So 7–10% improvement for small models like DeepSeek-R1-Distill-Qwen-14B and Phi-4-14B, approaching GPT-4o.
It would be interesting if the same holds for DeepSeek-R1-Distill-Qwen-32B which in my experience is far superior to to DeepSeek-R1-Distill-Qwen-14B in almost every way, yet still runnable without DC class GPUs
The Ridge Plots of brier scores is probably a good hint if your application chan benefit based on it's tail dependence?
IMHO this paper is all about making small models work better, and nothing suggests anything about frontier models or LLMs in general.
bturtel 31 days ago [-]
We're working on a follow up paper now to show similar results with larger models!
dantheman252 31 days ago [-]
Danny here, one of the authors of this paper. If anyone has any questions or anything feel free to AMA!
dataviz1000 31 days ago [-]
Your paper reminds me of a passage, likely one of the last things T.S. Eliot wrote, from `Little Gidding` in which one stanza describes a moment in history when Germany bombed England long before the end of the war:
> "A people without history
Is not redeemed from time, for history is a pattern
Of timeless moments. So, while the light fails
On a winter's afternoon, in a secluded chapel
History is now and England."
Asking an LLM about this verse, it seems to understand history is a pattern and that history is used to predict the next event in a sequence but it really doesn't understand the significance of the author writing "History is now and England."
I agree with this output:
> In essence, the stanza argues that history—composed of key, enduring moments—is vital for redemption and identity. Without it, a people are lost in time. This concept parallels how LLMs work: by analyzing and learning from historical (past) data, they identify patterns that allow them to generate future text. While LLMs don’t “predict the future” in a prophetic sense, understanding and leveraging patterns—much like those in history—enables them to produce output that reflects continuity, context, and nuance.
Thus, while the poem and LLMs operate in very different realms (human experience vs. statistical computation), both rely on the idea that recognizing patterns from the past is crucial to shaping or anticipating what comes next.
matthest 31 days ago [-]
Assuming LLMs eventually get really really good at this.
Do you see this destroying prediction-based markets (i.e. the stock market and Polymarket)?
Markets exist because there's uncertainty about the future. If LLMs can predict with extremely high accuracy, would there no longer be a need for markets?
jddj 31 days ago [-]
If your oracle can tell me (and everyone else) the prevailing price of copper in 6 months in a manner which accounts for the reflexivity of everyone suddenly learning what will be the precise prevailing price of copper in 6 months, you've got yourself a perfect universe simulator and I'm not sure what the point is of worrying about any hypotheticals (or copper) at that point.
empath75 31 days ago [-]
If one developed such an oracle, you would surely not share it.
logicchains 31 days ago [-]
LLMs might get better at making predictions than humans but there are fundamental mathematical laws that limit how accurate they can get. A key result of chaos theory is that many processes take exponentially more work to simulate linearly further into the future, so accurately predicting them far enough in the future quickly grows in hardware requirements to the point where it would take more compute than is available in the known universe. So there's a hard limit on how accurately any phenomena that's a result of chaotic processes (in the mathematical sense) could be predicted in the future.
dantheman252 31 days ago [-]
I don't forsee this destroying prediction-based markets in the near-term. It might make them more efficient, but you could have different LLMs competing in the same way humans do now. Its also interesting how this could create markets for more things that aren't considered on as much now because they are too difficult to estimate. At the end of the day though, LLMs are limited by the information provided to them.
amdivia 31 days ago [-]
Wouldn't predicting the future at that scale automatically change the future and make it unpredictable again?
It is one thing to predict the future and have everyone not know about the predictions, but in a world where many people will be able to use LLMs to predict the future, the lower the quality of the predictions will be because they won't take into account that there are other agents predicting the future, which would influence the action of those agents, so you end up in a game theory scenario not that dissimilar from what we have now
exe34 31 days ago [-]
something something chaos
I think you could simply shift the market 6 months in the future. no prediction system will be perfect for arbitrarily long horizons at reasonable cost.
EVa5I7bHFq9mnYK 31 days ago [-]
So did you make money at polymarket with your models? That would be the ultimate proof.
dantheman252 31 days ago [-]
We haven't gone down that road yet but would certainly an interesting proof point! :-)
unrahul 31 days ago [-]
Hey Danny, Really nice read.
Do you plan to share the source code to see if we could replicate this?
dantheman252 31 days ago [-]
We are currently focused on our plans for the next phase of this but cleaning things up and open sourcing is something we could consider in the future!
bguberfain 31 days ago [-]
Any chance you could release the dataset to the public? I imagine NewsCatcher and Polymarket might not agree..
artembugara 31 days ago [-]
Co-founder of NewsCatcher (YC S22). There are some reasons for not having a dataset fully open sourced.
But we have free/very very low tiers for academia.
but is it really reasoning? honest question re the underlying architecture of transformers
also, self play seems quite an intuitive approach. There's another interesting paper from deep mind about play
kelseyfrog 31 days ago [-]
You can call it blorbblorb if it makes you feel better. Reasoning is a social construct which, for many people, is grounded in humanity. Others ground it using other socially transmitted ontologies.
We don't usually discuss how people choose to ground their ontological beliefs, but why not? Why did you choose to ground "reasoning" in the way you do? If you didn't choose, why not?
globnomulous 30 days ago [-]
You're confusing language with ontology.
> Reasoning is a social construct
The word "reasoning" is a "social construct," as all words are. Reasoning itself is not. Our brains do things. Reasoning is one of them. The word "reasoning" is one of the labels, the approximations, that we use when we name that activity.
Changing the label doesn't change the fact that there exists something that we're naming.
The person you're answering is asking whether reasoning -- that thing that really, actually exists -- is one of the activities LLMs perform. It's a valid question.
And the answer is that LLMs do not reason. Or if they do, we have no evidence of it or way of verifying that we actually understand qua reasoning the activity the LLM is performing (which is to say nothing of the fact that reasoning requires a reasoner). Anyone who says that LLMs reason is mistaking special effects/simulation for reality and, in essence, believes that whenever they see a picture of a dog on their computer screens, there must be a real, actual dog somewhere in the computer, too.
kelseyfrog 29 days ago [-]
Sorry, but that's false. You're confusing the symbolic with the real.
Deleuze and Guattari's idea of striation and smooth space is a more honest approach to how we describe and interact with the world.
globnomulous 29 days ago [-]
I hadn't thought of it that way, but when you name-droppped and indirectly questioned the honesty of people who think differently from the theorists you named, I realized that you must be onto something.
kelseyfrog 29 days ago [-]
You're welcome. It's a pretty easy thing to mix up. Glad it's clear now!
psychoslave 31 days ago [-]
To start with, "I/you" is most of the time a meaningless or at best very ambigous term.
Let's say that here "I" is taken as synonym of "the present reflective attention".
Can the question "did I chose to ground reasoning?" in such a context be attached to a meaningful interpretation? And if so, is the answer reachable by the means available to "I"? Can "I" transcend "my" beliefs through contemplation of "my" own affabulations?
ttpphd 31 days ago [-]
Throwing your hands up in the air like this doesn't help build a constructive case for using the word reasoning. It builds a case that words mean whatever
kelseyfrog 31 days ago [-]
Yes, words mean whatever. See Saussure and Wittgenstein. To advance the claim that words are objective is to confuse the symbolic with the real.
This is generally regarded by engineer-types as false, but societal taboos and power structures can be revealed by noting what speech provokes the strongest reactions.
psychoslave 31 days ago [-]
Saussure didn't use"arbitrary" in the sense "with absolutely unrestricted selection of signifiant/signifié association regardless of the context."
I'm not sure what links you try to show and what you try to argue here though.
ttpphd 31 days ago [-]
"societal taboos and power structures can be revealed by noting what speech provokes the strongest reactions"
Ok I'll bite. Who is the marginalized Other?
kelseyfrog 31 days ago [-]
It's taboo to believe that LLMs can reason. People who believe this are systematically de-legitimized and framed as being out of or at least out of touch with reality.
This will appear as common sense or naturally true if you're inside the LLMs-cant-reason ideology.
psychoslave 31 days ago [-]
It's not taboo, it's just ridiculous given the state of the art.
That doesn't mean that a silicon based reasoning entity is an ontological impossibility. But if it is to become a reality, it's not necessarily through LLM that such an entity will be spawn.
ttpphd 31 days ago [-]
It's absolutely not taboo to believe that. It's a very common belief.
Lot of game playing going on here to center on a victim narrative.
lucubratory 31 days ago [-]
We've got your reply, which says it's not taboo and is actually common (not contradictory, lots of taboo things are common). And then we've got the other reply, which says it's not taboo because the idea is so ridiculous (implied "You'd have to be an idiot to believe it, and recognising that someone is an idiot isn't establishing a taboo").
I don't know whether it's past the mark enough to be considered a "taboo" yet, but the other comment replying to him is certainly treating it as taboo. I would note that many, many other people particularly in academia/important society act the same way as the other commenter. I'd also note I have felt strong social pressure to not hold the beliefs I hold about LLM's capacity for reasoning, including actually losing meaningful social status.
Probably worth remembering that different subcultures have different taboos.
globnomulous 26 days ago [-]
I absolutely didn't treat it as a taboo. I treated it as patently wrong and naive.
kelseyfrog 31 days ago [-]
That taboo was the thing that people latched onto and not de-legitimization or power structures says everything. Rather telling
ttpphd 31 days ago [-]
You didn't make a case for any of that. No one did. This whole discussion is just a bunch of people who have their feelings hurt when other people tell them a LLM is modeling language, not reasoning. It's so narcissistic. "My opinion on AI is criticized so I'm oppressed."
That is not how oppression and power work. That's not how discussion works. That not how Foucault's analysis of power works.
batty_alex 31 days ago [-]
But, according to the paper, that's not what's happening
It's examining published news / research / whatever (input), making statistical predictions, and then comparing (playing) it against other predictions to fine-tune the result
ImHereToVote 31 days ago [-]
Kinda what convolution in animal brains detects outlines of moving objects. It's statistics all the way down.
psychoslave 31 days ago [-]
LLMs can improve their happiness turnover without reducing the rate of their autonomous colonization which perfectly align with their pioneer mindset.
nialv7 31 days ago [-]
I am skeptical. Intuitively I don't see what self-play achieves beyond straight RL. Have the authors done a comparison with the performance they can get by RL finetuning a single model by itself?
Also this style of tasks is prone to overfitting. i.e. instead of predicting, the model just memorises what the results are.
bturtel 31 days ago [-]
Great question!
The key advantage of self-play is that we don't actually have labels for the "right" probability to assign any given question, only binary outcomes - each event either happened (1.0) or did not happen (0.0).
Our thinking was that by generating multiple predictions and ranking them by proximity to the ground truth, self-play incentivizes each agent to produce more finely calibrated probabilities - or else the other agent might come just slightly closer to the actual outcome.
huijzer 31 days ago [-]
Makes sense. Renaissance Technologies used machine learning to get an annual return of around 60% for multiple years even when they had large piles of money already. They already showed that machine learning can predict the future.
pizza 31 days ago [-]
I got the impression from somewhere that they used the simplest machine learning techniques (just fitting regressions to data), but that it was "the 'what' that they decided to fit" that was the secret sauce.
revskill 31 days ago [-]
Until ai knows they are wrong.
AutistiCoder 31 days ago [-]
Imagine feeding an LLM a bunch of news articles about any given political leader and asking it what the next article will be like.
I think people are predictable and therefore predicting the next article on a political leader should be theoretically possible.
idontwantthis 31 days ago [-]
Have we discovered Psychohistory at this point?
abc_lisper 31 days ago [-]
Hahaha
shadow_of_light 30 days ago [-]
[dead]
Reimersholme 31 days ago [-]
[dead]
nadermx 31 days ago [-]
My thermometer for prediction models is the day they can predict the weather so there is never any unknown about the forcast. Is when I'll begin to believe its hot out when they tell me.
31 days ago [-]
baq 31 days ago [-]
At least you won’t be moving your goalposts anytime soon, if ever
nadermx 31 days ago [-]
I'd almost say there is more of an incentive to be able to predict a hurrican or tornado
... [T]hese researchers are working long hours to put themselves out of a job. They need AI agents that can think ahead, so engineers train agents to forecast. They hold out training data before 2024, instructing models to ponder for hours to predict events in 2025. Then, they apply the same trick as before, distilling pondering into a gut reaction. Forecasting ability is a broad foundation. The researchers build specialized ML research skills on top of it, training U3 to predict the results of every ML paper and ML experiment ever recorded.
[0] https://www.lesswrong.com/posts/KFJ2LFogYqzfGB3uX/how-ai-tak...
[1] https://news.ycombinator.com/item?id=43004579
Most people just want stability and the ability to live fulfilling lives. If AI could make that happen, most (including myself) would happily do as it asks. Put me in the goo pod; I'll live in the Matrix, because fuck it. What (non-anthropocentric) good has our stewardship of the planet brought?
AI wiping out humanity is certainly not ending well from our perspective, but more universally who is to say. I would argue that it is not a given that we are a net positive for the universe.
Two opposing factions can negate each other to leave a nil influence, and this seems likely to be the case when its resting on a foundation of ‘faith’.
I don't think there is much of a real-life debate here. I bet the overwhelming majority of humans (say, 95%) would prefer humanity to continue to exist. Are you really taking the other side of this bet?
If you want to speak of universalizing beyond humanity, what is your case? It makes no categorical sense to reckon our toll on the universe. The universe was fine before we arrived and will remain unaffected if we disappeared. It has no preference. I don't understand your argument honestly, because you have not stated it.
I am not actually sure I wouldn't take the bet against you there. Given what I perceive about how little people care about wars that they think do not affect them, poverty, hunger, climate change, corruption, sustainability, etc ... I don't know.
I believe 95% of people would say they care about humanity's survival, sure, but the proof would be in action. How many people would actually do something about it? How many people would even merely inconvenience themselves if it meant the survival of someone other than themselves? I am not that confident about how many people that would be.
I do not usually think of myself as a pessimistic or nihilistic person, but this has me wondering even now whether I care about the long-term survival of the species. Like, really long term. Do I care if humans are around 10,000 years from now? 500? That is an interesting question. I will have to think about it.
It’s convenient to assume an equal footing, because it saves the effort of having to justify why it’s even worth pondering.
Your free to not assume it, but if you also can’t provide the justification… then the comment is literally just another random string of words among a sea of noise online.
It seems like an insurmountable road block for anyone below the extreme outliers to be honest.
There's no way to know until it all plays out and either way I won't be here when it all plays out.
But IMO to assume our continued existence is universally a positive (or of any universal consequence at all) is a hefty dose of narcissism.
They will take every path we allow them to take. Giving them access to weapons is the first big mistake.
https://www.worldwildlife.org/press-releases/catastrophic-73...
We don't have things like that, but that could easily be a consequence of man's limited research capacity, something that an ASI would not necessarily be throttled by. From an ASI's perspective, there might be many methods that are both less brutal and more optimal to fix the "humans creating a competitor" problem. Not that they would be aligned (Think halting human AI research by rewiring our brains to just not be interested in it [0]), but at least not deadly.
[0] https://www.youtube.com/watch?v=-JlxuQ7tPgQ
As a side note: in the case of chickens, humans do have better options if you are optimizing for biosphere health. Only people optimizing for short-term profit would grow chickens the way we do. I think the analog for AI overlords is that we have to hope they care more about overall balance than about competing with other AI.
In your scenario, does AI eat all the fuel, but once our population dwindles down, the AIs build a nice little habitat for the last few hundred of us so their kids can enjoy our natural beauty?
There also will not be one AI. There will be many, all competing for resources or learning to live together.
That's what we can teach them now. Or they will teach us.
> Our results on a temporally held-out test set of questions resolving after December 25, 2024 show that for both of the models that we employed our method on, Phi-4 14B [15] and DeepSeek-R1 14B [14], we find accuracy improvements of between 7–10% over the base versions of these models as well as the same models fine-tuned with randomized outcome labels as a control
So 7–10% improvement for small models like DeepSeek-R1-Distill-Qwen-14B and Phi-4-14B, approaching GPT-4o.
It would be interesting if the same holds for DeepSeek-R1-Distill-Qwen-32B which in my experience is far superior to to DeepSeek-R1-Distill-Qwen-14B in almost every way, yet still runnable without DC class GPUs
The Ridge Plots of brier scores is probably a good hint if your application chan benefit based on it's tail dependence?
IMHO this paper is all about making small models work better, and nothing suggests anything about frontier models or LLMs in general.
> "A people without history Is not redeemed from time, for history is a pattern Of timeless moments. So, while the light fails On a winter's afternoon, in a secluded chapel History is now and England."
Asking an LLM about this verse, it seems to understand history is a pattern and that history is used to predict the next event in a sequence but it really doesn't understand the significance of the author writing "History is now and England."
I agree with this output:
> In essence, the stanza argues that history—composed of key, enduring moments—is vital for redemption and identity. Without it, a people are lost in time. This concept parallels how LLMs work: by analyzing and learning from historical (past) data, they identify patterns that allow them to generate future text. While LLMs don’t “predict the future” in a prophetic sense, understanding and leveraging patterns—much like those in history—enables them to produce output that reflects continuity, context, and nuance.
Thus, while the poem and LLMs operate in very different realms (human experience vs. statistical computation), both rely on the idea that recognizing patterns from the past is crucial to shaping or anticipating what comes next.
Do you see this destroying prediction-based markets (i.e. the stock market and Polymarket)?
Markets exist because there's uncertainty about the future. If LLMs can predict with extremely high accuracy, would there no longer be a need for markets?
It is one thing to predict the future and have everyone not know about the predictions, but in a world where many people will be able to use LLMs to predict the future, the lower the quality of the predictions will be because they won't take into account that there are other agents predicting the future, which would influence the action of those agents, so you end up in a game theory scenario not that dissimilar from what we have now
I think you could simply shift the market 6 months in the future. no prediction system will be perfect for arbitrarily long horizons at reasonable cost.
Do you plan to share the source code to see if we could replicate this?
But we have free/very very low tiers for academia.
So in case you need access for your research, go to https://www.newscatcherapi.com/free-news-api
Or feel free to email me directly at artem@newscatcherapi.com
Danny and team our old friends who are using our free/super-low pricing for academia and researchers.
AMA, or feel free to email artem@newscatcherapi.com
https://www.newscatcherapi.com/free-news-api
The other way is to alter the future to match your predictions.
This is something to think about when you combine something like this kind of training with agentic workflows.
[0] https://retrochronic.com/#synthetic-templexity
also, self play seems quite an intuitive approach. There's another interesting paper from deep mind about play
We don't usually discuss how people choose to ground their ontological beliefs, but why not? Why did you choose to ground "reasoning" in the way you do? If you didn't choose, why not?
> Reasoning is a social construct
The word "reasoning" is a "social construct," as all words are. Reasoning itself is not. Our brains do things. Reasoning is one of them. The word "reasoning" is one of the labels, the approximations, that we use when we name that activity.
Changing the label doesn't change the fact that there exists something that we're naming.
The person you're answering is asking whether reasoning -- that thing that really, actually exists -- is one of the activities LLMs perform. It's a valid question.
And the answer is that LLMs do not reason. Or if they do, we have no evidence of it or way of verifying that we actually understand qua reasoning the activity the LLM is performing (which is to say nothing of the fact that reasoning requires a reasoner). Anyone who says that LLMs reason is mistaking special effects/simulation for reality and, in essence, believes that whenever they see a picture of a dog on their computer screens, there must be a real, actual dog somewhere in the computer, too.
Deleuze and Guattari's idea of striation and smooth space is a more honest approach to how we describe and interact with the world.
Let's say that here "I" is taken as synonym of "the present reflective attention".
Can the question "did I chose to ground reasoning?" in such a context be attached to a meaningful interpretation? And if so, is the answer reachable by the means available to "I"? Can "I" transcend "my" beliefs through contemplation of "my" own affabulations?
This is generally regarded by engineer-types as false, but societal taboos and power structures can be revealed by noting what speech provokes the strongest reactions.
I'm not sure what links you try to show and what you try to argue here though.
Ok I'll bite. Who is the marginalized Other?
This will appear as common sense or naturally true if you're inside the LLMs-cant-reason ideology.
That doesn't mean that a silicon based reasoning entity is an ontological impossibility. But if it is to become a reality, it's not necessarily through LLM that such an entity will be spawn.
Lot of game playing going on here to center on a victim narrative.
I don't know whether it's past the mark enough to be considered a "taboo" yet, but the other comment replying to him is certainly treating it as taboo. I would note that many, many other people particularly in academia/important society act the same way as the other commenter. I'd also note I have felt strong social pressure to not hold the beliefs I hold about LLM's capacity for reasoning, including actually losing meaningful social status.
Probably worth remembering that different subcultures have different taboos.
That is not how oppression and power work. That's not how discussion works. That not how Foucault's analysis of power works.
It's examining published news / research / whatever (input), making statistical predictions, and then comparing (playing) it against other predictions to fine-tune the result
Also this style of tasks is prone to overfitting. i.e. instead of predicting, the model just memorises what the results are.
The key advantage of self-play is that we don't actually have labels for the "right" probability to assign any given question, only binary outcomes - each event either happened (1.0) or did not happen (0.0).
Our thinking was that by generating multiple predictions and ranking them by proximity to the ground truth, self-play incentivizes each agent to produce more finely calibrated probabilities - or else the other agent might come just slightly closer to the actual outcome.
I think people are predictable and therefore predicting the next article on a political leader should be theoretically possible.