Funny to see this coming up again after all these years. I was the person who reported this issue to the Chromium team, 3 years before that blog posts. They marked it as won't fix, multiple times (other people reported it too), before they fixed it. Safari, Firefox, and IE were all already using more robust PRNGs for Math.random() that did not exhibit this issue.
I came across it because of some tests I had for a statistics library I was working on. There was a test that created an array with 100k random values that shouldn't have had any collisions, but I consistently got multiple collisions, only in Chrome.
magicalist 34 days ago [-]
Pretty sure it was [1] that finally got them moving (old hacker news discussion here[2]). Not to take away from your bug report, but there were a bunch :) The main issues were that the spec allows engines to pick their own implementation (including only having 2^32 possible values) and there were benchmarks like sunspider where overfitting encouraged the implementation to be as fast as possible. That was a lot less of an excuse for the RNG they used in 2012-2015 compared to when it was added in 2007 or 2008, but by then there was a decent argument that developers should go to crypto.getRandomValues() if they wanted actual random values, which probably stalled action longer than it should have.
> Safari, Firefox, and IE were all already using more robust PRNGs for Math.random
This wasn't true at the time. I actually don't remember what Safari was doing in 2012 so I won't say that one for sure, but Firefox and IE were both using garbage RNGs as well, they were probably just generating across the full (or at least more of the) (0, 1] double range so didn't produce the collisions you saw in your testing.
Safari, Firefox, and IE all had no trouble producing 100k random values without a collision. Chrome couldn't do it without at least two collisions. Chrome's `Math.random()` performance/speed was not measurably better despite, but it could only produce 2^32 random values despite IEEE-754 FP supporting up to 2^53. That is how I found this bug, because only Chrome used low enough entropy. Which is definitely worse.
My recollection of that time working on that stats library was that Chrome took many shortcuts to max out things like SunSpider. There were other Chrome specific workarounds I had to implement to address what I considered bugs, but they were definitely in lightly defined areas like `Math.random()`.
I couldn't find the original bug I file, but the discussion on it specifically noted that they could just copy the Firefox one (which was noted to be an improvement over Chrome's) but they decided on Won't Fix until someone more important than me drew attention to it. It was trivially fixed within a week once that happened.
hashseed 33 days ago [-]
Hi. I'm the V8 engineer who implemented the fix and wrote the blog post.
The reason the initial bug report [0] was marked as WAI but the Medium blog post got a lot more traction is very simple and much more human.
When the initial issue was filed to the chromium bug tracker, it was routed to Chrome's security team. From a security perspective, Math.random() provides no guarantees about cryptographic safety, so naturally it was marked as WAI – nobody from the V8 team actually saw this issue. One of the suggestions "3. Make crypto.random(size)" was acted upon though, and so crypto.getRandomValues() was introduced in Chrome 11, about 10 months after the issue report. In terms of specifying and introducing new Web APIs, this was incredibly fast.
After the Medium blog post was published, someone filed an issue directly to the V8 project. It did not question the spec compliance, but pointed out that the PRNG quality could be better. That nerd-sniped me into researching this topic and after reading a few papers, I implemented the fix with xorshift128+. I'm thankful that my team lead allowed me to set aside some time to work on this even though it was not on our project roadmap.
Performance regressions that could affect our benchmark performance was of course part of the consideration, but there were many options to avoid a performance regression. In V8, crossing the boundary between machine code compiled from JS to C++ runtime builtins is fairly expensive. I could have either ported xorshift128+ to assembly so that this boundary crossing was not necessary, or I could have amortized the boundary crossing cost by buffering multiple random values. I chose the latter because porting to assembly is error prone and I would have to do this for each platform. There are better options in V8 nowadays, by expressing the algorithm in an intermediate representation, but back then that was not available.
I'm kind of glad now that past-me decided to prepend a simple counter to my randomly generated IDs (using Math.random). I haven't observed any collisions with that approach (but then again... I could also just drop the Math.random part and still not likely have collisions unless a human manually chose an ID like "_1" ... but also I'm the only human in this equation)
somat 35 days ago [-]
ah, the seed, an innocent implementation detail, but many people latch on to this and missuse random as a hash function. The one I remember was this screwball long term archival system from the early 2000's, zipped business archives on cd, the zip files were password protected. with the password being a hash of several factors about the data(unit, date, etc). A little stupid, but not the worst thing I have ever seen. The bad part was the hashing algorithm used, it was perls rand(). When I saw that I was vaguely horrified and a few test's confirmed my fears, the perl rand function would vary across operating systems and even different version of perl would change it. I think they ended up having to keep an explicit database of all the passwords just to have a fighting chance at opening those files later. not that they ever would. no one ever looks at long term records once they are stored.
Anyway, I like how the openbsd folk fixed the old C rand family of functions. they made their seed functions do nothing and tied the generators to a strong kernel random source, that is, they made rand actually random. If your application actually depends on rand as a hash function they provided the old behavior under srand_deterministic(3)
"Standards insist that this interface return deterministic results. Unsafe usage is very common, so OpenBSD changed the subsystem to return non-deterministic results by default."
That is a phenomenally stupid thing to do. There is legitimate need for reproducible pseudo-random sequences derived from a known random seed. If you break seeding that can't be done. This is how fuzz testing with generated inputs works.
twic 35 days ago [-]
The spec [1] only requires that the sequence of pseudo-random numbers is repeated for a given seed within a single process ("subsequent" and "is then"). That means rand as specified is useless for reproducible sequences in fuzzing and so on. So, programs which do that already need to supply their own random number generators, and are not affected by this change.
Whereas lots of programs use rand as a source of general-purpose random numbers, but get the seeding wrong, and so suffer from various problems. This change fixes those programs.
That's an interesting way to read the specification, seems contrary to the intent and differs from how historic unix systems have implemented it. The author of the man page also disagrees with you on it being allowed by the standard ("If the standardized behavior is required srand_deterministic() can be substituted for srand()...").
Fucking things up for people who understand how PRNGs work just to fix things for people who don't seems backwards; things like this is why configure scripts take over a minute to run. I might be okay with the non-determinism in rand() if srand isn't ever called, but intentionally ignoring the stated intent of the program just pisses me off.
kbolino 35 days ago [-]
> non-determinism in rand() if srand isn't ever called
This is the path Go took for its math/rand library eventually: if you don't seed the RNG, it's a CSPRNG, but if you do seed it, it's deterministic.
However, there is lots of "advice" out there to seed your RNG with "random" values like the current time to make it "more secure" (and to be fair, in some cases, you may get sufficient entropy for the purpose at hand by doing this, but in most cases, you won't get enough and you'll spend it very fast). So a call to srand may indicate that you know what you're doing and want determinism and understand the consequences, but it doesn't necessarily mean that.
cpeterso 35 days ago [-]
> seed your RNG with "random" values like the current time
Good point. srand(time(NULL)) is a common, yet nondeterministic, seed. The caller clearly doesn't expect rand() to generate a particular sequence of numbers, so srand() could check whether the given seed looks like a time_t close to the current time() and, if it does, then use a proper CSPRNG instead.
leni536 35 days ago [-]
That's a horrifying suggestion.
kbolino 35 days ago [-]
Since any good solution would start with deleting rand and srand from the C standard library entirely, but that's never going to happen, we only have not-good solutions to consider.
The spec itself contains an informative section which says:
> The following code defines a pair of functions that could be incorporated into applications wishing to ensure that the same sequence of numbers is generated across different machines.
And then gives code which does not use rand/srand. The intent of the spec is very clearly not for it to be portably reproducible.
crdrost 35 days ago [-]
That is a legitimate need, but it should not be the default.
Another example of when you need it, is games. I want to store in the game state, that on Day 32 of being in this cave, there is randomly a cache of diamonds if you mine the east wall. If I generate this random number dynamically, then I permit save-scumming behaviors which may distract from the artistic points of my game. So I basically want a random seed and a hash function, hashing (seed, day, activity, location, attempt_number) to see if something is successful.
duskwuff 34 days ago [-]
> There is legitimate need for reproducible pseudo-random sequences derived from a known random seed.
Correct, but rand() / srand() aren't an effective way of providing for that need. The state of the RNG is process-global; it rapidly becomes nondeterministic in multithreaded applications, let alone if a library you're using calls rand() internally. And, as mentioned upthread, the algorithm is implementation-defined, and isn't guaranteed to remain the same over time.
kbolino 35 days ago [-]
Yes, there is a legitimate need, and when you know you need it, you have already demonstrated you know what you are doing, and therefore can find the proper tool, even if it's not the default.
Whereas, when you don't know what you're doing, the default should not be a footgun.
kevin_thibedeau 35 days ago [-]
C has a standard library with known guarantees. Pulling the rug on those guarantees is irresponsible.
The appropriate solution is to not break anything and use tooling that flags use of banned symbols that can have negative safety or security implications.
kbolino 35 days ago [-]
The problem is not that OpenBSD doesn't follow the standard, it's that the standard is wrong. The default random number facility should be cryptographically secure. Most developers do not know the difference; giving them a deterministic, somewhat more performant, but completely insecure RNG by default expresses the fallacious assumption that they do. This is not the 1980s anymore, virtually every computer is connected to the Internet, there are hundreds of millions of software developers across the globe across all skill levels, and now we have them using LLMs to contend with as well, such naivete is no longer sustainable.
somat 34 days ago [-]
Yes, and that function goes by several names, I call it the hash function, a deterministic operation that returns unordered results. The non-deterministic variant of this I call random(). if you want deterministic results, you should use the deterministic function.
mystified5016 35 days ago [-]
Having your random number generator be deterministic by default is extremely bad. Using a deterministic seed should be the exceptional case.
In those cases, they provide deterministic operation under a different set of functions so nobody does anything as stupid as using deterministic RNG as encryption
Someone 35 days ago [-]
But if you compile some third-party code that doesn’t know of that API change, and then use it, that can lead to disaster. For an example, see the GP post at https://news.ycombinator.com/item?id=42971287.
I think it would have been better to remove rand and srand, and provide two replacement functions with different names, or require all users to pick one using a #define.
lionkor 35 days ago [-]
That's an interesting story, thanks for sharing.
However, it is the reason that (what feels like) every single C function has a footnote, along the lines of "*except on FordBAT OS, where mkdir doesn't create a directory, all floats are inf, and negatives are positive -- because it seemed convenient"
GrantMoyer 35 days ago [-]
Xorshift is really cool because of how simple it is to comprehend and implement despite how well it works as a PRNG. The initial paper describing it, "Xorshift RNGs"[1], is surprisingly short and approachable, and I encourage anyone with any interest to read through it.
It doesn't seem to work well for smaller sizes when shifting left by 1. I'm sitting in a waiting room, playing with this on paper with four-digit binary numbers, and so far it seems to rapidly fall into a loop of 1,3,5,15,1...
edit: they all fall into their own loops. I forget the term for it, but the pattern is, unless I've made an arithmetic mistake (likely!):
1,3,5,15,1...
2,6,10,14,2...
4,12,4...
7,9,11,(1,3,5,15,1...)
8,8...
14,7,9,11,(1,3,5,15,1...)
GrantMoyer 34 days ago [-]
Note that you need to use 3 shifts. All the triplets of shifts listed in the paper should generate sequences with period 2^32-1 or 2^64-1 depending on the state size, because the tables were generated with that constraint. For example the first triplet for a 32 bit state corresponds to the algortihm:
def advance_state(state):
state ^= state << 1
state ^= state >> 3
state ^= state << 10
return state
pavel_lishin 34 days ago [-]
I should really have read the whole paper, but it's hard to read a PDF on my phone :/
I also definitely wasn't up for working with 8-digit binary numbers on my graph paper. What I consider may be weird, but it's not that weird :P
(But I did do a little reading about LFSRs, and I wonder if the amount the numbers are shifted by is related to the stuff about arithmetic fields & groups & such... maybe I'll look into it later, but statistically speaking, probably not :) )
wruza 35 days ago [-]
For the curious, xorshift128+:
uint64_t a = s[0];
uint64_t b = s[1];
s[0] = b;
a ^= a << 23;
a ^= a >> 18;
a ^= b;
a ^= b >> 5;
s[1] = a;
return a + b;
Fun fact, the state update function can be modelled as the multiplication of the state vector by a constant matrix (where the vector/matrix elements are all in GF(2)). By raising that matrix to a power n, you can use the resultant matrix to compute n state updates at once, allowing you to "skip ahead" arbitrarily into the output stream.
Hypothetically, you could use this to vectorize/parallelize computation of xorshift128+ output sequences in large batches, but I'm not sure anyone's ever bothered to do this because the naive sequential method is so cheap.
8n4vidtmkvmk 34 days ago [-]
I could see some speedrunners wanting to do this. I think it was in Windwaker some speedrunners ran some statistical analysis so they could beat this one mini-game where the odds were stacked against you.
35 days ago [-]
shiftingleft 35 days ago [-]
The easiest way to parallelize this RNG is to just run it in parallel on multiple states.
maxloh 35 days ago [-]
I learned in college that most random number generators use the current timestamp to generate a pseudo-random number.
How does this differ cryptographically from algorithms like MWC1616 and xorshift128+ in backend applications? How can a random number generator be vulnerable to attacks in real life?
Does it really matter which random number generator you use on a webpage? It's already considered an insecure environment anyway (that's why we have server-side validation).
rkangel 35 days ago [-]
They're call PRNG because they are pseudo random. If you know the starting state, you know what sequence they're going to generate. Putting it another way, if you always start them in the same state, they'll always produce the same sequence.
So a PRNG needs "seeding". You need some value to put in at the start that is different every time. Using the time on the computer results in a different seed on each run, and so a different number sequence each time.
If you need random numbers for cryptography (very good quality random numbers are important for lots of crypto processes) then you need to do more - they need to be completely unpredictable. If you just used the time, someone could seed a copy of the RNG with the same time, and predict the randomness sequence. So what we want is a seed that comes from actual randomness (also called "entropy"). That entropy has to come from somewhere - ideally some hardware source using some statistically random physical effect.
magicalhippo 35 days ago [-]
> Putting it another way, if you always start them in the same state, they'll always produce the same sequence.
This can be easy to bump into in fairly trivial cases.
For example, initializing by time alone can be problematic if you spawn N threads at once which all do their own seeding. Then they can all reach the seeding process at the same time, relative to the clock used for the seeding.
I had that problem with one of my earlier ray tracers, and so had to add add the thread id into the mix to seed the PRNGs.
markh1967 35 days ago [-]
A demonstration of exactly this problem (only in older versions of .Net, newer versions use a different seeding method)
Agreed. I do a lot of Monte Carlo simulations of physical systems (radars, scientific models) in which same or close seeds will cause trouble.
You do want a deterministic PRNG (for repeatability of the simulation). But if you seed with time alone, and spawn 100 parallel jobs, you will see repeats or near-repeats which will bias the Monte Carlo averages resulting from your simulation ensemble.
In ancient times, you could xor in the PID and the IP address of the host to ensure no seed overlap. Nowadays there are other ways to introduce entropy to the seed, e.g. with dedicated system calls or /dev/random.
UncleMeat 35 days ago [-]
The key thing with crypto is not the source of the seed, but the algorithm. Throwing true randomness into a random number generator that isn't mean to provide cryptographic hardness won't save you and a cryptographically secure pseudo random number generator is often just fine when seeded without true randomness.
toast0 35 days ago [-]
It's both. Cryptographic generators do have better properties on their outputs than noncryptographic hashes (usually at a cost in cpu time, although the cost is often small enough to ignore); but if the seed is guessable, the output sequence is guessable too.
Using timestamp in seconds and process id for example would be easy enough to guess. There was some good work about a decade ago to provide more usable apis to get a quality seed from the OS (getentropy, getrandom), so using a self built seed is pretty suboptimal unless you have a case where you need to repeat the sequence later (recording the seed from the OS is also an option)
UncleMeat 35 days ago [-]
It depends on what you mean by guessable. Guessing a few bits of the seed with confidence won't help you crack a csprng. Yes, people shouldn't be using a timestamp in seconds as a seed. But you also don't need to spring for hardware randomness in order to get strong protections.
kbolino 35 days ago [-]
The problem with using the current time is that "guessing a few bits with confidence" is the same as guessing the whole thing because there's only a few bits that need to be guessed. You might think a time_t value has 55 and growing years worth of values to guess, but in reality, if I'm attacking you or something related to you, I have probably narrowed it down to days, hours, or even just minutes worth of values to guess. Higher-precision timers confer some additional entropy but even then it's usually not enough -- if I'm intercepting live traffic I have a very good idea of the current time, if I'm forging a signature from a certificate I know when it was issued, if I'm trying to decrypt emails or files I usually have attached metadata to narrow my guesses, etc.
The CSPRNG has only made it practically impossible to predict future outputs from past outputs alone; however, if I can guess the seed, I can predict every output past, present, and future. CSPRNGs aren't magic and have to be used properly, and in any security-sensitive context, properly means with a truly random seed. The best CSPRNG in the world cannot create entropy out of thin air. It can, however, stretch a good source of entropy very far. There are cases when using a CSPRNG with a low-quality seed is appropriate, but the minimum acceptable entropy in the seed depends on the application, not on the RNG.
UncleMeat 35 days ago [-]
I'm not saying to use the current time in seconds. I'm saying that the need for "actual randomness" and being "completely unpredictable" in the comment I responded to is largely overblown. Seeds that aren't drawn from true randomness can be totally fine. Seeds that have a handful of highly biased bits can be fine.
There are some cases in crypto where a secret that has even a single bit biased more than epsilon will break a whole scheme. CSPRNGs don't tend to work that way.
kbolino 35 days ago [-]
Ok, yes, there were some algorithms in the past which had problems with carrying input biases to their output and sometimes even getting totally broken with certain seeds/keys. Nowadays, this isn't the case, and the view is that such issues are fundamental weaknesses rather than mere errata to be worked around.
However, the quality of a modern CSPRNG doesn't save you from a low-entropy seed. A couple biased bits in a 128-bit seed may not be a serious issue (but why take the chance?!), however good entropy estimation is a difficult task which is best left to the operating system. Modern operating systems will ingest sources of hardware randomness and seed their system-level CSPRNGs, which should then be used directly by application software (for key material) or as seeds to local CSPRNGs (for less security-sensitive but more performance-sensitive needs). Do not play around with seed entropy just to be clever.
In the absence of a full-fledged operating system, then things like high-precision boot timers can be used for seeds, but only because there's nothing better available. The lack of good randomness is a real problem in resource-constrained environments, though usually there are easier-to-exploit vulnerabilities that get hit first.
35 days ago [-]
simonmales 35 days ago [-]
Well written, h/t.
masklinn 35 days ago [-]
> I learned in college that most random number generators use the current timestamp to generate a pseudo-random number.
> How does this differ cryptographically from algorithms like MWC1616 and xorshift128+ in backend applications?
That question doesn’t make any sense. The current timestamp would be a seed to initialise the PRNG, which can be MWC or xorshift.
> Does it really matter which random number generator you use on a webpage?
That is application dependent. If you’re coding a roulette wheel no, if you’re coding an E2E chat yes.
> It's already considered an insecure environment anyway (that's why we have server-side validation).
That is also application dependent. The client is considered insecure from the server pov because it is entirely under user control, so the user can mess with anything. But if it is acting entirely on behalf of the user then that can be the point and by design.
In the example above, you don’t care that a user fucks up their own crypto, but you do care that the crypto holds for the average user.
Even the qualitative difference between two non-CS PRNG can make or break a project due to too small a state space or short a cycle.
sverhagen 35 days ago [-]
I don't really disagree with anything you said, but I think the roulette wheel and the E2E chat are funny examples, you could have a high value roulette game and a casual chat about the weather, where the stakes are opposite to your point ;-)
cratermoon 35 days ago [-]
I was thinking this myself.
When significant amounts of real money are sitting on the spin of the wheel and bouncing of the ball,
a bad PRNG like the one discussed in the article would be disastrous.
masklinn 35 days ago [-]
Yeah I was thinking less casino thingie and more https://pickerwheel.com, I guess I used the wrong term for it.
tetha 34 days ago [-]
> That is application dependent. If you’re coding a roulette wheel no, if you’re coding an E2E chat yes.
Or a prominent example are games.
The whole idea of sharing "cool minecraft worlds" or "seeds for cool runs in Balatro/Slay the Spire/..." are possible due to careful use of PRNGs in those games for world generation.
In fact, games often run at least 2 RNGs and possibly more in a multiplayer setting involving networking and encryption of that. You want some kind of "World Generation seed", which is usually a PRNG players can enter funny seeds like "CAFEBABE" into to share worlds. However, you want to protect this PRNG, because otherwise player action changes the world generation.
So you end up with more RNGs of some kind, for example for AI actions, particle movements. And depending on how these RNGs are used, you might link different subsystems together, allowing AI manipulation and such. It's a very interesting thing to think about and design.
tomsmeding 35 days ago [-]
In a class we had to reverse-engineer an application that had encrypted some text file — we had to decrypt it without the key. Turned out the application just used the current unix timestamp as the key. A brute-force search of the past few months didn't take very long.
If you include nanoseconds in your key then you have more entropy, of course, but timestamps are guessable, so it's risky.
sverhagen 35 days ago [-]
Those nanoseconds aren't really entropy then, are they?
lmm 35 days ago [-]
They're entropy for most practical purposes. But there's only 1000 possibilities, so that's less than 10 bits' worth.
boothby 35 days ago [-]
Especially if you're writing a file, the creation time is a big hint! Even nanoseconds won't help if you're leaking that many bits from your seed.
tomsmeding 35 days ago [-]
That's milliseconds; nanoseconds are 1e-9 seconds. Still, the file creation time remark of sibling is pertinent.
lmm 34 days ago [-]
I was thinking of the step from milli to nano, so yeah 1e6 not 1e3, 20 bits not 10. But the point stands.
lmm 35 days ago [-]
> I learned in college that most random number generators use the current timestamp to generate a pseudo-random number.
> How does this differ cryptographically from algorithms like MWC1616 and xorshift128+ in backend applications? How can a random number generator be vulnerable to attacks in real life?
In the early days of online poker some people made a killing by figuring out all the possible hands from the PRNGs being used and then they could just start a game and look up the hands they'd been dealt in their database.
Retr0id 35 days ago [-]
xorshift128+ has exactly 128 bits of internal state. After collecting a small number of outputs, it is trivial to determine the internal state by constructing and then solving an equation (where the variables are GF(2) elements), regardless of how the "seed" was initialized. Once you have the internal state, you can predict all future outputs with 100% certainty.
This only becomes a problem when people assume this isn't possible (for example, using it to pick lottery numbers) - this is more likely to happen on the server-side than the client-side (and v8 runs on both), but it can be security-relevant in client-only apps too (consider an insecure "password generator" app).
By the way, the "password generator app" is not a contrived hypothetical, it was actually one of my first ever bug-bounty payouts.
spacechild1 35 days ago [-]
> I learned in college that most random number generators use the current timestamp to generate a pseudo-random number.
People typically seed a PRNG with the current time. The actual output sequence is still created with a determinstic algorithm. These two things are really orthogonal.
markh1967 35 days ago [-]
It's been a while so I can't provide a link but I remember reading about someone who got hold of a video blackjack machine that used a pseudo random number generator and was able to find the seed from only a few hands and then knew exactly which cards would be dealt next. He then went on to empty every machine he could find in the local casinos.
atoav 35 days ago [-]
You can go back into a time before computers and use the most famous examples of how bad RNGs endanger crypto: One Time Pads (OTP). This is basically just a XOR of your message with your random numbers. If your random numbers are bad they leave patterns in the XOR-ed message that can be statistically analyzed and potentially decrypted.
Bad PRNGs repeat early and have predictable patterns, as such they form a special attack vector.
Whether that is really an issue for any given application, depends on the application (e.g. how shortlived the crypto is and how bad decryption acter the fact is). But if you are asking that question it likely means you don't know enough knowledge to roll your own crypto unless you wanna be in a world of pain.
Just because PRNGs appear unpredictable to you does not mean they aren't.
If you really wanna go the route, at least add a salt, so something like
One attack might be exploiting One Time Passwords. Lets say you send yourself enough tokens to calculate the state, you will then be able to calculate the next token and login as another user. However that should be easily fixed by adding some salt relating to the user, meaning you would only be able to figure out _your_ next token.
codesnik 35 days ago [-]
One mistake that bitten me a couple of times is not to reseed after fork(). AFAIR Perl calls srand automatically on fork, and I've been surprised that ruby at circa version 1.8 - didn't, which gave me quite some interesting behavior on production.
// Global PRNG: set Math.random.
seedrandom('hello.', { global: true });
console.log(Math.random()); // Always 0.9282578795792454
I found this useful when using worker threads to 'render' a construction by slices that included some randomness (that should be the same for each thread)
svachalek 34 days ago [-]
Replacing a system function is a quick and easy hammer but not great for maintenance. It's not clear that calls to Math.random() are now actually going to an npm, and if someone deletes that seedrandom line it's not clear that calls to Math.random() are now not actually going to an npm.
For new/clean code it's better to just import a seeded random function for clarity.
What applications care about this improvement, considering it's not cryptographically secure?
GrantMoyer 35 days ago [-]
I've hit issues with bad quality PRNGs when using them for Monte Carlo path tracing. With a non-uniform PRNG, the simulation can take a lot longer to converge to a clear image, because some ray paths can be severely under or over represented.
brunoborges 35 days ago [-]
I am curious what other languages (and runtimes) have been using so far. Anyone knows?
GrantMoyer 34 days ago [-]
C++ has had a few options since C++11[1].
- Linear congruential generator[2]:
xₜ = a xₜ₋₁ + c mod m
- Mersenne Twister[3]:
too complicated for a few lines
- Subtract with carry[3]:
k < p
carryₜ = [xₜ₋ₖ - xₜ₋ₚ - carryₜ₋₁ < 0]
xₜ = xₜ₋ₖ - xₜ₋ₚ - carryₜ₋₁ mod 2ⁿ
All three algorithms are fairly popular.
(Please excuse the unidiomatic index and constant names, I'm at the mercy of unicode)
Note that all 3 of these are fairly bad. Xoshiro and PCG are the 2 modern families of good fast rngs.
kittikitti 35 days ago [-]
Thank you so much for this, by coincidence I was looking for good sources about Pseudo Random Number Generators and how we program them. This does the trick and explains the levels of determinism and the most popular PRNG algorithms.
paulpauper 35 days ago [-]
the solution is to the use the RNG to build a key , not create the key with the bad RNG
https://stackoverflow.com/questions/9550796/why-is-google-ch... https://issues.chromium.org/issues/40404440
I came across it because of some tests I had for a statistics library I was working on. There was a test that created an array with 100k random values that shouldn't have had any collisions, but I consistently got multiple collisions, only in Chrome.
> Safari, Firefox, and IE were all already using more robust PRNGs for Math.random
This wasn't true at the time. I actually don't remember what Safari was doing in 2012 so I won't say that one for sure, but Firefox and IE were both using garbage RNGs as well, they were probably just generating across the full (or at least more of the) (0, 1] double range so didn't produce the collisions you saw in your testing.
[1] https://medium.com/@betable/tifu-by-using-math-random-f1c308...
[2] https://news.ycombinator.com/item?id=10598065
My recollection of that time working on that stats library was that Chrome took many shortcuts to max out things like SunSpider. There were other Chrome specific workarounds I had to implement to address what I considered bugs, but they were definitely in lightly defined areas like `Math.random()`.
I couldn't find the original bug I file, but the discussion on it specifically noted that they could just copy the Firefox one (which was noted to be an improvement over Chrome's) but they decided on Won't Fix until someone more important than me drew attention to it. It was trivially fixed within a week once that happened.
The reason the initial bug report [0] was marked as WAI but the Medium blog post got a lot more traction is very simple and much more human.
When the initial issue was filed to the chromium bug tracker, it was routed to Chrome's security team. From a security perspective, Math.random() provides no guarantees about cryptographic safety, so naturally it was marked as WAI – nobody from the V8 team actually saw this issue. One of the suggestions "3. Make crypto.random(size)" was acted upon though, and so crypto.getRandomValues() was introduced in Chrome 11, about 10 months after the issue report. In terms of specifying and introducing new Web APIs, this was incredibly fast.
After the Medium blog post was published, someone filed an issue directly to the V8 project. It did not question the spec compliance, but pointed out that the PRNG quality could be better. That nerd-sniped me into researching this topic and after reading a few papers, I implemented the fix with xorshift128+. I'm thankful that my team lead allowed me to set aside some time to work on this even though it was not on our project roadmap.
Performance regressions that could affect our benchmark performance was of course part of the consideration, but there were many options to avoid a performance regression. In V8, crossing the boundary between machine code compiled from JS to C++ runtime builtins is fairly expensive. I could have either ported xorshift128+ to assembly so that this boundary crossing was not necessary, or I could have amortized the boundary crossing cost by buffering multiple random values. I chose the latter because porting to assembly is error prone and I would have to do this for each platform. There are better options in V8 nowadays, by expressing the algorithm in an intermediate representation, but back then that was not available.
[0] https://g-issues.chromium.org/issues/40404440 [1] https://developer.mozilla.org/en-US/docs/Web/API/Crypto/getR... [2] https://g-issues.chromium.org/issues/40643979
Anyway, I like how the openbsd folk fixed the old C rand family of functions. they made their seed functions do nothing and tied the generators to a strong kernel random source, that is, they made rand actually random. If your application actually depends on rand as a hash function they provided the old behavior under srand_deterministic(3)
"Standards insist that this interface return deterministic results. Unsafe usage is very common, so OpenBSD changed the subsystem to return non-deterministic results by default."
makes me laugh every time.
http://man.openbsd.org/rand
That is a phenomenally stupid thing to do. There is legitimate need for reproducible pseudo-random sequences derived from a known random seed. If you break seeding that can't be done. This is how fuzz testing with generated inputs works.
Whereas lots of programs use rand as a source of general-purpose random numbers, but get the seeding wrong, and so suffer from various problems. This change fixes those programs.
[1] https://pubs.opengroup.org/onlinepubs/009696699/functions/ra...
Fucking things up for people who understand how PRNGs work just to fix things for people who don't seems backwards; things like this is why configure scripts take over a minute to run. I might be okay with the non-determinism in rand() if srand isn't ever called, but intentionally ignoring the stated intent of the program just pisses me off.
This is the path Go took for its math/rand library eventually: if you don't seed the RNG, it's a CSPRNG, but if you do seed it, it's deterministic.
However, there is lots of "advice" out there to seed your RNG with "random" values like the current time to make it "more secure" (and to be fair, in some cases, you may get sufficient entropy for the purpose at hand by doing this, but in most cases, you won't get enough and you'll spend it very fast). So a call to srand may indicate that you know what you're doing and want determinism and understand the consequences, but it doesn't necessarily mean that.
Good point. srand(time(NULL)) is a common, yet nondeterministic, seed. The caller clearly doesn't expect rand() to generate a particular sequence of numbers, so srand() could check whether the given seed looks like a time_t close to the current time() and, if it does, then use a proper CSPRNG instead.
> The following code defines a pair of functions that could be incorporated into applications wishing to ensure that the same sequence of numbers is generated across different machines.
And then gives code which does not use rand/srand. The intent of the spec is very clearly not for it to be portably reproducible.
Another example of when you need it, is games. I want to store in the game state, that on Day 32 of being in this cave, there is randomly a cache of diamonds if you mine the east wall. If I generate this random number dynamically, then I permit save-scumming behaviors which may distract from the artistic points of my game. So I basically want a random seed and a hash function, hashing (seed, day, activity, location, attempt_number) to see if something is successful.
Correct, but rand() / srand() aren't an effective way of providing for that need. The state of the RNG is process-global; it rapidly becomes nondeterministic in multithreaded applications, let alone if a library you're using calls rand() internally. And, as mentioned upthread, the algorithm is implementation-defined, and isn't guaranteed to remain the same over time.
Whereas, when you don't know what you're doing, the default should not be a footgun.
The appropriate solution is to not break anything and use tooling that flags use of banned symbols that can have negative safety or security implications.
In those cases, they provide deterministic operation under a different set of functions so nobody does anything as stupid as using deterministic RNG as encryption
I think it would have been better to remove rand and srand, and provide two replacement functions with different names, or require all users to pick one using a #define.
However, it is the reason that (what feels like) every single C function has a footnote, along the lines of "*except on FordBAT OS, where mkdir doesn't create a directory, all floats are inf, and negatives are positive -- because it seemed convenient"
[1]: https://www.jstatsoft.org/article/download/v008i14/916
edit: they all fall into their own loops. I forget the term for it, but the pattern is, unless I've made an arithmetic mistake (likely!):
1,3,5,15,1...
2,6,10,14,2...
4,12,4...
7,9,11,(1,3,5,15,1...)
8,8...
14,7,9,11,(1,3,5,15,1...)
I also definitely wasn't up for working with 8-digit binary numbers on my graph paper. What I consider may be weird, but it's not that weird :P
(But I did do a little reading about LFSRs, and I wonder if the amount the numbers are shifted by is related to the stuff about arithmetic fields & groups & such... maybe I'll look into it later, but statistically speaking, probably not :) )
Hypothetically, you could use this to vectorize/parallelize computation of xorshift128+ output sequences in large batches, but I'm not sure anyone's ever bothered to do this because the naive sequential method is so cheap.
How does this differ cryptographically from algorithms like MWC1616 and xorshift128+ in backend applications? How can a random number generator be vulnerable to attacks in real life?
Does it really matter which random number generator you use on a webpage? It's already considered an insecure environment anyway (that's why we have server-side validation).
So a PRNG needs "seeding". You need some value to put in at the start that is different every time. Using the time on the computer results in a different seed on each run, and so a different number sequence each time.
If you need random numbers for cryptography (very good quality random numbers are important for lots of crypto processes) then you need to do more - they need to be completely unpredictable. If you just used the time, someone could seed a copy of the RNG with the same time, and predict the randomness sequence. So what we want is a seed that comes from actual randomness (also called "entropy"). That entropy has to come from somewhere - ideally some hardware source using some statistically random physical effect.
This can be easy to bump into in fairly trivial cases.
For example, initializing by time alone can be problematic if you spawn N threads at once which all do their own seeding. Then they can all reach the seeding process at the same time, relative to the clock used for the seeding.
I had that problem with one of my earlier ray tracers, and so had to add add the thread id into the mix to seed the PRNGs.
https://dotnetfiddle.net/tLGMp6
You do want a deterministic PRNG (for repeatability of the simulation). But if you seed with time alone, and spawn 100 parallel jobs, you will see repeats or near-repeats which will bias the Monte Carlo averages resulting from your simulation ensemble.
In ancient times, you could xor in the PID and the IP address of the host to ensure no seed overlap. Nowadays there are other ways to introduce entropy to the seed, e.g. with dedicated system calls or /dev/random.
Using timestamp in seconds and process id for example would be easy enough to guess. There was some good work about a decade ago to provide more usable apis to get a quality seed from the OS (getentropy, getrandom), so using a self built seed is pretty suboptimal unless you have a case where you need to repeat the sequence later (recording the seed from the OS is also an option)
The CSPRNG has only made it practically impossible to predict future outputs from past outputs alone; however, if I can guess the seed, I can predict every output past, present, and future. CSPRNGs aren't magic and have to be used properly, and in any security-sensitive context, properly means with a truly random seed. The best CSPRNG in the world cannot create entropy out of thin air. It can, however, stretch a good source of entropy very far. There are cases when using a CSPRNG with a low-quality seed is appropriate, but the minimum acceptable entropy in the seed depends on the application, not on the RNG.
There are some cases in crypto where a secret that has even a single bit biased more than epsilon will break a whole scheme. CSPRNGs don't tend to work that way.
However, the quality of a modern CSPRNG doesn't save you from a low-entropy seed. A couple biased bits in a 128-bit seed may not be a serious issue (but why take the chance?!), however good entropy estimation is a difficult task which is best left to the operating system. Modern operating systems will ingest sources of hardware randomness and seed their system-level CSPRNGs, which should then be used directly by application software (for key material) or as seeds to local CSPRNGs (for less security-sensitive but more performance-sensitive needs). Do not play around with seed entropy just to be clever.
In the absence of a full-fledged operating system, then things like high-precision boot timers can be used for seeds, but only because there's nothing better available. The lack of good randomness is a real problem in resource-constrained environments, though usually there are easier-to-exploit vulnerabilities that get hit first.
> How does this differ cryptographically from algorithms like MWC1616 and xorshift128+ in backend applications?
That question doesn’t make any sense. The current timestamp would be a seed to initialise the PRNG, which can be MWC or xorshift.
> Does it really matter which random number generator you use on a webpage?
That is application dependent. If you’re coding a roulette wheel no, if you’re coding an E2E chat yes.
> It's already considered an insecure environment anyway (that's why we have server-side validation).
That is also application dependent. The client is considered insecure from the server pov because it is entirely under user control, so the user can mess with anything. But if it is acting entirely on behalf of the user then that can be the point and by design.
In the example above, you don’t care that a user fucks up their own crypto, but you do care that the crypto holds for the average user.
Even the qualitative difference between two non-CS PRNG can make or break a project due to too small a state space or short a cycle.
Or a prominent example are games.
The whole idea of sharing "cool minecraft worlds" or "seeds for cool runs in Balatro/Slay the Spire/..." are possible due to careful use of PRNGs in those games for world generation.
In fact, games often run at least 2 RNGs and possibly more in a multiplayer setting involving networking and encryption of that. You want some kind of "World Generation seed", which is usually a PRNG players can enter funny seeds like "CAFEBABE" into to share worlds. However, you want to protect this PRNG, because otherwise player action changes the world generation.
So you end up with more RNGs of some kind, for example for AI actions, particle movements. And depending on how these RNGs are used, you might link different subsystems together, allowing AI manipulation and such. It's a very interesting thing to think about and design.
If you include nanoseconds in your key then you have more entropy, of course, but timestamps are guessable, so it's risky.
> How does this differ cryptographically from algorithms like MWC1616 and xorshift128+ in backend applications? How can a random number generator be vulnerable to attacks in real life?
In the early days of online poker some people made a killing by figuring out all the possible hands from the PRNGs being used and then they could just start a game and look up the hands they'd been dealt in their database.
This only becomes a problem when people assume this isn't possible (for example, using it to pick lottery numbers) - this is more likely to happen on the server-side than the client-side (and v8 runs on both), but it can be security-relevant in client-only apps too (consider an insecure "password generator" app).
https://en.wikipedia.org/wiki/GF(2)
People typically seed a PRNG with the current time. The actual output sequence is still created with a determinstic algorithm. These two things are really orthogonal.
Bad PRNGs repeat early and have predictable patterns, as such they form a special attack vector.
Whether that is really an issue for any given application, depends on the application (e.g. how shortlived the crypto is and how bad decryption acter the fact is). But if you are asking that question it likely means you don't know enough knowledge to roll your own crypto unless you wanna be in a world of pain.
Just because PRNGs appear unpredictable to you does not mean they aren't.
If you really wanna go the route, at least add a salt, so something like
I'm sure some other runtimes don't do it, too.
For new/clean code it's better to just import a seeded random function for clarity.
There's Math.random(), and then there's Math.random() (2015) - https://news.ycombinator.com/item?id=38188421 - Nov 2023 (49 comments)
There's Math.random(), and then there's Math.random() - https://news.ycombinator.com/item?id=10751396 - Dec 2015 (106 comments)
- Linear congruential generator[2]:
- Mersenne Twister[3]: - Subtract with carry[3]: All three algorithms are fairly popular.(Please excuse the unidiomatic index and constant names, I'm at the mercy of unicode)
[1]: https://en.cppreference.com/w/cpp/header/random
[2]: https://en.wikipedia.org/wiki/Linear_congruential_generator
[3]: https://en.wikipedia.org/wiki/Mersenne_Twister
[4]: https://en.wikipedia.org/wiki/Subtract_with_carry