All the graphics for these are made in Chalk which is a python port of Haskell's Diagrams library to https://github.com/chalk-diagrams/chalk . Honestly I mostly make the puzzles as an excuse to hack on the graphics library which I find pretty interesting.
cherryteastain 297 days ago [-]
I really like the concept, but both Colab and locally running jupyter notebook seem to have issues. I'm getting an error related to "env.height" (can send you the full stacktrace if interested) in the very first puzzle.
srush 297 days ago [-]
Oh no, yes, please send a stack trace (although if it is in colab I should be able to repro)
cherryteastain 297 days ago [-]
Nevermind, I think it was just me being silly and not running the bit with wget at the top!
andriym 297 days ago [-]
keep 'em coming!
blt 297 days ago [-]
This is cool! It might sound silly, but figuring out `None` and `...` indexing in numpy was one of the more delightful moments of learning as a programmer for me.
Tangent: I wish we had not settled on the word "tensor" for multidimensional arrays. Yes there is some isomorphism with the multilinear map definition, but the huge majority of code using >2D arrays has nothing to do with multilinear maps.
olives 297 days ago [-]
If you like these puzzles, you might also enjoy the minitorch [1] teaching library by the same author, which is a course where you implement a simplified version of the pytorch library from scratch.
Completing that course and understanding the differences between the simplified and full versions has been the most useful deep learning teaching resource for me to date.
These are very fun. I would say that they are not entirely in order of difficulty so if you get stuck you should try some of the later ones (you can shim the ones you didn’t solve yet if you’d like).
The visualizations are great. Incredibly useful when debugging without being too noisy.
blt 296 days ago [-]
@srush some feature requests after finishing these:
- It would be nice if we could use Boolean arrays as 0/1 values in arithmetic without multiplying by 1 first, as Torch and NumPy allow.
- The coloration of the output diagrams is hard to read - it seems to separate positive, negative and zero based on yellow/blue/white, but it's almost impossible to tell the different shades of yellow and blue apart.
- Some of the randomized test cases seem wrongly scaled - for example, the target outputs for "bucketize" tend to be almost all zeros.
- I agree with the other comment that the difficulty is not monotonic. In particular the puzzles after "scatter_add" should probably be moved in the neighborhood of "cumsum".
Really fun and genuinely useful. I'm making these requests not just to be picky, but because I plan to suggest this to people getting started in the field. Can make PRs if you don't have time.
Looking forward to trying your other puzzle sets!
triyambakam 297 days ago [-]
Would these be useful to work through as someone learning ML engineering?
Edit: and if so, please explain the didactic benefit.
srush 297 days ago [-]
Yup. I often find people learning ML Engineering struggle a lot with shapes and broadcasting. The goal of these puzzles is to force you to really learn the semantics of broadcasting and internalize that data shapes in ML correspond to how most people think about loops.
The motivation was primarily teaching point-free, array programming. I don't think it is a great style, but it is fun as a brain teaser.
If you enjoy this type of thing, I made a bunch more. They're all kind of ML + PL in style.
- https://github.com/srush/gpu-puzzles
- https://github.com/srush/tensor-puzzles
- https://github.com/srush/autodiff-puzzles
- https://github.com/srush/transformer-puzzles
- https://github.com/srush/LLM-Training-Puzzles
- https://github.com/srush/triton-puzzles
All the graphics for these are made in Chalk which is a python port of Haskell's Diagrams library to https://github.com/chalk-diagrams/chalk . Honestly I mostly make the puzzles as an excuse to hack on the graphics library which I find pretty interesting.
Tangent: I wish we had not settled on the word "tensor" for multidimensional arrays. Yes there is some isomorphism with the multilinear map definition, but the huge majority of code using >2D arrays has nothing to do with multilinear maps.
Completing that course and understanding the differences between the simplified and full versions has been the most useful deep learning teaching resource for me to date.
[1] https://minitorch.github.io/
The visualizations are great. Incredibly useful when debugging without being too noisy.
- It would be nice if we could use Boolean arrays as 0/1 values in arithmetic without multiplying by 1 first, as Torch and NumPy allow.
- The coloration of the output diagrams is hard to read - it seems to separate positive, negative and zero based on yellow/blue/white, but it's almost impossible to tell the different shades of yellow and blue apart.
- Some of the randomized test cases seem wrongly scaled - for example, the target outputs for "bucketize" tend to be almost all zeros.
- I agree with the other comment that the difficulty is not monotonic. In particular the puzzles after "scatter_add" should probably be moved in the neighborhood of "cumsum".
Really fun and genuinely useful. I'm making these requests not just to be picky, but because I plan to suggest this to people getting started in the field. Can make PRs if you don't have time.
Looking forward to trying your other puzzle sets!
Edit: and if so, please explain the didactic benefit.