I'll agree with a lot of his critiques, but one of his small ones that surprisingly echoed a lot of frustration was the dictum to prefer small definitions.
> Quoting Chuck Moore:
> Forth is highly factored code. I don't know anything else to say except that Forth is definitions. If you have a lot of small definitions you are writing Forth. In order to write a lot of small definitions you have to have a stack.
It seemed like apologetics and a making a virtue out of necessity, given the fact that I don't have the capability to do stack acrobatics in my head live. The only way to be able to read a function in my head, without taking a pencil to paper _was_ small functions. But I found that clashed with the ways some algorithms and procedures naturally expressed themselves in longer multi-step style, and actually ending up being more verbose and tangled with multiple top-level definitions.
It turns out that local variables that compile to C-style indirect (SP + i) accesses are only mildly more expensive than stack acrobatics, but still gave the flexibility of Forth-style metaprogramming. [1]
Ultimately, the author's points about the "Forth philosophy" but not Forth-the-language itself (and extremely spare code) ring true to me.
Given my limitations, life is too short to work to have as minimalist an implementation as you'd like, and to desire to have a interactive development environment in <128k. For me, it's hard enough to implement the "subject" that I'm programming algorithmically/data-driven-ly/amortizing-computation-ly efficiently.
If you move towards a global-first approach(which is what Chuck Moore seems to have moved towards, from anecdote), what changes is that you can substitute the word defining the variable for another word, later down the line, that adds the context, indirection and error handling you need. The mechanism can be added without changing how the word is being used, and you can still write a divide-and-conquer kind of algorithm in this way, it's just more classically imperative in style, with more setting of mutable global temporaries and every byte of bookkeeping directly accounted for.
Part of the minimalist freedom in Forth is that it is agnostic to whether you're using the stack or the dictionary. If you want the word to be unambiguously about a particular space in memory it makes sense to define it first in terms of "it accesses this static location" instead of "it consumes three things on the stack and shuffles them around and indirects one to a memory location and adds the other two", because that inclines all the words to be about the stack. Take the primitive approach - the one that maps well to assembly - first and see how far it goes. You stay in control of how you're extending the language that way. C preempts that because the compiler hides the stack manipulation, so the semantics of the function will default towards locals, and then further extension is guided around fitting it into that.
(And it's true that the compiler gets you to an answer faster, and black-boxes the interface, so you can use code without reading code - and that is coming at the expense of precision around details like this. Forth is probably not the right way, if it's Conway's law that you're up against.)
rep_lodsb 21 days ago [-]
In most assembly languages, accessing local variables on the stack is easy, plus you have multiple registers for temporary data. Forth feels extremely limiting compared to that.
On an architecture without those features, like the 6502, Forth may be a good idea, and possibly faster than C - but only if it's compiled to machine code with some peephole optimizations, so that e.g. "123 MY-VAR C!" translates into "LDA #123 ; STA MY-VAR", instead of a naive implementation where the address and constant would first be pushed onto the stack.
And any more complicated optimizations would probably require first "decompiling" the Forth code back into a higher level of abstraction. It's practically the same as assembler macros otherwise.
edit: fixed order of operands. I originally wrote "MY-VAR 123 C!", but then remembered that the address to store to has to be on top of stack. IMHO, infix notation is less confusing, and writing a recursive-descent parser to handle it isn't that hard compared to everything else in implementing a compiler. And of course in an infix language, "123 := MYVAR" would be a syntax error, instead of storing (the low byte of) the address of MYVAR into memory location 123.
nxobject 21 days ago [-]
I appreciate your nudge towards thinking in a global-style manner – it does remind me of the style that early Macs and PCs wrote their software in, with memory and all lifetime data accounted for before the algorithms. (Maybe instead of the structured programming "Algorithms and Data Structures" (Wirth) approach, it's "Data Structures and Algorithms".)
DonHopkins 21 days ago [-]
I just posted some historical info about Chuck Moore's work for HOMER and Associates on a real time visual mixing console that produced many music videos that ran endlessly on MTV back when they actually played music videos, and special effects for blockbuster films like RoboCop and Total Recall to the discussion about abstractions:
I appreciate Yossi's plain honesty in this article, and it's a fun and interesting read; I've read it before.
I can relate to, but not endorse, designing a CPU and dialect for an interesting language you've never properly used. This turned out to be very painful, and Yossi argues convincingly that it is essentially not practical to use Forth at all, debating Jeff Fox's position. However, there is some evidence[1] that Forth actually might be practical, and it certainly seemed to have a niche in the 80's.
Yossi made some errors I've seen among new Forth programmers. A lot of people, before writing real programs, think Forth is like lisp from another universe. They visualise Forth primarily as a sort-of functional, concatenative, highly refactor-focused language. They likewise tend to throw out all the normal Forth defining words and use Forth as 'lisp without parentheses'. They try and put all their data on the stack, 'point-free', rather than using variables. And often their projects eventually devolve into C envy, every line with stack comments and equivalent C code to help, as shown in the article.
But go look at real, working, classic Forth code, of which there is much, and you'll see that there is a prevailing style that's easy to read and not actually that 'smart', or 'academic'. No more than 1-2 stack items need to be mentally 'juggled' for 99% of code, lots of variables and buffers are used whenever it's easier. Yes the classic variables are 'global', but it doesn't matter if the relevant code isn't recursing or touched in interrupts, and is only used by a cluster of related words. Newer Forths do have local variables, in spite of Jeff Fox's disapproval!
The classic code I'm talking about matches what I think Jeff Fox is trying to coerce you towards. Ultimately I disagree with Yossi's views because I think if he had actually tried to implement what Jeff Fox proposes, and got some practice first on a more realistic project, he would have had a much better shot. It's impressive how well the project turned out in spite of the approach, and how Yossi wrote a backend for his architecture in a week: a testament to both his skills and LLVM's design; but it's worth reflecting as engineers how arrogant (yet relatable) it is to make a CPU and compiler for a language you've never properly used.
I have ported Forth to a dozen small microcontrollers and my experience
writing much of the bootstrap code in Forth tells me that you are better off
coding Forth in a "vertical" style (ie one word per line with stack picture
comments), rather than the terse "horizontal" code of "everything on one line"
that many of the folks using Forth (including @yosefk, the author) seem to prefer.
Given how close Forth is to assembly (seen from an implementer's point of view)
it makes sense to write Forth in a "vertical" style which reflects the
"vertical" style in which assembly code is written. This has the advantage that
the "stack picture comments" on each line of code can stand in for Hoare triplets
so that the code and its - I'll call it - "proof" can be written hand in hand
at the same time.
Also 8th. I find the free version fun to play with as a modern desktop forth with built-in support for things like databases, odbc, json, matrices...etc. The commercial aspect won't be most people's cup of tea of course, but I still think it's neat and 8th is probably the most approachable way to write a desktop app with forth.
Something like an automated proof assistant to help annotate the stack
while coding would be awesome, but I'm not aware of any.
These might be famous last words, but if switching between compile/interpret
modes is ignored, I think it shouldn't be too hard to implement it though.
nxobject 20 days ago [-]
I do that's the rub with any language that works with procedural macroexpansion: it's conceptually hard to make diagnostics correspond one-to-one with original syntax. I think it might be especially hard with Common Lisp-style reader macros (i.e. procedures triggered at the parsing stage when a character is read in the input stream).
volemo 21 days ago [-]
We removed explicit arguments from the language so you can write comments with arguments after every function call. :D
OldGuyInTheClub 21 days ago [-]
I have read this article many times, usually whenever I look at Forth for whatever reason. As a non-computer scientist I learn a little about language design every time I look at it. But,I can never figure out what the author actually thinks about Forth. It seems like he is interested by it and sometimes amazed by it but has concerns and ultimately decides it is not for him. I could be way off on multiple axes though.
matheusmoreira 21 days ago [-]
He thinks he's not smart enough for it.
He describes the good Forth programmers as people who are so smart they can focus on nothing but the problem. To such people, programming languages, operating systems, libraries, even the chips that run all this stuff, they are all non-problems, to be reduced or eliminated. And to do that, it requires that the mind be able to compensate for all the comfort all of those layers would have afforded.
> Why pay this cost? [The cost of adding junk you don't actually need]
> Because I don't do algorithms, other people do, so I have to trust them and respect their judgment to a large extent.
> Because you need superhuman abilities to work without layers.
> The Forth way of focusing on just the problem you need to solve seems to more or less require that the same person or a very tightly united group focus on all three of these things, and pick the right algorithms, the right computer architecture, the right language, the right word size, etc.
> I don't know how to make this work.
> having people being able to do what at least 3 people in their respective areas normally do, and concentrating on those 3 things at the same time
> Doing the cross-layer global optimization.
A sufficiently smart person is capable of analyzing and optimizing the problem from the requirements all the way to the literal chip running the software. They come up with custom unconventional algorithms and implement weird chips with 18 bit words.
These are people who are smart enough to modify standard engineering equations. Those things weren't sacred to them, they understood them to the point they felt confident enough to change them in such a way that they could be efficiently implemented in some weirdly minimal custom chips[1]. And this somehow yielded more efficient algorithms which were easy to implement in Forth.
I share their enthusiasm for minimalism and eliminating the dependencies and the "junk". It always ends with me learning everything I can about the junk and doing it myself, maybe even reimagining it, hopefully better and smaller this time. I can only hope to one day be half as smart as some of these folks seem to be, though. Every time I try to get into hardware, I discover my limits.
[1]: The Yamaha DX7 is an example. People understood things and reframed the problems until the solution fit the available resources. They needed a sine wave...
> multiplying the sine wave by the envelope level yields the output
> However, fast multiplication required too much hardware in the 1980s, so the DX7 uses a mathematical shortcut: adding logarithms is equivalent to multiplying the values.
> The obvious problem is that computing logarithms is harder than multiplying, but the trick is to store the (negated) logarithm of the sine wave in the lookup table (below) instead of the sine wave.
> This provides the logarithm for free.
> The implementation takes advantage of the symmetry of the sine wave so only a quarter-wave needs to be stored.
OldGuyInTheClub 21 days ago [-]
Thanks for clarifying! In that case, he and I have the same "problem" although at very different levels. I've written in other posts about Forth and Lisp that I would very much like to get over the hump of changing how I think about problems. Unfortunately, I have never been able to do it.
There are times where I see how Moore and McCarthy are using a language to define itself. I can sort-of understand how Moore creates a custom language in Forth to tackle something. But, damned if I can explain it or apply that to even a toy problem. I am thankful that there are people out there who can think differently.
nxobject 21 days ago [-]
That being said, I wonder whether those applications tend to be embedded or devices, that lend to "first, you must invent the world" style thinking (which is good!)
The last task I was doing on a side project was making sure some bidirectional content accessible by a screen reader, which was the ultimate task of relying on other people's work.
Ultimately, I can't imagine having to think through in an integrated matter all of the layers that power this TTS – from the Unicode parsing, through the NN models to the sound synthesis, all the way to dealing with sound output.
vok 21 days ago [-]
I think that "Good Forth programmers arrange things so that they flow on the stack" has analogs in other languages. For example, arranging things in J so that short tacit expressions naturally provide the functions you need.
codr7 21 days ago [-]
I feel like Forth will always have a place in embedded contexts. And it's a good language to start with when learning how to write interpreters/compilers.
The second you start building higher level apps in Forth, you lose most of its advantages from my experience.
While usable as an in-app scripting language, I would pick Lisp any day.
vdupras 21 days ago [-]
The opinion that Forth doesn't climb the abstraction ladder well is popular, but I'd be tempted to qualify it as a misconception.
My own attempt[1] at a Forth that climbs that ladder is, I think, a good counter example. In my opinion, its HAL compares favorably to, for example, SBCL's native code compiler. Its almost-C compiler compares favorably to, I think, Tiny CC.
This misconception stems, I think, from the fact that you can very well reap the rewards of Forth in a low level environment without needing to "think in Forth". For example, by mastering immediate mechanics.
Someone who hasn't invested the effort to twist their mind to Forth-think will, yes, end up having troubles climbing the abstraction ladder.
This is not unlike, I think, "macro heavy" lisp, which many lispers actively avoid. But at the same time, much of lisp's power comes from it.
but in 1989 these were not a new wave at all, but another CPU without a future like the Lisp machines.
abrax3141 21 days ago [-]
You want to be truly amazed, check out Newell’s IPL-V, which is a machine language for a stack machine, developed in the 1950s and used to implement the first AIs. It had every idea n Lisp except the parens.
mud_dauber 21 days ago [-]
I got a taste for this years ago at Harris Semi. Grokking their RTX microcontrollers was a ton of fun. (Not so much for customers, who'd never seen such a thing.)
GlenTheMachine 21 days ago [-]
I desperately wish I had been introduced to Forth when I was learning assembly circa 1984 on my Commodore. Would have changed my life.
SoftTalker 21 days ago [-]
I had Forth for my TI/99-4A but at age 14 I lacked the background and context to be able to understand it or a mentor to get me started. I took a couple of stabs at it but the book I was using was just over my head, when all I had for comparison was programming I had done in BASIC. No internet in those days of course.
osullivj 21 days ago [-]
I got hold of the FIG UK 808O listing, and ported to Z80 to run on my Camputers Lynx.
hlehmann 21 days ago [-]
I had a job in 1982 or so that involved programming in Forth. It kinda made sense at the time to my young impressionable self, and one of the old timers thought it was ideal for what we were working on at the time. It all ran on a single thread; I don't even recall it having interrupts. I can't imagine using it for anything practical today.
> Quoting Chuck Moore:
> Forth is highly factored code. I don't know anything else to say except that Forth is definitions. If you have a lot of small definitions you are writing Forth. In order to write a lot of small definitions you have to have a stack.
It seemed like apologetics and a making a virtue out of necessity, given the fact that I don't have the capability to do stack acrobatics in my head live. The only way to be able to read a function in my head, without taking a pencil to paper _was_ small functions. But I found that clashed with the ways some algorithms and procedures naturally expressed themselves in longer multi-step style, and actually ending up being more verbose and tangled with multiple top-level definitions.
It turns out that local variables that compile to C-style indirect (SP + i) accesses are only mildly more expensive than stack acrobatics, but still gave the flexibility of Forth-style metaprogramming. [1]
Ultimately, the author's points about the "Forth philosophy" but not Forth-the-language itself (and extremely spare code) ring true to me.
Given my limitations, life is too short to work to have as minimalist an implementation as you'd like, and to desire to have a interactive development environment in <128k. For me, it's hard enough to implement the "subject" that I'm programming algorithmically/data-driven-ly/amortizing-computation-ly efficiently.
[1] https://www.novabbs.com/devel/article-flat.php?id=26347&grou...
That is, why are you factoring the code to use the stack when you have globals?
(Mumble mumble structured program, recursion, reuse)
If you move towards a global-first approach(which is what Chuck Moore seems to have moved towards, from anecdote), what changes is that you can substitute the word defining the variable for another word, later down the line, that adds the context, indirection and error handling you need. The mechanism can be added without changing how the word is being used, and you can still write a divide-and-conquer kind of algorithm in this way, it's just more classically imperative in style, with more setting of mutable global temporaries and every byte of bookkeeping directly accounted for.
Part of the minimalist freedom in Forth is that it is agnostic to whether you're using the stack or the dictionary. If you want the word to be unambiguously about a particular space in memory it makes sense to define it first in terms of "it accesses this static location" instead of "it consumes three things on the stack and shuffles them around and indirects one to a memory location and adds the other two", because that inclines all the words to be about the stack. Take the primitive approach - the one that maps well to assembly - first and see how far it goes. You stay in control of how you're extending the language that way. C preempts that because the compiler hides the stack manipulation, so the semantics of the function will default towards locals, and then further extension is guided around fitting it into that.
(And it's true that the compiler gets you to an answer faster, and black-boxes the interface, so you can use code without reading code - and that is coming at the expense of precision around details like this. Forth is probably not the right way, if it's Conway's law that you're up against.)
On an architecture without those features, like the 6502, Forth may be a good idea, and possibly faster than C - but only if it's compiled to machine code with some peephole optimizations, so that e.g. "123 MY-VAR C!" translates into "LDA #123 ; STA MY-VAR", instead of a naive implementation where the address and constant would first be pushed onto the stack.
And any more complicated optimizations would probably require first "decompiling" the Forth code back into a higher level of abstraction. It's practically the same as assembler macros otherwise.
edit: fixed order of operands. I originally wrote "MY-VAR 123 C!", but then remembered that the address to store to has to be on top of stack. IMHO, infix notation is less confusing, and writing a recursive-descent parser to handle it isn't that hard compared to everything else in implementing a compiler. And of course in an infix language, "123 := MYVAR" would be a syntax error, instead of storing (the low byte of) the address of MYVAR into memory location 123.
https://news.ycombinator.com/item?id=42532404
I can relate to, but not endorse, designing a CPU and dialect for an interesting language you've never properly used. This turned out to be very painful, and Yossi argues convincingly that it is essentially not practical to use Forth at all, debating Jeff Fox's position. However, there is some evidence[1] that Forth actually might be practical, and it certainly seemed to have a niche in the 80's.
Yossi made some errors I've seen among new Forth programmers. A lot of people, before writing real programs, think Forth is like lisp from another universe. They visualise Forth primarily as a sort-of functional, concatenative, highly refactor-focused language. They likewise tend to throw out all the normal Forth defining words and use Forth as 'lisp without parentheses'. They try and put all their data on the stack, 'point-free', rather than using variables. And often their projects eventually devolve into C envy, every line with stack comments and equivalent C code to help, as shown in the article.
But go look at real, working, classic Forth code, of which there is much, and you'll see that there is a prevailing style that's easy to read and not actually that 'smart', or 'academic'. No more than 1-2 stack items need to be mentally 'juggled' for 99% of code, lots of variables and buffers are used whenever it's easier. Yes the classic variables are 'global', but it doesn't matter if the relevant code isn't recursing or touched in interrupts, and is only used by a cluster of related words. Newer Forths do have local variables, in spite of Jeff Fox's disapproval!
The classic code I'm talking about matches what I think Jeff Fox is trying to coerce you towards. Ultimately I disagree with Yossi's views because I think if he had actually tried to implement what Jeff Fox proposes, and got some practice first on a more realistic project, he would have had a much better shot. It's impressive how well the project turned out in spite of the approach, and how Yossi wrote a backend for his architecture in a week: a testament to both his skills and LLVM's design; but it's worth reflecting as engineers how arrogant (yet relatable) it is to make a CPU and compiler for a language you've never properly used.
[1] https://www.hardware-x.com/article/S2468-0672(22)00025-6/ful...
Given how close Forth is to assembly (seen from an implementer's point of view) it makes sense to write Forth in a "vertical" style which reflects the "vertical" style in which assembly code is written. This has the advantage that the "stack picture comments" on each line of code can stand in for Hoare triplets so that the code and its - I'll call it - "proof" can be written hand in hand at the same time.
This is how all of the Forth code that I've written in https://github.com/romforth/romforth is structured.
It does make the code appear less compact though so you are not going to win any code golf prizes.
A text display with an auto pretty printed view would serve people who like both code styles well.
A newline per stack reducing operation with the next line indented by stack depth would make it close to your style and could be quite automatic.
https://factorcode.org/
https://8th-dev.com/
https://8th-dev.com/manual.html
These might be famous last words, but if switching between compile/interpret modes is ignored, I think it shouldn't be too hard to implement it though.
He describes the good Forth programmers as people who are so smart they can focus on nothing but the problem. To such people, programming languages, operating systems, libraries, even the chips that run all this stuff, they are all non-problems, to be reduced or eliminated. And to do that, it requires that the mind be able to compensate for all the comfort all of those layers would have afforded.
> Why pay this cost? [The cost of adding junk you don't actually need]
> Because I don't do algorithms, other people do, so I have to trust them and respect their judgment to a large extent.
> Because you need superhuman abilities to work without layers.
> The Forth way of focusing on just the problem you need to solve seems to more or less require that the same person or a very tightly united group focus on all three of these things, and pick the right algorithms, the right computer architecture, the right language, the right word size, etc.
> I don't know how to make this work.
> having people being able to do what at least 3 people in their respective areas normally do, and concentrating on those 3 things at the same time
> Doing the cross-layer global optimization.
A sufficiently smart person is capable of analyzing and optimizing the problem from the requirements all the way to the literal chip running the software. They come up with custom unconventional algorithms and implement weird chips with 18 bit words.
These are people who are smart enough to modify standard engineering equations. Those things weren't sacred to them, they understood them to the point they felt confident enough to change them in such a way that they could be efficiently implemented in some weirdly minimal custom chips[1]. And this somehow yielded more efficient algorithms which were easy to implement in Forth.
I share their enthusiasm for minimalism and eliminating the dependencies and the "junk". It always ends with me learning everything I can about the junk and doing it myself, maybe even reimagining it, hopefully better and smaller this time. I can only hope to one day be half as smart as some of these folks seem to be, though. Every time I try to get into hardware, I discover my limits.
[1]: The Yamaha DX7 is an example. People understood things and reframed the problems until the solution fit the available resources. They needed a sine wave...
https://www.righto.com/2021/12/yamaha-dx7-reverse-engineerin...
> multiplying the sine wave by the envelope level yields the output
> However, fast multiplication required too much hardware in the 1980s, so the DX7 uses a mathematical shortcut: adding logarithms is equivalent to multiplying the values.
> The obvious problem is that computing logarithms is harder than multiplying, but the trick is to store the (negated) logarithm of the sine wave in the lookup table (below) instead of the sine wave.
> This provides the logarithm for free.
> The implementation takes advantage of the symmetry of the sine wave so only a quarter-wave needs to be stored.
There are times where I see how Moore and McCarthy are using a language to define itself. I can sort-of understand how Moore creates a custom language in Forth to tackle something. But, damned if I can explain it or apply that to even a toy problem. I am thankful that there are people out there who can think differently.
The last task I was doing on a side project was making sure some bidirectional content accessible by a screen reader, which was the ultimate task of relying on other people's work.
Ultimately, I can't imagine having to think through in an integrated matter all of the layers that power this TTS – from the Unicode parsing, through the NN models to the sound synthesis, all the way to dealing with sound output.
The second you start building higher level apps in Forth, you lose most of its advantages from my experience.
While usable as an in-app scripting language, I would pick Lisp any day.
My own attempt[1] at a Forth that climbs that ladder is, I think, a good counter example. In my opinion, its HAL compares favorably to, for example, SBCL's native code compiler. Its almost-C compiler compares favorably to, I think, Tiny CC.
This misconception stems, I think, from the fact that you can very well reap the rewards of Forth in a low level environment without needing to "think in Forth". For example, by mastering immediate mechanics.
Someone who hasn't invested the effort to twist their mind to Forth-think will, yes, end up having troubles climbing the abstraction ladder.
This is not unlike, I think, "macro heavy" lisp, which many lispers actively avoid. But at the same time, much of lisp's power comes from it.
[1]: http://duskos.org/
https://www.amazon.com/Stack-Computers-Wave-Philip-Koopman/d...
but in 1989 these were not a new wave at all, but another CPU without a future like the Lisp machines.