The spiral rule works only if there is no pointer to pointer or array of array in the type. In other words it is an incorrect rule. But take this for example:
The type of VAR is a [1-element] array of [2-element] array of [3-element] array of pointer to pointer to ints. I drew a spiral that passes through each specifier in the correct order. To make the spiral correct it has to skip the pointer specifiers in the first three loops. This is marked by ¦.
The spiral rule can be modified to process all array specifiers before all pointer specifiers, but then you'd have to specify that the order to do so is right and then left. At that point it's just the Right-Left Rule.
rcxdude 18 days ago [-]
I've found that 'declaration follows use' is actually the easiest way to understand C type declarations. The only annoying to remember thing is which way the various type specifiers like const bind.
wruza 18 days ago [-]
In physics that usually means that we are missing something in the ¦'s. Like these hypothetical/imaginary "arr" keywords, look:
int ** arr arr arr VAR[1][2][3];
By introducing these virtual storage kind specifiers, the spiral model works! It becomes apparent that these specifiers are actually real, cause the model wouldn't be consistent otherwise, and there's so much evidence for it all over the C codebases. We just tend to see the "real" part of it (sigh, stuck forever with that legacy terminology), while this is what actually happens:
long double _Complex ** arr arr arr VAR[1][-1][sqrtl(-1)];
which is naturally homomorphic to most C compilations.
(spoiler for those who need it: jk)
tpoacher 18 days ago [-]
My own interpretation (which effectively matches the right-left rule, at least algorithmically) is that, (), [], and * are operators, which have different semantics at the declaration vs the 'usage/expression' level, where:
() and [] are right-associative unary type-operators, with high fixity, (or equivalently, binary type-operators when () or [] have arguments)
and * is a binary type-operator, with low fixity (i.e. lower than () or [] ), where the 'left' argument is effectively the context that's left after removing the asterisk and the 'right' argument (rather than what's to the left of the asterisk in a purely 'anatomical 'sense')
(whereas, at the expression level, * is a unary value-operator instead, while () and [] behave the same as in their type-form, except acting as value-operators instead)
pwdisswordfishz 18 days ago [-]
If you're going to adjust willy-nilly on which loop of the spiral a given element appears, then it's not much of a rule, isn't it? I mean, why not write this?
It's worth to remember that C "happened" to stick around because of UNIX, but it's been just another iteration of what started as BCPL; with Go coming from the same people who made Plan 9 (the spiritual successor to Research UNIX) and Alef/Limbo. These guys are equally interested in pushing both PL/OS research and practice.
(I also have no doubt that just like Go fixed C's type declarations, in another 20-30 years, Go's successor will finally fix error handling.)
unrealhoang 18 days ago [-]
hence why every recent programming languages are type after name (go included).
gf000 18 days ago [-]
I think a proper type system that doesn't special-cases arrays are better, e.g. Array<String>. Pointers may also be part of this uniform structure, e.g. Array<Ref<String>>, and there is zero question on how to parse it even if you are not familiar.
unrealhoang 18 days ago [-]
that's the beauty of type after name, it's all left to right, even in special-cases of array, references, and the most confusing-contributor to cdecl: functions `func(int)[5]*string`: perfectly clear, just read left to right.
gf000 18 days ago [-]
What about a Map<int, String>?
You can't really encode a tree structure with only linear semantics (without at least something like s-expressions).
unrealhoang 18 days ago [-]
And type declarations are s-exprs.
The advantage of type after name is that it keeps the traversing order consistently pre-order (node, left, right), instead of either:
- the notoriously ridiculous spiral cdecl: reading stuff right (output), node (func name) and left (input)
- create a new name to describe the function in pre-order: Func<Input, Output>
gf000 17 days ago [-]
But you can't know how many parameters do a user-defined "node" have. I don't see how it can work for generic data structures without a specific syntax for "the children of this node". Go also uses [] for generics, doesn't it?
At that point I find a uniform system (without special-casing arrays and pointers - they are also just types) would be simpler.
atiedebee 18 days ago [-]
I like the way Dlang does it. It keeps the types on the left like C, but are actually readable.
tpoacher 18 days ago [-]
even better, they should have it above or below!
or, declare the type separately, like in fortran or haskell (or in fact, pre-C99 c)
unrealhoang 18 days ago [-]
declare the type separately didn't solve the of hard-to-read cdecl problem.
tpoacher 17 days ago [-]
true; but at the same time, any sufficiently complex type is begging to be type-aliased into something more legible/sensible in the first place ... and this is also true for "right-typed" systems.
I've seen julia types unnecessarily fill a whole terminal for otherwise simple operations before, and I have to say I wasn't too impressed...
wruza 18 days ago [-]
The proper way is to use /usr/bin/cdecl or https://cdecl.org and extract as many typedefs as reasonable from the gibberish, because in C most of the times you’ll need these anyway to address lifetimes and ownership/borrowing points.
HexDecOctBin 18 days ago [-]
I just spent a week trying to write a parser for C declarations. Recursive Descent didn't work, Top Down Operator Precedence didn't work, I finally found an simple implementation of cdecl in K&R2, but it printed out textual description instead of creating an AST. Converting it to create an AST was a whole different challenge. In the end, I got it working by adding the idea of a blank-type and doing some tree inversions (almost using each AST as a program that executes over the nested chunk of AST).
It was only about 200 lines of code, and yet never have I been happier to finish a solution and never having to think about it again.
mananaysiempre 18 days ago [-]
Rec-descent absolutely does work (source: wrote a working parser), but it's a bit annoying to make it build a syntax tree. Making it build a prefix encoding of the declaration is much easier.
If you have to, you can make a syntax tree work, too, but you'll have to thread through the declarator parser a double (triple?) pointer for the hole in the tree where the next part goes, and at the end plug it with the specifiers-and-qualifiers part you parsed before that. Or at least that's what I had working before I gave up and switched to a prefix code. It'd probably pay to check the LCC book (Fraser & Hanson, A Retargetable C Compiler: Design and Implementation) to see what they do there, because their type representation is in fact a hash-consed tree. (The LuaJIT FFI parser needs some sort of magic fix-up pass at the end that I didn't much like, but I don't remember the specifics.)
foldr 18 days ago [-]
I think the most straightforward recursive-descent approach is just to parse the expression part of the type as an expression and then turn it inside out to get the type. So if you have e.g. 'int (*(p[5]))(char)', then your parse tree is something like
declaration
int
function call
*
[]
p
5
char
and you need to turn that inside out to get something like
array
pointer to
function
int
(char)
This way you have a simple expression parser combined with a simple recursive tree transformation function. (The tree transformation took about 50 LOC when I implemented it for a toy parser.)
HexDecOctBin 18 days ago [-]
> Rec-descent absolutely does work (source: wrote a working parser), but it's a bit annoying to make it build a syntax tree.
Yeah, it required too much backtracking and state snap-shotting and resets, and I couldn't figure out a decent way of reporting good errors.
Thanks for the references. The code I have now is pretty elegant and functional, so I'm not in the mood of diving back into it. But if I ever need to change it, I'll take a look.
keyle 18 days ago [-]
Reading this make me happy. People still spend time on fun & nerdy stuff like this!
dataflow 18 days ago [-]
How can recursive descent not work? Did you not do backtracking? It might be slow or annoying but it should work.
HexDecOctBin 18 days ago [-]
I mean, with enough hacking around, anything can work. But the code had gotten extremely complicated and brittle, and it was still not handling all the edge cases. Readability was important for this project, since I want to be able to modify the parser later to add new extensions to the language.
dataflow 18 days ago [-]
Ah gotcha, thanks! Out of curiosity do you recall examples of any of the hardest cases it couldn't properly handle?
HexDecOctBin 18 days ago [-]
Stuff like:
char *(*(**foo[][8])())[]
dataflow 18 days ago [-]
And it would fail to parse at all, or parse as the wrong thing?
HexDecOctBin 18 days ago [-]
Recursive descent required too much backtracking and state resets, which made the code buggy and unreadable (and the error reporting was awful). Pratt parser simply parsed wrong (presumably because I couldn't figure out the right way to sort the precedences).
dataflow 18 days ago [-]
Gotcha! I've been thinking of trying my hand at a C parser but it always gave me the sense that it was more annoying than it should be haha. Interesting to near your experience, thanks for sharing!
not-my-account 18 days ago [-]
What are you working on?
HexDecOctBin 18 days ago [-]
A C parser for a meta-programming layer (automatically generating serialisation routines, etc.)
Galanwe 18 days ago [-]
Just my 2 cents:
The _external_ way of doing introspection (a parser like yours) is generally very limiting in real life, as you will have to interpret all the preprocessor directives that your code follows.
Not only will you have to feed your parser the exact same inputs as your build system, but also some directives are built-in the compiler and may be hard to replicate.
The easiest way to do introspection is _intrusive_, even though it pollute the code a bit.
A long time ago, Qt had a sort of hybrid intrusive parser like this (MOC?), that was 20y ago though things may have changed.
HexDecOctBin 18 days ago [-]
Regarding pre-processing, I feed the result of `clang -E` to the code-generator. Since I use a unity build (single translation unit), it works out fine. In fact, the parser treats pre-processor directives as comments (to work around #line and `-dD`, etc.)
Regarding external and intrusive, I used to do it in the intrusive way but found it too limiting. Here, I not only generate code but can also (potentially) add whole new extensions to the language. This was the reason I wrote a new parser instead of just using libclang's JSON AST dump. Well, that and the fact that libclang is a multi-MB dependency while my parser is ~3000 lines of C code.
userbinator 18 days ago [-]
I finally found an simple implementation of cdecl in K&R2
Perhaps you should've started learning C with K&R. ;-)
HexDecOctBin 18 days ago [-]
I did, 10 years ago. Didn't remember every piece of code.
fc417fc802 18 days ago [-]
It's unfortunate that we're stuck with syntax that so many people struggle to accurately decipher.
userbinator 18 days ago [-]
It's unfortunate that so many people never read K&R.
A very small yet readable book, written by the original authors of C, that very clearly covers the language and even presents a C program to parse the declarations.
xigoi 18 days ago [-]
If you need a program to help you read C programs, that suggests a serious flaw in C.
f1shy 18 days ago [-]
You do not need it. But is a clear and handy way to present the logic/algorithm, if you are learning C anyway.
I would even say, that if C allows for a clear description of the procedure, clearer as plain text, it indicates how good and expressive C is…
gf000 18 days ago [-]
There are books on hieroglyphs as well, but there is a good reason we don't use them as types in programming languages.
masklinn 18 days ago [-]
You’re only “stuck” with it to the extent that you’re stuck with C(++). Free yourself from that and the rest follows.
immibis 18 days ago [-]
C++ is better, actually, because a template type has a straightforward syntax.
tsimionescu 18 days ago [-]
C++ template syntax exists in addition to all of the C type syntax. And it adds its own ugliness besides (such as knowing when you can say T vs having to say typename T).
It's not like there is an array<int> or ptr<int> syntax that can replace int[] or int*.
I never understood why the asterisk is with the variable name in C and not by the type. Apart from declaring multiple variables of pointer and non-pointer type at the same time is there any other reason for it?
dataflow 18 days ago [-]
It's because they chose the declaration syntax to match the usage syntax, rather than as a modification of the type. i.e.
long (*f)(int);
means, if you dereference f (i.e. write *f), you get long(int). If you then call it with an int, you get long.
It's pattern matching against the usage.
tpoacher 18 days ago [-]
It's not (unless you mean "by popular convention", which actually is not necessarily that popular).
int* c is perfectly valid syntax
and so is int * c.
The latter is my preferred convention: it helps me think of the asterisk as an operator, which, when acting at the "declaration" level, is binary in nature.
Whereas when the asterisk is used at the 'expression' level, it is unary in nature, and can be thought of effectively as a completely different operator that happens to share the same name/symbol as the declaration-based one.
tengwar2 18 days ago [-]
I'd advise not doing "int* c" as it is misleading. Consider
int* a, b;
This looks as though it is equivalent to
int* a;
int* b;
but actually means
int* a;
int b;
Most people trying it your way hit this issue fairly quickly and revert to "int *a". It's not just a meaningless convention; it's a convention which reflects the grammar of the language just as conventions on indentation reflect it.
"int * a" is not as bad, in that it is ambiguous rather than misleading, but I would avoid it for similar reasons.*
tpoacher 17 days ago [-]
Fair enough. Though, to some extent, one could counter this by pointing out that multiple declarations via the comma type-operator are generally considered ambiguous / bad practice in themselves; in which case I would prefer explicit spacing to discourage such bad habits. Otherwise, ideally the asterisk should be grouped anyway (i.e. int (* a), b; ).
Besides, a similar argument could be made of simultaneous array declarations, or things like int *f(), (*g)(); in the latter example, if I were lazy enough to use such a simultaneous declaration, at least I would still prefer to use proper grouping and spacing: int (* f()), (* g)();
Or, rely on the knowledge that the comma type-operator in this context binds less strongly than the asterisk type-operator, but more strongly than the type primitive, which causes it to be distributed to all the operands of the comma operator. Which is probably better than relying on proximity-by-convention heuristics in the first place.
tengwar2 16 days ago [-]
The comma operator does exist, of course, but I've never heard of the comma in the context of C variable declarations being described as an operator, nor does a web search on "comma type operator" return any results. It doesn't behave like an operator (it does not return a result). You might argue that in some sense it "returns a type", but this is C: types are not objects in the language, and in any case the preceding declaration would also be held to "return a type". No, it's just a separator in C.
Yes, some people recommend one declaration per line. It's generally a defensive habit, needed if you use spacing which doesn't reflect the grammar of the language. It's the C programming equivalent of double-knotting shoe-laces to stop them coming undone rather than recognising that the bow has been tied with a granny knot rather than a reef knot.
I do concede that if I had multiple complex declarations like int *f(), (*g)(); I would separate them on to different lines, but there is no need for that with easily read declarations such as int *a, b;.
18 days ago [-]
weinzierl 18 days ago [-]
Are there any tricks to make Rust type declarations easier?
unrealhoang 18 days ago [-]
yes, read it from left to right, don't use spiral.
weinzierl 18 days ago [-]
Sure, no spiral, but I still find the associativity of modifiers confusing. For example could you tell the difference between & mut &, & & mut and & mut &
mut right off the top of your head?
unrealhoang 18 days ago [-]
you should just see &mut and & as 2 separate modifier, maybe that'd help.
- & &mut doesn't make sense: and a variable of type & &mut can't mutate the underlying object, it's practically equivalent to a & &, even worse, since the inner is &mut, you can't have 2 & &mut that point to the same &mut (mut xor shared).
- &mut &: mutable reference that point to a shared reference, means you can change the outer to point to another shared reference, but you can't change content of the inner, for example:
let a: &str = "hello";
let b: &str = "world";
let mut c = &a;
let d = &mut c;
*d = &b;
// (*d).make_ascii_lowercase(); // not allowed
- &mut &mut: similar to the above, but you can also change the content of the inner.
foldr 18 days ago [-]
After implementing (parts of) a recursive descent parser for C, I had a sudden enlightenment about how easy it is to understand C declarations once you get the basic principle. There's already a comment on this (https://news.ycombinator.com/item?id=42565459), but I'll try to go into a little more detail.
A lot of people start with the idea that the syntax of C declarations is '<type> <variable_name>'. This works fine for simple cases, but it's completely wrong. What a declaration like 'int x' actually means is the following:
declare a variable x of a type such that the expression x is of type int
In such a simple case this seems unnecessarily long-winded, but now let's look at a more complex case:
int (*p[4]) (int x, int y);
declare a variable p of a type such that (*p[i])(x,y) is of type int
If dereferencing the ith element of p and then calling the result with two arguments gives us an int, then p must be an array of pointers to functions that take two arguments and return int. If you saw the expression '(*p[i])(x,y)' in some code, you'd have no difficulty figuring out that p must be an array of function pointers. So you needn't have any difficulty when reading the declaration either.
One slightly confusing thing here is that the nesting of the expression syntax is the opposite of the nesting of the type. The expression is
This makes sense once you understand that the expression is to be interpreted just as a normal expression. The first thing you do with an array of something is index into it. So indexation is going to be the most deeply nested part of the expression, even though the outermost layer of the type is 'array of ___'.
One additional source of confusion is the '*' operator and the need for additional parentheses in function pointer declarations. In C, function pointers dereference to themselves, so 'p()' and '(*p)()' are equivalent if p is a function pointer. However, in a type declaration you need something to distinguish a function pointer from a function, so the '*' has to be present. Why can't we just write 'int *p[4](int x, int y)' in the example above? Because of how the operator precedence rules work. That expression is equivalent to '*(p[4](x,y))', so it would declare an array of functions returning pointers to integers. (You can't declare an array of functions in C, so that's invalid.)
Ad-hoc rules for interpreting C declarations miss the genius of their underlying concept. You already know all the syntax you need to understand a C type declaration! It's just C expression syntax.
Aurelius108 17 days ago [-]
Never thought of it that way, that’s really helpful. Thinking of the right hand side as an expression makes way more sense then spiral and other explanations I’ve seen
immibis 18 days ago [-]
The actual rule is "declaration follows use".
int p[5] means the type of p[5] is int (but you still have to remember valid elements are 0-4).
void (signal(void()(int))(int) means the type of (signal(something that is a void()(int))(42) is void. And void(p)(int) means the type of (p)(42) is void.
If you can remember the precedence of these operators, you automatically remember the precedence of their "type operators" as well.
nialv7 18 days ago [-]
yeah, this is so simple, i don't understand why people keep invent more complex and worse rules to understand cdecl...
pwdisswordfishz 18 days ago [-]
Eh, this misconception again.
userbinator 18 days ago [-]
Over 30 years and it still won't die.
Sadly the link to Linus Torvalds' explanation of why it doesn't work has died, but the Archive remembers:
In that thread, replying to a suggestion to use typedefs, Kroah-Hartman states:
> Heh, no, typedefs don't help anything, there's a reason we don't use them in the Linux kernel whenever possible.
This surprises me, can someone explain that remark and elaborate on the reason it alludes to?
gregkh 18 days ago [-]
They hide structures very easily, allowing programmers to accidentally put them on the stack or use them as parameters in functions where they shouldn't be doing so. By forcing "struct" on the name, it makes it more obvious as to what you are doing.
pwdisswordfishz 18 days ago [-]
That's not a reason not to use typedefs at all.
casenmgreen 18 days ago [-]
Typedefs are useful only if you use functions for all operations on the data type.
Otherwise, if you actually use the variable directly, you are requiring the user to remember the underlying type of the typedef.
Larger program, hundred plus typedefs, unbearable burden on reader.
sylware 18 days ago [-]
People should stick to the specifications...
That said, that makes me think of perl5. I though the perl5 coders were doing some kind of sick competition: who is going to use the most implicit code, namely to read the code you would need a complete/perfect/permanent understanding of the full perl5 syntax parser to understand what some code is actually doing. I hate perl5 for that, but ultra complex syntax computer language like c++ and similar (java, rust, etc), are worse. In advanced OOP, you have no idea of what's going on if you don't embrace the full OOP model of a complex program, not to mention it does exponentially increase the syntax complexity, which is a liability in order to get a sane spectrum of alternative "real-life" compilers since those become insane to implement correctly.
If implicit there is in a computer language, it must be very little and very simple, but should be avoided as much as possible. Does it mean more verbose code, well, most of the time yes, and this is usually much better on the long run. For instance in C, I try to avoid complex expression (often I fail, because I am too used to some operators like '++' '--'), many operators should not be around, not pertinent enough (like a ? b : c) only increasing compiler complexity.
the_gipsy 18 days ago [-]
In example #2 it skips arbitrary "tokens": the first right parenthesis is visited, but its matching left parenthesis is skipped.
The Right-Left Rule is quoted less frequently on HN but it's a correct algorithm for deciphering C types: http://cseweb.ucsd.edu/~ricko/rt_lt.rule.html
The spiral rule can be modified to process all array specifiers before all pointer specifiers, but then you'd have to specify that the order to do so is right and then left. At that point it's just the Right-Left Rule.
(spoiler for those who need it: jk)
() and [] are right-associative unary type-operators, with high fixity, (or equivalently, binary type-operators when () or [] have arguments)
and * is a binary type-operator, with low fixity (i.e. lower than () or [] ), where the 'left' argument is effectively the context that's left after removing the asterisk and the 'right' argument (rather than what's to the left of the asterisk in a purely 'anatomical 'sense')
(whereas, at the expression level, * is a unary value-operator instead, while () and [] behave the same as in their type-form, except acting as value-operators instead)
IMO the Go syntax is a vast improvement as it's much simpler and avoids the clockwise/spiral issue: https://appliedgo.com/blog/go-declaration-syntax
(I also have no doubt that just like Go fixed C's type declarations, in another 20-30 years, Go's successor will finally fix error handling.)
You can't really encode a tree structure with only linear semantics (without at least something like s-expressions).
The advantage of type after name is that it keeps the traversing order consistently pre-order (node, left, right), instead of either:
- the notoriously ridiculous spiral cdecl: reading stuff right (output), node (func name) and left (input)
- create a new name to describe the function in pre-order: Func<Input, Output>
At that point I find a uniform system (without special-casing arrays and pointers - they are also just types) would be simpler.
or, declare the type separately, like in fortran or haskell (or in fact, pre-C99 c)
I've seen julia types unnecessarily fill a whole terminal for otherwise simple operations before, and I have to say I wasn't too impressed...
It was only about 200 lines of code, and yet never have I been happier to finish a solution and never having to think about it again.
If you have to, you can make a syntax tree work, too, but you'll have to thread through the declarator parser a double (triple?) pointer for the hole in the tree where the next part goes, and at the end plug it with the specifiers-and-qualifiers part you parsed before that. Or at least that's what I had working before I gave up and switched to a prefix code. It'd probably pay to check the LCC book (Fraser & Hanson, A Retargetable C Compiler: Design and Implementation) to see what they do there, because their type representation is in fact a hash-consed tree. (The LuaJIT FFI parser needs some sort of magic fix-up pass at the end that I didn't much like, but I don't remember the specifics.)
Yeah, it required too much backtracking and state snap-shotting and resets, and I couldn't figure out a decent way of reporting good errors.
Thanks for the references. The code I have now is pretty elegant and functional, so I'm not in the mood of diving back into it. But if I ever need to change it, I'll take a look.
The _external_ way of doing introspection (a parser like yours) is generally very limiting in real life, as you will have to interpret all the preprocessor directives that your code follows.
e.g.:
Not only will you have to feed your parser the exact same inputs as your build system, but also some directives are built-in the compiler and may be hard to replicate.The easiest way to do introspection is _intrusive_, even though it pollute the code a bit.
e.g.
A long time ago, Qt had a sort of hybrid intrusive parser like this (MOC?), that was 20y ago though things may have changed.Regarding external and intrusive, I used to do it in the intrusive way but found it too limiting. Here, I not only generate code but can also (potentially) add whole new extensions to the language. This was the reason I wrote a new parser instead of just using libclang's JSON AST dump. Well, that and the fact that libclang is a multi-MB dependency while my parser is ~3000 lines of C code.
Perhaps you should've started learning C with K&R. ;-)
A very small yet readable book, written by the original authors of C, that very clearly covers the language and even presents a C program to parse the declarations.
I would even say, that if C allows for a clear description of the procedure, clearer as plain text, it indicates how good and expressive C is…
It's not like there is an array<int> or ptr<int> syntax that can replace int[] or int*.
int long long unsigned number_of_days; - read it right to left, an unsigned long long int
float fraction; - read it right to left, "" reads as "pointer to", so pointer to float
My "float *" was somehow changed to "float fraction".
Ah, the asterix became italics.
[1]http://unixwiz.net/techtips/reading-cdecl.html
It's pattern matching against the usage.
int* c is perfectly valid syntax
and so is int * c.
The latter is my preferred convention: it helps me think of the asterisk as an operator, which, when acting at the "declaration" level, is binary in nature.
Whereas when the asterisk is used at the 'expression' level, it is unary in nature, and can be thought of effectively as a completely different operator that happens to share the same name/symbol as the declaration-based one.
int* a, b;
This looks as though it is equivalent to
int* a;
int* b;
but actually means
int* a;
int b;
Most people trying it your way hit this issue fairly quickly and revert to "int *a". It's not just a meaningless convention; it's a convention which reflects the grammar of the language just as conventions on indentation reflect it.
"int * a" is not as bad, in that it is ambiguous rather than misleading, but I would avoid it for similar reasons.*
Besides, a similar argument could be made of simultaneous array declarations, or things like int *f(), (*g)(); in the latter example, if I were lazy enough to use such a simultaneous declaration, at least I would still prefer to use proper grouping and spacing: int (* f()), (* g)();
Or, rely on the knowledge that the comma type-operator in this context binds less strongly than the asterisk type-operator, but more strongly than the type primitive, which causes it to be distributed to all the operands of the comma operator. Which is probably better than relying on proximity-by-convention heuristics in the first place.
Yes, some people recommend one declaration per line. It's generally a defensive habit, needed if you use spacing which doesn't reflect the grammar of the language. It's the C programming equivalent of double-knotting shoe-laces to stop them coming undone rather than recognising that the bow has been tied with a granny knot rather than a reef knot.
I do concede that if I had multiple complex declarations like int *f(), (*g)(); I would separate them on to different lines, but there is no need for that with easily read declarations such as int *a, b;.
- & &mut doesn't make sense: and a variable of type & &mut can't mutate the underlying object, it's practically equivalent to a & &, even worse, since the inner is &mut, you can't have 2 & &mut that point to the same &mut (mut xor shared).
- &mut &: mutable reference that point to a shared reference, means you can change the outer to point to another shared reference, but you can't change content of the inner, for example:
- &mut &mut: similar to the above, but you can also change the content of the inner.A lot of people start with the idea that the syntax of C declarations is '<type> <variable_name>'. This works fine for simple cases, but it's completely wrong. What a declaration like 'int x' actually means is the following:
In such a simple case this seems unnecessarily long-winded, but now let's look at a more complex case: If dereferencing the ith element of p and then calling the result with two arguments gives us an int, then p must be an array of pointers to functions that take two arguments and return int. If you saw the expression '(*p[i])(x,y)' in some code, you'd have no difficulty figuring out that p must be an array of function pointers. So you needn't have any difficulty when reading the declaration either.One slightly confusing thing here is that the nesting of the expression syntax is the opposite of the nesting of the type. The expression is
whereas the type is This makes sense once you understand that the expression is to be interpreted just as a normal expression. The first thing you do with an array of something is index into it. So indexation is going to be the most deeply nested part of the expression, even though the outermost layer of the type is 'array of ___'.One additional source of confusion is the '*' operator and the need for additional parentheses in function pointer declarations. In C, function pointers dereference to themselves, so 'p()' and '(*p)()' are equivalent if p is a function pointer. However, in a type declaration you need something to distinguish a function pointer from a function, so the '*' has to be present. Why can't we just write 'int *p[4](int x, int y)' in the example above? Because of how the operator precedence rules work. That expression is equivalent to '*(p[4](x,y))', so it would declare an array of functions returning pointers to integers. (You can't declare an array of functions in C, so that's invalid.)
Ad-hoc rules for interpreting C declarations miss the genius of their underlying concept. You already know all the syntax you need to understand a C type declaration! It's just C expression syntax.
int p[5] means the type of p[5] is int (but you still have to remember valid elements are 0-4).
void (signal(void()(int))(int) means the type of (signal(something that is a void()(int))(42) is void. And void(p)(int) means the type of (p)(42) is void.
If you can remember the precedence of these operators, you automatically remember the precedence of their "type operators" as well.
Sadly the link to Linus Torvalds' explanation of why it doesn't work has died, but the Archive remembers:
https://web.archive.org/web/20141218085356/https://plus.goog...
> Heh, no, typedefs don't help anything, there's a reason we don't use them in the Linux kernel whenever possible.
This surprises me, can someone explain that remark and elaborate on the reason it alludes to?
Otherwise, if you actually use the variable directly, you are requiring the user to remember the underlying type of the typedef.
Larger program, hundred plus typedefs, unbearable burden on reader.
That said, that makes me think of perl5. I though the perl5 coders were doing some kind of sick competition: who is going to use the most implicit code, namely to read the code you would need a complete/perfect/permanent understanding of the full perl5 syntax parser to understand what some code is actually doing. I hate perl5 for that, but ultra complex syntax computer language like c++ and similar (java, rust, etc), are worse. In advanced OOP, you have no idea of what's going on if you don't embrace the full OOP model of a complex program, not to mention it does exponentially increase the syntax complexity, which is a liability in order to get a sane spectrum of alternative "real-life" compilers since those become insane to implement correctly.
If implicit there is in a computer language, it must be very little and very simple, but should be avoided as much as possible. Does it mean more verbose code, well, most of the time yes, and this is usually much better on the long run. For instance in C, I try to avoid complex expression (often I fail, because I am too used to some operators like '++' '--'), many operators should not be around, not pertinent enough (like a ? b : c) only increasing compiler complexity.