I kind of emphatize with the author raging at "just copy C++ bro" proposals because at $TWO_JOBS_AGO I had to deal with an "Architecture Team" full of Very Senior(tm) people who would show up uninvited and give advice like "did you know you can pee and poo at the same time?"
Of course, but if you bothered at all to understand the constraints, you would have seen it is not actually that simple in our case.
And my project was several orders of magnitude simpler than the C standard.
wiseowise 262 days ago [-]
> I kind of emphatize with the author raging at "just copy C++ bro" proposals because at $TWO_JOBS_AGO I had to deal with an "Architecture Team" full of Very Senior(tm) people who would show up uninvited and give advice like "did you know you can pee and poo at the same time?"
My sides. This is the most hilarious and accurate summary of every org where I’ve worked.
Thank you.
ryandrake 262 days ago [-]
+1 It's amazing how we've all encountered these guys at least once in our careers (and often more, at many companies). A lot of times they are "founding engineers" from a decade ago who were employee number 2 and 3 or something, and once knew quite a bit about the codebase. They are too high-paid to code anymore, but they still fly in, dispense their "wisdom" all over the junior folks, then flap their wings and disappear for a few months. Seagull management[1] without the role/responsibility of people management.
> Very Senior(tm) people who would show up uninvited and give advice like "did you know you can pee and poo at the same time?"
That gave me a good 5 minutes of chuckling and smiling. Thank you.
autoexecbat 262 days ago [-]
Plenty of people struggle with simultaneous action for that. Generally doing one after the other within the same sitting
lpapez 258 days ago [-]
Absolutely.
Good luck explaining that to the A-Team.
keyle 262 days ago [-]
"Architects"! I always smile at the term. I'm as much a software "engineer" as they're "architects".
At some point we had to wear deodorant and a collared shirt, boom we became engineers.
supahfly_remix 262 days ago [-]
Developers will become engineers as soon as the legal liability for mistakes falls on them.
AtlasBarfed 262 days ago [-]
I have good news, AI will be the cover for all mistakes now, since it can be fingerpointed/scapegoated as the source of all code.
supahfly_remix 262 days ago [-]
I know you're joking, but I'm talking about legal responsibility like this: https://insideevs.com/news/575160/mercedes-accepts-legal-res.... In this case, Mercedes, not the the developer, is assuming liability, but that responsibility will trickle down.
tut-urut-utut 261 days ago [-]
Why do you think it’s fair for responsibility to trickle down, while profit never goes to engineers.
_aavaa_ 262 days ago [-]
I doubt it, that decision wouldn't be consistent with other rulings with regards to engineering work. Depending on the jurisdiction, the engineer (actual licensed engineer) stamps their seal of approval on the work and takes responsibility. Even if the software they used had a bug in it that caused ti to produce wrong answers, they are still responsible.
TeMPOraL 262 days ago [-]
That is yet to be determined. There is some pressure in the other direction. I know of one large corporation, that does a lot of work around the world of the kind where programming errors could destroy property or kill people, has strict policies wrt. AI-generated code. This includes an obligation to clearly mark code that was written by AI, as well as restrictions on when it's allowed. This is driven not just by potential IP issues, but also by security and export control.
(Yes, in a large enough corp, export control is a source of a surprisingly large amount of extra work...)
pjmlp 261 days ago [-]
Some of us are actually Professional Engineers, not a random title that someone decides to call themselves.
shrimp_emoji 262 days ago [-]
Collars?! Never.
phendrenad2 263 days ago [-]
I feel like this is an archetype. Show up out of nowhere, half-understand the problem, make a brain-dead suggestion, and then as soon as you point out the problems with that approach, they're suddenly too busy with other things to respond.
meindnoch 262 days ago [-]
Architect: you know, I just feel like there should be a way to solve this in a proper way
Engineer (thinking): (No, you idiot, there isn't, because it's broken! I told you, all options have been tried, and this was the least painful way of doing it. Yes, it's not the ideal solution, but there's no other way, unless the upstream vendor decides to fix the issue on their end!)
Engineer: thanks, I'll look into it :)
xmodem 262 days ago [-]
I had one where the architecture team implemented the brain-dead solution, advised leadership of other teams that they should adopt the brain-dead solution, and that my team would be supporting the brain-dead solution, without consulting us, and made me into the bad guy when I pointed out that my team did not support the brain-dead solution.
We ended up supporting the brain-dead solution, but that team has now experienced 100% turn over since then.
262 days ago [-]
rowanG077 262 days ago [-]
How dare they try to help the more junior engineers. What horrible people.
In all seriousness just accept their advice and see it for what it is. Someone trying to help you with limited view of the scope. As long as they don't impose their view I think your take is extremely bad.
chipdart 262 days ago [-]
> (...) I had to deal with an "Architecture Team" full of Very Senior(tm) people who would show up uninvited and give advice like "did you know you can pee and poo at the same time?"
It reads like you had experts giving you advise on how to improve things, and instead not only did you ignored their advise but you went to the extent of mindlessly disparaging their help.
lpapez 262 days ago [-]
Nah, the other commenter had described the situation exactly right - after dropping the comments, the "A-Team" disappeared for a few months and never revisited our responses. It really feels like an archetype common at many companies.
They were doing it just to boost their egos and most of the teams in the company learned to ignore them. When the company ownership changed, the "A-Team" was the first under the chopping block because the new owners correctly saw that the high status they had was simply due to inertia of being first devs at the company and were not fullfiling any meaningful role in the present.
jandrewrogers 262 days ago [-]
They accurately describe a particular type of person/role that exists at many large enterprises. These "architects" notionally have a lot of authority, appointed by other not very technical people, but are so divorced from the realities of the engineering execution that anything they tell you is mostly useless. In my experience it tends to be a refuge for people that aren't very strong technically but who enjoy making slide decks for management.
geraldwhen 262 days ago [-]
I’ve yet to meet an architect I would hire to build literally anything.
I’ve met dozens that don’t know their head from their ass. And always, always when you describe the problem constraints, they mumble and disappear.
eru 262 days ago [-]
My first boss had the title 'architect', but he was actually very competent, and very regularly got his hands dirty coding.
(But at the time, I basically joined what was still essentially a startup just after they had been acquired by a larger company. I think the titles like 'architect' might have come from the larger company, but the competence came from them still being the same people as at the startup.)
eropple 262 days ago [-]
I'm currently at a very large company, and architects are, in many lines of business, the only technical folks directly employed by the company. Which means a product's quality hinges pretty directly on whether your architect is somebody technical who can help solve problems both at the implementation level and before they get to the implementation level (which I certainly try to be, when not triple-booked on meetings trying to keep everything else on the rails) or the Dilbert version.
We do exist, I promise. ;) But in my case at least, the Eye of Sauron can only keep so many things in sight at a time...
VBprogrammer 262 days ago [-]
I suspect if you don't recognise this scenario you may be standing too close to the mirror.
wiseowise 262 days ago [-]
Spotted one of the architecture team.
slaymaker1907 263 days ago [-]
I'd argue it barely works in C++ as well. I've seen so many poorly implemented classes that violate the very complicated 3/5/0 rule. It's much easier to do RAII correctly in Rust since people aren't constantly working with raw pointers and since objects which are moved somewhere else are not dropped like they are in C++.
One variant that I think might work even better than RAII or defer in a lot of languages is having a thread local "context" which you attach all cleanup actions to. It even works in C, you just define cleanup as a list of
typedef void(cleanup_function*)(void* context);
which is saved as into a thread local. Unlike RAII, you don't need to create a custom type for every cleanup action and unlike the call-with pattern from functional programming, lifetimes of these lists can be non-hierarchical.
However, I'm still glad to see defer being considered for C. It's a lot better than using goto for cleanup.
vlovich123 263 days ago [-]
> that violate the very complicated 3/5/0 rule
Is it actually complicated? There’s only the rule of 0 - either your class isn’t managing resources directly & has none of the 5 default methods defined explicitly (destructor, copy constructor/assignment, move constructor/assingment), or it manages 1 and exactly 1 resource and defines all 5. Following that simple rule gives you exception safety & perfect RAII behavior. Of all the things in C++, it seemed like the most straightforward rule to follow mechanically.
BTW, the rule of 3 is from pre-C++11 - the addition of move construct/move assignment makes it the rule of 5 which basically says if you define any of those default ones you must define all of them. But the rule of 0 is far stronger in that it gives you prescriptive mechanical rules to follow for resource management.
It’s much easier to do RAII correctly in Rust because of the ecosystem of the language + certain language features that make it more ergonomic (e.g. Borrow/AsRef/Deref) + some ownership guarantees around moves unless you make the type trivially copyable which won’t be the case when you own a resource.
chipdart 262 days ago [-]
> Is it actually complicated?
It is. There is no point in arguing otherwise.
To understand the problem, you need to understand why it is also a solution to much bigger problems.
C++ started as C with classes, and by design aimed at being perfectly compatible with C. But you want to improve developer experience, and bring to the table major architectural traits such as RAII. This in turn meant you add support for custom constructors, and customize how your instances are copied and destroyed. But you also want to be able to have everything just work out of the box without forcing developers to write boilerplate code. So you come up with the concept of special member functions which are automatically added by the compiler if they are trivial. However, forcing that upon every single situation can cause problems, so you have to come up with a strategy that suits all use cases and prevents serious bugs.
Consequently, you add a bunch of rules which boil down to a) if the class/struct is trivial them compilers simply add trivial definitions of all special member functions s that you don't have to, but once you define any of those special member functions yourself them the compiler steps back and let's you do all the work.
Then C++ introduced move semantics. This refreshes the same problem as before. You need to retain compatibility with C, and you need to avoid boilerplate code, and on top of that you need to support all cases that originated the need for C++'s special member functions. But now you need to support move constructors and move assignment operators. Again, it's fine if the compiler adds those automatically if it's a trivial class/struct, but if the class has custom constructors and destructors then surely you also need to handle moves in a special way, so the compiler steps back and lets you do all the work. On top of that, you add the fact that if you need custom code to copy your objects around, surely you need custom code to move them too, and thus the compiler steps back to let you do all the work.
On top of this, there are also some specific combinations of custom constructors/destructors/copy constructors/copy assignment operators which let the compiler define move constructors/move assignment operators.
It all makes absolutely sense if you are mindful of the design requirements. But if you just start to onboard onto C++ and barely know what a copy constructors is, all these aspects are arcane and sadistic. If you declare nothing then your class instances are copied and moved automatically, but once you add a constructor everything suddenly blows up and your code doesn't even compile anymore. You spotted a bug where an instance of a child class isn't being destroyed properly, and once you add a virtual destructor you suddenly have an unrelated function call throw compiler errors. You add a snazzy copy constructor that's very performant and your performance tests suddenly start to blow up because of the performance hit if suddenly having to copy all instances instead of the compiler simply moving them. How do you sort out this nonsense?
The rule of 5 is a nice rule of thumb to allow developers to have a simple mental model over what they need to do to avoid a long list of issues, but you still have no control over what you're doing. Things work, but work by sheer coincidence.
rcxdude 262 days ago [-]
The need to define all 5 has basically nothing to do with C++'s heritage. If you allow those operations to be defined, they all must be defined when you define one of them.
There is a neater design in rust with its own tradeoffs: destructors are the only special function, move is always possible and has a fixed approach, copying is instead .clone(), assignment is always just a move, and constructors are just a convention with static methods, optionally with a Default trait. But that does constrain you: especially move being fixed to a specific definition means there's a lot you can't model well (self-referential structures), and that's a core part of why rust can have a neater model. And it still has the distinction you are complaining about with Copy, where 'trivial' structures can be copied implicitly but lose that as soon as they contain anything with a destructor or non-trivial .clone().
And in C++ it's pretty easy to avoid this mess in most cases: I rarely ever fully define all 5. If I have a custom constructor and destructor I just delete the other cases and use a wrapper class which handles those semantics for me.
chipdart 262 days ago [-]
> The need to define all 5 has basically nothing to do with C++'s heritage. If you allow those operations to be defined, they all must be defined when you define one of them.
I'm sorry, that is not true at all.
Nothing forces you to add implementations, at least not for all cases. That's only a simplistic rule of thumb that helps developers not well versed on the rules of special member functions (i.e., most) to get stuff to work by coincidence. You only need to add a, say, custom move constructor when you need it and when the C++ rules state the compiler should not generate one for you. There's even a popular table from a presentation from ACCU2014 stating exactly in which condition you need to fill in your custom definition.
You are also wrong when you assert this has nothing to do with C++'s heritage. It's the root cause of each and every single little detail. Special member functions were added with traits and tradeoffs for compatibility and ease of use, and with move semantics the committee had to revisit everything over again but with an additional layer of requirements. The rules involving default move constructors and move assignment operators are famously nuanced and even arbitrary. There is no way around it.
> There is a neater design in rust (...)
What Rust does and does not do is irrelevant. Rust was a greenfield project that had no requirement to respect any sort of backward compatibility and stability. If there is any remotely relevant comparison that would be Objective-C, which also took a minimalist approach based on custom factory methods and initializes that rely on conventions, and it is a big boilerplate mess.
cozzyd 262 days ago [-]
It would be more user-friendly if non-defined members of the 5 were automatically deleted, IMO.
vlovich123 262 days ago [-]
> It is. There is no point in arguing otherwise.
Well, I don’t know how to respond to this. I clarified what the rules actually are (< 1 paragraph) and following them blindly leads to correct results. You’ve brought in a whole bunch of nonsense about why C++ has become complex as a language - it’s not wrong but I’m failing to connect the dots as to how the rule of 0 itself is hard to follow or complex. I’m kind of taking as a given that whoever is writing the code is mildly familiar enough with C++ to understand RAII & is trying to apply it correctly.
> The rule of 5 is a nice rule of thumb to allow developers to have a simple mental model over what they need to do to avoid a long list of issues, but you still have no control over what you’re doing. Things work, but work by sheer coincidence.
First, as I’ve said multiple times, it’s the rule of 0. That’s the rule to follow to get correct composition of resource ownership & it’s super simple. As for not having control, I really fail to see how that is - C++ famously gives you too much control and that’s the problem. As for things working by sheer coincidence, that’s like your opinion. To me “coincidence” wouldn’t explain how many lines of C++ code are running in production.
Look, I think C++ has a lot of warts which is why I prefer Rust these days. But the rule of 0 is not where I’d say C++’s complexity lies - if you think that is the case, I’d recommend you use another language because if you can’t grok the rule of 0, the other footguns that lie in wait will blow you away to smithereens.
rcxdude 262 days ago [-]
In addition, it's actually pretty easy in most cases where you do what a non-trivial constructor and destructor to just delete the other 3, and wrap it in unique_ptr or similar to manage the hard parts. I think I've defined all 5 approximately once, and mostly for the fun of it in a side project.
bruce343434 262 days ago [-]
> nonsense ... not wrong
So it's not nonsense?
I think GP clearly laid out the base principles that lead to emergent complexity . GP calls this "coincidence" to convey the feeling of lots of complexity just narrowly avoiding catastrophe in a process that is hard to grok for someone getting into C++. GP also gave some scenarios in which the rule of 0 no longer applies and you now simply have to follow some other rule. "just follow the rule" is not very intuitive advice. The rule may be simple to follow but the foundations on which it rests are pretty complicated, which makes the entire rule complicated in my worldview and also that of GP. In your view, the rule is easy to follow therefore simple. Let's agree to disagree on that. Again, being told "you need to just follow this arbitrary rule to fix all these sudden compiler errors" doesn't inspire confidence in ones code, hence (I think) the usage of "coincidence". If I were using such a language, I'd certainly feel a bit nervous and unsure.
astrobe_ 262 days ago [-]
> GP calls this "coincidence" to convey the feeling of lots of complexity just narrowly avoiding catastrophe in a process that is hard to grok for someone getting into C++
I think that's what they said themselves:
>> It all makes absolutely sense if you are mindful of the design requirements. But if you just start to onboard onto C++ and barely know what a copy constructors is, all these aspects are arcane and sadistic
IMO not knowing why something works (in any language) is an unpleasant feeling. Then if you have the chance you can look under the hood, read things - it's exactly why I'm reading this thread - and little by little get a better understanding. That's called gaining experience.
> Again, being told "you need to just follow this arbitrary rule to fix all these sudden compiler errors" doesn't inspire confidence in ones code, hence (I think) the usage of "coincidence"
That's exactly what other languages like Haskell or Rust are praised for. Why does C++ receive a different treatment when it tries to do the same thing instead of crashing on you at runtime, for once?
marcosdumay 262 days ago [-]
> That's exactly what other languages like Haskell or Rust are praised for.
You making a trivial change, and suddenly there are entire new classes of bugs all over your code is an aspect that does really not receive any praise. People using those two languages work hard on avoiding that situation, and it clearly feels like a failure when it happens.
The part about pointing problems at compile time so the developer will know it sooner is great. And I imagine is the part you are talking about. But the GP was talking about the other part of the issue.
262 days ago [-]
tuyiown 262 days ago [-]
> Things work, but work by sheer coincidence
I wouldn't be so dramatic. House of cards don't stay put by coincidence !
d0mine 263 days ago [-]
Arena can be used to allocate many times but deallocate exactly once. In Zig:
I’m a fan of Zig, but I just want to point out that creating dedicated allocators for managing specific regions/chunks of memory or memory within specific application scopes (i.e., arenas) is just another memory allocation strategy rather than the ultimate solution to memory management issues. It comes with its own trade-offs and depends entirely on your use case. Also, it’s great that Zig has this battery included in its standard library, but arenas aren’t unique to Zig nor are they difficult to implement in any language that allows manual memory management. I’m just pointing this out because I keep seeing folks highlight this as a key feature of Zig over C.
OskarS 262 days ago [-]
You can do it in C for sure, but "culturally" in C, there's a stateless global allocator called "malloc", which is not the case in Zig. For instance, if you have a library libsomething in C, it will at most (probably) have something like this:
#ifndef LIB_MALLOC
#define LIB_MALLOC malloc
#end
if it allows you to customize allocation strategy at all, which is not a given.
But this only allows you at compile time to provide your own stateless global allocator. This is very different in Zig, which has a very strong culture of "if something needs to allocate memory, you pass it a stateful, dynamically dispatched allocator as an argument". You COULD do that in C, but virtually nobody does.
samatman 262 days ago [-]
It's 100% a key feature of Zig. Culturally, if it allocates, then it takes an allocator as an argument. C simply doesn't work that way. You could write C that way, but people don't.
I've written reasonable amounts of both, and it's just different. For instance, in Zig, you can create a HashMap using a FixedBufferAllocator, which is a region of memory (which can be stack allocated) dressed up as an allocator. You can also pass it an arena and free all at once, or any other allocator in the standard library, or implemented by you, or anyone else. Show me a C library with a HashMap which can do all three of these things. Everything which allocates takes an allocator, third-party libraries respect this convention or will quickly get an issue or PR either requesting or implementing this convention.
Ultimate solution? No, but also, sort of. The ability to idiomatically build a fine-grained memory policy is a large portion of what makes Zig so pleasant to use.
anymouse123456 262 days ago [-]
This. I've been loving Zig for some years now, but still write a lot of embedded C at work.
I've started to use simple memory arenas in C and it just feels so damn _nice_.
There's basically a shared lifetime for most of my transient allocations, which are nicely bounded in time by a "frame" of execution. Malloc/Free felt like a crazy amount of work, whereas an arena_reset(&ctx) just moves a pointer back to the first entry.
Another person pointed out that arenas are not destructors, and this is a great point to make. If you're dealing with external resources, moving an arena index back to the beginning does not help - at all.
sixthDot 262 days ago [-]
allocation is not construction, and deallocation is not destruction. The two steps are oftently executed sequentially but if you think that they are the same you'll end up with leaks, e.g at the level of the operating system (e.g GDI handles). What I mean is that arena allocators are not as simple as you pretend. That depends on what they allocate. The more commonly reason why arena allocators are praised is the cache locality.
bluGill 263 days ago [-]
I've never seen someone get the rule of 5 wrong, but the rule of 3 was a reaction to 10 years of hindsight to realise the default is wrong. Congradulations to rust for looking to see what was done wrong by their predisessors. you can't really fault someone for making a mistake when nobody at the time knew it was a mistake.
pavlov 263 days ago [-]
> 'a thread local "context" which you attach all cleanup actions to'
Like the autorelease pool found in Objective-C of yore? I always liked that solution and sometimes implemented in plain C too.
Incidentally i use it as 4/6/0 by including the default ctor in the set.
261 days ago [-]
uecker 262 days ago [-]
I was initially interested in defer in C (I am co-author of some earlier proposal), but after actually studying its impact on code example I was entirely unimpressed about the actual improvement compared to goto style cleanup. A lot of people seem to like it though, and JeanHeyd's version seems quite good, but I personally not terribly convinced about this feature anymore.
chipdart 262 days ago [-]
> I'd argue it barely works in C++ as well. I've seen so many poorly implemented classes that violate the very complicated 3/5/0 rule.
I'm afraid you're complaining about entirely unrelated things.
It's one thing to claim that C++ structs have this or that trait. It's a entirely different thing to try to pin bugs and developer mistakes on how a language is designed.
Gibbon1 263 days ago [-]
My small brained comment is people use heap allocation when they should be using an arena allocation. And heap allocation shouldn't return a pointer it should return a handle.
adrianN 263 days ago [-]
Yeah, the example of "what if you make a copy" breaks in C++ in exactly the same way if you're not careful.
gary_0 263 days ago [-]
In other words, C structs and C++ structs are not the same thing (although C++ can usually handle C structs too). C structs are Plain Old Data. C++ structs are "objects" and there are pages and pages of rules in the C++ Standard about what that means, and there's no way the C Standard can incorporate all that. And you can't drag any C++ struct/class features into C without dragging in all those rules with them.
chipdart 263 days ago [-]
> In other words, C structs and C++ structs are not the same thing (although C++ can usually handle C structs too). C structs are Plain Old Data. C++ structs are "objects" and there are pages and pages of rules in the C++ Standard about what that means, and there's no way the C Standard can incorporate all that.
I think this glances over what structs actually are in C++, and unwittingly portrays them as something different.
Structs in C++ are definitely exact like structs in C. Or they can be, if that's what you're aiming for. If you include a C header file that defines a struct in a C++ program, you build it, and you use instances of that struct to pass them to C programs, everthing just works.
The detail you need to be mindful of is that C structs support a subset of all the features supported by C++ classes, and once you start to use those features C++ also allows implementations to forego some constraints.
If you expect to use a struct in C++ but still define it in a way that you make it include features that are not supported in C then you can't pin that on the language.
Using C-like structs is a very common use case, to the point that the standard explicitly defines the concept of standard layout and builds upon that to specify the concept of a standard layout type. A struct/class that is a standard layout type, which means it's a POD type, corresponds exactly with C structs. They are explicitly defined in terms of retaining interoperability with other languages.
rramadass 262 days ago [-]
Exactly. This is one of the basic things (i.e. PODs) you learn in C++ that i am surprised the gp didn't know of it. I myself have written systems with C-structs/C-api and wrapped the same C-structs (by deriving) in C++ classes (being careful with any introduced vptr/vtable) to happily provide/extend C-code/libraries via C++ apis.
gary_0 262 days ago [-]
Of course I know about the POD idiom. You and chipdart are misunderstanding me because there's a tension between how the C++ Standard defines things and how C++ gets used in real life. Because strictly speaking, the Standard doesn't really have a concept of POD[0]. What it does have is "trivial" classes and the concept of object lifetime. For instance, if your class/struct isn't trivially_copyable and you memcpy it like a C struct, you're in Undefined Behavior country. If your class/struct is such that you must observe C++'s lifetime rules, but you are writing its fields by casting a char pointer to some bytes, that's UB.
But yes, if you make extra sure (under threat of footgun) that your struct only has simple types in it and doesn't use virtual or define any ctors/dtors or use protected/private or use inheritance and all of its members follow those rules etc etc, maybe you can treat it like a C struct. But the C++ Standard is telling a different story.
Keep in mind, I'm not blaming you for ignoring all these complications if at the end of the day the compiler seems to give you the behavior you expect. But the fun of C++ is that it's kind of two programming languages in one: the language the Standard defines, and the language the typical programmer thinks it is.
[0] There was std::is_pod, but it was deprecated because it doesn't reflect how the Standard actually defines things. A bit of a cruel joke, dangling that in front of us and then yanking it away.
rramadass 262 days ago [-]
POD is not an idiom. It was an actual specification (in a sense) for interop between C++ and C. Only in the later standards (maybe starting at C++14?) did the committee refine it further as "POD = Trivial + Standard_Layout" but that is just a redefinition without any fundamental change in semantics. So you can happily write C++ code with just your understanding of POD from C++98 in practice and everything will work fine.
They started changing the definitions in C++11 to support move semantics. I don't remember much about C++98, that was decades ago. If that's what the Standard said back then I'll take your word for it, but I wasn't talking about historical C++ Standards.
Keep in mind, my original comment was pretty much just drawing a line through TFA, which also argues that you can't cleanly map C++ object concepts onto C structs. C++ has some backwards compatibility with C obviously but nowadays it's a totally separate language with an independent standards body (for better or worse). Specifying "do what C does" might have flown in 1998 but that changed a long time ago.
rramadass 262 days ago [-]
I am generally not a fan of the standards committee nor what it is trying to do with the language. The word "Object" used in C++ land has a different meaning than the same word used in C land since there is an "Object Model" in C++ while there is none in C. Hence trying to map C++ object concepts onto C does not even make sense in the general case. But because of C++'s evolution having started as "C with classes" there is some mapping at the set-of-bits level which is where the POD (with all its limitations) comes in.
I am fully with Stroustrup in arguing that C++ should strive for as much compatibility with C as possible in the spirit of the original (see ref. at https://en.wikipedia.org/wiki/Compatibility_of_C_and_C%2B%2B...). But sadly the rest of standards committee don't seem to want this which i believe is a huge mistake. On the other side, the C standards committee should be very careful what inspiration they take from C++ in the evolution of the language since it was designed as a "minimal" language which was one of the main factors in its success. Whether people call it "primitive", "well behind other languages" etc. does not matter. You definitely don't want C turning into C++-lite. Hence IMO the conclusions stated in the last few paragraphs of the submitted article are quite right.
gary_0 262 days ago [-]
In my experience the number of C++ developers with nice things to say about the Committee is... very small.
In a way, the whole C++ endeavor was doomed from the start. C was old and pragmatic and vague, a "portable assembly", and it was a shaky foundation to build C++ on top of. When the Standard tried to tighten things up, it just got more lopsided, full of hacks to fix hacks. But the alternate universe where C++ had a more pragmatic, laissez-faire design going forward probably isn't any better; maybe the "standard" would have become "do whatever GCC does"--or in the Darkest Timeline, "do whatever MSVC does".
I disagree that C++ "respecting its C roots" is viable. The C++11 and later Standards were trying to make the best of a bad situation, and that required leaving C behind because the C way of doing things doesn't fit with a higher-level language like contemporary C++. Especially when the language has multiple implementations that need to compile the same code the same way. The "C with classes" days are long over for most of us who have to use libraries expecting std::vector, smart pointers, and exception handling. We live in mortal fear of compiler writers smiting us for innocent things like punning through a union.
> You definitely don't want C turning into C++-lite
I agree. Trying to quickly hack classes or templates or whatever back on top of C would just start the whole C++ nightmare over again.
rramadass 262 days ago [-]
> In a way, the whole C++ endeavor was doomed from the start ... I disagree that C++ "respecting its C roots" is viable.
Hey! Them's fighting words! :-) "C++ as a better C" (which is what it started as) was/is/always will be needed and necessary. It gave you the best of both low-level and high-level worlds with full control and just enough complexity. Instead of implementing structs full of function pointers to design dynamic dispatch object models you just had the compiler do that for you while still retaining full control over other aspects. I still have some manuals that came with SCO Unix one of which was on the then newfangled C++ language. It had one chapter by Stroustrup himself (his original paper probably) on the C++ object model showing how vptrs/vtables are implemented and thinking it neat that the compiler did it for you. Also templates were just glorified macros then with none of the shenanigans that you see today. Hence moving from C to C++ was easy and its usage and popularity exploded. But with the infusion of lots of people into C++ land people who were not aware of the original vision/design/compatibility goal of the language started asking for the inclusion of more and more OO and modern language features. The result? The standards committee reinventing the language from C++11 onwards(and changing every freaking 3 years) and alienating the old C++ folks who made it popular in the first place. No doubt there are some benefits like increased design space and modern programming techniques but am not sure whether the increased complexity makes it all worth it. For me it is still C++98 with the addition of the STL and some simple generic programming techniques which is the sweet spot.
celrod 262 days ago [-]
> We live in mortal fear of compiler writers smiting us for innocent things like punning through a union.
C++20 introduced `std::bitcast`, so I appreciate alias analysis getting all the help it can.
rkangel 262 days ago [-]
> Using C-like structs is a very common use case
Not true. Using C structs themselves in C++ is very common - when you include the C header file, the relevant declarations are wrapped in "extern "C" {}" which gives structs C semantics. You can do this because C++ is backwards compatible with C.
Most of the time when you use a struct in C++ you're just ignoring most of the capabilities of objects (which is fine!). If you declare a struct in C++, you're getting an object. The only difference between the struct and class keywords in C++ is the default privacy of the members.
gpderetta 262 days ago [-]
A C++ structure has exactly the equivalent semantics of a C structure if it exists. Extern "C" only affects linkage of functions. It has no effect on structure definitions.
OskarS 262 days ago [-]
It absolutely does not. A C++ structure is a much richer object: it can have custom copying, moving and assignment behaviours, it can have vtables, it has RAII, etc. That's the whole point about the "object model" in the article: C++ has it, C does not.
What I think you're trying to say is "a POD structure with no custom behavior is essentially identical in C and C++". That is mostly true, though if the struct contains a union, C++ has stricter UB rules (there might be other differences as well, but that's the one I can think of at the moment).
gpderetta 262 days ago [-]
What I'm saying is that extern "C" has no effect on structure compatibility.
262 days ago [-]
262 days ago [-]
schmidt_fifty 262 days ago [-]
The detail missing from this explanation is that structs and classes are the same thing with different default visibility. I found this enormously confusing when learning the language, and I think it was a major mistake. My assumption was that a struct was exactly the same as a c struct, and the "new" functionality was all a part of the classes.
Still. There's always extern "c".
stonemetal12 263 days ago [-]
Yep. The only difference between struct and class in C++ is that class defaults to private while struct defaults to public.
100% of the using structs like they are C structs vs using class as objects is cultural not a part of the language.
chipdart 262 days ago [-]
> 100% of the using structs like they are C structs vs using class as objects is cultural not a part of the language.
I think this take is completely wrong. There is nothing cultural about it. C++ was created as a strict superset of C, and thus from the inception it supported all features made available in C. This design goal remains true up to this day, and only started to diverge relatively recently when C was updated to include features that were not supported (yet) by C++.
When someone declares a plain old struct in C++, they are declaring a struct that is perfectly compatible and interoperable with C. This is by design. From the inception.
steveklabnik 262 days ago [-]
> This design goal remains true up to this day, and only started to diverge relatively recently when C was updated to include features that were not supported (yet) by C++.
It is true that both sides agree that compatibility is an important goal, but it's only a goal, not something that's 100% the case.
mianos 262 days ago [-]
What if, after 20 years of C++, you spend 10 years doing python, only to go back to C++ and realise that all this private/protected stuff is a crock and most of the time you are doing real work you just use struct and start typing your C++, virtual functions, constructors, destructors etc?
Just asking, for a friend.
josefx 262 days ago [-]
> you spend 10 years doing python, only to go back to C++ and realise that all this private/protected stuff is a crock
Just a friendly reminder that two leading underscores wont protect your member functions in C++. Even if people insist that those are totally not supposed to be private in python.
nhatbui 262 days ago [-]
I think OP meant discarding public/private constructs entirely, no protection, like in python.
gpderetta 262 days ago [-]
Except python started mangling double underscore in a futile attempt to implement private members/methods.
mianos 262 days ago [-]
The underscore prefix is more about communication. It's not a bad convention as it makes you feel a bit dirty when you are using them outside a class, but, do what you want, we are consenting adults.
Whenever I say "I'm no longer attached to all that private stuff", people always reply, "wait until you work on a large code base". I work on a million line+ code base. Whatever.
This argument aside, I'm not a total philistine. RAII is awesome but C++ is full to the boot with crusty stuff to keep the compatibility. I always feel there is a language better than anything trying to come out.
gpderetta 262 days ago [-]
Python will literally mangle the names of double underscore members by prefixing them with the class name, to make it harder to access from the outside, so it is not just about communication.
These days I'm for minimalism, most of my structs are aggregates of public members, but sometimes you really want to make sure to maintain the invariant that your array pointer and your size field are in sync.
Doxin 262 days ago [-]
Using double underscore is advised against, and the name mangling is largely considered a mis-feature these days. Most style guides will tell you to use a single underscore to mark something as not for public consumption.
Of course neither double nor single underscore will stop anyone who wants to touch your privates badly enough. Which is big part of the python philosophy: You're not stopped from doing inadvisable things. Instead there's a strong culture around writing "pythonic" code, which largely avoids these pitfalls.
eru 262 days ago [-]
And neither does C++'s 'private' stop any other code from messing with your data, either, if they want to do that badly enough.
Doxin 261 days ago [-]
I'm not super familiar with C++, but I imagine you'd need some chicanery to access privates, while in python you can just use them by name.
eru 260 days ago [-]
Well, you can always cast and access stuff by memory address.
mianos 258 days ago [-]
I can't rally upvote this without breaking the rules about obscenities. But I'll give it a :)
In python, if any of this gives you an trouble you can just replace the stuff in the class dict with your own functions. You don't even need to cast.
schmidt_fifty 262 days ago [-]
[dead]
schmidt_fifty 262 days ago [-]
[dead]
gary_0 263 days ago [-]
Any similarly of keyword naming between C and C++ is purely coincidental. :P
C++ is somewhat unique in that it started out as a few extra features on top of C before gradually splitting off and mutating into a totally separate programming language.
pjmlp 261 days ago [-]
"Unique" in a world where Objective-C, Objective-C++, Groovy, and TypeScript exist.
gary_0 261 days ago [-]
What I meant wasn't that it was a language that was compatible with an earlier language. Groovy just compiles to the JVM; lots of things do. TypeScript is just JavaScript with type safety; Python did that too. Objective-C was just NeXT attempting to make the ugliest-looking programming language possible and they succeeded immediately.
But Cfront was released circa 1983 and you basically just wrote C, but it added a bit of new syntax that generated extra C behind the scenes. Object-oriented programming was still fetal in 1983! It didn't get really hyped until the mid-90's. So C++ kind of mutated for decades as this gross appendage on C until it became this whole separate blob that ate half of programming. It was 15 years later when the C++98 "standard" started trying to reign in Dr. Stroustrup's monster.
Then in 2005 we threw away all our textbooks that were like "Look! `Apple` derives from `Fruit`! `Car` derives from `Engine`! This is going to change the world!" because adding object-orientedness to everything became uncool when our bosses became fans of Java. But by this point the C++ blob had taken on a life of its own...
So yeah. Very few programming languages have a story as long and insane as C++.
pjmlp 261 days ago [-]
Objective-C was originally a macro processor just like CFront on top of C.
Objective-C++ likewise on top of CFront.
Until like with CFront, they became selfhosted compilers.
Groovy code is Java code, regardless of targeting the JVM, the same syntax is supported and extended with dynamic capabilities.
Object Pascal was created for Lisa project, exactly in 1983.
Tom Love and Brad Cox created Objective-C in 1984.
kazinator 263 days ago [-]
Support for POD (plain old datatype) structs in C++ is definitely part of the language.
SAHChandler 263 days ago [-]
The public vs. private aspect also affects inheritance. structs publicly inherit from base types by default, classes privately inherit from base types.
KingLancelot 262 days ago [-]
[dead]
guillaumec 262 days ago [-]
I notice more and more pushes to 'improve' C and turn it into something it should not become. I feel like the C++ community gave up on C++ because of the growing complexity and so turned to C with the hope of adding to it the good parts of C++ without the ugliness. But this is of course hopeless: every added feature will create new issues that will be solved with new features until the language becomes too complex for anyone to fully understand.
fch42 262 days ago [-]
The part in it that I don't understand is ...
Again, "traditionally", one could (ab)use C++ as "C with extras". And it wasn't uncommon, especially in resource constraint usecases,
to do just that. C++ without STL or templates, or even C++ without new/delete.
This "is not C++", agree. Would a subset be enough for "using it like C-with-RAII" ?
Given the details and pitfalls the original author lists, I suspect not. It's not just C programmers who "do strange things" and make odd choices. The language itself though "lends itself to that". I've (had to) write code that sometimes-alloca'ed sometimes-malloc'ed the same thing and then "tagged" it to indicate whether it needed free() or "just" the implied drop. Another rather common antipattern is "generic embedded payloads" - the struct definition ending "char data[1]" just to be padded out by whatever creates it to whatever size (nevermind type) of that data.
Can you write _new_ C code that "does RAII" ? Probably. Just rewrite it in rust, or zig :-)
Can you somehow transmogrify language, compiler, standard lib so that you can recompile existing C code, it not to "just get RAII" then at least to give you meaningful compiler errors/warnings that tell you how to change it ? I won't put money on that.
actionfromafar 262 days ago [-]
New/delete were never that great to begin with and have now fallen out of style. Also, C++ is quite useful and powerful even without STL.
The STL is pretty dispensable in my experience, even for people doing full-blown modern C++, and C++20 has made that particularly obvious. The most useful feature somewhat unique to C++ is the extensive metaprogramming facility, which only recently became non-arcane.
bluetomcat 262 days ago [-]
> Can you write _new_ C code that "does RAII" ? Probably.
You can do "manual" goto-based RAII in C, and it has been done for decades. The end of your function needs to have a cascading layer of labels, undoing what has been done before:
if (!(x = create_x())) {
goto cleanup;
}
if (!(y = create_y())) {
goto cleanup_x;
}
if (!(z = create_z())) {
goto cleanup_y;
}
do_something(x, y, z);
cleanup_z:
destroy_z(z);
cleanup_y:
destroy_y(y);
cleanup_x:
destroy_x(x);
cleanup:
return;
It just takes more discipline and is more error-prone maintenance-wise.
rcxdude 262 days ago [-]
That's not RAII, that's 'defer'. defer and context managers are both implementations of a subset of the kind of functionality you can get with RAII (the two missing parts are 1) allowing you to place an RAII object in part of a larger structure and have confidence it will actually be constructed and destructed correctly, and 2) allowing the representation of lifetimes which are more complex then just 'in this scope' via moves and copies).
gpderetta 262 days ago [-]
except this misses the point of RAII.
raydev 262 days ago [-]
> until the language becomes too complex for anyone to fully understand
Like C, with its many hidden behaviors?
uecker 262 days ago [-]
What hidden behaviors? The only hidden behaviors I can think of which are somewhat problematic in C are implicit value-changing conversions. But one can instruct compilers to diagnose those.
tempodox 262 days ago [-]
Not hidden, the C standard spells it out. And implementation-defined behavior can be observed.
raydev 262 days ago [-]
> Not hidden, the C standard spells it out
I would argue that if it needs to be spelled out in a separate document from the code you're reading, then it's hidden.
tempodox 257 days ago [-]
You must be joking. Like, you only use languages that don't require you to learn anything.
dgellow 262 days ago [-]
Could you be more specifics? What improvements do you mean?
It’s not clear if you’re talking about defer or RAII
jokoon 263 days ago [-]
I wish there was some way that you could configure a C++ compiler to just disable certain features of the language, or enforce good practices.
But that's already what linters/static analyzers are doing? But then, why not integrate those tools directly in a C++ compiler instead?
With cpp2/cppfront, Herb Sutter is already building some sort of a "sane" subset of the C++ language, maybe because you cannot achieve good practices without having a new syntax.
C++ seems to have the same problem of javascript: it has annoying "don't-do-that" use cases, although it seems insanely more complicated to teach good C++ practices.
Of course, this requires buying into a set of tooling and learning a lot of specific idioms. I can't say I've used it, but from reading the docs it seems sound enough.
fweimer 263 days ago [-]
You can write a compiler plugin that rejects constructs you don't like. Even GCC doesn't immediately lower the more complex (more controversial) C++ constructs. There's an existing system headers mechanism, so it's probably not that hard to skip this kind of feature restrictions for the standard library headers (where the banned constructs might be used to implement something that looks completely different at the surface).
pjmlp 262 days ago [-]
That is what static analysers are for.
The issue is developers that think they are useless tools.
mst 262 days ago [-]
Usefully albeit depressingly, these days you can often get significantly easier buy-in if you call it a linter instead.
humanrebar 262 days ago [-]
Or taken to an extreme, it's not hard to compose clang-queries that find for any arbitrary syntax. You could ban 'int' and pointers in your project if you wanted!
pjmlp 262 days ago [-]
Or the way I like it, configure Sonar to break pull requests when devs ignore the rules that are supposed to be followed.
sixfiveotwo 262 days ago [-]
[dead]
pornel 263 days ago [-]
> “just ban simple automatic storage duration structure copying” is a terrible usability and horrific ergonomics decision to make
This sounds like a great idea to me! Rust disables implicit copying for structs with destructors, and together with move-by-default, it works really well. Unlike PoD structs, you don't need to heap allocate them to ensure their uniqueness. Unlike copy constructors, you don't need to worry about implicit copies. Unlike C++ move, there's no moved-from junk value left behind.
tialaramex 262 days ago [-]
> Rust disables implicit copying for structs with destructors
"Disabling" is maybe not the right way to think about it. Rust only has "implicit copying" for Copy types, so you have to at the very least #[derive(Copy,Clone)] to get this, it's true that you can't (and therefore neither can a derive macro) impl Copy on types which implement Drop and that's on purpose but you're making a concrete decision here - the answer Rust knows is never correct is something you'd have to ask for specifically, so when you ask it can say "No" and explain why.
Lots of similar behaviour in C++ is silent. Why isn't my Doodad behaving the way I expected? I didn't need to ask for it to have the behaviour I expected but the compiler concludes it can't have that behaviour, so, it doesn't, and there's nowhere for a diagnostic which says "Um, no a Doodad doesn't work like that, and here's why!"
Diagnostics are hard and C++ under-values the importance of good diagnostics. Rust recently landed work so libraries can provide improved diagnostics when you try to call them with inappropriate parameters. For example now if you try to collect() an iterator into a slice, the compiler notices that slice doesn't implement FromIterator and it asks FromIterator to explain why this can't work, whereupon FromIterator notices you were trying to use a slice and emits a diagnostic for this particular situation - if you'd tried to collect into an array it explains how you'd actually do that, since it's tricky - the slice is impossible since it's not an owning type, you need to collect into a container.
meinersbur 262 days ago [-]
IMHO it would still be a useful feature, just one that is not strictly needed. You are not able to to pass around such structs, but neither is that possible with the suggested `defer` statement. The only advantage of `defer` is that the destructor code is inline, rather than in a separate destructor function.
But you could gain reusability of headers to be also used in C++, not needing to reinvent the wheel with new issues (e.g. variable lifetime), and a whole lot of existing experience with RAII.
einpoklum 263 days ago [-]
A C++ code design note:
The initial example in the article is anti-idiomatic, because it imbues the larger class with a RAIIness which can be limited to just one element of it:
It's only the c member that really requires any special attention. In this particular case. So, there should be something like a `class void_buffer` which is a RAII class, and then:
and now instead of a complicated bespoke class we have the simplest of structs; the only complexity is in void_buffer.
HelloNurse 262 days ago [-]
But compared to the example code in the article this void_buffer would be significantly more difficult to write and/or use, more verbose and less readable due to fragmentation into multiple classes, and (in view of future evolutions of the code) also less general.
einpoklum 261 days ago [-]
> would be significantly more difficult to write and/or use
1. It would be no more difficult to write and use than the larger class. After all, you can use the larger class as a void_buffer with some dummy extra fields.
2. You can put the class in a detail_ sub-namespace, or make it an inner class of ObjectType, and then people will avoid using it in other, general contexts.
nottorp 262 days ago [-]
Yes, that's a property of C++. It tempts you to hide complexity under several layers of classes so you can have ... more classes, i guess? And job security, because it makes the code much harder to follow for whoever didn't write it.
einpoklum 261 days ago [-]
"More classes" is not something detrimental in any way, in itself. You can have "more classes" while also having:
* Less code overall
* More reuse of classes as versatile/simple components, as opposed to a zoo of bespoke classes
* Classes which are simpler to understand and with more predictable behavior
This is true in the example above: With the corrected code, it's enough that I tell you "ObjectType is a simple struct; and one of its members is a buffer of untyped data". I don't have to show you the class definition; you know enough to understand what's going on. And you can use your void_buffer elsewhere.
daemin 262 days ago [-]
In this case it's not complexity but a useful abstraction.
It abstracts the void_buffer into its own type with proper correct functions for creating, (maybe copying), moving, and destructing the buffer. With that you get a simple type that you can use elsewhere without needing to remember that you need to free() the buffer manually before the end of the scope, or needing to remember how to correctly copy or move the buffer elsewhere.
Measter 263 days ago [-]
Maybe I just suck at reading, but I'm not sure I get the argument for why function overloading and constructors are required for RAII. Is it some interaction with C and C++'s object models that I clearly didn't understand?
defen 263 days ago [-]
Me attempting to summarize the article:
There are 2 ways to get C++-style RAII into C. The first way is to wholesale import the C++ object system into C (which means name mangling, all the different flavors of constructors, destructors, etc). Conceptually this would work, but it's never going to happen, because implementing that would be literally more work than an entire conforming C99 compiler.
The second way is to just use some special function attributes to signify that a function runs when an object is created on the stack / popped off the stack. This won't work either because the C++ object system also solves lots of other problems that this simpler system just ignores (such as, what happens when you copy an object that has a constructor function).
jacinabox 261 days ago [-]
The C language has rules around 'effective type' which determine what object type a block of memory can have, while the C++ language has rules around object model which does basically this AND requires that a constructor is called on an object before it is properly regarded as being of that object type. In my opinion the reason why the C++ standard cares about object lifetime is because C++ structs can have reference members which are required to be initialized in any instance of that struct type. In contrast it's compatible with what C has of an object model to just say to language users "If an object is in static or automatic storage the constructor is called automatically but if an object is in heap storage it's up to the user to call the constructor themselves."
amateur C++ coder
Measter 263 days ago [-]
Ah, I missed the "copy wholesale" aspect.
When I started reading it, the first thing that came to my mind was the issue with copying the structs. The article started looking at the issue, but didn't really follow further with the changes needed to make it work, which is that you start needing to introduce tracking which instance is responsible for the resources and providing a way to transfer that responsibility (a.k.a. ownership and move semantics).
masklinn 262 days ago [-]
Nah the first half of the essay is basically irrelevant, you need to start below that, and what I consider the meat of the issue is the “copy” section about two thirds down.
eschneider 263 days ago [-]
It would seem that if you want C with RAII, you...use C++ and limit the features you use. QED.
01100011 263 days ago [-]
This is surprisingly common. C++ is huge and filled with many features that are only understood by a small subset of folks and so many teams have restricted coding standards that define which features can be used and how.
manuel_w 263 days ago [-]
Sounds reasonable. In a project I used to work in I disabled stdlib, exceptions, RTTI. Not sure what else to disable to essentially have C with different syntax.
legobmw99 263 days ago [-]
extern “C” on everything, if you care about linking compatibility, gets you another chunk of the way there by also disabling overloading etc.
nicebyte 263 days ago [-]
I respect Jean-Heyd very much, but I'm unconvinced by this article.
First, the whole name mangling discussion is completely irrelevant to the issue and can be omitted. Second, one could tack on both copy and move constructors on to C in order to solve the double-free issue, in much the same way regular ctors are tacked on in the email proposal. In fact, I would argue that it is _necessary_ because A in RAII stands for Acquisition not Allocation. "Acquisition" implies ownership, which can be transferred or shared, so your copies and moves _have_ to have a special meaning to them. The fact that the proposal is bad or incomplete does not mean that it is "impossible" to have RAII in C. I don't claim that it _is_, but reading this did not reveal to me anything fundamental that would preclude RAII, only that all the preceding RAII proposals have been sloppy.
scott_s 263 days ago [-]
I found the arguments compelling. The discussion on "Effective types" and C not having a proper concept of objects is key.
Another way to think about it: even if you had defined constructors and destructors for a struct, you have not solved when to call them. C++'s answer to that question is its sophisticated object model. C does not have one, and in order to answer that question, it must. It's worth noting that RAII was not a feature that was intentionally created in C++. Rather, astute early C++ developers realized it was a useful idiom made possible by C++'s object model.
sapling-ginger 262 days ago [-]
You say "just add copy and move contructors", but that requires function overloading, which is exactly why he spent a third of the article ranting about name mangling. The point is that there is a tangled network of interdependent features that make C++ work, and you can't ""just"" take a small piece to put into C without dragging a whole bunch of other stuff along.
nicebyte 261 days ago [-]
No it does not. You can absolutely add copy and move ctors without function overloading.
indigoabstract 262 days ago [-]
Yes, it meanders to much to get to the point. Which is that RAII doesn't work in C because unlike C++, which has a comprehensive type system mandated by a standard, a C program doesn't "know" at runtime that a struct is composed of other (typed) fields so it can do a proper deep field copy (or destruction). And implementing that type system in C doesn't seem feasible for practical and political reasons.
I think the actual question should be "can C get automatic memory management like in C++ without having the equivalent of C++'s type system"?
Though I can't put my finger on it, my intuition says it can, if the interested people are willing to look deep enough.
orf 262 days ago [-]
> a C program doesn't "know" at runtime that a struct is composed of other (typed) fields so it can do a proper deep field copy (or destruction).
This doesn’t make sense: you don’t need runtime introspection to do this?
indigoabstract 262 days ago [-]
In C++, when you copy a struct instance to another instance, the runtime knows if any fields (to whatever depth) have manually defined assignment or move operators and will call them in the proper order. So it's a deep copy. The same information is used for calling any field constructors and destructors that are user defined.
Introspection (reflection) would go even further and provide at runtime all the information that you have at compile time about an object. But that's not required for assignment and destruction operations to work.
C doesn't have any of that, so a struct copy is just a shallow copy, a bit by bit copy of the entire struct contents. Which works pretty well, except for pointers/references.
sixfiveotwo 262 days ago [-]
[dead]
bregma 262 days ago [-]
No. Well, yes, in that if the type of an object is dynamic, it's possible that certain functions are resolved at runtime usually through a "virtual table". The static type of an object is only known at compile time, and all that the virtual dispatch does is an indirection through the virtual table to the static constructor or destructor as required, and the static special functions always know how to construct, copy, or destroy any subobjects.
So, no, runtime introspection is not needed, but runtime dispatch may be needed.
262 days ago [-]
KingLancelot 262 days ago [-]
[dead]
iainmerrick 262 days ago [-]
This is obviously a bit of a rant, and intended as such, but it’s really well thought through and well-argued too.
I haven’t seen this distinction laid out so clearly before:
Every other language worth being so much as spit on either employs deep garbage collection (Go, D, Java, Lua, C#, etc.) or automatic reference counting (Objective-C, Objective-C++, Swift, etc.), uses RAII (Rust with Drop, C++, etc.), or does absolutely nothing while saying to Go Fuck Yourself™ and kicking the developer in the shins for good measure (C, etc.).
GC, ARC, RAII or GTFO, those are the options. That’s right!
I always come away from these discussions with more respect for Objective-C -- such a powerful yet simple language. I suppose Swift is the successor but it feels very different.
Although, Obj-C only really came into its own once it finally gained automatic reference counting, after briefly flirting with GC. At that point it was already being displaced by younger and more fashionable languages.
OskarS 262 days ago [-]
I would say there are two more esoteric options: "no memory allocation at all outside of the program stack" (like... i dunno... lambda calculus?) and fancy-pants computer sciency things like linear types. No sane language does either of those, though.
iainmerrick 262 days ago [-]
Good points! I tend to think of linear types as a more general kind of RAII -- like there's a spectrum that goes RAII -> borrow checker -> linear types -- but maybe it does warrant its own category.
jay-barronville 263 days ago [-]
C is the ultimate WYSIWYG language (provided you understand the semantics of your target architecture and assuming a non-buggy compiler). The language is relatively simple. The standard is accessible. I’d like it to remain that way. I don’t need C to adopt any other “modern” language features.
C11 provided a few worthwhile improvements (i.e., a proper memory model, alignment specification, standardized anonymous structures/unions), but so many of the other additions, suggestions, and proposals I’ve seen will just ruin the minimal nature of C. In C++, a simple statement like `a = b++;` can mean multiple constructors being called, hidden allocations, unexpected exceptions, unclear object hierarchies, an overloaded `++`, an overloaded `=`, etc. Every time I wish I had some C++ feature in C, I just think about the cognitive overhead it’d bring with it, slap myself a couple times, and go back to loving simple ole C.
Please don’t ruin C.
hgs3 263 days ago [-]
> Please don’t ruin C.
Exactly this. C++ folks should not approach C like a "C++ lite". I appreciate the authors candid take on the subject.
As for defer, there is some existing precedent like GCC and Clang's __attribute__((cleanup)), but - at least for me - a simple "goto cleanup;" is usually sufficient. If I understand N3199 [1] correctly, which is the authors proposal for introducing defer in C, then "defer" would be entirely a compile-time construct. Essentially just a code transformation to inject the necessary cleanup at the right spots. If you're going to introduce defer to C then that does seem like the "best" approach IMO.
We C++ devs haved moved away from C decades ago, and frankly dont even think of it any more, and will never go back. Its a relic of its time, like DOS, Amiga etc. RAII is a big feature we can no longer live without. The type system and overloading is fantastic. And std::vector is a magnificant feature. A language without these features is a relic for us C++ devs.
And yes, I also agree that C++ has WTF insanity, like 17 or so initialisation quirks, exceptions in general (primarily to address failures in constructers, surely there must be a better way, also OOM / bad_alloc is a relic from the past), and unspecified sizes for default built in types (thats C heritage).
uecker 262 days ago [-]
I moved from C++ back to C and found that I am much more productive not worrying about a lot of things. But it takes a while to figure out how to do things in C because almost nothing comes out-of-the-box.
sixfiveotwo 262 days ago [-]
[dead]
atn34 263 days ago [-]
> provided you understand the semantics of your target architecture
Unless you're writing inline assembly or intrinsics or something like that, the semantics of your target architecture are quite irrelevant. If you're reasoning about the target architecture semantics that's a pretty good indication that what you're writing is undefined behavior. Reasoning about performance characteristics of your target architecture is definitely ok though.
chowells 263 days ago [-]
And presuming you avoid 100% of undefined behavior, which I've never seen a non-trivial C program succeed at. C is way too complicated in the real world. You don't want C, you want a language that actually gives defined semantics to all combinations of language constructs.
fooker 263 days ago [-]
>you want a language that actually gives defined semantics to all combinations of language constructs
No, this is wrong. It's a common misconception though. You would only want that in a hypothetical world where all computers are exactly the same.
Undefined and implementation defined behavior is what allows us to have performance at all. Here are some simple examples.
Suppose we want to make division by zero and null pointer dereference defined. Now every time you write a/b or *x, the compiler will be forced to emit an extra branching check before this operation.
Something much more common---addition. What about signed overflow? Do you want the compiler to emit an overflow check in advance? Similar reasoning for shift instructions.
UB in the language specification allows compilers to optimize based on the assumption that the programs you write won't have undefined behavior. If compilers are not able to do this, it becomes impossible to implement most optimizations we rely on. It's a very core feature of modern language specifications, not an oversight you can fix by thinking about it for 10 minutes.
dooglius 262 days ago [-]
> Now every time you write a/b or *x, the compiler will be forced to emit an extra branching check before this operation.
This is wrong, because you would define them to have the behavior that the architecture in question does, so no changes would be needed. For integer division this would mean entering an implementation-defined exceptional state that does not by default continue execution (on Linux, SIGFPE with the optional ability to handle that signal). For dereferencing a pointer, it should have the same semantics as a load/store to any other address--if something is there it works normally, if the memory is unmapped e.g. for typical Linux x86 programs you get SIGSEGV (just as you would for accessing any other unmapped address).
fooker 262 days ago [-]
Okay, you get half of the story.
Suppose now, there are two architectures with slightly differing behavior.
Can the compiler still optimize signed x + 1 > x to true?
jay-barronville 263 days ago [-]
> Suppose we want to make division by zero and null pointer dereference defined.
A good example is WebAssembly*—address 0x00000000 is a perfectly fine and well-defined address in linear memory. In practice though, most code you’ll come across targeting WebAssembly treats it as if dereferencing it is undefined behavior.
* Of course WebAssembly is a compiler target rather than a language, but it serves as a good example of the point you’re making.
bigstrat2003 263 days ago [-]
> UB in the language specification allows compilers to optimize based on the assumption that the programs you write won't have undefined behavior.
Given that has proven to be a completely false assumption, I don't think there's a justification for compilers continuing to make it. Whatever performance gains they are making are simply not worth the unreliability they are courting.
fooker 263 days ago [-]
> Given that has proven to be a completely false assumption
This part is correct. The problem is in how to deal with this. If you want the compiler to correctly deal with code having undefined behavior, often the only possibility is to assume that all code has undefined behavior. That means, almost every operation gets a runtime branch. That is completely incompatible with how modern hardware works.
The rest is wrong, but again, this is a common misconception. Language designers and compiler writers are not idiots, contrary to popular belief. UB as a concept exists for a reason. It's not for marginal performance boosts, it is to enable any compiler based transformation, and a notion of portability.
grumpyprole 263 days ago [-]
I'm sorry I still don't buy it. Can you please show me a use case where ignoring null pointer or overflow checks makes your product non-viable or uncompetitive?
Some of these checks could be removed by languages with better compilers and likely more restrictions. That is the better approach. As a user, I don't want to run code that is potentially unsafe and/or insecure.
daemin 262 days ago [-]
So the simplest case for not providing a language specification for dereferencing a null pointer is that it requires putting in checks everywhere to detect the condition and then do something in the case where the pointer is null. So what should the null pointer case do then? Something like emit an exception, or send a signal, or call std::terminate to exit the process?
I know that languages like Java have a NullPointerException which they can throw and handle for situations like this, but they're also built on a highly specified virtual machine architecture that is consistent across hardware platforms. This also does not guarantee that your program is safe from crashing when this exception gets thrown, as you have to handle it somewhere. For something as general as this it will probably be in the Main function, so you might as well let it go unhandled as there's not that much you can do at that point.
For a language like C++ it is simpler, easier, and I would argue more correct, to just let the hardware handle the situation, which in this case would trigger a memory error of trying to access invalid memory. As the real issue is probably somewhere else in the code which isn't being handled correctly and the bad data is flowing through to the place where it accesses the null pointer and the program crashes.
To add to that in a lot of cases the program isn't crashing while trying to access address 0, it's crashing trying to access address 200, or 1000, or something like that, and putting in simplistic checks isn't going to catch those. You could argue that the check should guard against accessing the lowest 1k of memory, but then when do you stop, at 64k? Then you have an issue with programs that must fit within 1k of memory.
Leaving it unspecified is the better choice.
fooker 263 days ago [-]
It's not ignoring about ignoring null pointer or overflow checks, it's having to necessarily insert these checks everywhere.
grumpyprole 263 days ago [-]
We should build compilers that insert these checks for us (if they cannot statically determine them unnecessary). The ability to omit these checks doesn't IMHO justify undefined behaviour.
fooker 263 days ago [-]
Well, good news is that you have optional modes in most compilers that do this.
You would not want to force these by default, nobody wants it. You can not statically determine them unnecessary in for the vast majority of code, even stuff as simple as `print(read(a) + read(b))`.
dralley 263 days ago [-]
And yet somehow languages such as Rust, which have no UB (in the safe subset) manages to be within 5% of C and often faster in both real-world codebases and microbenchmarks.
fooker 262 days ago [-]
It’s just a change in jargon for ‘marketing’ reasons.
For example: Rust will silently wrap signed integers in release mode even when it’s considered a bug and crashes in debug mode.
rcxdude 262 days ago [-]
That is pretty much the only example where there's a compromise between performance and correctness as a difference between release and debug mode, and note that it's a) not undefined behaviour and b) does not violate any of rust's safety guarantees.
Every other example you mention is done by rust in release mode and the performance impact is minimal, so I would say it's a good counterexample to your claims that defining these things would hamstring performance (signed integer overflow especially is an obvious no-brainer for defining. Note that doesn't necessarily mean overflow checks! Even just defining the result precisely would remove a lot of footguns).
Slyfox33 262 days ago [-]
Signed overflow is not UB in rust. That's not the same thing at all.
fooker 262 days ago [-]
It’s not.
You have missed my point.
sixfiveotwo 262 days ago [-]
[dead]
samatman 263 days ago [-]
Zig, a language which is explicitly aimed at the same domain as C, has an improved semantics for all of these things.
If a pointer can be null, it must be an optional pointer, and you must in fact check before you dereference it. This is what you want. Is it ok to write a program which segfaults at random because you didn't check for a pointer which can be null? Of course not. If you don't null-check the return value of e.g. malloc, your program is invalid.
But the benefit is in the other direction. Careful C checks for null before using a pointer, and keeping track of whether null has been checked is a manual process. This results in redundant null checks if you can't statically prove (by staring at the code and thinking very hard) that it isn't null. So in practice you're likely to have a combination of not checking and getting burned, and checking a pointer which was already checked. To do otherwise you have to understand the complete call graph, this is infeasible.
Zig doesn't do any of this. If it's a pointer, you can safely dereference it. If it's an optional pointer, you must check, and then: it's a pointer. Safe to pass down the call stack and freely use. If you want C behavior you can always YOLO and just say `yoloptr.?.*`.
Overflow addition and divide by zero are safety checked undefined behavior, a critical concept in the specification. They will panic with a stack trace in debug and ReleaseSafe mode, and blow demons out of your nose in ReleaseFast and ReleaseSmall modes. There's also +% for guaranteed wraparound twos-complement overflow, and +| for saturating addition. Also `@addWithOverflow` if your jam is checking the overflow bit. Unwrapping an optional without checking it is also safety-checked UB: if you were wrong about the assumption that the payload carries a value, you'll get a panic and stack trace on the line where you did `yolo.?`.
Shift operations require that the right hand side of the shift be a type log2(Type.bitwidth) of the left hand side. Zig allows integers of any width, so for a: u64, calling a << b requires that b be a u6 or smaller. Which is fine: if you know values will be within 0..63, you declare them u6, and if you want to shift on a byte, you truncate it: you were going to mask it anyway, right? Zig simply refuses to let you forget this. Addition of two u6 is just as fast as addition of the underlying bytes because of, you got it, safety-checked undefined behavior. In release mode it will just do what the chip does.
There's a common theme here: some things require undefined behavior for performance. Zig does what it can to crash your program if that behavior is exhibited while you're developing it. Other things require that you take some well-defined actions or you'll get UB: Zig tracks those in the type system.
You'll note that undefined behavior is very much a part of the Zig specification, for the same reasons as in C. But that's not a great excuse to make staying within the boundaries of defined behavior as pointlessly difficult as it is in C.
fooker 263 days ago [-]
Yes, you can surely improve things from C. C is not a benchmark for anything other than footguns per line of code.
The debug modes you mention are also available in various forms in C and C++ compilers. For example ASan and UBSan in clang will do exactly what you have described. The question is, then whether these belong in the language specification or left to individual tools.
pjmlp 262 days ago [-]
As proven multiple times throughout the computing history, individual tools are optional, and as such used less often than they actually should be.
Language specification is unavoidable when using said language.
fooker 262 days ago [-]
Have you wondered why Rust or Python do not have a specification?
For a bunch of languages outside the C-centric world, specifications don't exist.
Documentation and specification are not the same things.
The intuitive distinction is that the second one is for compiler/library developers, and the former is for users.
A specification can not leave any room for ambiguity or anything up to interpretation. If it does (and this happens), it is treated as a bug to be fixed.
lstodd 262 days ago [-]
mwahahaha. as if there is some divine "language specification" which all compilers adhere to on pain of eternal damnation.
no such thing ever existed.
pjmlp 262 days ago [-]
Given that one can write Fortran in any language, maybe you're right.
rcxdude 262 days ago [-]
it's not just in debug modes. It should be the standard in release mode as well (IMO the distinction shouldn't exist for most projects anyway). ASan and UBSan are explicitly not designed for that.
samatman 262 days ago [-]
Worth noting that Zig has ReleaseSafe, which safety-checks undefined behavior while applying any optimizations it can given that restriction.
The more interesting part is that the mode can be individually modified on a per-block basis with the @setRuntimeSafety builtin, so it's practical to identify the performance-critical parts of the program and turn off safety checks only for them. Or the opposite: identify tricky code which is doing something complex, and turn on runtime safety there, regardless of the build status.
That's why this sort of thing should be part of the specification. @setRuntimeSafety would be meaningless without the concept of safety-checked undefined behavior.
I would say that making optionals and fat pointers (slices) a part of the type system is possibly more important, but it all combines to give a fighting chance of getting user-controlled resource management correct.
Given the topic of the Fine Article, it's worth briefly noting that `defer` and `errdefer` are keywords in Zig. Both the test allocator, and the GeneralPurposeAllocator in safe mode, will panic if you leak memory by forgetting to use these, or rather, forget to free allocations generally. My impression is that the only major category of memory bugs these tools won't catch in development is double-free, and that's being worked on.
fooker 262 days ago [-]
Well, give it a try.
If you can make it work in a way that has acceptable performance characteristics, every systems language will adopt your technique overnight.
rcxdude 262 days ago [-]
I use rust, which already does this.
fooker 262 days ago [-]
Signed overflow is officially a 'bug' in rust, it traps in debug mode but silently follows LLVM/platform behavior in release mode.
Huh, doesn't that sound familiar?
steveklabnik 262 days ago [-]
> silently follows LLVM/platform behavior
This is not the case. It's two's compliment overflow.
Also, since we're being pedantic here: it's not actually about "debug mode" or "release mode", it is tied to a flag, and compilers must have that flag on in debug mode. This gives the ability to move release mode to also produce the flag in the future, if it's decided that the overhead is worth it. We'll see if it ever is.
> Huh, doesn't that sound familiar?
Nope, it is completely different from undefined behavior, which gives the compiler license to do anything it wants. These are well defined semantics, the polar opposite of UB.
fooker 260 days ago [-]
>This is not the case. It's two's compliment overflow.
Okay, here is an example showing that rust follows LLVM behavior when the optimizer is turned on. LLVM addition produces poison when signed wrap happens. I'm a little bit puzzled about the vehement responses in the comments wow. I have worked on several compilers (including a few patches to Rust), and this is all common knowledge.
> nuw and nsw stand for “No Unsigned Wrap” and “No Signed Wrap”, respectively. If the nuw and/or nsw keywords are present, the result value of the add is a poison value if unsigned and/or signed overflow, respectively, occurs.
Note that Rust produces `add`. The C++ produces `add nsw`. No poison in Rust, poison in C++.
Here is an example of these differences producing different results, due to the differences in behavior:
https://godbolt.org/z/Gaonnc985
This is because in Rust, the wrapping behavior means that this will always be true, but in C++, because it is UB, the compiler assumes it will always be false.
> I'm a little bit puzzled about the vehement responses in the comments wow.
You are claiming that Rust has semantics that it was very, very deliberately designed to not have.
samatman 261 days ago [-]
Rust includes a great deal of undefined behavior, unlocked with the trustme keyword. Ahem, sorry, unsafe. If only...
So if we're going to be pedantic, it's safe Rust which has defined semantics for basically everything. A considerable accomplishment, to be sure.
steveklabnik 261 days ago [-]
While this is true, we’re talking about integer overflow. That’s part of safe Rust. So it’s not really germane to this conversation.
pjmlp 262 days ago [-]
Even languages like Modula-2 and Ada, among others, had better semantics than C, but they didn't come for free alongside UNIX.
rperez333 263 days ago [-]
I know nothing about Zig, but this is pretty interesting and looks well designed. Linus was recently very mad when someone suggested a new semantics for overflow:
——
I'm still entirely unconvinced.
The thing is, wrap-around is not only well-defined, it's common, and
EXPECTED.
> The thing is, wrap-around is not only well-defined, it's common, and EXPECTED.
No, it's really not. Do this experiment: for the next ten thousand lines of code you right, every time you do an integer arithmetic operation, ask yourself if the code would be correct if it wrapped around. I would be shocked if the answer was "yes" in as much as 1% of the time.
(The most recent arithmetic expression I wrote was summing up statistics counters. Wraparound is most definitely not correct in that scenario! Actually, I suspect saturation behavior would be more often correct than wraparound behavior.)
This is a case where I think Linus is 100% wrong. Integer overflow is frequently a problem, and demanding the compiler only check for it in cases where it's wrong amounts to demanding the compiler read the programmer's mind (which goes about as well as you'd expect). Taint tracking is also not a viable solution, as anyone who has implemented taint tracking for overflow checks is well aware.
cozzyd 262 days ago [-]
It depends heavily on context.
For the kernel, which deals with a lot of device drivers, ring buffers, and hashes, wraparound is often what you want. The same is likely to be true for things like microcontroller firmware and such.
In data analysis or monte carlo simulations, it's very rarely what you want, indeed.
There are definitely cases where wraparound behavior is correct. There are also cases hard errors on overflow isn't desirable (say, statistics counters), but it's still hard to call wraparound the correct behavior (e.g., saturation would probably work better for statistics than wraparound). There are also cases where you could probably prove that overflow can't happen. But if you made the default behavior a squawk that wraparound occurred, and instead made developers annotate all the cases where that was desirable to silence the squawk, even in the entire Linux kernel, I'd suspect you'd end up with fewer than 1000 places.
This is sort of the point of the exercise--wraparound behavior is often what you want when you think about overflow, but you actually spend so much of your time not thinking about it that you miss how frequently wraparound behavior isn't what you wanted.
cozzyd 262 days ago [-]
I think wraparound generally is better for statistics counters like the ones in the linked code, since often you want to check the number of packets/errors per some time interval, which you can do with overflow (as long as the time interval isn't so long that you overflow within a period) but not with saturation.
samatman 262 days ago [-]
I think it's critical that we do annotate it as a special multiply.
If wraparound is ok for that particular multiplication, tell the compiler that. As a sibling comment says, this is seldom the case, but it does happen, in particular, expecting byte addition or multiplication to wrap around can be useful.
The actual expectation of the vast majority of arithmetic in a computer program is that the result will be correct in the ordinary schoolyard sense. While developing that program, it should absolutely panic if that isn't the case. "Well defined" doesn't mean correct.
I don't understand your objection to spelling that `val *% GOLDEN_RATIO_32` is. When someone sees that (especially you, later, coming back to your own code) it clearly indicates that wrapping is expected, or at least allowed. That's good.
bregma 262 days ago [-]
Unsigned integer overflow is not undefined in C or C++. You can rely on how it works.
Signed integer overflow, on the other hand, is undefined. The compiler is allowd to assume it never happens and can re-arrange or eliminate code as it sees fit under that assumption.
How many lines will this code print?
for (int i = INT_MAX-1; i < 0; ++i) printf("I'm in danger!\n");
kimixa 263 days ago [-]
I feel the meme of "Undefined Behavior" has been massively exaggerated on the internet - the vast majority of examples appear to be extreme toy examples using the weirdest contrived constructs, or things that are expected to fault and you're already using platform-specific information to know what that would look like (e.g. expecting a segmentation fault). It's a Scary Boogyman That Will Kill You, not something that can be understood, managed, and avoided if necessary.
And even then there are tools to help define much of that - if you want well defined wrapped signed integers, great. If you want to trap on overflow, there's an option for that. Lots of compiler warnings and other static analysis tools (that would just be default-rejected by the compiler today if it didn't have historical baggage, but they exist and can be enabled to do that rejection).
Yes, there's many issues with the ecosystem (and tooling - those options above should be default IMHO), but massively overstating them won't actually help anyone make better software.
And other languages often have similar amounts of "undefined behavior" - but just don't document it as such, relying on a single implementation being "Defined Correct", and hope they're not actually being relied on if anything changes. Just like C, only undocumentated.
adrianN 263 days ago [-]
I don't feel like the cause of most (all?) memory safety bugs has been "massively exaggerated".
kimixa 263 days ago [-]
If you removed every case of "Undefined Behavior" from the C spec, you'd still have memory safety bugs. Because they're orthogonal (though may be coupled if they come from the same core logic error).
This is what I mean by it becoming "meme" - things like "Undefined Behavior" or "Memory Safety" have become a discussion-ending "Objective Badness", hiding the real intent - being "Languages I Do No Like" (or, most often, are a poor fit for the actual job I'm trying to do. Which is fine, but not rejecting that those jobs actually exist).
But they mean real things that we can improve in terms of software quality, and safety - but that's rarely the intended result when those terms are now brought up. And many things we can do right now with existing systems to improve things, to not throw away huge amounts of already well-tested code. To do a staged improvement, and not let "perfect" be the enemy of better.
adrianN 262 days ago [-]
I suppose there are ways to make the undefined behavior defined that preserve memory unsafety, so you’re technically correct. In practice one would probably require safe crashes for OOB access etc.
actionfromafar 262 days ago [-]
I can give an example on how to remove all undefined behaviour and preserve memory unsafety. First, we decide that all compilers compile to a fixed instruction set running on a CPU with a fixed memory model. Just pick one of the existing ones, like a 68000 or a 80486DX. Then, we decide that all unitialized memory is actually 0, always, from the operating system and the allocator. That should go pretty far or am I missing something?
throwaway2037 262 days ago [-]
> You don't want C, you want a language that actually gives defined semantics to all combinations of language constructs.
So, Zig?
sixfiveotwo 262 days ago [-]
Well, perhaps a subset of it, since it also introduces concepts that do not exist in C (eg. exceptions).
samatman 262 days ago [-]
Zig does not have exceptions, what it has is error sets. It uses the words try and catch, which does cause confusion, but the semantics and implementation are completely different.
If a function has an error type (indicated by a ! in the return type), you have a few options. You can use `result = try foo();`, which will propagate the error out of the function (which now must have ! in its signature). Or you can use `result = foo() catch default;` or `result = foo() catch unreachable;`. The former substitutes a default value, the latter is undefined behavior if there's an error (panic, in debug and ReleaseSafe modes).
Or, just `result = foo();` gives `result` an error-union type, of the intended result or the error. To do anything useful with that you have to unwrap it with an if statement.
It's a different, simpler mechanism, with much less impact on performance, and (my opinion) more likely to end up with correct code. If you want to propagate errors the way exceptions do, every function call needs a `try` and every return value needs a ! in the return type. Sometimes that's what you need, but normally error propagation is shallow, and ends at the first call which can plausibly do anything about the error.
sixfiveotwo 261 days ago [-]
Thank you for your input, I stand corrected. So as I understand it, it works somewhat like the result type of rust (or ocaml), or the haskell either type, but instead of being parameterized, it is extensible, isn't it?
samatman 261 days ago [-]
More like that, yes. Rust has two general-purpose mechanisms, generics and enums, which are combined to handle Optional and Result types. Zig special-cases optional types with `?type` (that is literally the type which can be a type or null), and special-cases errors with `!`. Particularly with errors, I find this more ergonomic, and easier to use. Exceptions were right about one thing: it does often make sense to handle errors a couple call frames up the stack, and Zig make that easy, but without the two awful things about exceptions: low-performance try blocks, and never quite knowing if something you call will throw one.
It also has tagged unions as a general mechanism for returning one of several enumerated values, while requiring the caller to exhaustively switch on all the possibilities to use the value. And it has comptime generics ^_^. But it doesn't use them to implement optionals or errors.
einpoklum 263 days ago [-]
You don't necessarily want that. Forcing language-defined semantics on everything costs performance. Sorry, it just does, we can't have it all. So, you can sacrifice performance for well-defined'ness, or you can choose not to - and the choice depends on the language _design goals_. As the design goals differ, so do the combinations of choices made for syntax and semantics.
bigstrat2003 263 days ago [-]
I think pretty much any amount of performance is worth sacrificing in order to get rid of the gnarly things UB can cause. Correctness is the first and most important thing in programming, because if you can't be certain it works then it's not very useful.
einpoklum 261 days ago [-]
It may be worth it _for you_. It is not worth it _for others_.
Correctness can be established well enough - even if guaranteed automatically - in a language with UB.
planede 262 days ago [-]
How do you define a buffer overflow?
gpderetta 263 days ago [-]
> a simple statement like `a = b++;` can mean multiple constructors being called, hidden allocations, unexpected exceptions [...]
The difference is that in C++ it's expected that you'll overload operators, provide implicit conversions and throw exceptions. Of course you can write terrible code in C but it is not common accepted practice to hide a longjmp in a macro disguised as an identifier.
pjmlp 262 days ago [-]
Indeed, you hide longjump in a #define macro instead, with a do while block trick.
jay-barronville 263 days ago [-]
The funny thing is, examples of macro craziness only strengthen my point, because C++ inherits all of that in addition to its hidden behaviors and magical semantics. It’s rare to find serious C code doing a lot of crazy things behind macros. In my experience, the few exceptions I can think of include the GMP library and data structure-related code trying to emulate generics (mostly hash tables).
gpderetta 263 days ago [-]
Yes, C++ is a larger language for sure. But because it has better abstraction facilities, macro hackery is less common.
gpderetta 263 days ago [-]
pthread_cleanup_{push,pop}
jay-barronville 263 days ago [-]
Haha. You can’t be serious—what’s the likelihood of running into C code like this in anything remotely serious (compared to the millions upon millions of lines of innocent-looking C++ code that does like a dozen different things under the hood)?
gpderetta 263 days ago [-]
No true Scotsman.
I assume you haven't looked at the expansion of errno lately?
that's a deliberately unfair comparison. operator overloading, constructors, assignments, etc. happen "under-the-hood" in c++ and are standard language features.
whereas you can see the user-defined macro definition of "b" at the top of the file. you can't blame the c language for someone choosing to write something like that. sure it's possible, but its your choice and responsibility if you do stupid things like this example.
gpderetta 262 days ago [-]
Macros are also standard C features, and good luck figuring out that an identifier is a macro without IDE help when the definition is buried in some header.
fargle 262 days ago [-]
what you say is partially true (you can also of course use -E to check macros) but:
- macros are also standard C++ features too, so this point doesn't differentiate between those languages
- i'm failing to adequately communicate my point. there's a fundamental difference practically and philosophically between macro stupidity and C++ doing things under-the-hood. of course a user (you, a co-developer, a library author you trusted) can do all sorts of stupid things. but it's visible and it's written in the target language - not hard-coded in the compiler.
yes - sure, good luck finding the land-mine "b" macro if it was well buried. but you can find it and when you do find it, you can see what it was doing. you can #undef it. you can write your own version that isn't screwed up, etc.
you can do none of those things for operations in c++ that occur automatically - you can't even see them except in assembly.
gpderetta 262 days ago [-]
> there's a fundamental difference practically and philosophically between macro stupidity and C++ doing things under-the-hood. of course a user (you, a co-developer, a library author you trusted) can do all sorts of stupid things. but it's visible and it's written in the target language - not hard-coded in the compiler
I specifically reject this. Constructors, exceptions, and so on are as similarly visible at the source level as macro definitions.
And thanks to macros, signal handling, setjmp, instrumentation, hardening, dynamic .so resolution, compilers replacing what look like primitive accesses with library functions, any naïve read of C code, is, well, naïve.
I'm not claiming C++ superiority here [1], I'm trying to dispel the notion that C is qualitatively different from C++ form a WYSIWYG point of view, both theoretically and in practice.
[1]although as I mentioned else, other C++ features means that macros see less use.
fargle 262 days ago [-]
to be clear, i'm neither defending nor bashing either language. i use and like both as appropriate. and it's fine to disagree, btw. please do not read "good" or "bad" into my attempt to describe either.
but i will also emphatically reject your position: "Constructors, exceptions, and so on are as similarly visible at the source level as macro definitions"
no they are not. you can certainly see what the macro is doing - you see it's definition, not just it's existence. whereas in c++ you have to trust that language/compiler to:
- build a vtable (what exactly does this look like?)
- make copy ctors
- do exception handling.
- etc.
none of these are explicit. all of them are closed and opaque. you can't change their definition, nor add on to it.
at issue at hand is both "magic" and openness. c gives relatively few building blocks. they are simple (at least in concept). user libraries construct (or attempt to construct) more complex idioms using these building blocks. conversely c++ bakes complex features right into the language.
as you note, there are definitely forces that work against the naïve original nature of c. macros, setjmp, signal handling, instrumentation, hardening, .so resolution, compilers replacing primitive accesses, etc. but all of those apply equally to c and c++. they are also more an affect of the ABI and the platform/OS than either language. in short, those are complaints and complexities due to UNIX, POSIX, and other similar derived systems, not c or c++ the language itself.
c has relatively few abstractions: macros, functions, structured control flow, expressions, type definitions. all of these could be transformed into machine code by hand, for example in a toy implementation. sure a "good" compiler and optimizer will then mangle that into something potentially unrecognizable, but it will still nearly always work the way that the naïve understanding would. that's why when compilers do "weird" things with UB, it gets people riled up. it's NOT what we expect from c.
c++ on the other hand has, in the language itself, many more abstractions and they are all more complex. you aren't anywhere near the machine anymore and you must trust the language definition to understand what the end effect will be. how it accomplishes that? not your problem. this makes it squarely a high-level language, no different than java or python in that facet.
i explicitly reject your position that "that C is qualitatively [not] different from C++ from a WYSIWYG point of view, [either] theoretically [or] in practice."
to me, it absolutely is. it represents at lower level interface with the system and machine. c is somewhere between a high-level assembler and a mid-level language. c++ is truly high-level language. yes, compilers and os's come around and make things a little more interesting than the naïve view of c in rare cases . but c++? everything is complex - there is not even workable illusion of simplicity. to me this is unfortunate because, c++ is still burdened by visible verbosity, complexities, land-mines, and limitations due to the fact that it is probably not quite high-level enough.
this is all very long winded. you and many other readers might think i'm wrong. the reason i'm responding is not to be argumentative, but because it is that it's by no means a "settled" question and there are certainly also plenty of people that see it a very different way. which i think is fine.
theeandthy 263 days ago [-]
Agreed 100%. C is what it is and that’s a good thing.
However, if I were to request a feature to the core language it would be: NAMESPACES. This would clean up the code significantly without introducing confusing code paradigms.
hgs3 263 days ago [-]
Namespaces are nice, but to my knowledge require name mangling which isn't a thing in C. I'm curious what you mean by "clean up the code significantly" and "confusing code paradigms" because in C you typically prefix your functions to prevent name collisions which isn't confusing or too noisy in my subjective opinion.
pjmlp 262 days ago [-]
Name mangling is an implementation detail to fit into UNIX linker design space, not the same approach as other compiled languages with modules, with their own linker.
gpderetta 262 days ago [-]
Also name mangling (which in this case would simply be appending the namespace name to the identifier) would be trivially implementable in C.
In fact on some targets the assembler name of identifiers doesn't always match the C name already.
Although as someone almost always explicitly qualifies names,
typing foo_bar is not very different from foo::bar; the only minor advantages are that you do not have to use foo:: inside the implementation of foo itself and the ability to use aliases.
planede 262 days ago [-]
> which in this case would simply be appending the namespace name to the identifier
surely not. How do you differentiate these two functions?
void fooN(void);
namespace N { void foo(void); }
gpderetta 262 days ago [-]
[I meant to write prepend, but that doesn't change the argument]
You would mangle it as something like foo$N depending on the platform.
theeandthy 263 days ago [-]
Yeah you’re right. I guess folks who want C++ stuff should just use C++…
I guess I should have reworded. I don’t expect that feature in C, but if I were to reinvent C today I would keep it the same but add namespace and mangling.
Adding an explicit prefix to every function call is a lot boilerplate when it’s all added up.
riku_iki 262 days ago [-]
> a simple statement like `a = b++;` can mean multiple constructors being called, hidden allocations, unexpected exceptions, unclear object hierarchies, an overloaded `++`, an overloaded `=`, etc.
its just mean if you need that logic, in C you would write lots of verbose less safe code.
tsegratis 263 days ago [-]
wishlist
1) labels as values in standard
2) control over memory position offsets, without linker script
other than that a few more compiler implementations offering things like checked array bounds, and a focus on correctness rather than accepting the occasional compiler bug
the rough edges like switch fallthrough are rough, but easy to work around. They don't need fixing (-pedantic fixes it already, etc)
maybe more control over assembly generation, such as exposing compilation at runtime; but that is into the wishful end of wishlists
pjmlp 262 days ago [-]
Only if you mean C as defined by K&R C, and its original use when porting UNIX.
throwawaymaths 262 days ago [-]
How about instead of RAII built into the compiler, you define destructor functions for each datatypes that you care to and have a sidecar or compiler plugin or hell even a linter check that those destructors have been called when a variable of that type goes out of scope?
If you miss a destructor event, without configuring the addon "yes I really meant that", the addon halts the compilatoin at best, or returns nonzero for ci at worst.
paulddraper 262 days ago [-]
And then what if you combined the compiler+linter together.
throwawaymaths 262 days ago [-]
Sure, as long as you don't put it in the type system.
uecker 262 days ago [-]
This would also be my preferred solution.
potbelly83 263 days ago [-]
I'm confused why are they trying to implement name mangling in C? Are they trying to use the C++ compiler to implement the RAII assembly code and then link that back into C? Wouldn't a smarter approach be to do a C version of what C++ does?
nicebyte 263 days ago [-]
they're not. they're saying "if we had constructors in c, we'd need a mechanism to allow multiple constructors for the same type". in c++ function overloading and mangling are used to get that, but it's far from the only way something like that could be achieved. imo that whole part could be removed, it's like a little distraction and doesn't really have anything to do with the core of their argument.
potbelly83 263 days ago [-]
thanks! appreciate the reply
jay-barronville 263 days ago [-]
I thought the author covered that pretty well. How would you make sure that function calls and object lifetimes are managed correctly/deterministically while also remaining compatible with existing C++ code and compilers without having to have `extern "C"` everywhere?
Edit: I just reread this comment and realized the beginning of it could come across as a bit condescending even though that wasn’t at all my intention. I’d edit it out, but I don’t like doing that, so my apologies if it did come across that way!
Xeamek 263 days ago [-]
Is RAII even wanted?
I mean, in the name, 'Resource acquisition is initialization' it talks about the initialization part.
But while not super versed in cpp, it looks like what everybody wants is actually the de-initialization part, which doesn't seem to be inherent to RAII, no?
It's a bit confusing to have a 'thing' mention one mechanism in its name, but actually being valuable by ensuring some other mechanism
susam 263 days ago [-]
> It's a bit confusing to have a 'thing' mention one mechanism in its name, but actually being valuable by ensuring some other mechanism.
Indeed! When I was first learning C++, I found the term "RAII" quite confusing too. However, after years of experience with this term, associating "RAII" with its intended meaning has become second nature.
Having said that, there is at least one way to make better sense of "RAII" and that is considering the fact that in RAII, holding a resource is a class invariant. The resource is acquired during construction (initialisation) and released during destruction (which happens automatically when the object of the class goes out of scope). Throughout the object's lifetime, from construction to destruction, maintaining possession of the acquired resource is an invariant condition.
Although sounds simple in principle, this can get complicated pretty quickly, especially in the implementation of the copy assignment operator where we may need to carefully delete an existing resource before copying the new resource received by the operator. Problems like this led to formulating more techniques for carefully managing the resources while satisfying the class invariant. One such technique is the copy-and-swap idiom.
None of this is meant to justify the somewhat arbitrary term though. In fact, there are at least two better alternative names for RAII: Scope-Based Resource Management (SBRM) and Constructor Acquires, Destructor Releases (CADR).
mjevans 262 days ago [-]
CADR is far clearer. Just like re-ordering it as Resource Initialization Is Acquisition.
justincredible 262 days ago [-]
[dead]
legobmw99 263 days ago [-]
I feel like most RAII fans will openly admit that it’s the worst name in the world for such an idea. The idea is that any time you acquire a resource, you should initialize an object with said resource. But I guess “resource allocation should always lead to object initialization” is too long.
The reason to do this is precisely so that the resource can be cleaned up at destruction of the object. So even if you had an acronym like RASALTOI, it would still probably be misleading
NekkoDroid 263 days ago [-]
RAII is the worst name they could have selected and C++ devs openly admit that. Its more SBRM, Scope Based Resource Management.
rqtwteye 262 days ago [-]
I always find it interesting to see calls to mallow and free in C++ code. I don't recall using malloc/free in many years in C++. It's always new/delete.
262 days ago [-]
ezoe 262 days ago [-]
I really don't understand why the author is so mad at these armchair professionals who think they know better.
You are proposing to change the C language. The risk is great even the smallest change will break the existing code. If you can't convince all of the stakeholders, it's better not to change it. Keep the status-quo.
wizzwizz4 262 days ago [-]
You underestimate the sheer diversity of existing C compilers. A specification change is not a significant risk to existing code, compared to even clang v.s. gcc compatibility issues.
wrs 263 days ago [-]
>The utterly pervasive and constant feeling that a lot of people – way too many people – are really trying to invent these things from first principles and pretend like they were the first people to ever conceive of these ideas… it feels pretty miserable, all things considered. Going through life evaluating effectively no prior art in other languages, domains, C codebases as they exist today, just… anything.
Oh man, I hear ya. And in a lot more domains than computer language design. Is it inexperience? Impatience? The tendency for search results to be filled with low-quality and high-recency content? The prioritization of hot-take blog posts and Reddit comments over books?
astral303 262 days ago [-]
Sad that C is still being utilized with a serious face. If you can't be bothered to develop in C++ and only pay for what you use, RAII is like your last problem.
tester756 261 days ago [-]
>There is no wibbly-wobbly semantics like .NET IL finalizers
there is dedicated mechanism to achieve RAII-likeness in .NET: try-finally construct
neonsunset 261 days ago [-]
It's funny this is mentioned. Misconceptions about .NET are unfortunate but unsurprising.
There is no such thing as IL finalizers. There are object finalizers which are highly discouraged to be used on their own.
Their most frequent application is a safety measure for objects implementing IDisposable where not calling Dispose could lead to memory leak or other form of resource starvation that must be prevented.
For example, a file handle is IDisposable, so it is naturally disposed through using statement but should a user make a mistake in a scenario where that handle has non-trivial lifecycle, once the object is no longer referenced, its finalizer will be called upon one of the Gen2 GCs by a finalizer thread, preventing the file handle leakage even if its freeing is now non-deterministic:
// Disposed upon exiting the scope
using var okay = File.OpenHandle("file1");
// IDEs will complain if you do this, but if you insist,
// the implementation will prevent you from shooting your
// foot off even if it'll hurt until next Gen2 GC
var leaked = File.OpenHandle("file2");
If you're reading these comments (and maybe contributing to them) about various arcane details of C++ and the differences with C, there's one thing you're not doing ...
... actually writing code that gets the job done ... in C++.
julian_t 262 days ago [-]
Is there a law, like Betteridge's Law, that says "the answer to any question that says 'why not just' is 'it isn't as simple as that'"?
wakawaka28 263 days ago [-]
Because you can just use actual C++ lol...
lionkor 262 days ago [-]
C has scopes. Add destructors. That's pretty much all you need to get most of the benefits of RAII.
You can add `defer` instead, but regardless, this has nothing to do with C++. You can implement safety features without having to copy the arguably worst language in the world, C++. I like C++, I wrote many larger projects in it, but it sucks to the very core. Just add RAII to C.
leduyquang753 262 days ago [-]
Did you even read the article? The part where it talks about problems with destructors is about halfway through.
masklinn 262 days ago [-]
TBF the essay is rather strangely structured, the entire two thirds of the essay covering constructors and overloading has only ancillary relevance to the actual problem, Rust has neither and does RAII just fine after all (though it does have name mangling).
The author even acknowledges halfway through that it’s basically a strawman:
> It’s not a bad argument; after all, the entire above argument hinges on the idea of stealing from C++ entirely and copying their semantics bit-for-bit.
To me, only after that does it engage with the underlying concept in a way which is engaging and convincing. But you’ve had to trawl through 2500 words to get to that point.
lionkor 262 days ago [-]
They assume C++-like destructors. Other languages, like Zig, do a good job with syntax like `defer`
masklinn 262 days ago [-]
They don’t “assume” C++-like destructor, they’re the primary author of the N3199 “defer” proposal for C.
This is a response to people contacting / criticising them asking for destructors instead of defer.
jcranmer 262 days ago [-]
The entire point of the blog post (written by the author of the C defer proposal) is to motivate why C should have defer. It is an attempt to summarize one of the most common criticisms of the proposal.
yason 262 days ago [-]
RAII is just automation and semantic sugar for something like this (or the equivalent set of goto labels that do the freeing at the end of the function):
{
void *buffer = malloc(SIZE_MAX);
if (buffer) {
if (!do_stuff(buffer)) {
free(buffer);
return;
}
more_stuff(buffer);
free(buffer);
}
return;
}
If you wanted something like that in C it doesn't need to emulate C++ style RAII with classes and strongly typed constructors. It could look like something like, for example, where you just define pairs of allocator and free functions:
allocdef void *autobuffer(malloc, free);
...
{
autobuffer buffer(SIZE_MAX);
if (buffer) {
if (do_stuff(buffer)) {
return;
}
more_stuff(buffer);
}
return;
}
The implementation would effectively be a Lisp style macro expansion encoded in the C compiler (or preprocessor) that would just basically write out the equivalent of the first listing above.
Mesopropithecus 262 days ago [-]
In the second example, buffer is still a pointer? If so, when does free run, and who decides that? When buffer goes out of scope, could do_stuff store the pointer some place else?
I find this an interesting thought experiment, basically types that you'd opt in to RAII. Just have a feeling that you'll need to define some notion of ownership to make it work.
Of course, but if you bothered at all to understand the constraints, you would have seen it is not actually that simple in our case.
And my project was several orders of magnitude simpler than the C standard.
My sides. This is the most hilarious and accurate summary of every org where I’ve worked.
Thank you.
1: https://en.wikipedia.org/wiki/Seagull_management
That gave me a good 5 minutes of chuckling and smiling. Thank you.
Good luck explaining that to the A-Team.
At some point we had to wear deodorant and a collared shirt, boom we became engineers.
(Yes, in a large enough corp, export control is a source of a surprisingly large amount of extra work...)
Engineer (thinking): (No, you idiot, there isn't, because it's broken! I told you, all options have been tried, and this was the least painful way of doing it. Yes, it's not the ideal solution, but there's no other way, unless the upstream vendor decides to fix the issue on their end!)
Engineer: thanks, I'll look into it :)
We ended up supporting the brain-dead solution, but that team has now experienced 100% turn over since then.
In all seriousness just accept their advice and see it for what it is. Someone trying to help you with limited view of the scope. As long as they don't impose their view I think your take is extremely bad.
It reads like you had experts giving you advise on how to improve things, and instead not only did you ignored their advise but you went to the extent of mindlessly disparaging their help.
They were doing it just to boost their egos and most of the teams in the company learned to ignore them. When the company ownership changed, the "A-Team" was the first under the chopping block because the new owners correctly saw that the high status they had was simply due to inertia of being first devs at the company and were not fullfiling any meaningful role in the present.
I’ve met dozens that don’t know their head from their ass. And always, always when you describe the problem constraints, they mumble and disappear.
(But at the time, I basically joined what was still essentially a startup just after they had been acquired by a larger company. I think the titles like 'architect' might have come from the larger company, but the competence came from them still being the same people as at the startup.)
We do exist, I promise. ;) But in my case at least, the Eye of Sauron can only keep so many things in sight at a time...
One variant that I think might work even better than RAII or defer in a lot of languages is having a thread local "context" which you attach all cleanup actions to. It even works in C, you just define cleanup as a list of
However, I'm still glad to see defer being considered for C. It's a lot better than using goto for cleanup.Is it actually complicated? There’s only the rule of 0 - either your class isn’t managing resources directly & has none of the 5 default methods defined explicitly (destructor, copy constructor/assignment, move constructor/assingment), or it manages 1 and exactly 1 resource and defines all 5. Following that simple rule gives you exception safety & perfect RAII behavior. Of all the things in C++, it seemed like the most straightforward rule to follow mechanically.
BTW, the rule of 3 is from pre-C++11 - the addition of move construct/move assignment makes it the rule of 5 which basically says if you define any of those default ones you must define all of them. But the rule of 0 is far stronger in that it gives you prescriptive mechanical rules to follow for resource management.
It’s much easier to do RAII correctly in Rust because of the ecosystem of the language + certain language features that make it more ergonomic (e.g. Borrow/AsRef/Deref) + some ownership guarantees around moves unless you make the type trivially copyable which won’t be the case when you own a resource.
It is. There is no point in arguing otherwise.
To understand the problem, you need to understand why it is also a solution to much bigger problems.
C++ started as C with classes, and by design aimed at being perfectly compatible with C. But you want to improve developer experience, and bring to the table major architectural traits such as RAII. This in turn meant you add support for custom constructors, and customize how your instances are copied and destroyed. But you also want to be able to have everything just work out of the box without forcing developers to write boilerplate code. So you come up with the concept of special member functions which are automatically added by the compiler if they are trivial. However, forcing that upon every single situation can cause problems, so you have to come up with a strategy that suits all use cases and prevents serious bugs.
Consequently, you add a bunch of rules which boil down to a) if the class/struct is trivial them compilers simply add trivial definitions of all special member functions s that you don't have to, but once you define any of those special member functions yourself them the compiler steps back and let's you do all the work.
Then C++ introduced move semantics. This refreshes the same problem as before. You need to retain compatibility with C, and you need to avoid boilerplate code, and on top of that you need to support all cases that originated the need for C++'s special member functions. But now you need to support move constructors and move assignment operators. Again, it's fine if the compiler adds those automatically if it's a trivial class/struct, but if the class has custom constructors and destructors then surely you also need to handle moves in a special way, so the compiler steps back and lets you do all the work. On top of that, you add the fact that if you need custom code to copy your objects around, surely you need custom code to move them too, and thus the compiler steps back to let you do all the work.
On top of this, there are also some specific combinations of custom constructors/destructors/copy constructors/copy assignment operators which let the compiler define move constructors/move assignment operators.
It all makes absolutely sense if you are mindful of the design requirements. But if you just start to onboard onto C++ and barely know what a copy constructors is, all these aspects are arcane and sadistic. If you declare nothing then your class instances are copied and moved automatically, but once you add a constructor everything suddenly blows up and your code doesn't even compile anymore. You spotted a bug where an instance of a child class isn't being destroyed properly, and once you add a virtual destructor you suddenly have an unrelated function call throw compiler errors. You add a snazzy copy constructor that's very performant and your performance tests suddenly start to blow up because of the performance hit if suddenly having to copy all instances instead of the compiler simply moving them. How do you sort out this nonsense?
The rule of 5 is a nice rule of thumb to allow developers to have a simple mental model over what they need to do to avoid a long list of issues, but you still have no control over what you're doing. Things work, but work by sheer coincidence.
There is a neater design in rust with its own tradeoffs: destructors are the only special function, move is always possible and has a fixed approach, copying is instead .clone(), assignment is always just a move, and constructors are just a convention with static methods, optionally with a Default trait. But that does constrain you: especially move being fixed to a specific definition means there's a lot you can't model well (self-referential structures), and that's a core part of why rust can have a neater model. And it still has the distinction you are complaining about with Copy, where 'trivial' structures can be copied implicitly but lose that as soon as they contain anything with a destructor or non-trivial .clone().
And in C++ it's pretty easy to avoid this mess in most cases: I rarely ever fully define all 5. If I have a custom constructor and destructor I just delete the other cases and use a wrapper class which handles those semantics for me.
I'm sorry, that is not true at all.
Nothing forces you to add implementations, at least not for all cases. That's only a simplistic rule of thumb that helps developers not well versed on the rules of special member functions (i.e., most) to get stuff to work by coincidence. You only need to add a, say, custom move constructor when you need it and when the C++ rules state the compiler should not generate one for you. There's even a popular table from a presentation from ACCU2014 stating exactly in which condition you need to fill in your custom definition.
https://i.sstatic.net/b2VBV.png
You are also wrong when you assert this has nothing to do with C++'s heritage. It's the root cause of each and every single little detail. Special member functions were added with traits and tradeoffs for compatibility and ease of use, and with move semantics the committee had to revisit everything over again but with an additional layer of requirements. The rules involving default move constructors and move assignment operators are famously nuanced and even arbitrary. There is no way around it.
> There is a neater design in rust (...)
What Rust does and does not do is irrelevant. Rust was a greenfield project that had no requirement to respect any sort of backward compatibility and stability. If there is any remotely relevant comparison that would be Objective-C, which also took a minimalist approach based on custom factory methods and initializes that rely on conventions, and it is a big boilerplate mess.
Well, I don’t know how to respond to this. I clarified what the rules actually are (< 1 paragraph) and following them blindly leads to correct results. You’ve brought in a whole bunch of nonsense about why C++ has become complex as a language - it’s not wrong but I’m failing to connect the dots as to how the rule of 0 itself is hard to follow or complex. I’m kind of taking as a given that whoever is writing the code is mildly familiar enough with C++ to understand RAII & is trying to apply it correctly.
> The rule of 5 is a nice rule of thumb to allow developers to have a simple mental model over what they need to do to avoid a long list of issues, but you still have no control over what you’re doing. Things work, but work by sheer coincidence.
First, as I’ve said multiple times, it’s the rule of 0. That’s the rule to follow to get correct composition of resource ownership & it’s super simple. As for not having control, I really fail to see how that is - C++ famously gives you too much control and that’s the problem. As for things working by sheer coincidence, that’s like your opinion. To me “coincidence” wouldn’t explain how many lines of C++ code are running in production.
Look, I think C++ has a lot of warts which is why I prefer Rust these days. But the rule of 0 is not where I’d say C++’s complexity lies - if you think that is the case, I’d recommend you use another language because if you can’t grok the rule of 0, the other footguns that lie in wait will blow you away to smithereens.
So it's not nonsense?
I think GP clearly laid out the base principles that lead to emergent complexity . GP calls this "coincidence" to convey the feeling of lots of complexity just narrowly avoiding catastrophe in a process that is hard to grok for someone getting into C++. GP also gave some scenarios in which the rule of 0 no longer applies and you now simply have to follow some other rule. "just follow the rule" is not very intuitive advice. The rule may be simple to follow but the foundations on which it rests are pretty complicated, which makes the entire rule complicated in my worldview and also that of GP. In your view, the rule is easy to follow therefore simple. Let's agree to disagree on that. Again, being told "you need to just follow this arbitrary rule to fix all these sudden compiler errors" doesn't inspire confidence in ones code, hence (I think) the usage of "coincidence". If I were using such a language, I'd certainly feel a bit nervous and unsure.
I think that's what they said themselves:
>> It all makes absolutely sense if you are mindful of the design requirements. But if you just start to onboard onto C++ and barely know what a copy constructors is, all these aspects are arcane and sadistic
IMO not knowing why something works (in any language) is an unpleasant feeling. Then if you have the chance you can look under the hood, read things - it's exactly why I'm reading this thread - and little by little get a better understanding. That's called gaining experience.
> Again, being told "you need to just follow this arbitrary rule to fix all these sudden compiler errors" doesn't inspire confidence in ones code, hence (I think) the usage of "coincidence"
That's exactly what other languages like Haskell or Rust are praised for. Why does C++ receive a different treatment when it tries to do the same thing instead of crashing on you at runtime, for once?
You making a trivial change, and suddenly there are entire new classes of bugs all over your code is an aspect that does really not receive any praise. People using those two languages work hard on avoiding that situation, and it clearly feels like a failure when it happens.
The part about pointing problems at compile time so the developer will know it sooner is great. And I imagine is the part you are talking about. But the GP was talking about the other part of the issue.
I wouldn't be so dramatic. House of cards don't stay put by coincidence !
But this only allows you at compile time to provide your own stateless global allocator. This is very different in Zig, which has a very strong culture of "if something needs to allocate memory, you pass it a stateful, dynamically dispatched allocator as an argument". You COULD do that in C, but virtually nobody does.
I've written reasonable amounts of both, and it's just different. For instance, in Zig, you can create a HashMap using a FixedBufferAllocator, which is a region of memory (which can be stack allocated) dressed up as an allocator. You can also pass it an arena and free all at once, or any other allocator in the standard library, or implemented by you, or anyone else. Show me a C library with a HashMap which can do all three of these things. Everything which allocates takes an allocator, third-party libraries respect this convention or will quickly get an issue or PR either requesting or implementing this convention.
Ultimate solution? No, but also, sort of. The ability to idiomatically build a fine-grained memory policy is a large portion of what makes Zig so pleasant to use.
I've started to use simple memory arenas in C and it just feels so damn _nice_.
There's basically a shared lifetime for most of my transient allocations, which are nicely bounded in time by a "frame" of execution. Malloc/Free felt like a crazy amount of work, whereas an arena_reset(&ctx) just moves a pointer back to the first entry.
Another person pointed out that arenas are not destructors, and this is a great point to make. If you're dealing with external resources, moving an arena index back to the beginning does not help - at all.
Like the autorelease pool found in Objective-C of yore? I always liked that solution and sometimes implemented in plain C too.
It is not very complicated at all; just a discipline to follow (or not if you know what you are doing) once learnt - https://en.cppreference.com/w/cpp/language/rule_of_three
Incidentally i use it as 4/6/0 by including the default ctor in the set.
I'm afraid you're complaining about entirely unrelated things.
It's one thing to claim that C++ structs have this or that trait. It's a entirely different thing to try to pin bugs and developer mistakes on how a language is designed.
I think this glances over what structs actually are in C++, and unwittingly portrays them as something different.
Structs in C++ are definitely exact like structs in C. Or they can be, if that's what you're aiming for. If you include a C header file that defines a struct in a C++ program, you build it, and you use instances of that struct to pass them to C programs, everthing just works.
The detail you need to be mindful of is that C structs support a subset of all the features supported by C++ classes, and once you start to use those features C++ also allows implementations to forego some constraints.
If you expect to use a struct in C++ but still define it in a way that you make it include features that are not supported in C then you can't pin that on the language.
https://learn.microsoft.com/en-us/cpp/cpp/trivial-standard-l...
Using C-like structs is a very common use case, to the point that the standard explicitly defines the concept of standard layout and builds upon that to specify the concept of a standard layout type. A struct/class that is a standard layout type, which means it's a POD type, corresponds exactly with C structs. They are explicitly defined in terms of retaining interoperability with other languages.
But yes, if you make extra sure (under threat of footgun) that your struct only has simple types in it and doesn't use virtual or define any ctors/dtors or use protected/private or use inheritance and all of its members follow those rules etc etc, maybe you can treat it like a C struct. But the C++ Standard is telling a different story.
Keep in mind, I'm not blaming you for ignoring all these complications if at the end of the day the compiler seems to give you the behavior you expect. But the fun of C++ is that it's kind of two programming languages in one: the language the Standard defines, and the language the typical programmer thinks it is.
[0] There was std::is_pod, but it was deprecated because it doesn't reflect how the Standard actually defines things. A bit of a cruel joke, dangling that in front of us and then yanking it away.
References:
1) Trivial, standard-layout, POD, and literal types - https://learn.microsoft.com/en-us/cpp/cpp/trivial-standard-l...
2) No more plain old data - https://mariusbancila.ro/blog/2020/08/10/no-more-plain-old-d...
Keep in mind, my original comment was pretty much just drawing a line through TFA, which also argues that you can't cleanly map C++ object concepts onto C structs. C++ has some backwards compatibility with C obviously but nowadays it's a totally separate language with an independent standards body (for better or worse). Specifying "do what C does" might have flown in 1998 but that changed a long time ago.
I am fully with Stroustrup in arguing that C++ should strive for as much compatibility with C as possible in the spirit of the original (see ref. at https://en.wikipedia.org/wiki/Compatibility_of_C_and_C%2B%2B...). But sadly the rest of standards committee don't seem to want this which i believe is a huge mistake. On the other side, the C standards committee should be very careful what inspiration they take from C++ in the evolution of the language since it was designed as a "minimal" language which was one of the main factors in its success. Whether people call it "primitive", "well behind other languages" etc. does not matter. You definitely don't want C turning into C++-lite. Hence IMO the conclusions stated in the last few paragraphs of the submitted article are quite right.
In a way, the whole C++ endeavor was doomed from the start. C was old and pragmatic and vague, a "portable assembly", and it was a shaky foundation to build C++ on top of. When the Standard tried to tighten things up, it just got more lopsided, full of hacks to fix hacks. But the alternate universe where C++ had a more pragmatic, laissez-faire design going forward probably isn't any better; maybe the "standard" would have become "do whatever GCC does"--or in the Darkest Timeline, "do whatever MSVC does".
I disagree that C++ "respecting its C roots" is viable. The C++11 and later Standards were trying to make the best of a bad situation, and that required leaving C behind because the C way of doing things doesn't fit with a higher-level language like contemporary C++. Especially when the language has multiple implementations that need to compile the same code the same way. The "C with classes" days are long over for most of us who have to use libraries expecting std::vector, smart pointers, and exception handling. We live in mortal fear of compiler writers smiting us for innocent things like punning through a union.
> You definitely don't want C turning into C++-lite
I agree. Trying to quickly hack classes or templates or whatever back on top of C would just start the whole C++ nightmare over again.
Hey! Them's fighting words! :-) "C++ as a better C" (which is what it started as) was/is/always will be needed and necessary. It gave you the best of both low-level and high-level worlds with full control and just enough complexity. Instead of implementing structs full of function pointers to design dynamic dispatch object models you just had the compiler do that for you while still retaining full control over other aspects. I still have some manuals that came with SCO Unix one of which was on the then newfangled C++ language. It had one chapter by Stroustrup himself (his original paper probably) on the C++ object model showing how vptrs/vtables are implemented and thinking it neat that the compiler did it for you. Also templates were just glorified macros then with none of the shenanigans that you see today. Hence moving from C to C++ was easy and its usage and popularity exploded. But with the infusion of lots of people into C++ land people who were not aware of the original vision/design/compatibility goal of the language started asking for the inclusion of more and more OO and modern language features. The result? The standards committee reinventing the language from C++11 onwards(and changing every freaking 3 years) and alienating the old C++ folks who made it popular in the first place. No doubt there are some benefits like increased design space and modern programming techniques but am not sure whether the increased complexity makes it all worth it. For me it is still C++98 with the addition of the STL and some simple generic programming techniques which is the sweet spot.
C++20 introduced `std::bitcast`, so I appreciate alias analysis getting all the help it can.
Not true. Using C structs themselves in C++ is very common - when you include the C header file, the relevant declarations are wrapped in "extern "C" {}" which gives structs C semantics. You can do this because C++ is backwards compatible with C.
Most of the time when you use a struct in C++ you're just ignoring most of the capabilities of objects (which is fine!). If you declare a struct in C++, you're getting an object. The only difference between the struct and class keywords in C++ is the default privacy of the members.
What I think you're trying to say is "a POD structure with no custom behavior is essentially identical in C and C++". That is mostly true, though if the struct contains a union, C++ has stricter UB rules (there might be other differences as well, but that's the one I can think of at the moment).
Still. There's always extern "c".
100% of the using structs like they are C structs vs using class as objects is cultural not a part of the language.
I think this take is completely wrong. There is nothing cultural about it. C++ was created as a strict superset of C, and thus from the inception it supported all features made available in C. This design goal remains true up to this day, and only started to diverge relatively recently when C was updated to include features that were not supported (yet) by C++.
When someone declares a plain old struct in C++, they are declaring a struct that is perfectly compatible and interoperable with C. This is by design. From the inception.
This is not really the case. See https://en.wikipedia.org/wiki/Compatibility_of_C_and_C%2B%2B for a non-exhaustive list.
It is true that both sides agree that compatibility is an important goal, but it's only a goal, not something that's 100% the case.
Just asking, for a friend.
Just a friendly reminder that two leading underscores wont protect your member functions in C++. Even if people insist that those are totally not supposed to be private in python.
Whenever I say "I'm no longer attached to all that private stuff", people always reply, "wait until you work on a large code base". I work on a million line+ code base. Whatever.
This argument aside, I'm not a total philistine. RAII is awesome but C++ is full to the boot with crusty stuff to keep the compatibility. I always feel there is a language better than anything trying to come out.
These days I'm for minimalism, most of my structs are aggregates of public members, but sometimes you really want to make sure to maintain the invariant that your array pointer and your size field are in sync.
Of course neither double nor single underscore will stop anyone who wants to touch your privates badly enough. Which is big part of the python philosophy: You're not stopped from doing inadvisable things. Instead there's a strong culture around writing "pythonic" code, which largely avoids these pitfalls.
In python, if any of this gives you an trouble you can just replace the stuff in the class dict with your own functions. You don't even need to cast.
C++ is somewhat unique in that it started out as a few extra features on top of C before gradually splitting off and mutating into a totally separate programming language.
But Cfront was released circa 1983 and you basically just wrote C, but it added a bit of new syntax that generated extra C behind the scenes. Object-oriented programming was still fetal in 1983! It didn't get really hyped until the mid-90's. So C++ kind of mutated for decades as this gross appendage on C until it became this whole separate blob that ate half of programming. It was 15 years later when the C++98 "standard" started trying to reign in Dr. Stroustrup's monster.
Then in 2005 we threw away all our textbooks that were like "Look! `Apple` derives from `Fruit`! `Car` derives from `Engine`! This is going to change the world!" because adding object-orientedness to everything became uncool when our bosses became fans of Java. But by this point the C++ blob had taken on a life of its own...
So yeah. Very few programming languages have a story as long and insane as C++.
Objective-C++ likewise on top of CFront.
Until like with CFront, they became selfhosted compilers.
Groovy code is Java code, regardless of targeting the JVM, the same syntax is supported and extended with dynamic capabilities.
Object Pascal was created for Lisa project, exactly in 1983.
Tom Love and Brad Cox created Objective-C in 1984.
Again, "traditionally", one could (ab)use C++ as "C with extras". And it wasn't uncommon, especially in resource constraint usecases, to do just that. C++ without STL or templates, or even C++ without new/delete.
This "is not C++", agree. Would a subset be enough for "using it like C-with-RAII" ?
Given the details and pitfalls the original author lists, I suspect not. It's not just C programmers who "do strange things" and make odd choices. The language itself though "lends itself to that". I've (had to) write code that sometimes-alloca'ed sometimes-malloc'ed the same thing and then "tagged" it to indicate whether it needed free() or "just" the implied drop. Another rather common antipattern is "generic embedded payloads" - the struct definition ending "char data[1]" just to be padded out by whatever creates it to whatever size (nevermind type) of that data.
Can you write _new_ C code that "does RAII" ? Probably. Just rewrite it in rust, or zig :-) Can you somehow transmogrify language, compiler, standard lib so that you can recompile existing C code, it not to "just get RAII" then at least to give you meaningful compiler errors/warnings that tell you how to change it ? I won't put money on that.
https://www.youtube.com/watch?v=rX0ItVEVjHc
A classic which touches on such stuff.
You can do "manual" goto-based RAII in C, and it has been done for decades. The end of your function needs to have a cascading layer of labels, undoing what has been done before:
It just takes more discipline and is more error-prone maintenance-wise.Like C, with its many hidden behaviors?
I would argue that if it needs to be spelled out in a separate document from the code you're reading, then it's hidden.
It’s not clear if you’re talking about defer or RAII
But that's already what linters/static analyzers are doing? But then, why not integrate those tools directly in a C++ compiler instead?
With cpp2/cppfront, Herb Sutter is already building some sort of a "sane" subset of the C++ language, maybe because you cannot achieve good practices without having a new syntax.
C++ seems to have the same problem of javascript: it has annoying "don't-do-that" use cases, although it seems insanely more complicated to teach good C++ practices.
Of course, this requires buying into a set of tooling and learning a lot of specific idioms. I can't say I've used it, but from reading the docs it seems sound enough.
The issue is developers that think they are useless tools.
This sounds like a great idea to me! Rust disables implicit copying for structs with destructors, and together with move-by-default, it works really well. Unlike PoD structs, you don't need to heap allocate them to ensure their uniqueness. Unlike copy constructors, you don't need to worry about implicit copies. Unlike C++ move, there's no moved-from junk value left behind.
"Disabling" is maybe not the right way to think about it. Rust only has "implicit copying" for Copy types, so you have to at the very least #[derive(Copy,Clone)] to get this, it's true that you can't (and therefore neither can a derive macro) impl Copy on types which implement Drop and that's on purpose but you're making a concrete decision here - the answer Rust knows is never correct is something you'd have to ask for specifically, so when you ask it can say "No" and explain why.
Lots of similar behaviour in C++ is silent. Why isn't my Doodad behaving the way I expected? I didn't need to ask for it to have the behaviour I expected but the compiler concludes it can't have that behaviour, so, it doesn't, and there's nowhere for a diagnostic which says "Um, no a Doodad doesn't work like that, and here's why!"
Diagnostics are hard and C++ under-values the importance of good diagnostics. Rust recently landed work so libraries can provide improved diagnostics when you try to call them with inappropriate parameters. For example now if you try to collect() an iterator into a slice, the compiler notices that slice doesn't implement FromIterator and it asks FromIterator to explain why this can't work, whereupon FromIterator notices you were trying to use a slice and emits a diagnostic for this particular situation - if you'd tried to collect into an array it explains how you'd actually do that, since it's tricky - the slice is impossible since it's not an owning type, you need to collect into a container.
But you could gain reusability of headers to be also used in C++, not needing to reinvent the wheel with new issues (e.g. variable lifetime), and a whole lot of existing experience with RAII.
The initial example in the article is anti-idiomatic, because it imbues the larger class with a RAIIness which can be limited to just one element of it:
It's only the c member that really requires any special attention. In this particular case. So, there should be something like a `class void_buffer` which is a RAII class, and then: and actually, let's just not sully the set of constructors, but rather have: and now instead of a complicated bespoke class we have the simplest of structs; the only complexity is in void_buffer.1. It would be no more difficult to write and use than the larger class. After all, you can use the larger class as a void_buffer with some dummy extra fields.
2. You can put the class in a detail_ sub-namespace, or make it an inner class of ObjectType, and then people will avoid using it in other, general contexts.
* Less code overall
* More reuse of classes as versatile/simple components, as opposed to a zoo of bespoke classes
* Classes which are simpler to understand and with more predictable behavior
This is true in the example above: With the corrected code, it's enough that I tell you "ObjectType is a simple struct; and one of its members is a buffer of untyped data". I don't have to show you the class definition; you know enough to understand what's going on. And you can use your void_buffer elsewhere.
It abstracts the void_buffer into its own type with proper correct functions for creating, (maybe copying), moving, and destructing the buffer. With that you get a simple type that you can use elsewhere without needing to remember that you need to free() the buffer manually before the end of the scope, or needing to remember how to correctly copy or move the buffer elsewhere.
There are 2 ways to get C++-style RAII into C. The first way is to wholesale import the C++ object system into C (which means name mangling, all the different flavors of constructors, destructors, etc). Conceptually this would work, but it's never going to happen, because implementing that would be literally more work than an entire conforming C99 compiler.
The second way is to just use some special function attributes to signify that a function runs when an object is created on the stack / popped off the stack. This won't work either because the C++ object system also solves lots of other problems that this simpler system just ignores (such as, what happens when you copy an object that has a constructor function).
amateur C++ coder
When I started reading it, the first thing that came to my mind was the issue with copying the structs. The article started looking at the issue, but didn't really follow further with the changes needed to make it work, which is that you start needing to introduce tracking which instance is responsible for the resources and providing a way to transfer that responsibility (a.k.a. ownership and move semantics).
Another way to think about it: even if you had defined constructors and destructors for a struct, you have not solved when to call them. C++'s answer to that question is its sophisticated object model. C does not have one, and in order to answer that question, it must. It's worth noting that RAII was not a feature that was intentionally created in C++. Rather, astute early C++ developers realized it was a useful idiom made possible by C++'s object model.
I think the actual question should be "can C get automatic memory management like in C++ without having the equivalent of C++'s type system"?
Though I can't put my finger on it, my intuition says it can, if the interested people are willing to look deep enough.
This doesn’t make sense: you don’t need runtime introspection to do this?
Introspection (reflection) would go even further and provide at runtime all the information that you have at compile time about an object. But that's not required for assignment and destruction operations to work.
C doesn't have any of that, so a struct copy is just a shallow copy, a bit by bit copy of the entire struct contents. Which works pretty well, except for pointers/references.
So, no, runtime introspection is not needed, but runtime dispatch may be needed.
I haven’t seen this distinction laid out so clearly before:
Every other language worth being so much as spit on either employs deep garbage collection (Go, D, Java, Lua, C#, etc.) or automatic reference counting (Objective-C, Objective-C++, Swift, etc.), uses RAII (Rust with Drop, C++, etc.), or does absolutely nothing while saying to Go Fuck Yourself™ and kicking the developer in the shins for good measure (C, etc.).
GC, ARC, RAII or GTFO, those are the options. That’s right!
I always come away from these discussions with more respect for Objective-C -- such a powerful yet simple language. I suppose Swift is the successor but it feels very different.
Although, Obj-C only really came into its own once it finally gained automatic reference counting, after briefly flirting with GC. At that point it was already being displaced by younger and more fashionable languages.
C11 provided a few worthwhile improvements (i.e., a proper memory model, alignment specification, standardized anonymous structures/unions), but so many of the other additions, suggestions, and proposals I’ve seen will just ruin the minimal nature of C. In C++, a simple statement like `a = b++;` can mean multiple constructors being called, hidden allocations, unexpected exceptions, unclear object hierarchies, an overloaded `++`, an overloaded `=`, etc. Every time I wish I had some C++ feature in C, I just think about the cognitive overhead it’d bring with it, slap myself a couple times, and go back to loving simple ole C.
Please don’t ruin C.
Exactly this. C++ folks should not approach C like a "C++ lite". I appreciate the authors candid take on the subject.
As for defer, there is some existing precedent like GCC and Clang's __attribute__((cleanup)), but - at least for me - a simple "goto cleanup;" is usually sufficient. If I understand N3199 [1] correctly, which is the authors proposal for introducing defer in C, then "defer" would be entirely a compile-time construct. Essentially just a code transformation to inject the necessary cleanup at the right spots. If you're going to introduce defer to C then that does seem like the "best" approach IMO.
[1] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3199.htm
And yes, I also agree that C++ has WTF insanity, like 17 or so initialisation quirks, exceptions in general (primarily to address failures in constructers, surely there must be a better way, also OOM / bad_alloc is a relic from the past), and unspecified sizes for default built in types (thats C heritage).
Unless you're writing inline assembly or intrinsics or something like that, the semantics of your target architecture are quite irrelevant. If you're reasoning about the target architecture semantics that's a pretty good indication that what you're writing is undefined behavior. Reasoning about performance characteristics of your target architecture is definitely ok though.
No, this is wrong. It's a common misconception though. You would only want that in a hypothetical world where all computers are exactly the same.
Undefined and implementation defined behavior is what allows us to have performance at all. Here are some simple examples.
Suppose we want to make division by zero and null pointer dereference defined. Now every time you write a/b or *x, the compiler will be forced to emit an extra branching check before this operation.
Something much more common---addition. What about signed overflow? Do you want the compiler to emit an overflow check in advance? Similar reasoning for shift instructions.
UB in the language specification allows compilers to optimize based on the assumption that the programs you write won't have undefined behavior. If compilers are not able to do this, it becomes impossible to implement most optimizations we rely on. It's a very core feature of modern language specifications, not an oversight you can fix by thinking about it for 10 minutes.
This is wrong, because you would define them to have the behavior that the architecture in question does, so no changes would be needed. For integer division this would mean entering an implementation-defined exceptional state that does not by default continue execution (on Linux, SIGFPE with the optional ability to handle that signal). For dereferencing a pointer, it should have the same semantics as a load/store to any other address--if something is there it works normally, if the memory is unmapped e.g. for typical Linux x86 programs you get SIGSEGV (just as you would for accessing any other unmapped address).
Suppose now, there are two architectures with slightly differing behavior.
Can the compiler still optimize signed x + 1 > x to true?
A good example is WebAssembly*—address 0x00000000 is a perfectly fine and well-defined address in linear memory. In practice though, most code you’ll come across targeting WebAssembly treats it as if dereferencing it is undefined behavior.
* Of course WebAssembly is a compiler target rather than a language, but it serves as a good example of the point you’re making.
Given that has proven to be a completely false assumption, I don't think there's a justification for compilers continuing to make it. Whatever performance gains they are making are simply not worth the unreliability they are courting.
This part is correct. The problem is in how to deal with this. If you want the compiler to correctly deal with code having undefined behavior, often the only possibility is to assume that all code has undefined behavior. That means, almost every operation gets a runtime branch. That is completely incompatible with how modern hardware works.
The rest is wrong, but again, this is a common misconception. Language designers and compiler writers are not idiots, contrary to popular belief. UB as a concept exists for a reason. It's not for marginal performance boosts, it is to enable any compiler based transformation, and a notion of portability.
Some of these checks could be removed by languages with better compilers and likely more restrictions. That is the better approach. As a user, I don't want to run code that is potentially unsafe and/or insecure.
I know that languages like Java have a NullPointerException which they can throw and handle for situations like this, but they're also built on a highly specified virtual machine architecture that is consistent across hardware platforms. This also does not guarantee that your program is safe from crashing when this exception gets thrown, as you have to handle it somewhere. For something as general as this it will probably be in the Main function, so you might as well let it go unhandled as there's not that much you can do at that point.
For a language like C++ it is simpler, easier, and I would argue more correct, to just let the hardware handle the situation, which in this case would trigger a memory error of trying to access invalid memory. As the real issue is probably somewhere else in the code which isn't being handled correctly and the bad data is flowing through to the place where it accesses the null pointer and the program crashes.
To add to that in a lot of cases the program isn't crashing while trying to access address 0, it's crashing trying to access address 200, or 1000, or something like that, and putting in simplistic checks isn't going to catch those. You could argue that the check should guard against accessing the lowest 1k of memory, but then when do you stop, at 64k? Then you have an issue with programs that must fit within 1k of memory.
Leaving it unspecified is the better choice.
You would not want to force these by default, nobody wants it. You can not statically determine them unnecessary in for the vast majority of code, even stuff as simple as `print(read(a) + read(b))`.
For example: Rust will silently wrap signed integers in release mode even when it’s considered a bug and crashes in debug mode.
Every other example you mention is done by rust in release mode and the performance impact is minimal, so I would say it's a good counterexample to your claims that defining these things would hamstring performance (signed integer overflow especially is an obvious no-brainer for defining. Note that doesn't necessarily mean overflow checks! Even just defining the result precisely would remove a lot of footguns).
You have missed my point.
If a pointer can be null, it must be an optional pointer, and you must in fact check before you dereference it. This is what you want. Is it ok to write a program which segfaults at random because you didn't check for a pointer which can be null? Of course not. If you don't null-check the return value of e.g. malloc, your program is invalid.
But the benefit is in the other direction. Careful C checks for null before using a pointer, and keeping track of whether null has been checked is a manual process. This results in redundant null checks if you can't statically prove (by staring at the code and thinking very hard) that it isn't null. So in practice you're likely to have a combination of not checking and getting burned, and checking a pointer which was already checked. To do otherwise you have to understand the complete call graph, this is infeasible.
Zig doesn't do any of this. If it's a pointer, you can safely dereference it. If it's an optional pointer, you must check, and then: it's a pointer. Safe to pass down the call stack and freely use. If you want C behavior you can always YOLO and just say `yoloptr.?.*`.
Overflow addition and divide by zero are safety checked undefined behavior, a critical concept in the specification. They will panic with a stack trace in debug and ReleaseSafe mode, and blow demons out of your nose in ReleaseFast and ReleaseSmall modes. There's also +% for guaranteed wraparound twos-complement overflow, and +| for saturating addition. Also `@addWithOverflow` if your jam is checking the overflow bit. Unwrapping an optional without checking it is also safety-checked UB: if you were wrong about the assumption that the payload carries a value, you'll get a panic and stack trace on the line where you did `yolo.?`.
Shift operations require that the right hand side of the shift be a type log2(Type.bitwidth) of the left hand side. Zig allows integers of any width, so for a: u64, calling a << b requires that b be a u6 or smaller. Which is fine: if you know values will be within 0..63, you declare them u6, and if you want to shift on a byte, you truncate it: you were going to mask it anyway, right? Zig simply refuses to let you forget this. Addition of two u6 is just as fast as addition of the underlying bytes because of, you got it, safety-checked undefined behavior. In release mode it will just do what the chip does.
There's a common theme here: some things require undefined behavior for performance. Zig does what it can to crash your program if that behavior is exhibited while you're developing it. Other things require that you take some well-defined actions or you'll get UB: Zig tracks those in the type system.
You'll note that undefined behavior is very much a part of the Zig specification, for the same reasons as in C. But that's not a great excuse to make staying within the boundaries of defined behavior as pointlessly difficult as it is in C.
The debug modes you mention are also available in various forms in C and C++ compilers. For example ASan and UBSan in clang will do exactly what you have described. The question is, then whether these belong in the language specification or left to individual tools.
Language specification is unavoidable when using said language.
For a bunch of languages outside the C-centric world, specifications don't exist.
https://docs.python.org/3/reference/index.html
https://docs.python.org/3/library/index.html
https://doc.rust-lang.org/reference/index.html
https://doc.rust-lang.org/std/index.html
https://ferrous-systems.com/blog/ferrocene-language-specific...
The intuitive distinction is that the second one is for compiler/library developers, and the former is for users.
A specification can not leave any room for ambiguity or anything up to interpretation. If it does (and this happens), it is treated as a bug to be fixed.
no such thing ever existed.
The more interesting part is that the mode can be individually modified on a per-block basis with the @setRuntimeSafety builtin, so it's practical to identify the performance-critical parts of the program and turn off safety checks only for them. Or the opposite: identify tricky code which is doing something complex, and turn on runtime safety there, regardless of the build status.
That's why this sort of thing should be part of the specification. @setRuntimeSafety would be meaningless without the concept of safety-checked undefined behavior.
I would say that making optionals and fat pointers (slices) a part of the type system is possibly more important, but it all combines to give a fighting chance of getting user-controlled resource management correct.
Given the topic of the Fine Article, it's worth briefly noting that `defer` and `errdefer` are keywords in Zig. Both the test allocator, and the GeneralPurposeAllocator in safe mode, will panic if you leak memory by forgetting to use these, or rather, forget to free allocations generally. My impression is that the only major category of memory bugs these tools won't catch in development is double-free, and that's being worked on.
If you can make it work in a way that has acceptable performance characteristics, every systems language will adopt your technique overnight.
Huh, doesn't that sound familiar?
This is not the case. It's two's compliment overflow.
Also, since we're being pedantic here: it's not actually about "debug mode" or "release mode", it is tied to a flag, and compilers must have that flag on in debug mode. This gives the ability to move release mode to also produce the flag in the future, if it's decided that the overhead is worth it. We'll see if it ever is.
> Huh, doesn't that sound familiar?
Nope, it is completely different from undefined behavior, which gives the compiler license to do anything it wants. These are well defined semantics, the polar opposite of UB.
Okay, here is an example showing that rust follows LLVM behavior when the optimizer is turned on. LLVM addition produces poison when signed wrap happens. I'm a little bit puzzled about the vehement responses in the comments wow. I have worked on several compilers (including a few patches to Rust), and this is all common knowledge.
https://godbolt.org/z/r6WTxGjrb
The C++ output:
> LLVM addition produces poison when signed wrap happens.https://llvm.org/docs/LangRef.html#add-instruction
> nuw and nsw stand for “No Unsigned Wrap” and “No Signed Wrap”, respectively. If the nuw and/or nsw keywords are present, the result value of the add is a poison value if unsigned and/or signed overflow, respectively, occurs.
Note that Rust produces `add`. The C++ produces `add nsw`. No poison in Rust, poison in C++.
Here is an example of these differences producing different results, due to the differences in behavior: https://godbolt.org/z/Gaonnc985
Rust:
C++: This is because in Rust, the wrapping behavior means that this will always be true, but in C++, because it is UB, the compiler assumes it will always be false.> I'm a little bit puzzled about the vehement responses in the comments wow.
You are claiming that Rust has semantics that it was very, very deliberately designed to not have.
So if we're going to be pedantic, it's safe Rust which has defined semantics for basically everything. A considerable accomplishment, to be sure.
—— I'm still entirely unconvinced.
The thing is, wrap-around is not only well-defined, it's common, and EXPECTED.
Example:
and dammit, I absolutely DO NOT THINK we should annotate this as some kind of "special multiply". —-Full thread: https://lore.kernel.org/lkml/CAHk-=wi5YPwWA8f5RAf_Hi8iL0NhGJ...
No, it's really not. Do this experiment: for the next ten thousand lines of code you right, every time you do an integer arithmetic operation, ask yourself if the code would be correct if it wrapped around. I would be shocked if the answer was "yes" in as much as 1% of the time.
(The most recent arithmetic expression I wrote was summing up statistics counters. Wraparound is most definitely not correct in that scenario! Actually, I suspect saturation behavior would be more often correct than wraparound behavior.)
This is a case where I think Linus is 100% wrong. Integer overflow is frequently a problem, and demanding the compiler only check for it in cases where it's wrong amounts to demanding the compiler read the programmer's mind (which goes about as well as you'd expect). Taint tracking is also not a viable solution, as anyone who has implemented taint tracking for overflow checks is well aware.
For the kernel, which deals with a lot of device drivers, ring buffers, and hashes, wraparound is often what you want. The same is likely to be true for things like microcontroller firmware and such.
In data analysis or monte carlo simulations, it's very rarely what you want, indeed.
For example, I opened up https://elixir.bootlin.com/linux/latest/source/drivers/firew... as a random source file in the Linux kernel, and I didn't see a single line where wraparound would be correct behavior.
There are definitely cases where wraparound behavior is correct. There are also cases hard errors on overflow isn't desirable (say, statistics counters), but it's still hard to call wraparound the correct behavior (e.g., saturation would probably work better for statistics than wraparound). There are also cases where you could probably prove that overflow can't happen. But if you made the default behavior a squawk that wraparound occurred, and instead made developers annotate all the cases where that was desirable to silence the squawk, even in the entire Linux kernel, I'd suspect you'd end up with fewer than 1000 places.
This is sort of the point of the exercise--wraparound behavior is often what you want when you think about overflow, but you actually spend so much of your time not thinking about it that you miss how frequently wraparound behavior isn't what you wanted.
If wraparound is ok for that particular multiplication, tell the compiler that. As a sibling comment says, this is seldom the case, but it does happen, in particular, expecting byte addition or multiplication to wrap around can be useful.
The actual expectation of the vast majority of arithmetic in a computer program is that the result will be correct in the ordinary schoolyard sense. While developing that program, it should absolutely panic if that isn't the case. "Well defined" doesn't mean correct.
I don't understand your objection to spelling that `val *% GOLDEN_RATIO_32` is. When someone sees that (especially you, later, coming back to your own code) it clearly indicates that wrapping is expected, or at least allowed. That's good.
Signed integer overflow, on the other hand, is undefined. The compiler is allowd to assume it never happens and can re-arrange or eliminate code as it sees fit under that assumption.
How many lines will this code print?
And even then there are tools to help define much of that - if you want well defined wrapped signed integers, great. If you want to trap on overflow, there's an option for that. Lots of compiler warnings and other static analysis tools (that would just be default-rejected by the compiler today if it didn't have historical baggage, but they exist and can be enabled to do that rejection).
Yes, there's many issues with the ecosystem (and tooling - those options above should be default IMHO), but massively overstating them won't actually help anyone make better software.
And other languages often have similar amounts of "undefined behavior" - but just don't document it as such, relying on a single implementation being "Defined Correct", and hope they're not actually being relied on if anything changes. Just like C, only undocumentated.
This is what I mean by it becoming "meme" - things like "Undefined Behavior" or "Memory Safety" have become a discussion-ending "Objective Badness", hiding the real intent - being "Languages I Do No Like" (or, most often, are a poor fit for the actual job I'm trying to do. Which is fine, but not rejecting that those jobs actually exist).
But they mean real things that we can improve in terms of software quality, and safety - but that's rarely the intended result when those terms are now brought up. And many things we can do right now with existing systems to improve things, to not throw away huge amounts of already well-tested code. To do a staged improvement, and not let "perfect" be the enemy of better.
If a function has an error type (indicated by a ! in the return type), you have a few options. You can use `result = try foo();`, which will propagate the error out of the function (which now must have ! in its signature). Or you can use `result = foo() catch default;` or `result = foo() catch unreachable;`. The former substitutes a default value, the latter is undefined behavior if there's an error (panic, in debug and ReleaseSafe modes).
Or, just `result = foo();` gives `result` an error-union type, of the intended result or the error. To do anything useful with that you have to unwrap it with an if statement.
It's a different, simpler mechanism, with much less impact on performance, and (my opinion) more likely to end up with correct code. If you want to propagate errors the way exceptions do, every function call needs a `try` and every return value needs a ! in the return type. Sometimes that's what you need, but normally error propagation is shallow, and ends at the first call which can plausibly do anything about the error.
It also has tagged unions as a general mechanism for returning one of several enumerated values, while requiring the caller to exhaustively switch on all the possibilities to use the value. And it has comptime generics ^_^. But it doesn't use them to implement optionals or errors.
Correctness can be established well enough - even if guaranteed automatically - in a language with UB.
Yes, nothing like that is possible in C
https://godbolt.org/z/Ge4EqzznT
I assume you haven't looked at the expansion of errno lately?
edit: also
https://github.com/KxSystems/kdb/blob/master/c/c/k.h
whereas you can see the user-defined macro definition of "b" at the top of the file. you can't blame the c language for someone choosing to write something like that. sure it's possible, but its your choice and responsibility if you do stupid things like this example.
- macros are also standard C++ features too, so this point doesn't differentiate between those languages
- i'm failing to adequately communicate my point. there's a fundamental difference practically and philosophically between macro stupidity and C++ doing things under-the-hood. of course a user (you, a co-developer, a library author you trusted) can do all sorts of stupid things. but it's visible and it's written in the target language - not hard-coded in the compiler.
yes - sure, good luck finding the land-mine "b" macro if it was well buried. but you can find it and when you do find it, you can see what it was doing. you can #undef it. you can write your own version that isn't screwed up, etc.
you can do none of those things for operations in c++ that occur automatically - you can't even see them except in assembly.
I specifically reject this. Constructors, exceptions, and so on are as similarly visible at the source level as macro definitions.
And thanks to macros, signal handling, setjmp, instrumentation, hardening, dynamic .so resolution, compilers replacing what look like primitive accesses with library functions, any naïve read of C code, is, well, naïve.
I'm not claiming C++ superiority here [1], I'm trying to dispel the notion that C is qualitatively different from C++ form a WYSIWYG point of view, both theoretically and in practice.
[1]although as I mentioned else, other C++ features means that macros see less use.
but i will also emphatically reject your position: "Constructors, exceptions, and so on are as similarly visible at the source level as macro definitions"
no they are not. you can certainly see what the macro is doing - you see it's definition, not just it's existence. whereas in c++ you have to trust that language/compiler to:
- build a vtable (what exactly does this look like?)
- make copy ctors
- do exception handling.
- etc.
none of these are explicit. all of them are closed and opaque. you can't change their definition, nor add on to it.
at issue at hand is both "magic" and openness. c gives relatively few building blocks. they are simple (at least in concept). user libraries construct (or attempt to construct) more complex idioms using these building blocks. conversely c++ bakes complex features right into the language.
as you note, there are definitely forces that work against the naïve original nature of c. macros, setjmp, signal handling, instrumentation, hardening, .so resolution, compilers replacing primitive accesses, etc. but all of those apply equally to c and c++. they are also more an affect of the ABI and the platform/OS than either language. in short, those are complaints and complexities due to UNIX, POSIX, and other similar derived systems, not c or c++ the language itself.
c has relatively few abstractions: macros, functions, structured control flow, expressions, type definitions. all of these could be transformed into machine code by hand, for example in a toy implementation. sure a "good" compiler and optimizer will then mangle that into something potentially unrecognizable, but it will still nearly always work the way that the naïve understanding would. that's why when compilers do "weird" things with UB, it gets people riled up. it's NOT what we expect from c.
c++ on the other hand has, in the language itself, many more abstractions and they are all more complex. you aren't anywhere near the machine anymore and you must trust the language definition to understand what the end effect will be. how it accomplishes that? not your problem. this makes it squarely a high-level language, no different than java or python in that facet.
i explicitly reject your position that "that C is qualitatively [not] different from C++ from a WYSIWYG point of view, [either] theoretically [or] in practice."
to me, it absolutely is. it represents at lower level interface with the system and machine. c is somewhere between a high-level assembler and a mid-level language. c++ is truly high-level language. yes, compilers and os's come around and make things a little more interesting than the naïve view of c in rare cases . but c++? everything is complex - there is not even workable illusion of simplicity. to me this is unfortunate because, c++ is still burdened by visible verbosity, complexities, land-mines, and limitations due to the fact that it is probably not quite high-level enough.
this is all very long winded. you and many other readers might think i'm wrong. the reason i'm responding is not to be argumentative, but because it is that it's by no means a "settled" question and there are certainly also plenty of people that see it a very different way. which i think is fine.
However, if I were to request a feature to the core language it would be: NAMESPACES. This would clean up the code significantly without introducing confusing code paradigms.
In fact on some targets the assembler name of identifiers doesn't always match the C name already.
Although as someone almost always explicitly qualifies names, typing foo_bar is not very different from foo::bar; the only minor advantages are that you do not have to use foo:: inside the implementation of foo itself and the ability to use aliases.
surely not. How do you differentiate these two functions?
You would mangle it as something like foo$N depending on the platform.
I guess I should have reworded. I don’t expect that feature in C, but if I were to reinvent C today I would keep it the same but add namespace and mangling.
Adding an explicit prefix to every function call is a lot boilerplate when it’s all added up.
its just mean if you need that logic, in C you would write lots of verbose less safe code.
1) labels as values in standard 2) control over memory position offsets, without linker script
other than that a few more compiler implementations offering things like checked array bounds, and a focus on correctness rather than accepting the occasional compiler bug
the rough edges like switch fallthrough are rough, but easy to work around. They don't need fixing (-pedantic fixes it already, etc)
maybe more control over assembly generation, such as exposing compilation at runtime; but that is into the wishful end of wishlists
If you miss a destructor event, without configuring the addon "yes I really meant that", the addon halts the compilatoin at best, or returns nonzero for ci at worst.
Edit: I just reread this comment and realized the beginning of it could come across as a bit condescending even though that wasn’t at all my intention. I’d edit it out, but I don’t like doing that, so my apologies if it did come across that way!
It's a bit confusing to have a 'thing' mention one mechanism in its name, but actually being valuable by ensuring some other mechanism
Indeed! When I was first learning C++, I found the term "RAII" quite confusing too. However, after years of experience with this term, associating "RAII" with its intended meaning has become second nature.
Having said that, there is at least one way to make better sense of "RAII" and that is considering the fact that in RAII, holding a resource is a class invariant. The resource is acquired during construction (initialisation) and released during destruction (which happens automatically when the object of the class goes out of scope). Throughout the object's lifetime, from construction to destruction, maintaining possession of the acquired resource is an invariant condition.
Although sounds simple in principle, this can get complicated pretty quickly, especially in the implementation of the copy assignment operator where we may need to carefully delete an existing resource before copying the new resource received by the operator. Problems like this led to formulating more techniques for carefully managing the resources while satisfying the class invariant. One such technique is the copy-and-swap idiom.
None of this is meant to justify the somewhat arbitrary term though. In fact, there are at least two better alternative names for RAII: Scope-Based Resource Management (SBRM) and Constructor Acquires, Destructor Releases (CADR).
The reason to do this is precisely so that the resource can be cleaned up at destruction of the object. So even if you had an acronym like RASALTOI, it would still probably be misleading
You are proposing to change the C language. The risk is great even the smallest change will break the existing code. If you can't convince all of the stakeholders, it's better not to change it. Keep the status-quo.
Oh man, I hear ya. And in a lot more domains than computer language design. Is it inexperience? Impatience? The tendency for search results to be filled with low-quality and high-recency content? The prioritization of hot-take blog posts and Reddit comments over books?
there is dedicated mechanism to achieve RAII-likeness in .NET: try-finally construct
There is no such thing as IL finalizers. There are object finalizers which are highly discouraged to be used on their own.
Their most frequent application is a safety measure for objects implementing IDisposable where not calling Dispose could lead to memory leak or other form of resource starvation that must be prevented.
For example, a file handle is IDisposable, so it is naturally disposed through using statement but should a user make a mistake in a scenario where that handle has non-trivial lifecycle, once the object is no longer referenced, its finalizer will be called upon one of the Gen2 GCs by a finalizer thread, preventing the file handle leakage even if its freeing is now non-deterministic:
https://dlang.org/spec/betterc.html
... actually writing code that gets the job done ... in C++.
You can add `defer` instead, but regardless, this has nothing to do with C++. You can implement safety features without having to copy the arguably worst language in the world, C++. I like C++, I wrote many larger projects in it, but it sucks to the very core. Just add RAII to C.
The author even acknowledges halfway through that it’s basically a strawman:
> It’s not a bad argument; after all, the entire above argument hinges on the idea of stealing from C++ entirely and copying their semantics bit-for-bit.
To me, only after that does it engage with the underlying concept in a way which is engaging and convincing. But you’ve had to trawl through 2500 words to get to that point.
This is a response to people contacting / criticising them asking for destructors instead of defer.
I find this an interesting thought experiment, basically types that you'd opt in to RAII. Just have a feeling that you'll need to define some notion of ownership to make it work.