> A portion of this code implemented a SMTP client.
If I wanted to root cause this, the real problem is right there. Implementing protocols correctly is hard and bugs like in the post are common. A properly implemented SMTP client library, like one you would pull off the shelf, would accept text and encode it properly per the SMTP protocol, regardless of where the periods were in the input. The templating layer shouldn't be worrying about SMTP.
TeMPOraL 149 days ago [-]
The real problem isn't the protocol, but the cowboy approach to interacting with it. It's not hard to "accept text and encode it properly per the SMTP protocol", you just need to realize you need to do it in the first place.
There is a multitude of classes of errors and security vulnerabilities, including "SQL injection", XSS, and similar, that are all caused by the same mistake that this case of missing period was[0]: gluing strings together. For example, with SQL queries, the operation of binding values to a query template should happen in "SQL space", not in untyped string space. "SELECT * FROM foo WHERE foo.bar = " + $userData; is doing the dumb thing and writing directly to SQL's serialized format. In correct code (and correct thinking), "SELECT * FROM..." bit is not a string, it just looks like one. Same with HTML templating[1] - work with the document tree instead of its string representation, and you'll avoid dumb vulnerabilities.
So, if you want to avoid missing dots in your e-mails, don't inject unstructured text into the middle of SMTP pipeline. Respect the abstraction level at which you work.
See also: langsec.
--
[0] - And therefore should be considered as a single class of errors, IMO.
[1] - Templating systems themselves are thus a mistake belonging to this class, too - they're all about gluing string representations together, where the correct way is to work at the level of language/data structures represented by the text.
mananaysiempre 149 days ago [-]
For what it’s worth, some markup-first template systems have tried to respect the target format’s structure—Genshi[1] and TAL[2] come to mind, and of course XSLT (see also SLAX[3]). I said “markup-first” so that the whole question isn’t trivialized by JSX.
> [1] - Templating systems themselves are thus a mistake belonging to this class
This is not universally true.
JavaScript has an amazing feature called tagged template literals which let you tag a string with interpolations with a function that handles the literal and interpolation parts separately. This lets the tag function handle the literals as trusted developer written HTML or SQL, and the interpolations as untrusted user-provided values.
Lit's HTML template system[1] uses this to basically eliminate XSS (there are some HTML features like "javascript: " attributes that require special handling).
ex:
html`<h1>Hello, ${name}</h1>`
If `name` is a user-provided string, it can never insert a <script> or <img> tag, etc., because it's escaped.
There are similar tags for SQL, GraphQL, etc. Java added a similar String Templates feature in 21.
> If `name` is a user-provided string, it can never insert a <script> or <img> tag, etc., because it's escaped.
Be careful with that "never". A curious and persistent person might discover a bug in the implementation, leading to something like the Log4Shell issue.
VBprogrammer 149 days ago [-]
Not sure why you are being downvoted here. It's a fair point and properly escaping your data is only one part of the overall security picture but you should also be strictly validating data at the inputs to your system too.
spankalee 149 days ago [-]
Luckily, for Lit specifically, the "escaping" is done by the browser by setting textContent, so the string literally never passes through the HTML parser. Any string is valid text content, and if you found a bug that permitted unsafe text to be parsed as HTML somehow, it would be a browser bug and a very, very serious one.
But it'd be similar with with other template systems. If the interpolation should allow any string, there's really no validation to be done.
TeMPOraL 148 days ago [-]
That's exactly the kind of hack that worries me. Your example is still (seemingly[0]) gluing text at serialized level, ignoring the actual structure of HTML language. ${name} should never be able to insert any text that would end up being interpreted as markup. Not only when some code decides it's not user-provided; it's not even possible to make that test be 100% accurate, and it doesn't protect you from mistakes in "trusted" strings (like totally trusted `name` having a stray '>' in it).
The bulletproof way of doing this is working at the level of abstraction of your target language. With HTML, that would be a tree structure. For example, if your HTML generation looks more like:
["H1", "Hello, " + name]
and that is passed to code that actually builds up the tree and then serializes it down to HTML, then there is no way `name` could ever break the structure or inject anything.
--
[0] - I skimmed the docs of Lit, it seems there are restrictions on where interpolation can be placed, but I don't think they're actually building up the tree expressed by the static parts.
spankalee 148 days ago [-]
Your assumptions here are very, very wrong. Calling it a hack is only telling on yourself, honestly.
Lit is not working at the serialized level, at all. It parses the templates independently of any values, and the values are inserted into the already parsed tree structure. There's is literally no way for values to be parsed as HTML.
throw10920 149 days ago [-]
The real problem is that SMTP is a "plain-text" protocol that includes in-band signaling. It literally happened because SMTP defined "a line that only contains a single period in it" as a control sequence and not a literal line that only contains a single period in it.
SMTP is an example of an unnecessarily complex design, and the implementation bugs reflect it. SMTP shouldn't be hard for someone to correctly implement by themselves (even though I agree that people shouldn't be re-inventing the wheel).
pja 149 days ago [-]
At some point, all protocols include in-band signalling somewhere: You have to put packets on the line, and those packets are ultimately just a stream of anonymous bytes.
If it wasn’t a period, it would be something else & you’d have to handle that instead.
throw10920 149 days ago [-]
> all protocols include in-band signalling somewhere
That's an incredibly reductionistic view of the world that's utterly useless for anything (including actually engineering systems) except pedantry. It's obvious that the level at which you include control information is meaningful and significantly affects the design of the protocol, as we see in the submission. Directly embedding the control information into the message body does not lead to a design that is easy to implement.
> If it wasn’t a period, it would be something else & you’d have to handle that instead.
Yes, and there are many other design choices that'd be significantly easier to handle.
TeMPOraL 149 days ago [-]
No, it's not reductionist and pedantic. It's a reminder that there is no magic. Building an abstraction layer that separates control and data doesn't win you anything if, like the people in the article, you then forget it's a thing and write directly to the level below it.
throw10920 149 days ago [-]
> No, it's not reductionist and pedantic.
It's very reductionistic, because it intentionally ignores meaningful detail, and it's pedantic because it's making a meaningless distinction.
> It's a reminder that there is no magic.
This is irrelevant. Nobody is claiming that there's any magic. I'm pointing out the true fact that details about the abstraction layers matter.
In this case, the abstraction layer was poorly-designed.
Good abstraction layer: length prefix, or JSON encoding.
Bad abstraction layer: "the body of the email is mostly plain text, except when there's a line that only contains a single period".
There are very, very few problems to which the latter is a good solution. It is a bad engineering decision, and it also obfuscates the fact that there even is an abstraction layer unless you carefully read the spec.
-------------
In fact, the underlying problem goes deeper than that - the design of SMTP is intrinsically flawed because it's a text-based ad-hoc protocol that has in-band signaling.
There are very few good reasons to use a text-based data interchange format. One of them is to make the format self-documenting, such that people can easily read and write it without consulting the spec.
If the spec is complex enough that you get these ridiculous footguns, then it shouldn't be text-based in the first place. Instead, it should be binary - then you have to either read the spec or use someone else's implementation.
Failing that, use a standardized structured format like XML or JSON.
But there's no excuse for the brain-dead approach that SMTP took. They didn't even use length prefixing,
kjellsbells 148 days ago [-]
I dont disagree with your criticisms of SMTP, but reading those early RFCs (eg 772) is a reminder of what a wildly different place the Internet was back then, and in that light, I feel it only fair to grant some grace.
MTP had one concern which was to get mail over to a host that stood a better chance of delivering it, where the total host pool was maybe a hundred nodes?
I speculate that Postel and Sluizer were aware of alternatives and rejected them in favor of things that were easily implemented on highly diverse, low powered hardware. Not everyone had IBM-grade budgets after all.
Alternative implementations of mail that did follow the kinds of precepts that you suggest existed at one time. X.400 is the obvious example. If I recall correctly, it did have rigorous protocol spec definitions, message length tags for every entity sent on the wire, bounds and limits on each PDU, the whole hog. It was also crushed by SMTP, and this was in the era when you needed to understand sendmail and its notoriously arcane config to do anything. So sometimes the technically worse solution just wins, and we are stuck with it.
smallnamespace 148 days ago [-]
> or JSON encoding
JSON needs to escape backslashes, SMTP needs to escape newline followed by period. If you're already accepted doing escaping, what's the issue?
nottorp 149 days ago [-]
Why not protobufs inside protobufs then?
TeMPOraL 148 days ago [-]
> Good abstraction layer: length prefix, or JSON encoding.
> Bad abstraction layer: (...)
In this context, it shouldn't matter. Sure, "mostly plaintext except some characters in some special positions..." is considered bad in modern engineering practice, however it's not fundamentally different or more difficult that printf and family. You wouldn't start calling printf without at least skimming the docs for the format string language, would you?
> It is a bad engineering decision, and it also obfuscates the fact that there even is an abstraction layer unless you carefully read the spec.
There's the rub: you should have read the spec. You should always read the spec, at least if you're doing something serious like production-grade software. With a binary or JSON-based protocol, you wouldn't look at few messages and assume you understand the encoding. I suppose we can blame SMTP for design that didn't account for human nature: it looks simple enough to fool people into thinking they don't need to read the manual.
> There are very few good reasons to use a text-based data interchange format.
If you mean text without obvious and well-defined structure, then I completely agree.
> One of them is to make the format self-documenting, such that people can easily read and write it without consulting the spec.
"Self-documenting" is IMHO a fundamentally flawed idea, and expecting people to read and write code/markup without consulting the spec is a fool's errand.
> it should be binary - then you have to either read the spec or use someone else's implementation.
That's mitigating (and promoting) bad engineering practice with protocol design; see above. I'm not a fan of this, nor the more general attitude of making tools "intuitive". I'd rather promote the practice of reading the goddamn manual.
> But there's no excuse for the brain-dead approach that SMTP took. They didn't even use length prefixing,
The protocol predates both JSON and XML by several decades. It was created in times when C was roaming the world; length prefixing got unpopular then, and only recently seems to en vogue.
pja 149 days ago [-]
> No, it's not reductionist and pedantic. It's a reminder that there is no magic.
Exactly! This is an even better phrasing of my point.
BoppreH 149 days ago [-]
That's not a very useful definition of "in-band signaling". For me, the main difference is an out-of-band protocol that says:
"The first two bytes represent the string length, in big-endian, followed by that many bytes presenting the string text."
and an in-band signalling protocol:
"The string is ended by a period and a newline."
In the second one, you're indicating the end of the string from within the string. It looks simpler, but that's where accidents happen. Now you have to guarantee that the text never contains that control sequence, and you need an escaping method to represent the control sequence as part of the text.
akvadrako 149 days ago [-]
That isn't true at all. In most binary protocols, you put the size of the message in the header. Then any sequence of bytes is allowed to follow. There are no escape sequences.
pja 149 days ago [-]
That’s still in-band signalling: The metadata is in the same channel as the data.
treflop 149 days ago [-]
From my experience, almost no protocols have in-band signaling. No protocol I’ve ever built has in-band signaling because it’s nuts.
You always know what the next byte means because either you did a prefixed length, your protocol has stringent escaping rules, or you chose an obvious and consistent terminator like null.
yencabulator 149 days ago [-]
SMTP has stringent escaping rules. The authors of the code in the article were incompetent.
pja 149 days ago [-]
If your metadata about the data is in the same channel as the data then you’re doing in-band signalling.
treflop 149 days ago [-]
Not in modern times.
The terms harken back from the day of circuit switched networks but now that we have heavily transitioned to packets, bands are an artificial construct on top of packets and applying the term isn’t very clear cut.
The main property of in-band data in the circuit-switched network days is that you could inject commands into your data stream. If we apply that criteria that to a modern protocol, even if you mix metadata and data in the same “band,” if your data can never be interpreted as commands then “out of band” makes an apt description.
That's only true if you're not breaking the protocol abstraction layer. There is no "out-of-band" once you serialize your messages. If you start injecting random bytes into the data stream on the wire, you can absolutely start introducing commands, or confuse the receiver where the next piece of metadata/control is.
In this case, somewhere the protocol abstraction layer got broken, and the message text ended up being treated as already serialized. It's not a problem with the protocol per se, but with bad implementation of its API (or no implementation at all, just printf-ing into the wire format).
treflop 148 days ago [-]
Injecting random data into any protocol will break it.
When we’re talking about whether someone can inject data into the link, we’re talking about the end user and not the software. If we’re talking protocol design, then you wouldn’t want regular data to be able to inject commands by simply existing.
TeMPOraL 148 days ago [-]
> Injecting random data into any protocol will break it.
It shouldn't, unless you're bypassing the actual protocol serialization layer (or hitting a bug in the implementation). Which is what's the case here. Protocol design can't address the case of users just writing out some bytes and declaring it's a valid protocol message.
treflop 148 days ago [-]
Sure but I’m not replying to the thread.
I’m replying to a post where someone said most protocols have in-band signaling and therefore this problem is unavoidable.
eviks 149 days ago [-]
Why didn't they use actual control sequences instead of text chars, ASCII had plenty of those?
inejge 149 days ago [-]
When your debugger is telnet circa 1980, non-printable characters are a liability. (You pay for it later, when people who've never seen "...end with <CRLF>.<CRLF>" botch the implementation.)
Animats 149 days ago [-]
SMTP is that way because running the entire contents of a message through some escape character processing was expensive back then, in the era of 0.25 MIPS machines.
throw10920 148 days ago [-]
Thank you for weighing in, John Nagle!
I understand that constraint, and it seems reasonable - but in that case, why not use a length prefix? That should be even more efficient than having to scan for a line containing a single period and nothing else.
Animats 148 days ago [-]
Because many paths of that era were not binary-transparent. There was CR-LF to LF conversion, the possibility of ASCII/EBCDIC translation, and other transformations.
We still have the mess that is the required and standardized behavior of HTML5 parsers faced with bad data.
throw10920 136 days ago [-]
That's genuinely horrifying, but it explains why SMTP made a decision that I would otherwise categorize as "insane" - it was forced to by constraints of the time. Thank you for explaining!
davedx 149 days ago [-]
Honestly this part with the periods, while unusual, isn’t really complex. The two rules regarding the periods were like a small paragraph of text. I agree with sibling comment from Temporal, this is purely a “skill issue”, not a protocol issue
meisenhus 149 days ago [-]
[dead]
xg15 149 days ago [-]
Sounds like a great idea, until you find that your SMTP library pulls in 5 other libraries as its own dependencies and those each pull in 3 transitive dependencies of their own, one being some kitchen sink/toolbox project where only 1% of its code is actually relevant to the dependant and the rest is dead weight - but which pulls in 20 more dependencies for functions that are literally never called in your project - and before you know it, your codebase bloats up by several MB and you get CVE warnings for libraries that you didn't even know existed, let alone that you're using them.
andruby 149 days ago [-]
I agree with this principle in general.
But for SMTP libraries, that's often part of stdlib (Ruby, Python, PHP, ...).
Gigachad 149 days ago [-]
This doesn’t seem like a real issue. Servers can handle a several megabyte executable, and CVE warnings for libraries you aren’t using can be ignored.
markisus 149 days ago [-]
In the hypothetical above, you won’t have any way to know which libraries are actually being used unless you read through the source code. Many libraries will transitively include protobuf, but most functions will not call protobuf.
bronson 149 days ago [-]
Agreed. Even if you establish that it's not being used today, that doesn't mean that it will continue to be unused after the next few commits land.
And, even though you might not see a way to call into the unused code, an attacker might find a way (XZ Utils).
That's why it's a best practice to specify protocols at a very high level (e.g. using cap'n'proto) instead of expecting every random sleep-deprived SDE2 to correctly implement a network exchange in terms of read() and write().
blueflow 149 days ago [-]
That why you have to read the specs of the protocol you want to implement. Its a matter of engineering rigorousness. Brute-forcing until "it works" doesn't cut it.
vlovich123 149 days ago [-]
I think you're missing the fact that experience has taught us repeatedly that separating the protocol definition from the wire format is a good idea. But sure, feel free to ignore the many lessons and blame it on individuals as if anything could be implemented free from human laziness, error, & economic demands (which btw is ironically a lack of engineering rigour which I was always taught as you assume that humans will be lazy, corrupt, make mistakes & that economics of making things as cheap as possible are a real part of engineering & not something to handwave away as "they're not doing real engineering").
robryk 149 days ago [-]
To steelman the GP's POV: there are other parts of solutions to problems where similar levels of rigour are required and cannot be filled in by using a preexisting library (state machines for distributed business logic come to mind as an example). Eliminating the need for that here doesn't help that much in general, and might even make things worse, because it gives people less experience with tasks demanding rigour before they tackle ones that are both subtler and harder.
vlovich123 149 days ago [-]
Learning to blindly follow a spec for the purposes of parsing the SMTP wire protocol doesn't give you extra ability to follow the state machine or distributed business logic specs better. It just adds to the overall opportunities for you to make a mistake. This also ignores the fact that SMTP specs is split across multiple RFCs with no single normative version which further complicates the probability that you implement the spec correctly in the first place.
Engineers get better faster because they leverage better tools and build tools to overcome their own shortcomings and leverage their strengths, not by constantly being beat into shape by unforgiving systems.
To be fair, what you and OP said is not an uncommon mentality. It's even shared in a way by Torvalds:
> [easier to do development with a debugger] And quite frankly, I don't care. I don't think kernel development should be "easy". I do not condone single-stepping through code to find the bug.
I do not think that extra visibility into the system is necessarily a good thing.
> Quite frankly, I'd rather weed out the people who don't start being careful early rather than late. That sounds callous, and by God, it _is_ callous. But it's not the kind of "if you can't stand the heat, get out the the kitchen" kind of remark that some people take it for. No, it's
something much more deeper: I'd rather not work with people who aren't careful. It's darwinism in software development. It's a cold, callous argument that says that there are two kinds of
people, and I'd rather not work with the second kind. Live with it.
He has similar views about unit tests btw.
I personally would prefer to work with people who are smart & understand systems and have machines take care of subtle details rather than needing to always be 100% careful at all times. No one writing an SMTP parser is at Torvald's level.
I'm not arguing that this excuses you from being careful or failing to understand things. I'm saying that defensively covering your flank against common classes of mistakes leads to better software than the alternative.
im3w1l 149 days ago [-]
> This also ignores the fact that SMTP specs is split across multiple RFCs with no single normative version which further complicates the probability that you implement the spec correctly in the first place.
This is a point I agree with and the fact I see it mentioned so rarely, that standards are split across multiple RFC's makes me suspect that people don't mention it because they don't know because they never read them in the first place, and rather try to follow the implementation of some existing program.
256_ 149 days ago [-]
It can get tedious and annoying, but I don't think it actually affects the likelihood that you'll implement something wrong. The RFCs link to each other when needed. Also groups of RFCs often get combined and edited into a single revised RFC for simplicity.
This makes me wonder: How could the IETF's approach to standardisation be improved? I'm not sure how to fix this problem without overhauling everything.
blueflow 149 days ago [-]
Research is also a real part of engineering. One should not omit it and end up with a SMTP implementation like in the article.
ninkendo 149 days ago [-]
Yeah someone should get in their Time Machine and tell those idiot SMTP RFC authors in the 1980s that they should have used a wire format that wouldn’t be invented for another 30 years.
Gigachad 149 days ago [-]
You don’t need to understand SMTP to send email. You just use a library that implements it and lets you just pass in a html doc.
Joker_vD 149 days ago [-]
Reading is for nerds. Real programmers™ write. And they write code, for the machine, not docs or specs or any other silly stuff intended for interhuman communication.
bregma 149 days ago [-]
Ah, yes, the Agile manifesto.
149 days ago [-]
aftoprokrustes 149 days ago [-]
I will not comment on the technical part, as others already did it better than I could, but it just reminded me of an anecdote that reminds of the importance of such trivial things as a period at the end of a sentence:
In Germany, where I work, it is usual at the end of employement to ask for a letter of recommendation ("Zeugnis") that lists the tasks performed, and how good the employee was. It is an important document, as it will typcally be required when applying for jobs. Obviously, no employee would accept a document explicitly stating "this guy is a lazy bastard, do not hire him", so there is a "Zeugnissprache", a "secret code" to disguise this information as praise. One part of this code is that a missing period in the last sentence means "please ignore everything said here, this guy is horrible".
How do I know? I let a lawyer check my Zeugnis after my last employment, and (I assume out of lack of care, as all my performance reviews were positive) the last sentence was missing the period.
Biganon 149 days ago [-]
Secret codes being used in recommandation letters are an urban legend. HR people have no incentive to create a secret code for them and their potential rivals, let alone teach it to new HR people while also keeping it secret.
This legend comes from the fact that HR people cannot be too explicit about the fact that you've been a pain in the ass (you could probably sue if it's too transparent), so if they have nothing positive to say they will commend your punctuality or something equally as mundane. It's not secret codes, it's like... "bless their heart", but in HR talk. Plausible deniability if you want to sue, I guess. "But it's a good thing, your honor! They were always on time!"
aftoprokrustes 140 days ago [-]
As other commenters said, what you describe could actually be considered a secret code.
But in the specific german case, the code is not even that secret. This is a formal document with a very specific structure, and very standardized phrases. There is even specific software to generate the text out of performance ratings. Basically something like this:
- John was overal engaged: he is a lazy bastard
- John was engaged: he is OK
- john was very engaged: he is good
- john was always very and thoroughly engaged: he is very good
boxed 149 days ago [-]
So.. secret codes are a myth, and here are some examples of secret codes?
KwanEsq 149 days ago [-]
"Damning with faint praise" is hardly a secret code.
Biganon 149 days ago [-]
Please refrain from willingly picking the naive interpretation when you've understood my point perfectly fine, it's against the rules of this website.
...sigh:
Secret codes as in "watermark-level omission of characters" are a myth. Lingo and jargon do however exist, and convey meaning in a particularly subtle way. They are shared and taught by culture, not by a secret handbook passed down from generation to generation. See also dogwhistling.
The goal is to protect the issuer, not to selflessly inform the recipient.
HankB99 149 days ago [-]
This reminds me of the joke whose punch line is
> You will be lucky to have this person work for you.
angst_ridden 149 days ago [-]
"I cannot recommend X too highly. X always served as an example to their colleagues. The quality of X's code was unequalled in our department, and X's work always merited special attention." (etc)
doktrin 149 days ago [-]
> I cannot recommend X too highly
This isn't a veiled statement. It's outright dunking on the applicant.
thedanbob 149 days ago [-]
It can be interpreted both ways: "I cannot recommend X too highly (because they suck)" vs "I cannot recommend X too highly (because whatever praise I give will be inadequate)"
doktrin 148 days ago [-]
If praise is the intent, it would be phrased as "... cannot recommend X highly enough"
MiguelX413 147 days ago [-]
I disagree.
doktrin 147 days ago [-]
Not every opinion is equally valid
boxed 148 days ago [-]
My take away from this is that any positive thing written in a letter of recommendation can be read as sarcasm by an English speaker.
I think there's some deeper issue with the language/culture here.
doktrin 148 days ago [-]
Every language has turns of phrase that are not necessarily intuitive to non-native speakers.
FabHK 148 days ago [-]
I think the line is
> you would be lucky to get this employee to work for you!
"In politics, a dog whistle is the use of coded or suggestive language..."
CRConrad 148 days ago [-]
You skipped over the fact that "bless their heart" itself is (or at least used to be, before it became too well-known to really be a “secret” any more) precisely such a secret code. (Like, probably, most “HR talk”.)
lisper 149 days ago [-]
> One part of this code is that a missing period in the last sentence means "please ignore everything said here, this guy is horrible".
Gee, what could possibly go wrong?
mwigdahl 149 days ago [-]
Did you check your performance reviews to see if they were missing periods also?
pests 149 days ago [-]
Why would they need to keep it secret if its a review and internal-only?
shellycharls9 149 days ago [-]
[dead]
meisenhus 149 days ago [-]
[dead]
kazinator 149 days ago [-]
Why would a cron job that sends e-mails need to implement its own SMTP client???
You just use the mail program from mailutils or whatever.
Just from a point of view of deliverability, developing bare bones SMTP interaction over a socket is a nonstarter. You can't just connect to random mail exchange hosts directly and send mail these days. A solution has to be capable of connecting to a specific SMTP forwarding host (e.g. provided by your ISP). For that, you need to implement connections over TLS, with authentication and all.
Also, a slightly ironic thing is that cron already knows how to send mail. The output of a cron job is mailed to the owner. Some crons let that mail address be overriden with a MAILTO variable in the crontab or some such thing.
partdavid 149 days ago [-]
There's actually a lot of things that embed a SMTP client for sending mail that should use a host MTA. The reason is that the user who actually wants to use "the thing" is often just about able to enter an SMTP server, but there's no way "the thing" can trust that a properly-configured sender MTA like sendmail is configured. Companies have huge fleets of servers that can't send system mail, and it's often not really under the control of the author of your frobulator program or the user, either. Mail itself is a specialist configuration, often requiring prerequisites in DNS, crypto stuff, policies, etc. It's "too much" for just being able to have your wallet or whatever send you mail; "contact your system administrator" is just often not a realistic option, in both large and small scales.
It's not a good reason, no--definitely not--but it's a real reason.
leni536 149 days ago [-]
Use an SMTP library then.
partdavid 149 days ago [-]
Agreed, for sure.
kazinator 149 days ago [-]
The SMTP client needs to be configured. E.g. if there is a particular SMTP forwarding host that must be used, you have somehow get the "mail" utility to use that, and you must somehow any get any given SMTP library to also use it.
bastawhiz 149 days ago [-]
There's no universe where I'd trust that there's already some binary on the system to send mail for me, but I definitely wouldn't roll my own. For any reasonable language, there's a multitude of SMTP client libraries available if there's not already one in the stdlib.
erhaetherth 149 days ago [-]
I'd give up instantly and outsource it to a SaaS. I'm no expert on email but I know enough to know it's a PITA and managing delivery is a thing, and you have to make sure your DNS is configured right out the receiver will just reject it, there's reputations to consider, you probably want to throttle how fast you send it.. ugh.
kazinator 149 days ago [-]
They are relying on a cron program being there, so ...
You can require it and it gets provisioned.
ummonk 149 days ago [-]
Just another example of Zawinski's law, no?
onlyrealcuzzo 149 days ago [-]
> Zawinski's Law captures common market pressure on software solutions, stating that “every program attempts to expand until it can read mail. Those programs which cannot so expand are replaced by ones which can.”
sjf 149 days ago [-]
The modern version would seem to be every platform expands until it includes instant messaging.
easyThrowaway 149 days ago [-]
I believe that's outdated already, the next...uh, expansion layer would be some sort of generative AI features.
In other words, every other platform expands until it can summarize emails.
tantalor 149 days ago [-]
IM is just fancy email.
teddyh 149 days ago [-]
I see two huge bad habits here. The first is the obvious one, as pointed out by many commenters here: Don’t implement standards haphazardly, if you even should do so yourself. Either give the implementations the necessary care and attention, or use a pre-made library.
But the other thing is: Don’t vendor your dependencies. Those libraries you use need to be updated regularly and timely, and absolutely not “only as necessary”. If updates lag behind or are avoided entirely, bugs like this can be huge problems even when the upstream code has been fixed, for people who thought that they should update only when they, themselves, see a problem or need.
otterley 149 days ago [-]
> Don’t vendor your dependencies.
The alternative seems worse: your own application's stability is now at risk against upstream changes that could break your code. Sure, you might not get a fix immediately, but I'd rather know I'm making a change because I need a fix than introducing instability and additional risk that I don't want to subject myself to. "If it ain't broke, don't fix it."
skybrian 149 days ago [-]
I like Go's approach, which ensures that all upgrades happen when you choose to do some upgrades.
(For apps, a lock file will do it too.)
cpeterso 149 days ago [-]
Taking dependency updating to the extreme, some Google projects adopt a "live at head" philosophy, where their projects depend on their dependencies' top-of-tree main commits, not a release branch:
You can choose to either live at the slightly-bleeding edge (as determined by “stable” releases, etc), or to live on the edge of end-of-life, as discussed here: <https://news.ycombinator.com/item?id=21785399>
(And surely you should have tests to verify all your own functionality after upgrading a dependency?)
otterley 147 days ago [-]
That "should" is load bearing. Unfortunately thorough automated testing isn't frequently done by application teams, and even fewer automated testing of dependencies is done by them. Most developers assume, for better or worse, that testing of dependencies is the responsibility of their respective authors.
the_real_tjaart 149 days ago [-]
> Don’t implement standards haphazardly, if you even should do so yourself. Either give the implementations the necessary care and attention, or use a pre-made library.
I agree 100%.
ryandrake 149 days ago [-]
Famous last words at >50% of the companies I've worked: "Just implement as much of standard X as you need to ship an MVP of feature Y!"
rrr_oh_man 149 days ago [-]
Can you share a story?
maxbond 149 days ago [-]
I had a very difficult to track down bug that ended up losing us a very big fish client, that came down to parsing a binary file with an ad-hoc parser that looked for the index of a header. They didn't realize that after the header was some metadata, so a small amount of metadata was interpreted as data. I fixed it by writing a proper ad-hoc parser that actually worked on a header-by-header level. But the damage was done and we had looked like buffoons to the client.
Was very hard for the team. There was a shouting match over it, some hard feelings. The code was written in the spirit GP alludes to by an enthusiastic executive who wanted to help lighten the load. I should've rejected the PR but was intimidated to reject the exec's code (not an engineering reason for an engineering decision!). The exec was a good data scientist but not as strong a coder as me, and parsing binary files is one of my specialties.
Friends don't let friends parse using "indexOf()".
ryandrake 149 days ago [-]
As the sibling comment already mentioned, file format reading is a big one. Oh, a customer needs us to ingest CSV data. Let's just roll our own CSV parser, the file format is simple, right? What could go wrong? Same for graphics formats. We need to read PNG files, so let's just figure out the format and implement just enough to read the handful of files we have on hand. Even though libpng exists.
Another, more domain specific one: We have to talk to a GPS receiver that connects via RS-232 and outputs NMEA formatted data. This device, like all such devices, outputs a small subset of the standard. So parse just enough of it for that one device we have to get working, and ship it. Then someone attaches a different device that outputs a different subset of the standard and the software fails (sometimes gracefully, sometimes not).
kruador 149 days ago [-]
That reminds me of the Windows Mobile 6.5 device with built-in GPS receiver, where on the 31st day of the month, the API would actually output the date as the 0th day of the following month. This wouldn't have been a problem, except that we were using that to set the system clock. The devices would reset their clock if allowed to go completely flat, but they didn't have reliable NTP support either. I believe the OS would just ignore our attempt to set the date to an invalid value, leaving it at the system default date.
It was a struggle to get the vendor to even acknowledge the problem, since it would only happen on 7 days out of the year, and usually 60 days apart. (There are only two consecutive months with 31 days, July and August.) It did eventually get fixed.
I seem to recall writing a workaround, but then it being stalled by our customer's Change Control Board.
userbinator 149 days ago [-]
I suspect a lot of people are no longer being taught these fundamental protocols by manual interaction with a terminal, since that's what SMTP seems to have been originally intended for; and as someone who actually made use of that for a nontrivial amount of time, the "single dot on a line" to end a message has been permanently etched into my memory.
Relatedly, escaping somehow seems to be a foreign concept for a lot of programmers, who wouldn't ever see the above situation and ask themselves "but what if I want to send an email with a line containing a single dot?" yet another large group of them finds it perfectly logical and easy to understand.
edanm 149 days ago [-]
> I suspect a lot of people are no longer being taught these fundamental protocols by manual interaction with a terminal,
I'd be surprised if any significant portion of software developers learned anything like that in the last 30 years, at least.
Maxion 149 days ago [-]
I mean, if we did that for everything in webdev, It'd take 30 years of training before you'd get to do your first PR!
This reminds me of an experience debugging a network protocol implementation - specifically AppleTalk NBP for other ancient people. I had coded everything but my packets were rejected (aka silently dropped) when real (aka Apple implementation) packets were not. I had a copy of the good and bad packets on the screen of my computer and had gone over them byte by byte to find the problem. And there was none. From start to finish they were exactly the same, with correct check sums etc. It was time to go home and I decided just to print the stupid things to look at later.
As soon as I printed them, the error was clear. My version ran to two pages and the good implementation one page. I had not been careful to clear the buffer before sending the data (mbufs don't you know).
This still cracks me up.
the_real_tjaart 149 days ago [-]
Thanks for sharing!
kazinator 149 days ago [-]
But the line "We are happy to welcome you to our family." is not anywhere near the line limit. There is something else going on here, like perhaps the whole thing actually being an HTML MIME attachment, perhaps? IN which it is like
... lots of text ... <br>We are happy to welcome you to our family.<br>
or whatever. But if you blindly split HTML into lines, it will break tags.
the_real_tjaart 149 days ago [-]
This was just an example, sorry for not providing a real example that had the exact character count.
canucker2016 149 days ago [-]
and probably the vast majority of the people receiving those broken HTML emails will realize there's something wrong with the email's formatting, chalk up the problem to a company that can't be bothered to correctly write a legible email and lower the company's competence rating, and go on to the next email.
the company is just blindly unaware to their minor problem.
dylan604 149 days ago [-]
rarely do I hold the sins of an incompetent marketing department against the company itself. otherwise, there'd be no company left deemed as competent because all marketing departments are incompetent.
Aloisius 149 days ago [-]
Quoted-printable encoding with soft breaks would allow blind splitting of lines without breaking HTML tags.
While quoted-printable is supposed to have a max line limit of 78 characters including CRLF rather than 1000, email clients tend to be permissive.
jeffbee 149 days ago [-]
If a person has never heard of dot stuffing they're never going to believe what other horrors lie within the email space. Header folding, quoting in the local part, ipv6 literals, etc.
blueflow 149 days ago [-]
Whats the issue with ipv6 literals?
jeffbee 149 days ago [-]
Everything that is normally annoying with IPv6 literals, plus the fact that it's encapsulated as [IPV6:<the address>] and the address could take a dumb form such as ::1.2.3.4. But I mentioned it because it might initially seem to a neophyte in this field that the thing to the right of the rightmost @ sign is a name you can pass to your resolver. It might not be.
kazinator 149 days ago [-]
The IPv6 notation claims : for separating digits. But that's already an established notation for port numbers: 10.1.2.3:456. Oops!
The square brackets allow us to stick a port number on it: [ffff::0123:4567]:6301
jeffbee 149 days ago [-]
Yes, but the RFCs permit 6v4 addresses such as ff:ee::aa:1.2.3.4
jcranmer 149 days ago [-]
Note that IPv4 address literals are encoded in square brackets (e.g., user@[127.0.0.1]), so it's really all IP address literals that could be a problem rather than specifically IPv6 literals.
irrational 149 days ago [-]
> Seeing that the SMTP client code was borrowed from a previous project we thought it good to let our other teams know about this bug in case they needed to patch it as well. They thanked us and we called it a day.
As soon as I read the above, I knew the below would be the result.
> It seems one of our other teams haven't gotten around to patching this bug in their code.
carimura 149 days ago [-]
Good story. Reading this reminds me of all the "curious cases of....." that I've solved (or in some cases not solved) over my career and how that feeling of triumph is so deeply tied to why I got into computers in the first place. The pure joy of unraveling the mysteries of engineering... like forgetting a semi-colon somewhere in a 50k LOC Perl backend.
the_real_tjaart 149 days ago [-]
> how that feeling of triumph is so deeply tied to why I got into computers in the first place
I share your sentiment, thank you for reading.
eschneider 149 days ago [-]
I guessed the cause of the problem (well, "leading period") from the description, but that's because I've experienced a lot of pain in my life...
linsomniac 149 days ago [-]
Show of hands: Who knew exactly where this was going as soon as "SMTP" was mentioned?
barryrandall 149 days ago [-]
I first encountered this when I was writing a WAP[1] mail client in ColdFusion[2]. I must have read the entire SMTP spec 20 times before I spotted my mistake.
As soon as "missing period" was mentioned I suspected it would be about SMTP.
jcpham2 149 days ago [-]
As soon as I started reading one of the first things that came to mind was the termination period in the smtp server spec. If you’ve ever had to troubleshoot an SMTP transport issue typing SMTP con Ds by hand is a familiar thing. Cool read.
jayceedenton 149 days ago [-]
> A portion of this code implemented a SMTP client.
What the...
davidmurdoch 149 days ago [-]
Loved reading this. This is now one of my new favorite bug hunt stories!
rrr_oh_man 149 days ago [-]
I found it quite mild compared to the 500 mile email
You disagree that I loved the article? Or that it's one of my new favorites?
(I'm only messing with you as you criticized the author's use of language then replied to me with imprecision.)
tuck1s 149 days ago [-]
The need for dot-stuffing is a side-effect of email's use of the period as part of the termination sequence CR LF . CR LF.
Mishandling of that sequence has led to other recently discussed bugs. https://smtpsmuggling.com/
layer8 149 days ago [-]
SMTP = Sporadically Missing Trailing Period
;-)
256_ 149 days ago [-]
/SMTP/SMLP/
SMLP = Sporadically Missing Leading Period
(-;
readyman 149 days ago [-]
The title sounds like a pregnancy scare
blueflow 149 days ago [-]
You can avoid this mess altogether by using the quoted-printable content encoding when generating emails.
unilynx 149 days ago [-]
How so? Quoted printable doesn't require the dot to be encoded.
Maybe you're thinking of base64 encoding?
blueflow 149 days ago [-]
Doesn't require, but comfortably allows you to do.
unilynx 149 days ago [-]
Agreed.
But this is SMTP, I have no doubt there are gateways out there that will reencode the mail and put the dot back in the wrong place, eg in the name of wrapping all URLs behind a phishing-warning-page
opello 149 days ago [-]
I think the point here is that any subsequent implementation handling the message may well not be susceptible to the problem thus avoiding the erroneous behavior.
int_19h 149 days ago [-]
It can handle it fine in that sense, but still unquote it before sending it over to the next node. Which might have the bug.
camel_gopher 149 days ago [-]
I have two kids. The case is not curious at all.
pyuser583 149 days ago [-]
For female readers, the title might have a very different meaning.
bhaney 149 days ago [-]
This made me realize I have the opposite problem. Now I have to go update the toy SMTP server that I ended up implementing in a perl script so it handles SMTP clients double-dotting a line.
256_ 149 days ago [-]
Sometimes I feel like I'm wasting my time by obsessively reading RFCs and specifications for the things I use. This made me feel better about that. And also much more smug.
Other than the obvious moral that protocols should be implemented properly, the moral of the story is that all abstractions are leaky, and it will always be useful to understand the lower levels.
renewiltord 149 days ago [-]
Suspected it was the 990 char limit per line but this is another one. I assume this is an old system? Despite many claims about “best practices”, there were definitely past platforms where these didn’t exist and a minimal implementation was safer just because you knew the scope of error.
Of course if it was modern, different question.
sethammons 149 days ago [-]
I recognized the problem on sight; we solved this exact issue but since it was MTA software we knew about periods being special. Unfortunately, people are often solving problems that many others have solved. Maybe AI will allow the lines to be connected or solve for known edge cases like a dot in the smtp data
genewitch 149 days ago [-]
you know, now that you mention this, i think i have heard of this exact problem before, perhaps from the late 90s; the "full stop on a line by itself" itched a bit, probably for that reason as well.
croes 149 days ago [-]
>the first character is a period and there are other characters on the line, the first character is deleted.
Why is it implemented that way?
If a single period means end of mail then more than a period means it's mail data.
Why deleting the period in the first place?
Couldn't they store one byte to check the next?
wruza 149 days ago [-]
Cause you have to send a line with a single period somehow.
. Ends body
.. One period <--
... Two periods
.A A
..A .A
A A
Anyway that’s stupid and only helps if you compose your email right in a tcp session.
jiveturkey 149 days ago [-]
> This meant some customers received emails informing them their new premium was now $2700 instead of $27.00.
there's a secondary issue here, why in the world would you auto split a monetary value across a numeric decimal indicator? why would you split lines at all for this use case?
kr0bat 149 days ago [-]
As mentioned, the SMTP protocol only allows for 1000 bytes of data per line. The author also mentions that they are sending html emails, which ignore line breaks.
So a message intended to be sent by an SMTP client:
DATA
Hello customer,<br>[978 characters] 27.00
Was erroneously formated into:
DATA
Hello customer,<br>[978 characters] 27
.00
.
The period after 27 will be removed. And this is how the html will be rendered.
Hello customer,
[Lots of text] 2700
jiveturkey 149 days ago [-]
but html does not ignore line breaks. when part of body text, a run of whitespace (including newline) becomes a single whitespace when rendered.
so splitting 27.00 on the . becomes 27 00, because the CRLF is significant to the client.
you would want to split at whitespace, not at any other character -- unless you had a 999+ string of non-whitespace of course.
perhaps the author didn't know or didn't realize or thought it insignificant to his point that in addition there was a quoted-printable encoding, in which case i believe the trailing/mandatory CRLF can be made non significant for client rendering. personally i still would have split on actual whitespace. (well, i wouldn't have written an smtp client in the first place.)
f33d5173 149 days ago [-]
Hmmmm, html doesn't ignore line breaks, it just treats them as any other whitespace, where a consecutive sequence is folded into a single space. 27 00 would still be quite confusing, of course
tingletech 149 days ago [-]
I think the space character in the comment above is representing a new line on the wire.
kr0bat 149 days ago [-]
Yep, I didn't add enough newlines. Fixed
pwg 149 days ago [-]
From TFA it is stated that they were doing the split of lines because of the "1000 octet" maximum line length requirement of the SMTP protocol.
And, they also state that the period disappeared because it was placed at the start of the next line when the split occurred.
From which one can deduce that they were doing the most basic "split" possible, splitting at the exact 1000 octet point, i.e. something like:
if (length(line)>1000) then:
line1=string_range(line,0,999)
line2=string_range(line,1000,end)
fi
And if the period in 27.00 ended up exactly at offset 1000 in "line" then it got 'split' into line 2 as the first character of line2.
Aloisius 149 days ago [-]
It would split at 998 characters, since each line must end with CRLF.
HeatrayEnjoyer 149 days ago [-]
I do not understand what you are asking. "$27.00" is a standard expense format.
hunter2_ 149 days ago [-]
I think GP was was using the phrase "auto split ... across [character]" in reference to characters that can cause line breaks for "word wrap" purposes in page layouts. For example, a normal space is a character that causes line breaks, but a non-breaking space (nbsp) is not. A hyphen, a tab, a zero-width space (zwsp), and several other characters are also generally used for line breaking. I think GP is saying that the decimal indicator -- the fourth character of "$27.00" -- should not be used for breaking. I think GP assumes that the problematic line breaking in TFA is akin to the type of "word wrap" page layout logic I've just explained; in reality the line breaking in TFA has nothing to do with that, it's simply breaking at 1000 octets (probably for reasons of buffer size, certainly not page layout) regardless of what character is in that position, so this whole thing is moot. GP needs to RTFA!
kccqzy 149 days ago [-]
From the article:
> The maximum total length of a text line including the <CRLF> is 1000
Aloisius 149 days ago [-]
If the period was the 999th character in the line, it would split it to the next line since the maximum line length in SMTP is 1000 characters including CRLF.
867-5309 149 days ago [-]
>Every time an employee of our client needed to send out a document via email or needed to print a document that needed to be sent out by the postal services to the customer the employee would have to replace all the placeholders within the document
ironically titled..
readthenotes1 149 days ago [-]
All this about a missing period and nothing about the consistently missing commas. In fact I would say leaving off punctuation on one line paragraphs would be more consistent.
davidwritesbugs 149 days ago [-]
As I'm implementing an NNTP server based on RFC specs I knew instantly what was happening here without RFA. Dot stuffing, yea 80s protocols baby.
marcosdumay 149 days ago [-]
So... SMTP client misses basic SMTP functionality that fits in the 20 lines summary.
Good luck having it handle any of the SMTP craziness that isn't on the short introduction to the protocol.
banish-m4 149 days ago [-]
When you roll your own library duplicating something that likely already exists, you own it all. And that doesn't include all of the vendor-specific nonconforming edge-cases.
marcosdumay 149 days ago [-]
I guess when you roll your own library, you get an obligation to read the introduction of the 10 pages long standard.
Or the usage example.
m3kw9 149 days ago [-]
For a second I thought this was the health forum
mnw21cam 149 days ago [-]
Mmm. The story is about a missing full stop.
wkat4242 149 days ago [-]
Yeah I was kinda expecting a picture of a pregnancy test :)
ijuelz 149 days ago [-]
The night is dark and full of errors.
EVa5I7bHFq9mnYK 149 days ago [-]
Given that the words Night and Club appear in the picture, the title looked mildly intriguing ..
bruce343434 149 days ago [-]
As someone who configures email servers for a living (among other things): email needs to be replaced, frankly. This is a stupid protocol with even stupider file formats. What is the reason for such a hard coded line limit? It's just a stream of bytes...
Not to mention all the weird bandaids on top of bandaids to try to get sender verification and tamper proof emails working. That alongside the complete lack of end to end encryption.
It's just an incredibly unpleasant tech stack from top to bottom, through and through. The amount of moving parts/pieces of running software needing to cooperate just right to even function as a simple outgoing-only mail server is too damn high.
vardump 149 days ago [-]
You’re not wrong, but how would you suggest to accomplish this? Replacing whole email infrastructure seems nearly impossible.
bruce343434 149 days ago [-]
My hope is that people move away from email as they get tired by it, and towards more convenient instant messaging platforms which pretty much get the feature set right (attachments, encryption, blocking/spam provisions, provenance, "stories" and special group chat modes for broadcasting) and for which open protocols arise ((are being forced by the eu)): https://www.theverge.com/2024/2/6/24063705/whatsapp-interope...
In the future, I hope that email will be like fax, or at least treated like http in comparison to instant messaging (https).
Learner100 149 days ago [-]
- The SMTP client they implemented could insert a newline such that a line was comprised of only a single period.
- The SMTP client spec says that an additional period would be added here.
- The SMTP server spec says that it would remove this additional period, bringing us back to one period.
I don’t get how this led to there being no period at all. Am I missing something?
roer 149 days ago [-]
The spec says that an additional period should be added on the client, but their implementation did not do that.
Learner100 149 days ago [-]
Ah, thanks!
edweis 149 days ago [-]
I like the simplicity of the email membership price increase.
bufordtwain 149 days ago [-]
What about the missing comma at the end of the first line? :)
the_real_tjaart 149 days ago [-]
Whoops, :)
cat_plus_plus 148 days ago [-]
Congratulation on upcoming new addition to your family!
xmjw 149 days ago [-]
Holy shit. This is one of the 2 bugs in my career I never solved. (That I knew about, and lost sleep over, etc…)
BrandonMarc 148 days ago [-]
This reads like a story from the daily wtf
149 days ago [-]
nytesky 149 days ago [-]
Wasn’t this an episode of Silicon Valley?
billy99k 149 days ago [-]
[flagged]
vsuperpower2020 149 days ago [-]
I'm sorry to hear that but this is not really the time or the place for personal stories.
If I wanted to root cause this, the real problem is right there. Implementing protocols correctly is hard and bugs like in the post are common. A properly implemented SMTP client library, like one you would pull off the shelf, would accept text and encode it properly per the SMTP protocol, regardless of where the periods were in the input. The templating layer shouldn't be worrying about SMTP.
There is a multitude of classes of errors and security vulnerabilities, including "SQL injection", XSS, and similar, that are all caused by the same mistake that this case of missing period was[0]: gluing strings together. For example, with SQL queries, the operation of binding values to a query template should happen in "SQL space", not in untyped string space. "SELECT * FROM foo WHERE foo.bar = " + $userData; is doing the dumb thing and writing directly to SQL's serialized format. In correct code (and correct thinking), "SELECT * FROM..." bit is not a string, it just looks like one. Same with HTML templating[1] - work with the document tree instead of its string representation, and you'll avoid dumb vulnerabilities.
So, if you want to avoid missing dots in your e-mails, don't inject unstructured text into the middle of SMTP pipeline. Respect the abstraction level at which you work.
See also: langsec.
--
[0] - And therefore should be considered as a single class of errors, IMO.
[1] - Templating systems themselves are thus a mistake belonging to this class, too - they're all about gluing string representations together, where the correct way is to work at the level of language/data structures represented by the text.
[1] https://genshi.edgewall.org/wiki/Documentation/xml-templates...
[2] https://zope.readthedocs.io/en/latest/zopebook/AppendixC.htm...
[3] https://juniper.github.io/libslax/slax-manual.html
This is not universally true.
JavaScript has an amazing feature called tagged template literals which let you tag a string with interpolations with a function that handles the literal and interpolation parts separately. This lets the tag function handle the literals as trusted developer written HTML or SQL, and the interpolations as untrusted user-provided values.
Lit's HTML template system[1] uses this to basically eliminate XSS (there are some HTML features like "javascript: " attributes that require special handling).
ex:
If `name` is a user-provided string, it can never insert a <script> or <img> tag, etc., because it's escaped.There are similar tags for SQL, GraphQL, etc. Java added a similar String Templates feature in 21.
[1]: https://lit.dev/docs/templates/overview/
Be careful with that "never". A curious and persistent person might discover a bug in the implementation, leading to something like the Log4Shell issue.
But it'd be similar with with other template systems. If the interpolation should allow any string, there's really no validation to be done.
The bulletproof way of doing this is working at the level of abstraction of your target language. With HTML, that would be a tree structure. For example, if your HTML generation looks more like:
and that is passed to code that actually builds up the tree and then serializes it down to HTML, then there is no way `name` could ever break the structure or inject anything.--
[0] - I skimmed the docs of Lit, it seems there are restrictions on where interpolation can be placed, but I don't think they're actually building up the tree expressed by the static parts.
Lit is not working at the serialized level, at all. It parses the templates independently of any values, and the values are inserted into the already parsed tree structure. There's is literally no way for values to be parsed as HTML.
SMTP is an example of an unnecessarily complex design, and the implementation bugs reflect it. SMTP shouldn't be hard for someone to correctly implement by themselves (even though I agree that people shouldn't be re-inventing the wheel).
If it wasn’t a period, it would be something else & you’d have to handle that instead.
That's an incredibly reductionistic view of the world that's utterly useless for anything (including actually engineering systems) except pedantry. It's obvious that the level at which you include control information is meaningful and significantly affects the design of the protocol, as we see in the submission. Directly embedding the control information into the message body does not lead to a design that is easy to implement.
> If it wasn’t a period, it would be something else & you’d have to handle that instead.
Yes, and there are many other design choices that'd be significantly easier to handle.
It's very reductionistic, because it intentionally ignores meaningful detail, and it's pedantic because it's making a meaningless distinction.
> It's a reminder that there is no magic.
This is irrelevant. Nobody is claiming that there's any magic. I'm pointing out the true fact that details about the abstraction layers matter.
In this case, the abstraction layer was poorly-designed.
Good abstraction layer: length prefix, or JSON encoding.
Bad abstraction layer: "the body of the email is mostly plain text, except when there's a line that only contains a single period".
There are very, very few problems to which the latter is a good solution. It is a bad engineering decision, and it also obfuscates the fact that there even is an abstraction layer unless you carefully read the spec.
-------------
In fact, the underlying problem goes deeper than that - the design of SMTP is intrinsically flawed because it's a text-based ad-hoc protocol that has in-band signaling.
There are very few good reasons to use a text-based data interchange format. One of them is to make the format self-documenting, such that people can easily read and write it without consulting the spec.
If the spec is complex enough that you get these ridiculous footguns, then it shouldn't be text-based in the first place. Instead, it should be binary - then you have to either read the spec or use someone else's implementation.
Failing that, use a standardized structured format like XML or JSON.
But there's no excuse for the brain-dead approach that SMTP took. They didn't even use length prefixing,
MTP had one concern which was to get mail over to a host that stood a better chance of delivering it, where the total host pool was maybe a hundred nodes?
I speculate that Postel and Sluizer were aware of alternatives and rejected them in favor of things that were easily implemented on highly diverse, low powered hardware. Not everyone had IBM-grade budgets after all.
Alternative implementations of mail that did follow the kinds of precepts that you suggest existed at one time. X.400 is the obvious example. If I recall correctly, it did have rigorous protocol spec definitions, message length tags for every entity sent on the wire, bounds and limits on each PDU, the whole hog. It was also crushed by SMTP, and this was in the era when you needed to understand sendmail and its notoriously arcane config to do anything. So sometimes the technically worse solution just wins, and we are stuck with it.
JSON needs to escape backslashes, SMTP needs to escape newline followed by period. If you're already accepted doing escaping, what's the issue?
> Bad abstraction layer: (...)
In this context, it shouldn't matter. Sure, "mostly plaintext except some characters in some special positions..." is considered bad in modern engineering practice, however it's not fundamentally different or more difficult that printf and family. You wouldn't start calling printf without at least skimming the docs for the format string language, would you?
> It is a bad engineering decision, and it also obfuscates the fact that there even is an abstraction layer unless you carefully read the spec.
There's the rub: you should have read the spec. You should always read the spec, at least if you're doing something serious like production-grade software. With a binary or JSON-based protocol, you wouldn't look at few messages and assume you understand the encoding. I suppose we can blame SMTP for design that didn't account for human nature: it looks simple enough to fool people into thinking they don't need to read the manual.
> There are very few good reasons to use a text-based data interchange format.
If you mean text without obvious and well-defined structure, then I completely agree.
> One of them is to make the format self-documenting, such that people can easily read and write it without consulting the spec.
"Self-documenting" is IMHO a fundamentally flawed idea, and expecting people to read and write code/markup without consulting the spec is a fool's errand.
> it should be binary - then you have to either read the spec or use someone else's implementation.
That's mitigating (and promoting) bad engineering practice with protocol design; see above. I'm not a fan of this, nor the more general attitude of making tools "intuitive". I'd rather promote the practice of reading the goddamn manual.
> But there's no excuse for the brain-dead approach that SMTP took. They didn't even use length prefixing,
The protocol predates both JSON and XML by several decades. It was created in times when C was roaming the world; length prefixing got unpopular then, and only recently seems to en vogue.
Exactly! This is an even better phrasing of my point.
"The first two bytes represent the string length, in big-endian, followed by that many bytes presenting the string text."
and an in-band signalling protocol:
"The string is ended by a period and a newline."
In the second one, you're indicating the end of the string from within the string. It looks simpler, but that's where accidents happen. Now you have to guarantee that the text never contains that control sequence, and you need an escaping method to represent the control sequence as part of the text.
You always know what the next byte means because either you did a prefixed length, your protocol has stringent escaping rules, or you chose an obvious and consistent terminator like null.
The terms harken back from the day of circuit switched networks but now that we have heavily transitioned to packets, bands are an artificial construct on top of packets and applying the term isn’t very clear cut.
The main property of in-band data in the circuit-switched network days is that you could inject commands into your data stream. If we apply that criteria that to a modern protocol, even if you mix metadata and data in the same “band,” if your data can never be interpreted as commands then “out of band” makes an apt description.
See https://en.m.wikipedia.org/wiki/Out-of-band_data
In this case, somewhere the protocol abstraction layer got broken, and the message text ended up being treated as already serialized. It's not a problem with the protocol per se, but with bad implementation of its API (or no implementation at all, just printf-ing into the wire format).
When we’re talking about whether someone can inject data into the link, we’re talking about the end user and not the software. If we’re talking protocol design, then you wouldn’t want regular data to be able to inject commands by simply existing.
It shouldn't, unless you're bypassing the actual protocol serialization layer (or hitting a bug in the implementation). Which is what's the case here. Protocol design can't address the case of users just writing out some bytes and declaring it's a valid protocol message.
I’m replying to a post where someone said most protocols have in-band signaling and therefore this problem is unavoidable.
I understand that constraint, and it seems reasonable - but in that case, why not use a length prefix? That should be even more efficient than having to scan for a line containing a single period and nothing else.
Hence such horrors as MIME delimiters:
We still have the mess that is the required and standardized behavior of HTML5 parsers faced with bad data.But for SMTP libraries, that's often part of stdlib (Ruby, Python, PHP, ...).
And, even though you might not see a way to call into the unused code, an attacker might find a way (XZ Utils).
That's why it's a best practice to specify protocols at a very high level (e.g. using cap'n'proto) instead of expecting every random sleep-deprived SDE2 to correctly implement a network exchange in terms of read() and write().
Engineers get better faster because they leverage better tools and build tools to overcome their own shortcomings and leverage their strengths, not by constantly being beat into shape by unforgiving systems.
To be fair, what you and OP said is not an uncommon mentality. It's even shared in a way by Torvalds:
> [easier to do development with a debugger] And quite frankly, I don't care. I don't think kernel development should be "easy". I do not condone single-stepping through code to find the bug. I do not think that extra visibility into the system is necessarily a good thing.
> Quite frankly, I'd rather weed out the people who don't start being careful early rather than late. That sounds callous, and by God, it _is_ callous. But it's not the kind of "if you can't stand the heat, get out the the kitchen" kind of remark that some people take it for. No, it's something much more deeper: I'd rather not work with people who aren't careful. It's darwinism in software development. It's a cold, callous argument that says that there are two kinds of people, and I'd rather not work with the second kind. Live with it.
He has similar views about unit tests btw.
I personally would prefer to work with people who are smart & understand systems and have machines take care of subtle details rather than needing to always be 100% careful at all times. No one writing an SMTP parser is at Torvald's level.
I'm not arguing that this excuses you from being careful or failing to understand things. I'm saying that defensively covering your flank against common classes of mistakes leads to better software than the alternative.
This is a point I agree with and the fact I see it mentioned so rarely, that standards are split across multiple RFC's makes me suspect that people don't mention it because they don't know because they never read them in the first place, and rather try to follow the implementation of some existing program.
This makes me wonder: How could the IETF's approach to standardisation be improved? I'm not sure how to fix this problem without overhauling everything.
In Germany, where I work, it is usual at the end of employement to ask for a letter of recommendation ("Zeugnis") that lists the tasks performed, and how good the employee was. It is an important document, as it will typcally be required when applying for jobs. Obviously, no employee would accept a document explicitly stating "this guy is a lazy bastard, do not hire him", so there is a "Zeugnissprache", a "secret code" to disguise this information as praise. One part of this code is that a missing period in the last sentence means "please ignore everything said here, this guy is horrible".
How do I know? I let a lawyer check my Zeugnis after my last employment, and (I assume out of lack of care, as all my performance reviews were positive) the last sentence was missing the period.
This legend comes from the fact that HR people cannot be too explicit about the fact that you've been a pain in the ass (you could probably sue if it's too transparent), so if they have nothing positive to say they will commend your punctuality or something equally as mundane. It's not secret codes, it's like... "bless their heart", but in HR talk. Plausible deniability if you want to sue, I guess. "But it's a good thing, your honor! They were always on time!"
But in the specific german case, the code is not even that secret. This is a formal document with a very specific structure, and very standardized phrases. There is even specific software to generate the text out of performance ratings. Basically something like this:
- John was overal engaged: he is a lazy bastard
- John was engaged: he is OK
- john was very engaged: he is good
- john was always very and thoroughly engaged: he is very good
...sigh:
Secret codes as in "watermark-level omission of characters" are a myth. Lingo and jargon do however exist, and convey meaning in a particularly subtle way. They are shared and taught by culture, not by a secret handbook passed down from generation to generation. See also dogwhistling.
The goal is to protect the issuer, not to selflessly inform the recipient.
> You will be lucky to have this person work for you.
This isn't a veiled statement. It's outright dunking on the applicant.
I think there's some deeper issue with the language/culture here.
> you would be lucky to get this employee to work for you!
"In politics, a dog whistle is the use of coded or suggestive language..."
Gee, what could possibly go wrong?
You just use the mail program from mailutils or whatever.
Just from a point of view of deliverability, developing bare bones SMTP interaction over a socket is a nonstarter. You can't just connect to random mail exchange hosts directly and send mail these days. A solution has to be capable of connecting to a specific SMTP forwarding host (e.g. provided by your ISP). For that, you need to implement connections over TLS, with authentication and all.
Also, a slightly ironic thing is that cron already knows how to send mail. The output of a cron job is mailed to the owner. Some crons let that mail address be overriden with a MAILTO variable in the crontab or some such thing.
It's not a good reason, no--definitely not--but it's a real reason.
You can require it and it gets provisioned.
In other words, every other platform expands until it can summarize emails.
But the other thing is: Don’t vendor your dependencies. Those libraries you use need to be updated regularly and timely, and absolutely not “only as necessary”. If updates lag behind or are avoided entirely, bugs like this can be huge problems even when the upstream code has been fixed, for people who thought that they should update only when they, themselves, see a problem or need.
The alternative seems worse: your own application's stability is now at risk against upstream changes that could break your code. Sure, you might not get a fix immediately, but I'd rather know I'm making a change because I need a fix than introducing instability and additional risk that I don't want to subject myself to. "If it ain't broke, don't fix it."
(For apps, a lock file will do it too.)
https://chromium.googlesource.com/chromium/src/+/HEAD/third_...
(And surely you should have tests to verify all your own functionality after upgrading a dependency?)
I agree 100%.
Was very hard for the team. There was a shouting match over it, some hard feelings. The code was written in the spirit GP alludes to by an enthusiastic executive who wanted to help lighten the load. I should've rejected the PR but was intimidated to reject the exec's code (not an engineering reason for an engineering decision!). The exec was a good data scientist but not as strong a coder as me, and parsing binary files is one of my specialties.
Friends don't let friends parse using "indexOf()".
Another, more domain specific one: We have to talk to a GPS receiver that connects via RS-232 and outputs NMEA formatted data. This device, like all such devices, outputs a small subset of the standard. So parse just enough of it for that one device we have to get working, and ship it. Then someone attaches a different device that outputs a different subset of the standard and the software fails (sometimes gracefully, sometimes not).
It was a struggle to get the vendor to even acknowledge the problem, since it would only happen on 7 days out of the year, and usually 60 days apart. (There are only two consecutive months with 31 days, July and August.) It did eventually get fixed.
I seem to recall writing a workaround, but then it being stalled by our customer's Change Control Board.
Relatedly, escaping somehow seems to be a foreign concept for a lot of programmers, who wouldn't ever see the above situation and ask themselves "but what if I want to send an email with a line containing a single dot?" yet another large group of them finds it perfectly logical and easy to understand.
I'd be surprised if any significant portion of software developers learned anything like that in the last 30 years, at least.
SMTP https://www.rfc-editor.org/rfc/rfc5321#section-4.5.2
Also in POP3 https://www.rfc-editor.org/rfc/rfc1939#page-8
As soon as I printed them, the error was clear. My version ran to two pages and the good implementation one page. I had not been careful to clear the buffer before sending the data (mbufs don't you know).
This still cracks me up.
the company is just blindly unaware to their minor problem.
While quoted-printable is supposed to have a max line limit of 78 characters including CRLF rather than 1000, email clients tend to be permissive.
The square brackets allow us to stick a port number on it: [ffff::0123:4567]:6301
As soon as I read the above, I knew the below would be the result.
> It seems one of our other teams haven't gotten around to patching this bug in their code.
I share your sentiment, thank you for reading.
[1] https://en.wikipedia.org/wiki/Wireless_Application_Protocol
[2] https://en.wikipedia.org/wiki/ColdFusion_Markup_Language
What the...
https://www.ibiblio.org/harris/500milemail.html
(I'm only messing with you as you criticized the author's use of language then replied to me with imprecision.)
;-)
(-;
Maybe you're thinking of base64 encoding?
But this is SMTP, I have no doubt there are gateways out there that will reencode the mail and put the dot back in the wrong place, eg in the name of wrapping all URLs behind a phishing-warning-page
Other than the obvious moral that protocols should be implemented properly, the moral of the story is that all abstractions are leaky, and it will always be useful to understand the lower levels.
Of course if it was modern, different question.
Why is it implemented that way?
If a single period means end of mail then more than a period means it's mail data.
Why deleting the period in the first place? Couldn't they store one byte to check the next?
there's a secondary issue here, why in the world would you auto split a monetary value across a numeric decimal indicator? why would you split lines at all for this use case?
So a message intended to be sent by an SMTP client:
DATA
Hello customer,<br>[978 characters] 27.00
Was erroneously formated into:
DATA
Hello customer,<br>[978 characters] 27
.00
.
The period after 27 will be removed. And this is how the html will be rendered.
Hello customer,
[Lots of text] 2700
so splitting 27.00 on the . becomes 27 00, because the CRLF is significant to the client.
you would want to split at whitespace, not at any other character -- unless you had a 999+ string of non-whitespace of course.
perhaps the author didn't know or didn't realize or thought it insignificant to his point that in addition there was a quoted-printable encoding, in which case i believe the trailing/mandatory CRLF can be made non significant for client rendering. personally i still would have split on actual whitespace. (well, i wouldn't have written an smtp client in the first place.)
And, they also state that the period disappeared because it was placed at the start of the next line when the split occurred.
From which one can deduce that they were doing the most basic "split" possible, splitting at the exact 1000 octet point, i.e. something like:
And if the period in 27.00 ended up exactly at offset 1000 in "line" then it got 'split' into line 2 as the first character of line2.> The maximum total length of a text line including the <CRLF> is 1000
ironically titled..
Good luck having it handle any of the SMTP craziness that isn't on the short introduction to the protocol.
Or the usage example.
Not to mention all the weird bandaids on top of bandaids to try to get sender verification and tamper proof emails working. That alongside the complete lack of end to end encryption.
It's just an incredibly unpleasant tech stack from top to bottom, through and through. The amount of moving parts/pieces of running software needing to cooperate just right to even function as a simple outgoing-only mail server is too damn high.
In the future, I hope that email will be like fax, or at least treated like http in comparison to instant messaging (https).
- The SMTP client spec says that an additional period would be added here.
- The SMTP server spec says that it would remove this additional period, bringing us back to one period.
I don’t get how this led to there being no period at all. Am I missing something?