As someone who spent most of a career in process automation, I've decided doing it well is mostly about state limitation.
Exceptions or edge cases add additional states.
To fight the explosion of state count (and the intermediate states those generate), you have a couple powerful tools:
1. Identifying and routing out divergent items (aka ensuring items get more similar as they progress through automation)
2. Reunifying divergent paths, instead of building branches
Well-designed automation should look like a funnel, rather than a subway map.
If you want to go back and automate a class of work that's being routed out, write a new automation flow explicitly targeting it. Don't try and kludge into into some giant spaghetti monolith that can handle everything.
PS: This also has the side effect of simplifying and concluding discussions about "What should we do in this circumstance?" with other stakeholders. Which for more complex multi-type cases can be never-ending.
PPS: And for god's sake, never target automation of 100% of incoming workload. Ever. Iteratively approach it, but accept reaching it may be impossible.
carlmr 42 days ago [-]
I also like the way the Toyota production system puts it with Jidoka / automate with a human touch. [1]
1. Only automate the steps you know how to execute manually very well. This is a prerequisite. This kind of goes in the direction of not automating everything but looking at the parts you can automate well first.
2. Enable human intervention in the automation. Enable humans to stop the line when something is wrong. Then you figure out what's wrong and improve your automation. This implies the interative approach.
Or to put it more concisely, trust but verify the ‘manual process’, that includes any verbal or written explanations of the process.
Only 100% trust the automation after every possible step, procedure, and word has been 100% verified. (Which is to say almost never…)
bloopernova 42 days ago [-]
> Well-designed automation should look like a funnel, rather than a subway map.
Do you have any examples of this? Like an example project?
> never target automation of 100% of incoming workload. Ever.
A new product owner came in last year and wanted to "automate everything". They wanted a web page where builds, branches, commits, and more were all on one page that showed exactly which commit was deployed where, by whom, etc etc. They wanted this extravaganza for a 2-person application that was in maintenance with no new features.
They're also the kind of person who consults chatgpt or copilot on a subject while you're explaining that subject to them, to check that the LLM agrees with what you are saying. They'll even challenge people to prove some copilot output is incorrect. It seems to me that they consider LLMs more reliable than people.
bee_rider 42 days ago [-]
> They're also the kind of person who consults chatgpt or copilot on a subject while you're explaining that subject to them, to check that the LLM agrees with what you are saying. They'll even challenge people to prove some copilot output is incorrect. It seems to me that they consider LLMs more reliable than people.
Dear lord, these tools have just come out, how can they already have invented a new type of asshole?
whstl 41 days ago [-]
We had a product manager that made requirements based mostly on ChatGPT.
It would output completely nonsensical stuff like QR-Code formats that don't exist, or asking to connect to hallucinated APIs.
It was often caught by lead devs quite quickly: the documentation wasn't a link or a PDF but rather some block of text.
But in the cases it wasn't, it was super costly: some developer would spend hours trying to make the API work to no avail, or, in the case of the QR code, it would reach QA which would be puzzled about how to test it.
So yes there is a new type of asshole.
johnm 42 days ago [-]
No, those are the same that have been around forever. They just have a new tool to "justify" their crappy behavior.
mitjam 41 days ago [-]
I experienced this, as well. It’s a whole new level of „I know enough to be dangerous“.
whstl 41 days ago [-]
This is as fun as the business or product person that "knows how to code".
bee_rider 41 days ago [-]
Hah, thanks to LLMs we’ve drastically reduced the barrier to entry, in terms of knowing enough to be dangerous. Hopefully there’s a corresponding reduction in the level of knowledge required to be useful…
SatvikBeri 41 days ago [-]
> Do you have any examples of this? Like an example project?
Not the OP, but I worked on loans once. The application originally required tax returns to check income. Then we added the option to upload recent bank statements instead of tax returns. But there were a lot of places where the assumption of tax returns was hard-coded, so to save time the developers basically added an entirely separate code path for bank statements. Do this n times, and you have 2^n code paths.
When we eventually refactored and fixed it, we replaced the hardcoded references to tax returns with a concept of estimated income. Then we were able to reduce the branch to simply saying "if tax returns are present, estimated income = get_estimated_income_from_tax_returns(), otherwise estimated income = get_estimated_income_from_bank_statement()".
That's the core idea – collapse a branch as soon as possible.
chrsig 42 days ago [-]
> A new product owner came in last year and wanted to "automate everything". They wanted a web page where builds, branches, commits, and more were all on one page that showed exactly which commit was deployed where, by whom, etc etc. They wanted this extravaganza for a 2-person application that was in maintenance with no new features.
you know, this seems very reasonable until that last sentence.
pc86 41 days ago [-]
I would argue that if you are non-technical enough that you need this type of data on a web page for you, you probably don't actually need this data and I'd be wary that you're just looking for a way to find out who broke something so you can be shitty to them.
If you really want to know this type of commit-level data you can get it from git pretty easily, even if you're not particularly good with git but can search half-decently. If you don't have the skills to use git, it's extremely unlikely that knowing what the current branch and commit status of the repository is will meaningfully help you do your job.
Aeolun 41 days ago [-]
I want this information and I can easily pull it out of git. I still want the webpage too because I don’t want to take 15 manual steps and open twenty different Github pages every time I want to find out.
ykonstant 42 days ago [-]
Why a web page and not directly the git log? If style is necessary, reformat the log data with some fancy ASCII art?
teqsun 42 days ago [-]
I'm assuming the PO isn't technical so git-log, git-blame, etc. are over their head.
Which itself begs why they'd need this level of detail on the codebase.
marcosdumay 42 days ago [-]
It's hard to put your current ops configuration inside the git log. If you found some way to do that that fits well in the philosophy of a stream of immutable changes, I'm interested in reading your ideas.
andiveloper 42 days ago [-]
We are using git tags on the commit to figure out what is currently deployed where, e.g. "dev-stage", "test-stage" etc.
chrsig 42 days ago [-]
the "which ones are deployed where" bit is nice. if you're managing a lot of repos and deployments, yeah, that kind of thing can get really messy.
i don't care how it's presented, via webpage or cli tool or whatever -- just saying that when you are working at larger scale, those are very reasonable things to want to see in one spot at a glance.
the need dissipates as you scale down.
stackskipton 42 days ago [-]
Sure, but that's best handled by Application reporting it via monitoring system. For example, at my company, we embed git commit, version and branch that last merge to main in container environment variables. Prometheus then exposes that as labels so we can just look any time it comes up. If we wanted to build a Grafana dashboard, that could be done easily as well.
I'm sure most monitoring systems have some way to loading that into their system.
chrsig 41 days ago [-]
Sure, and the commenter's PO didn't specify how to get him a webpage. All very reasonable, see?
Smeevy 42 days ago [-]
Oh my goodness, I wouldn't last a day with someone who did that. That sort of casual disrespect while you're talking to someone is wholly unacceptable behavior.
There's only one person you work with that's like that, right? Right?
bryanrasmussen 42 days ago [-]
this reminds me of the last manager I had who used to be a developer, I think he was probably a pretty good developer - but not as good as he thought he was, because he thought he was good enough that he could tell you how things worked and why you were wrong without any knowledge of the code base or experience actually using the various 3rd party services that were being integrated.
I tried to develop the skill of nodding and then doing it correctly later, but it would always hit because there was a ticket I had written getting assigned to another dev and I had to explain to them why it was the way I had specified, and then he would correct me, and I would say yes part of what you say is correct and needs to be considered (as I said, I think he was a good developer at one time) but not all of it and he would insist he was correct and I had to go talk it over later with the dev as to why it worked the way I specified.
bornfreddy 41 days ago [-]
That sounds awful. The manager is clearly not doing their job well, so avoiding their opinion and covering their mistakes is imho counterproductive. Instead, I would let my reservations be known and proceed exactly as they suggested. If it works, great, I have learned something. If not, let's scrape this and do it right, this time with manager's knowledge.
But in the end, if you can't work with the manager, it's time to head to greener pastures.
_rm 41 days ago [-]
The problem with this head nod approach is it won't lead to reigning such types in.
Only their boss can reign them in, and so you have to use techniques to shine a light on them to their superiors.
Think "I want you to record your order" from HBO's Chernobyl, but more surreptitious.
supriyo-biswas 41 days ago [-]
One other way of reining in such a manager in is to bring someone in the meeting who the manager trusts and who you too have good rapport with, and have them say the same points that you would have made otherwise.
Effectively a form of trust and reputation arbitrage, but it was effective for dealing with a particularly difficult manager who didn’t accept certain things about the design of an API, and yet when the other guy told him the same things, he just asked a few mild follow ups and accepted what I was telling him all along.
_rm 40 days ago [-]
Yeah absolutely, I've used this for improvement suggestions too. Decide what you want to do, and then find the most big name source who's said basically the same thing, and then quote them pretending you're just relaying their message.
bloopernova 42 days ago [-]
So far they are the only one.
They're definitely "leadership material"!
knicholes 42 days ago [-]
... but were they right?
pc86 41 days ago [-]
Being right is not license to be an insufferable asshole to the people you work with.
shakna 42 days ago [-]
> They wanted a web page where builds, branches, commits, and more were all on one page that showed exactly which commit was deployed where, by whom, etc etc.
Sounds like they would have been a fan of ungit [0], which I have seen used for that kind of flow overview, though it has looked more impressive than actually proved helpful in my experience.
Not GP, but at my old job making games we had a truly horrendous asset import pipeline. A big chunk of it was a human going through a menial 30 minute process that was identical for each set of assets. I took on the task of automating it.
I made a CLI application that took a folder path, and then did the asset import for you. It was structured into "layers" basically, where the first layer would make sure all the files were present and correct any file names, the next layer would ensure that all of the textures were the right size, etc. etc.
This funneled the state of "any possible file tree" to "100% verified valid set of game assets", hence the funnel approach.
It didn't accept 100% of """valid""" inputs, but adding cases to handle ones I'd missed was pretty easy because the control flow was very straightforward. (lots of quotes on "valid" because what I thought should be acceptable v.s. what the people making the assets thought should be acceptable were very different)
initplus 41 days ago [-]
One example is that instead of adding support for edge case mutations/changes late in a process, it's sometimes better to force those records to be thrown away and reset with a new record from the start of the process. You avoid chasing down flow on effects of late unexpected changes in different parts of the application.
To give a contrived/trivial example, imagine a TLS handshake. Rather than building support to allow hosts to retry with a different cert, it's better to fail the connection and let the client start from scratch. Same principle can be applied to more complex process automation tasks in business. Imagine a leave tracking system. It might be better to not support changing dates of an existing leave application, and instead supporting cancel & re-apply. Best part is that the user facing part of both versions can be exactly the same.
jimkoen 41 days ago [-]
> wanted to "automate everything".
With all due respect, this is preached by pretty much every book you read on cloud administration. I'd argue that if the process is decent enough, it'll work with major cloud providers, because their API's are rich enough to enable this already.
The thing with most automation tools though is, a) they're abysmal for most of the workflows preached (thinking of ansible and im shuddering) and b) to reach the degree of automation described in most literature, you need the API's of $MAJOR_CLOUD_PROVIDER.
internet101010 42 days ago [-]
An example of this would be filtering on different attributes of data. You have a complex decision tree interface but ultimately the output is constructed into a single way that can be passed along to a single class/function.
airbreather 41 days ago [-]
They are also the sort of person that thinks the problem can defined by thinking about and describing wanted behaviour alone.
Aeolun 41 days ago [-]
My boss does this once in a while. While I understand the impulse, it always makes me feel a bit redundant when they’ll ask me to explain why what ChatGPT spits out won’t work for our situation.
Isn’t that the whole point of hiring experts?! So that you can ask them, instead of the computer for advice?
perrygeo 42 days ago [-]
> never target automation of 100% of incoming workload... Iteratively approach it.
This is critical. So many people jump from "it works" to "let's automate it entirely" without understanding all the nuances, all the various conditions and intermediate states that exist in the real world. The result is a brittle system that is unreliable and requires constant manual intervention.
A better approach is to semi-automate things first. Write scripts with manual QA checkpoints and playbooks that get executed directly by devs. Do the thing yourself until it's so boring it hurts. Then automate it.
screye 42 days ago [-]
This is genius. I skimmed it the first time, and it took me a good 30 minutes to appreciate the wide applicability of your insight.
A whole family of problems across domains : supply chains, manufacturing lines, AI pipelines, resilient software, physical security, etc. come down to effective state limitation.
In my day job, the vast bulk of my arguments with PMs come down to a lack of planning allocation for post-launch code cleanup. I haven't been able to find a succinct articulation for the utility of 'code cleanup' to a wider audience. 'State-limitation' fits nicely. Exactly the concept I was looking for.
It draws from the better known 'less-is-more' adage, but adds a lot of implicit detail to the generic (almost cliche) adage.
_rm 41 days ago [-]
To be honest, for most low or moderately performing organisations, the best technique is just to not talk about it and just do it.
So long as it's done silently, blended in with other things, and cloaked under clever wording (e.g. "this blocks that other thing you want" rather than "this will improve the codebase"), things will go quite well.
As soon as you speak to them as you would another engineer, you provide them material to use against you in prevention of you taking proper action.
screye 40 days ago [-]
That only works if the whole team coordinates.
If one person writes broken code in half the time, while you take twice as much cleaning the mess.....then you're going to be perceived as ineffective.
achillesheels 40 days ago [-]
“A whole family of problems across domains…come down to effective state limitation.”
A fancy way of saying, “simplicity is the mark of truth.”
Or
“Less mechanical points of failure the better.”
I concur. Eliminate design risk.
HatchedLake721 42 days ago [-]
With your experience, anything you'd recommend to read in the process automation space?
(I'm a founder of an automation SaaS where we've made "human interface" one of the core features of the product)
trod123 40 days ago [-]
I'm not aware of any books that have covered this appropriately.
The biggest part of automation in my experience is boiling down the inputs to a 'unique' state that automation can then use as inputs and be run on.
For computation to do work, it requires a property of consistency, and computers can only operate accurately and do work when properties of determinism are met. Also as the OP mentions, state can explode leading to sphagetti, this is why he mentions a sieve like approach based on similarity.
Some problem spaces can be fundamentally inconsistent, such as with some approximations (common methods used for such), which falls back to what amounts to guesses, heuristics, and checks in terms of exception handling. There are problem scopes that cannot be characterized too so no amount of exception handling will resolve the entire scope, which is why you need fallbacks in a resilient design.
If inputs cannot be controlled and uniquely differentiated, the automation fails in brittle ways, especially with regards to external change.
The main interface (with regards to your core features) would be language, or communication. There exists words right now that can have contradictory, and different meanings, where the same word may mean the opposite depending on context, and this is not a general consensus but an individual one (where the individual may be misusing it).
That breaks the 1:1 mapping required for determinism, and AI weights mimicking neurons have a narrow approximation where it may work under a narrow set of circumstances but no computer today can differentiate when the inputs are the same but have two or more, different states mixed in (many people forget that absence of a state is a state too) and then decompose them. Abstract decomposition seems to be something only humans are good at, and I'm glad this is the case otherwise none of us would have jobs.
seanlinehan 42 days ago [-]
Spot on.
I was an early eng and first VP of Product at Flexport. Global logistics is inherently complicated and involves coordinating many disparate parties. To complete any step in the workflow, you're generally taking in input data from a bunch of different companies, each of which have varying formats and quality of data. A very challenging context if your goal is process automation.
The only way to make progress was exactly the way you described. At each step of the workflow, you need to design at least 2 potential resolution pathways:
1. Automated
2. Manual
For the manual case, you have to actually build the interfaces for an operator to do the manual work and encode the results of their work as either:
1. Input into the automated step
2. Or, in the same format as the output of the automated case
In either case, this is precisely aligned with your "reuinifying divergent paths" framing.
In the automated case, you actually may wind up with N different automation pathways for each workflow step. For an example at Flexport: if we needed to ingest some information from an ocean carrier, we often had to build custom processors for each of the big carriers. And if the volume with a given trading partner didn't justify that investment, then it went to the manual case.
From the software engineering framing, it's not that different from building a micro-services architecture. You encapsulate complexity and expose standard inputs and outputs. This avoids creating an incomprehensible mess and also allows the work to be subdivided for individual teams to solve.
All that said – doing this in practice at a scaling organization is tough. The micro-services framing is hard to explain to people who haven't internalized the message.
But yeah, 100% automation is a wild-goose chase. Maybe you eventually get it, maybe not. But you have to start with the assumption that you won't or you never will.
initplus 41 days ago [-]
Sounds like a really interesting problem space. I'm curious if you have any comments about how you approached dealing with inconsistencies between information sources? System A says X, system B says Y. I suppose best approach is again just to bail out to manual resolution?
seanlinehan 39 days ago [-]
In the early days, we bailed out to manual resolution. In the later days, we had enough disparate data sources that we built oracles to choose which of the conflicting data was most likely to be correct.
For example, we integrated with a data source that used OCR to scan container numbers as they passed through various way points while they were on trains. The tech wasn't perfect. We frequently got reports from the rail data source that a train was, for example, passing through the middle of the country when we knew with 100% certainty that it was currently in the middle of the pacific ocean on a boat. That spurious data could be safely thrown out on logical grounds. Other cases were not as straightforward!
davedx 42 days ago [-]
This has a corollary in software and refactoring: don’t refactor duplicated code too early; keep it separate until it’s very clear the refactoring is correct
chrsig 42 days ago [-]
I wouldn't consider my career being in process automation, but I feel like you just described my approach to managing new product development on top of existing machinery while ensuring no breakage of what exists.
Not directly relevant to the post, but seems like a good place to share.
My team and I once took on a very tricky automation project. At the time we had a complex software deployment done about once per month that involved a team of about a dozen people showing up at 4am to do it while traffic was low.
The deployment involved many manual steps and coordination of everybody involved. The person leading each deployment followed the documented list of steps and got each person to do their bit at the right time; people to run database migrations, people to install RPMs on particular servers, people to test and verify functionality. Mistakes and missed steps were not uncommon.
The very first thing we did was take the documentation and write a Jenkins job to post each step into a Slack channel specifically for coordinating the deployments. Someone clicked "go" and each step was posted as a message in that channel with a 'done' button to be clicked when that step was done. Clicking the button caused the next step to be posted.
The next release we did used that instead of one person reading the steps out of confluence. Everyone involved in the release could always see what step we were at, and when it was their turn to do their bit. This helped ensure no steps were ever missed too.
Over the following months we chipped away at that job a bit at a time. We'd pick a step in the process and automate just that step, starting with the low-hanging fruit first. The Slack message for that step went from "click to confirm you've done it" to "click to do it", with the result posted once it was done; followed by the next step to perform.
It was a long process, but it allowed the rest of the business (and us!) to gradually gain confidence in the automation, and lowered the risk of the project dramatically. Once several steps had been automated and battle-tested we removed the 'click to do' bits in between and the whole release became a couple of clicks followed by the odd bit of manual QA.
What is the point of defining a Python class with a single `run` method, and then running with `Class.run()`, instead of just defining a `function` and running with `function()`?
GarnetFloride 42 days ago [-]
Checklist automation is a very powerful tool. Making a script that just calls the checklist that you walk through is the first step because you have to debug the checklist. It's really hard to automate something if you don't really known what it is you are doing, and steps are always missed.
Once you debug the checklist you can start automating the steps and then you find the things that are easy for humans to do but hard for computers. That's the fun part.
In one case I worked on, they automated the sales funnel, but then customers started asking for refunds and complaining online. Turns out the automation didn't work.
I got a team together to get the customers happy and then did a Stalin's Postman test where we walked through the whole process. All but one of the steps failed.
Now that we knew what was failing, we could start the process for fixing it.
pchristensen 41 days ago [-]
What's a "Stalin's Postman test"? I couldn't find anything for that phrase.
sebastialonso 40 days ago [-]
I can't stop thinking parent comment is ChatGPT output
Jimmc414 39 days ago [-]
We should delve into that.
hinkley 41 days ago [-]
Hybrid automation is remarkably effective. I learned about this idea around five or six years ago and have been employing it since.
Step 3: Go to this URL and do a thing, then click Y.
Do some stuff, do some more stuff
Step 8: Go to this URL and undo the thing you did in Step 3, then click Y.
deely3 42 days ago [-]
Thats ideal world solution. Love it!
How much resources requires maintenance of it?
troupe 42 days ago [-]
The Do Nothing Scripting [1] is an approach that I feel recognizes the difficulty of automation by starting out with a script that just tells you what to do like a checklist. Once you've used it to fully document all the exceptions, etc. you can then start building the automation.
An approach like this seems to give more weight to the fact that just figuring out a way to document exactly what needs to be done is often the hardest part and if you get that right before you start writing automation code, it might make make the automation much more efficient.
This reminds me of Seeing Like a State[0] (I know, it's not new... but I only got around to reading it earlier this year.)
Automating a process makes it more standardized and legible, but takes away a lot of the nuance and of the resilience that human institutions tend to bake in. Do it too little, you get chaos; do it too much, you're destroying the very thing you were trying to nurture.
It's certainly deformation professionnelle, but the parallel with software is eerily relevant.
One thing on my list for CRUD software is an „I know what I’m doing” button that bypasses validation. Usually it’s behind manager or admin approval.
It’s pretty tricky to get right and is very case by case in what it specifically means. But it’s been critical to handling edge cases.
For example, in 99% of cases, you can’t progress to stage X without doing Y. But in this case „I know what I’m doing”.
Though this puts a lot of pressure on downstream processes to handle missing data. Fortunately that hasn’t been an issue too much because validation grows and changes over time and must naturally deal with records that were created under prior validation when their state was originally valid.
duckmysick 42 days ago [-]
The "tricky to get right" part is so true.
One time I was involved in automating a business process. Nothing super complicated - scrapping data from a portal, filling forms, some data wrangling. It had to handle a few edge cases and involve a human in the loop. I tried to include as much data validation as possible, but this being a business in real world there were of course some exceptions. I implemented a Superuser Override which was basically a button that launched a popup with another form where the app users requested an approval from their manager to accept the unvalidated data. It was red and scary and it had Bootstrap .danger all over because it was a cool thing to do back then. Don't judge, ok?
Things worked as they should have. The automation saved tons of time, even if some fields were missing. Occasionally, there was a Superuser Override request which sent an email to the manager overseeing the process. Actually, things worked too well - the volume of processed data increased and so did the number of the Override requests. The manager - busy as they usually are - started approving all the requests. Manually at first and then they ran a macro that did it for them.
Eventually the app users got wind of it and they started using the Superuser Override liberally. Before they shown more restraint.
I wish there was some clever punch line to this story but I don't know what happened after.
If I had to do it again, I'd be more aggressive with validations, make the approval process more like a code review, involve the most experienced app users as the Superuser, and have audit logs. Oh and Tailwind instead of Bootstrap.
veggieroll 41 days ago [-]
Yeah. My first choice for approval is admins because some managers are better than others on this for sure. But that really is only possible because I build for a fairly small user base.
Audit logging is definitely another must for me. I literally won't make an app without it. The app I'm working on now had a case where a PM promised up and down that edit history wasn't needed for a particular section, and lo-and-behold 2 years later it turns out that actually we do need the history.
christkv 42 days ago [-]
This is so important. You don’t want to stop flows from happening because of a bug or bad config. Just keep track of the override who did it when and if possible a quick blurb. Then ensure those exceptions are caught and fed to dev.
0cf8612b2e1e 42 days ago [-]
I like this idea, but seems like a bug factory. Going to break lots of downstream assumptions and create duplicate workflows for these (hopefully) rare events.
leetrout 42 days ago [-]
Related - I highly recommend reading Lisanne Bainbridge's paper "Ironies of Automation" which points out why we need to keep human factors in mind when designing automation.
It doesn’t have to be so hard, though. One thing you can do is learn to just say “no”. Such as no to cash on delivery.
Being flexible is good, but it comes at a cost. When that cost is too high, don’t do it. Realize that the customer who wants that workflow probably isn’t going to be your make-or-break moment. And in fact, they might have flexibility of their own. For example, you don’t accept Amex but they have a backup credit card or cash. It might be annoying to them if they are fussy, but it’s normal enough that the consequences are minimal. And yes, you may occasionally get a customer who doesn’t have that flexibility, but you shouldn’t be pinning your business on rare events. Figure out what’s most common among your target customers and support a few simple workflows. Say no to everything else until you have the resources to do it properly.
Another thing you can do is have a hybrid low tech/high tech solution. Automate structured inputs with software. Write down the rest as notes in a logbook. Over time you will probably see patterns in the logbook that you can automate.
Lastly, remember what Morpheus says in The Matrix, “Some [rules] can be bent. Others can be broken.” For example, you could simply pay the bill on behalf of the customer who wants cash on delivery. Now you assume some personal risk but the computer system doesn’t have to support their workflow. Is it worth it? Maybe, maybe not, but it’s a choice you can make.
akira2501 42 days ago [-]
> When that cost is too high, don’t do it.
Or just recognize that your software ideology is incorrect for the problem at hand. This guy wanted to write a single piece of software that did everything the business needed. That was clearly a mistake and no surprise that his software ended up very tightly coupled to the business itself.
This was an error in interface and component design. Most likely caused by starting to write software before fully understanding the job roles and people who would be using it.
abakker 41 days ago [-]
I think it's OK to make that mistake because "starting" is a more positive force than "starting over" is a negative one. if the abstraction or assumptions were wrong, you already built some kind of lossy approximation that is highly useful. its just important to recognize when the prototype needs to be thrown away or retired.
the corollary of this, though, is that "there's nothing more permanent than a temporary fix". so, balancing these ideals is the work of engineering and management.
the_sleaze_ 42 days ago [-]
This is of course the right answer, reality is often easier to change than software when the goal is to keep reality and software in sync.
So much harder to say no that it sounds though - sales is saying yes to everything they can, board is pressuring for more sales, the users are taking every possible opportunity to shift blame onto the software, and engineering is buckling under the churn of changing business goals every quarter.
SoftTalker 42 days ago [-]
Depends on the customer too though. Exceptions are always made for big/valuable customers, even if they are a PITA.
If you have a long term big customer who has always been able to do COD and you suddenly pull that option with the explanation "it's easier for us" that's not going to go over well. Now you might lose them and they will tell their network "Wow, Foo Corp used to be good now they are just being unreasonable."
CPLX 41 days ago [-]
I mean it depends. Everything depends.
Are you selling to restaurants? To grocery stores in a large city's Chinatown? And so on.
akira2501 42 days ago [-]
> My favorite example of the latter is how the arrival of IBM computing in the 60s and 70s totally changed the definition of accounting, inventory control, and business operations. Every process that was "computerized" ultimately looked nothing at all like what was going on under those green eyeshades in accounting.
How did you arrive at that conclusion? Most of those applications were written by the businesses themselves. IBM provided you a programming environment not a set of predefined business software. Which is why COBOL jobs still exist.
What changed with the mainframe was that instead of having a bunch of disparate and disconnected processes you now had a centralized set of dependent processes and a database that provided a single source of truth.
Businesses were going to want this capability regardless of how limited humans with green eyeshades were previously.
> Much of the early internet (and still most bank and insurance) look like HTML front ends to mainframe 3270 screens.
Well, precisely, those are custom applications. If they weren't we wouldn't have this issue. You talk about automation but you seem to have not noticed that mainframes have CICS, why it exists, and why businesses still use it.
The "old school" ways are actually still exceptionally powerful.
kmoser 41 days ago [-]
Also:
> As it turns out the entire flow at the post office (or DMV or tax office) is about exception handling. No amount of software is going to get you out of there because it is piecing together a bunch of inputs and outputs that are outside the bounds of a system.
Not sure how the author arrived at this conclusion, either. Both entities (USPS and DMV) have elaborate rules for how business gets done. The USPS in particular has rules that cover shipping everything imaginable, often via different methods. The outputs are quite simple: USPS ships your package; DMV provides your license or vehicle registration.
yen223 42 days ago [-]
We don't automate things because it's easy. We automate things because we thought it would be easy
supportengineer 42 days ago [-]
I see it as trading one class of problems for another set of problems.
bob1029 42 days ago [-]
Much of the pain can be reduced by considering if you should bother automating the thing in the first place.
Some kinds of automation are absolutely essential. Aligning wafer stages during photolithography, shipping logistics, high frequency trading, etc. The business absolutely wouldn't work without it.
The other kinds of automation are more questionable. Developing E2E CI/CD automation for an internal tool that is redeployed once every quarter might be a more difficult sell to management. For these cases, even if the rate of manual process invocation is somewhat high (i.e. frustrating for some employees), the customer can't see it. They won't pay one additional cent for this.
There is also this entire meta time wasting game where the mere debate about "should we automate" takes up so much resources you could have manually invoked the thing more times than it would ever be feasibly invoked automatically during its projected lifetime. Alternatively, you could have just let your developers build the damn thing to get it out of their systems.
jimkoen 41 days ago [-]
> The other kinds of automation are more questionable. Developing E2E CI/CD automation for an internal tool that is redeployed once every quarter might be a more difficult sell to management. For these cases, even if the rate of manual process invocation is somewhat high (i.e. frustrating for some employees), the customer can't see it. They won't pay one additional cent for this.
I've generally had the experience that it's the exact opposite. Best practices dictate that all processes should be automated to the limit.
j45 42 days ago [-]
It's important to systemize (get it working manually) before automating.
Premature automation can create more complexity in the process as edge cases are usually added in, or needing routing.
If you're building process management and automation from scratch as a way to learn, it's also a red flag of technical debt waiting.
from-nibly 42 days ago [-]
Always reconsider the process itself before you automate it. Human based processes are different than computer based ones. Trying to directly emulate human actions is what can make automation fragile.
bjornsing 42 days ago [-]
> Then one day the business owner (my father) said “hey this customer is going to come by and pick up the order so don’t print a shipping label.” Yikes had to figure that out. Then one day another customer said “hey can we pay cash on delivery?” C.O.D. used to be a thing. Then I had to add that.
What happens if you just say no? I have a feeling a lot of complexity stems from the fact that these exceptions are easy to handle when you’re doing things manually, so it’s hard to say no. But if you think about the business case I would not be surprised if the correct answer is no more often than not.
marcosdumay 42 days ago [-]
> What happens if you just say no?
Do you want to say no?
That's not an easy question to answer. You should think about it. Is it worth alienating some customers to handle them in a larger scale?
Naively, this looks like an automatic "no", but it's not.
immibis 42 days ago [-]
Not in these cases. It doesn't make sense to deny taking money from a customer unless you're near capacity and can replace the customer.
bjornsing 42 days ago [-]
I’d say that depends. If it will cost you more money to handle a specific exception than what you can generate in gross profit from those sales, then you should just say no.
gcanyon 41 days ago [-]
Only tangentially related, but I like the quote so I'll share it: a boss of mine once said, "You can't software your way out of a process problem."
taeric 42 days ago [-]
This is demonstrably correct, to me, but I'm curious what makes this unique to software?
Hardware automation has had obvious boons to places it works. And often we will force the places we want it to be to change to accommodate the automation. Agriculture is an easy example of what I mean here. Take a look at some fields that are optimized for giant machinery. Very impressive and can manage a lot. My favorite recently was seeing the "kickout" effort to prevent bad fruit from entering a hopper.
To that end, is the main issue with software automation that it thinks it can be a total solution and often tries to start by not imposing changes on what is being automated?
ninalanyon 41 days ago [-]
The biggest problems with any automation are describing what the current process really is and discovering which parts are essential and which are simply accidental. This is true whether one is creating a purely mechanical system or a purely software system
The part that is unique to software is that companies often expect people whose only expertise is in software to do both of these tasks when the second often requires deep domain knowledge. When one mechanises something in hardware it is generally taken for granted that domain experts will be central to the effort but when the result is principally software, domain experts are often left out of the process.
prometheus76 42 days ago [-]
This is an excellent and insightful article, but it feels like it was speech-to-text and the author didn't take the time to clean it up before posting it. It's a little distracting.
teddyh 42 days ago [-]
No kidding. “ate ice create 14 hours ago”. I’m guessing they meant “ice cream”, but it’s very hard to push yourself through reading sloppy writing like this. The author should have taken their own advice and not automated speech-to-text, spellchecking, and editing.
jakub_g 41 days ago [-]
The author is probably a rather busy person (https://en.m.wikipedia.org/wiki/Steven_Sinofsky) and I appreciate they took time to share their thoughts, even with typos. Yeah I also got caught with the one above, but didn't spot any other.
uoaei 42 days ago [-]
The thing about software is that it can only ever define its own process. "Software as spec". The goal of automation is to find processes whose architecture and dynamics are already amenable to being represented in software.
Public reaction to automation has been mixed for obvious reasons, but also because when software is applied to structure- and/or determinism-resistant systems it fails to capture some essential components that ends up degrading the services that system can provide.
ozim 42 days ago [-]
That is something that comes from experience - automate happy flow so you can handle all normal cases super fast - handle edge cases manually.
It always was the biggest issue I have seen “sorry cannot help you with that because our system says so”, which should never be the case. There should always be way around the system. Then of course way around the system needs approval from some higher up anyways.
svieira 41 days ago [-]
> AI as a tool is regulated just as a PC or calculator or PDR is regulated — the regulation is not the tool but the person using the tool.
Another gem from the article that I wanted to surface. It's yet-another-take on the general sentiment here, but it's very succinct. Automation is good, but automation is a tool that serves people (and not in a Soylent Green kind of way).
hinkley 41 days ago [-]
I don't set out to do automation but the automation eventually finds me.
I start making a tool for myself to track a tedious or error prone process. Even after you've gotten the wiki to have the correct steps in the correct order, you can still run into human error. I use it and work on it until I stop burning my fingers. Then I share it with a few people who complain about the same thing, and see how they burn their fingers.
Then I roll it out to a wider team, wait for more problems, then suggest it for general use. Usually by this point we are seeing fewer problems with the tools than with humans doing things manually, so we escalate to teasing people to use the tool so they don't end up with an Incident. Then it becomes mandatory and anyone who still refuses to use it is now risking PIP-level trouble if they don't join the party.
People who try to skip steps make not only a lot of problems for themselves but also for anyone else who suggests automation tools. Running toward the finish line makes everything worse.
omarhaneef 42 days ago [-]
One of the best pieces on here recently (and didn’t realize it was by Sinofsky!)
Anyway, one other insight I would add is that the issues tend to come up at the interface between the systems. Your automation has to get some input and send some output and both those are pain points. That’s why we sometimes prefer imperfect monolith software.
pwojnaro 41 days ago [-]
This becomes even harder if your execution environment is non-deterministic by definition.
Imagine test automation on physical devices (ex-founder of a mobile app testing SaaS company here) where system0level events that can't be programmatically detected or dismissed can delay, interrupt or alter the outcome of your tests.
Dealing with exceptions to make execution outcomes more stable seem like an obvious choice, but what if it's not feasible due to the complexity of the environment? We found that accepting variance and extrapolating test results from multiple executions in the same (or identical) environment was the only way forward.
airbreather 41 days ago [-]
Medical diagnosis is not a form of state based behaviour, therefore it can't be "automated" in the strict sense.
Medical diagnosis is interpretation, you can't predict all possible states involved in the diagnosis and outcome, it is inherently unsuitable to be treated in such a way.
Midwits treat these as gospel. But when you automate something, you make it 10x more efficient, and this opens up the possibility of using the automation 10x more often. Adherents to xkcd/1205 are like "don't waste your time automating the full build process, we only do a full build a few times at the end of a release."
bokohut 42 days ago [-]
Automation is about saving one's time and time is something that no amount of money can directly buy more of. One can invest their time and money however to indirectly "buy" one more time through reduction of mundane tasks that many humans have come to accept as normal. Software only solutions aside my own personal and professional experiences have involved building several automation sensor networks and one of the first I built was over 20 years ago that was within my own house. In 2004 my master electrician father and I wired my entire house with both high voltage, 220v/110v, as well as low voltage, 24v/12v, in my plan to build a smart home. My automation planning choices then were focused on exactly this, saving me sub seconds of time here, there, and everywhere. As a former electrician I was acutely aware of light switch placement in a room and the time it takes to turn it on and off each and every time but I asked myself then if my creative awareness could successfully design out the need to turn on a light switch as I enter a room? I did just that using the automation sensor network I built that interacts with source code I wrote which manipulates other devices based on logic of sensor states and any other sensors or inputs I deemed inclusive for the logic's objective. In the last 20 years I have rarely turned on light switches as they come on by "magic" when anyone enters a room and this "magic" is actually real world human time saved for that individual entering said room. I encourage those intrigued by these words to do some of your own math on the time you spend where you reside turning on and off light switches. There will be those here that see it foolish to automate anything, any longshore men present here? There will be those here also that see some benefits to it but having lived in it for 20 years I will never live without it again.
Stay Healthy!
moandcompany 41 days ago [-]
The second order costs of automation should consider time spent diagnosing and repairing said automation, and consequences of automation failure.
For example, should an automation fail, will manual processes still work? Does execution of a manual correction, or repair, require the knowledge or skills of a particular person or persons that may not be available, etc, or can it be done by the normal persons that the automation is intended to serve/benefit?
In the Smart Home context, when the automation or other cleverness fails, will ____ still be operable normally by a regular person?
airbreather 41 days ago [-]
And keeping the system working when originally specified parts are no longer available.
akhileshwar09 41 days ago [-]
well ,we cant say automation can make mistakes these days. cus the current machienes and the prototypes are soo complex than before and doin accurate jobs.
Exceptions or edge cases add additional states.
To fight the explosion of state count (and the intermediate states those generate), you have a couple powerful tools:
Well-designed automation should look like a funnel, rather than a subway map.If you want to go back and automate a class of work that's being routed out, write a new automation flow explicitly targeting it. Don't try and kludge into into some giant spaghetti monolith that can handle everything.
PS: This also has the side effect of simplifying and concluding discussions about "What should we do in this circumstance?" with other stakeholders. Which for more complex multi-type cases can be never-ending.
PPS: And for god's sake, never target automation of 100% of incoming workload. Ever. Iteratively approach it, but accept reaching it may be impossible.
1. Only automate the steps you know how to execute manually very well. This is a prerequisite. This kind of goes in the direction of not automating everything but looking at the parts you can automate well first.
2. Enable human intervention in the automation. Enable humans to stop the line when something is wrong. Then you figure out what's wrong and improve your automation. This implies the interative approach.
[1] https://global.toyota/en/company/vision-and-philosophy/produ...
Only 100% trust the automation after every possible step, procedure, and word has been 100% verified. (Which is to say almost never…)
Do you have any examples of this? Like an example project?
> never target automation of 100% of incoming workload. Ever.
A new product owner came in last year and wanted to "automate everything". They wanted a web page where builds, branches, commits, and more were all on one page that showed exactly which commit was deployed where, by whom, etc etc. They wanted this extravaganza for a 2-person application that was in maintenance with no new features.
They're also the kind of person who consults chatgpt or copilot on a subject while you're explaining that subject to them, to check that the LLM agrees with what you are saying. They'll even challenge people to prove some copilot output is incorrect. It seems to me that they consider LLMs more reliable than people.
Dear lord, these tools have just come out, how can they already have invented a new type of asshole?
It would output completely nonsensical stuff like QR-Code formats that don't exist, or asking to connect to hallucinated APIs.
It was often caught by lead devs quite quickly: the documentation wasn't a link or a PDF but rather some block of text.
But in the cases it wasn't, it was super costly: some developer would spend hours trying to make the API work to no avail, or, in the case of the QR code, it would reach QA which would be puzzled about how to test it.
So yes there is a new type of asshole.
Not the OP, but I worked on loans once. The application originally required tax returns to check income. Then we added the option to upload recent bank statements instead of tax returns. But there were a lot of places where the assumption of tax returns was hard-coded, so to save time the developers basically added an entirely separate code path for bank statements. Do this n times, and you have 2^n code paths.
When we eventually refactored and fixed it, we replaced the hardcoded references to tax returns with a concept of estimated income. Then we were able to reduce the branch to simply saying "if tax returns are present, estimated income = get_estimated_income_from_tax_returns(), otherwise estimated income = get_estimated_income_from_bank_statement()".
That's the core idea – collapse a branch as soon as possible.
you know, this seems very reasonable until that last sentence.
If you really want to know this type of commit-level data you can get it from git pretty easily, even if you're not particularly good with git but can search half-decently. If you don't have the skills to use git, it's extremely unlikely that knowing what the current branch and commit status of the repository is will meaningfully help you do your job.
Which itself begs why they'd need this level of detail on the codebase.
i don't care how it's presented, via webpage or cli tool or whatever -- just saying that when you are working at larger scale, those are very reasonable things to want to see in one spot at a glance.
the need dissipates as you scale down.
I'm sure most monitoring systems have some way to loading that into their system.
There's only one person you work with that's like that, right? Right?
I tried to develop the skill of nodding and then doing it correctly later, but it would always hit because there was a ticket I had written getting assigned to another dev and I had to explain to them why it was the way I had specified, and then he would correct me, and I would say yes part of what you say is correct and needs to be considered (as I said, I think he was a good developer at one time) but not all of it and he would insist he was correct and I had to go talk it over later with the dev as to why it worked the way I specified.
But in the end, if you can't work with the manager, it's time to head to greener pastures.
Only their boss can reign them in, and so you have to use techniques to shine a light on them to their superiors.
Think "I want you to record your order" from HBO's Chernobyl, but more surreptitious.
Effectively a form of trust and reputation arbitrage, but it was effective for dealing with a particularly difficult manager who didn’t accept certain things about the design of an API, and yet when the other guy told him the same things, he just asked a few mild follow ups and accepted what I was telling him all along.
They're definitely "leadership material"!
Sounds like they would have been a fan of ungit [0], which I have seen used for that kind of flow overview, though it has looked more impressive than actually proved helpful in my experience.
[0] https://github.com/FredrikNoren/ungit
Not GP, but at my old job making games we had a truly horrendous asset import pipeline. A big chunk of it was a human going through a menial 30 minute process that was identical for each set of assets. I took on the task of automating it.
I made a CLI application that took a folder path, and then did the asset import for you. It was structured into "layers" basically, where the first layer would make sure all the files were present and correct any file names, the next layer would ensure that all of the textures were the right size, etc. etc.
This funneled the state of "any possible file tree" to "100% verified valid set of game assets", hence the funnel approach.
It didn't accept 100% of """valid""" inputs, but adding cases to handle ones I'd missed was pretty easy because the control flow was very straightforward. (lots of quotes on "valid" because what I thought should be acceptable v.s. what the people making the assets thought should be acceptable were very different)
To give a contrived/trivial example, imagine a TLS handshake. Rather than building support to allow hosts to retry with a different cert, it's better to fail the connection and let the client start from scratch. Same principle can be applied to more complex process automation tasks in business. Imagine a leave tracking system. It might be better to not support changing dates of an existing leave application, and instead supporting cancel & re-apply. Best part is that the user facing part of both versions can be exactly the same.
With all due respect, this is preached by pretty much every book you read on cloud administration. I'd argue that if the process is decent enough, it'll work with major cloud providers, because their API's are rich enough to enable this already.
The thing with most automation tools though is, a) they're abysmal for most of the workflows preached (thinking of ansible and im shuddering) and b) to reach the degree of automation described in most literature, you need the API's of $MAJOR_CLOUD_PROVIDER.
Isn’t that the whole point of hiring experts?! So that you can ask them, instead of the computer for advice?
This is critical. So many people jump from "it works" to "let's automate it entirely" without understanding all the nuances, all the various conditions and intermediate states that exist in the real world. The result is a brittle system that is unreliable and requires constant manual intervention.
A better approach is to semi-automate things first. Write scripts with manual QA checkpoints and playbooks that get executed directly by devs. Do the thing yourself until it's so boring it hurts. Then automate it.
A whole family of problems across domains : supply chains, manufacturing lines, AI pipelines, resilient software, physical security, etc. come down to effective state limitation.
In my day job, the vast bulk of my arguments with PMs come down to a lack of planning allocation for post-launch code cleanup. I haven't been able to find a succinct articulation for the utility of 'code cleanup' to a wider audience. 'State-limitation' fits nicely. Exactly the concept I was looking for.
It draws from the better known 'less-is-more' adage, but adds a lot of implicit detail to the generic (almost cliche) adage.
So long as it's done silently, blended in with other things, and cloaked under clever wording (e.g. "this blocks that other thing you want" rather than "this will improve the codebase"), things will go quite well.
As soon as you speak to them as you would another engineer, you provide them material to use against you in prevention of you taking proper action.
If one person writes broken code in half the time, while you take twice as much cleaning the mess.....then you're going to be perceived as ineffective.
A fancy way of saying, “simplicity is the mark of truth.”
Or
“Less mechanical points of failure the better.”
I concur. Eliminate design risk.
(I'm a founder of an automation SaaS where we've made "human interface" one of the core features of the product)
The biggest part of automation in my experience is boiling down the inputs to a 'unique' state that automation can then use as inputs and be run on.
For computation to do work, it requires a property of consistency, and computers can only operate accurately and do work when properties of determinism are met. Also as the OP mentions, state can explode leading to sphagetti, this is why he mentions a sieve like approach based on similarity.
Some problem spaces can be fundamentally inconsistent, such as with some approximations (common methods used for such), which falls back to what amounts to guesses, heuristics, and checks in terms of exception handling. There are problem scopes that cannot be characterized too so no amount of exception handling will resolve the entire scope, which is why you need fallbacks in a resilient design.
If inputs cannot be controlled and uniquely differentiated, the automation fails in brittle ways, especially with regards to external change.
The main interface (with regards to your core features) would be language, or communication. There exists words right now that can have contradictory, and different meanings, where the same word may mean the opposite depending on context, and this is not a general consensus but an individual one (where the individual may be misusing it).
That breaks the 1:1 mapping required for determinism, and AI weights mimicking neurons have a narrow approximation where it may work under a narrow set of circumstances but no computer today can differentiate when the inputs are the same but have two or more, different states mixed in (many people forget that absence of a state is a state too) and then decompose them. Abstract decomposition seems to be something only humans are good at, and I'm glad this is the case otherwise none of us would have jobs.
I was an early eng and first VP of Product at Flexport. Global logistics is inherently complicated and involves coordinating many disparate parties. To complete any step in the workflow, you're generally taking in input data from a bunch of different companies, each of which have varying formats and quality of data. A very challenging context if your goal is process automation.
The only way to make progress was exactly the way you described. At each step of the workflow, you need to design at least 2 potential resolution pathways:
1. Automated
2. Manual
For the manual case, you have to actually build the interfaces for an operator to do the manual work and encode the results of their work as either:
1. Input into the automated step
2. Or, in the same format as the output of the automated case
In either case, this is precisely aligned with your "reuinifying divergent paths" framing.
In the automated case, you actually may wind up with N different automation pathways for each workflow step. For an example at Flexport: if we needed to ingest some information from an ocean carrier, we often had to build custom processors for each of the big carriers. And if the volume with a given trading partner didn't justify that investment, then it went to the manual case.
From the software engineering framing, it's not that different from building a micro-services architecture. You encapsulate complexity and expose standard inputs and outputs. This avoids creating an incomprehensible mess and also allows the work to be subdivided for individual teams to solve.
All that said – doing this in practice at a scaling organization is tough. The micro-services framing is hard to explain to people who haven't internalized the message.
But yeah, 100% automation is a wild-goose chase. Maybe you eventually get it, maybe not. But you have to start with the assumption that you won't or you never will.
For example, we integrated with a data source that used OCR to scan container numbers as they passed through various way points while they were on trains. The tech wasn't perfect. We frequently got reports from the rail data source that a train was, for example, passing through the middle of the country when we knew with 100% certainty that it was currently in the middle of the pacific ocean on a boat. That spurious data could be safely thrown out on logical grounds. Other cases were not as straightforward!
My team and I once took on a very tricky automation project. At the time we had a complex software deployment done about once per month that involved a team of about a dozen people showing up at 4am to do it while traffic was low.
The deployment involved many manual steps and coordination of everybody involved. The person leading each deployment followed the documented list of steps and got each person to do their bit at the right time; people to run database migrations, people to install RPMs on particular servers, people to test and verify functionality. Mistakes and missed steps were not uncommon.
The very first thing we did was take the documentation and write a Jenkins job to post each step into a Slack channel specifically for coordinating the deployments. Someone clicked "go" and each step was posted as a message in that channel with a 'done' button to be clicked when that step was done. Clicking the button caused the next step to be posted.
The next release we did used that instead of one person reading the steps out of confluence. Everyone involved in the release could always see what step we were at, and when it was their turn to do their bit. This helped ensure no steps were ever missed too.
Over the following months we chipped away at that job a bit at a time. We'd pick a step in the process and automate just that step, starting with the low-hanging fruit first. The Slack message for that step went from "click to confirm you've done it" to "click to do it", with the result posted once it was done; followed by the next step to perform.
It was a long process, but it allowed the rest of the business (and us!) to gradually gain confidence in the automation, and lowered the risk of the project dramatically. Once several steps had been automated and battle-tested we removed the 'click to do' bits in between and the whole release became a couple of clicks followed by the odd bit of manual QA.
Was it hard to achieve buy in from all parties? I'd guess that would be the hardest part, to get everyone to join in on working on the automation.
> > https://blog.danslimmon.com/2019/07/15/do-nothing-scripting-...
What is the point of defining a Python class with a single `run` method, and then running with `Class.run()`, instead of just defining a `function` and running with `function()`?
Step 3: Go to this URL and do a thing, then click Y.
Do some stuff, do some more stuff
Step 8: Go to this URL and undo the thing you did in Step 3, then click Y.
An approach like this seems to give more weight to the fact that just figuring out a way to document exactly what needs to be done is often the hardest part and if you get that right before you start writing automation code, it might make make the automation much more efficient.
[1] https://blog.danslimmon.com/2019/07/15/do-nothing-scripting-...
Automating a process makes it more standardized and legible, but takes away a lot of the nuance and of the resilience that human institutions tend to bake in. Do it too little, you get chaos; do it too much, you're destroying the very thing you were trying to nurture.
It's certainly deformation professionnelle, but the parallel with software is eerily relevant.
--- [0] https://en.wikipedia.org/wiki/Seeing_Like_a_State
It’s pretty tricky to get right and is very case by case in what it specifically means. But it’s been critical to handling edge cases.
For example, in 99% of cases, you can’t progress to stage X without doing Y. But in this case „I know what I’m doing”.
Though this puts a lot of pressure on downstream processes to handle missing data. Fortunately that hasn’t been an issue too much because validation grows and changes over time and must naturally deal with records that were created under prior validation when their state was originally valid.
One time I was involved in automating a business process. Nothing super complicated - scrapping data from a portal, filling forms, some data wrangling. It had to handle a few edge cases and involve a human in the loop. I tried to include as much data validation as possible, but this being a business in real world there were of course some exceptions. I implemented a Superuser Override which was basically a button that launched a popup with another form where the app users requested an approval from their manager to accept the unvalidated data. It was red and scary and it had Bootstrap .danger all over because it was a cool thing to do back then. Don't judge, ok?
Things worked as they should have. The automation saved tons of time, even if some fields were missing. Occasionally, there was a Superuser Override request which sent an email to the manager overseeing the process. Actually, things worked too well - the volume of processed data increased and so did the number of the Override requests. The manager - busy as they usually are - started approving all the requests. Manually at first and then they ran a macro that did it for them.
Eventually the app users got wind of it and they started using the Superuser Override liberally. Before they shown more restraint.
I wish there was some clever punch line to this story but I don't know what happened after.
If I had to do it again, I'd be more aggressive with validations, make the approval process more like a code review, involve the most experienced app users as the Superuser, and have audit logs. Oh and Tailwind instead of Bootstrap.
Audit logging is definitely another must for me. I literally won't make an app without it. The app I'm working on now had a case where a PM promised up and down that edit history wasn't needed for a particular section, and lo-and-behold 2 years later it turns out that actually we do need the history.
http://www.complexcognition.co.uk/2021/06/ironies-of-automat...
Being flexible is good, but it comes at a cost. When that cost is too high, don’t do it. Realize that the customer who wants that workflow probably isn’t going to be your make-or-break moment. And in fact, they might have flexibility of their own. For example, you don’t accept Amex but they have a backup credit card or cash. It might be annoying to them if they are fussy, but it’s normal enough that the consequences are minimal. And yes, you may occasionally get a customer who doesn’t have that flexibility, but you shouldn’t be pinning your business on rare events. Figure out what’s most common among your target customers and support a few simple workflows. Say no to everything else until you have the resources to do it properly.
Another thing you can do is have a hybrid low tech/high tech solution. Automate structured inputs with software. Write down the rest as notes in a logbook. Over time you will probably see patterns in the logbook that you can automate.
Lastly, remember what Morpheus says in The Matrix, “Some [rules] can be bent. Others can be broken.” For example, you could simply pay the bill on behalf of the customer who wants cash on delivery. Now you assume some personal risk but the computer system doesn’t have to support their workflow. Is it worth it? Maybe, maybe not, but it’s a choice you can make.
Or just recognize that your software ideology is incorrect for the problem at hand. This guy wanted to write a single piece of software that did everything the business needed. That was clearly a mistake and no surprise that his software ended up very tightly coupled to the business itself.
This was an error in interface and component design. Most likely caused by starting to write software before fully understanding the job roles and people who would be using it.
the corollary of this, though, is that "there's nothing more permanent than a temporary fix". so, balancing these ideals is the work of engineering and management.
So much harder to say no that it sounds though - sales is saying yes to everything they can, board is pressuring for more sales, the users are taking every possible opportunity to shift blame onto the software, and engineering is buckling under the churn of changing business goals every quarter.
If you have a long term big customer who has always been able to do COD and you suddenly pull that option with the explanation "it's easier for us" that's not going to go over well. Now you might lose them and they will tell their network "Wow, Foo Corp used to be good now they are just being unreasonable."
Are you selling to restaurants? To grocery stores in a large city's Chinatown? And so on.
How did you arrive at that conclusion? Most of those applications were written by the businesses themselves. IBM provided you a programming environment not a set of predefined business software. Which is why COBOL jobs still exist.
What changed with the mainframe was that instead of having a bunch of disparate and disconnected processes you now had a centralized set of dependent processes and a database that provided a single source of truth.
Businesses were going to want this capability regardless of how limited humans with green eyeshades were previously.
> Much of the early internet (and still most bank and insurance) look like HTML front ends to mainframe 3270 screens.
Well, precisely, those are custom applications. If they weren't we wouldn't have this issue. You talk about automation but you seem to have not noticed that mainframes have CICS, why it exists, and why businesses still use it.
The "old school" ways are actually still exceptionally powerful.
> As it turns out the entire flow at the post office (or DMV or tax office) is about exception handling. No amount of software is going to get you out of there because it is piecing together a bunch of inputs and outputs that are outside the bounds of a system.
Not sure how the author arrived at this conclusion, either. Both entities (USPS and DMV) have elaborate rules for how business gets done. The USPS in particular has rules that cover shipping everything imaginable, often via different methods. The outputs are quite simple: USPS ships your package; DMV provides your license or vehicle registration.
Some kinds of automation are absolutely essential. Aligning wafer stages during photolithography, shipping logistics, high frequency trading, etc. The business absolutely wouldn't work without it.
The other kinds of automation are more questionable. Developing E2E CI/CD automation for an internal tool that is redeployed once every quarter might be a more difficult sell to management. For these cases, even if the rate of manual process invocation is somewhat high (i.e. frustrating for some employees), the customer can't see it. They won't pay one additional cent for this.
There is also this entire meta time wasting game where the mere debate about "should we automate" takes up so much resources you could have manually invoked the thing more times than it would ever be feasibly invoked automatically during its projected lifetime. Alternatively, you could have just let your developers build the damn thing to get it out of their systems.
I've generally had the experience that it's the exact opposite. Best practices dictate that all processes should be automated to the limit.
Premature automation can create more complexity in the process as edge cases are usually added in, or needing routing.
If you're building process management and automation from scratch as a way to learn, it's also a red flag of technical debt waiting.
What happens if you just say no? I have a feeling a lot of complexity stems from the fact that these exceptions are easy to handle when you’re doing things manually, so it’s hard to say no. But if you think about the business case I would not be surprised if the correct answer is no more often than not.
Do you want to say no?
That's not an easy question to answer. You should think about it. Is it worth alienating some customers to handle them in a larger scale?
Naively, this looks like an automatic "no", but it's not.
Hardware automation has had obvious boons to places it works. And often we will force the places we want it to be to change to accommodate the automation. Agriculture is an easy example of what I mean here. Take a look at some fields that are optimized for giant machinery. Very impressive and can manage a lot. My favorite recently was seeing the "kickout" effort to prevent bad fruit from entering a hopper.
To that end, is the main issue with software automation that it thinks it can be a total solution and often tries to start by not imposing changes on what is being automated?
The part that is unique to software is that companies often expect people whose only expertise is in software to do both of these tasks when the second often requires deep domain knowledge. When one mechanises something in hardware it is generally taken for granted that domain experts will be central to the effort but when the result is principally software, domain experts are often left out of the process.
Public reaction to automation has been mixed for obvious reasons, but also because when software is applied to structure- and/or determinism-resistant systems it fails to capture some essential components that ends up degrading the services that system can provide.
It always was the biggest issue I have seen “sorry cannot help you with that because our system says so”, which should never be the case. There should always be way around the system. Then of course way around the system needs approval from some higher up anyways.
Another gem from the article that I wanted to surface. It's yet-another-take on the general sentiment here, but it's very succinct. Automation is good, but automation is a tool that serves people (and not in a Soylent Green kind of way).
I start making a tool for myself to track a tedious or error prone process. Even after you've gotten the wiki to have the correct steps in the correct order, you can still run into human error. I use it and work on it until I stop burning my fingers. Then I share it with a few people who complain about the same thing, and see how they burn their fingers.
Then I roll it out to a wider team, wait for more problems, then suggest it for general use. Usually by this point we are seeing fewer problems with the tools than with humans doing things manually, so we escalate to teasing people to use the tool so they don't end up with an Incident. Then it becomes mandatory and anyone who still refuses to use it is now risking PIP-level trouble if they don't join the party.
People who try to skip steps make not only a lot of problems for themselves but also for anyone else who suggests automation tools. Running toward the finish line makes everything worse.
Anyway, one other insight I would add is that the issues tend to come up at the interface between the systems. Your automation has to get some input and send some output and both those are pain points. That’s why we sometimes prefer imperfect monolith software.
Imagine test automation on physical devices (ex-founder of a mobile app testing SaaS company here) where system0level events that can't be programmatically detected or dismissed can delay, interrupt or alter the outcome of your tests.
Dealing with exceptions to make execution outcomes more stable seem like an obvious choice, but what if it's not feasible due to the complexity of the environment? We found that accepting variance and extrapolating test results from multiple executions in the same (or identical) environment was the only way forward.
Medical diagnosis is interpretation, you can't predict all possible states involved in the diagnosis and outcome, it is inherently unsuitable to be treated in such a way.
and
https://xkcd.com/974/
Stay Healthy!
For example, should an automation fail, will manual processes still work? Does execution of a manual correction, or repair, require the knowledge or skills of a particular person or persons that may not be available, etc, or can it be done by the normal persons that the automation is intended to serve/benefit?
In the Smart Home context, when the automation or other cleverness fails, will ____ still be operable normally by a regular person?