Subhead is "Sandboxing Perl with WebAssembly - Part 2."
The subhead sounds weird, but part 1 makes more sense and is pretty interesting. Perl has many modules to deal with file formats nobody has used since Perl's prime. It isn't totally clear to me if the goal is to compile the Perl interpreter into WASM or interpreter + modules. In any either case the goal is to re-use the original tools within new tooling.
I’m building a new startup and file metadata plays an important role. There are thousands of file formats, each format may have dozens of versions, and each stores metadata differently.
Our use-case also needs metadata to be present when a file is uploaded - extracting the data on our servers means we add considerable overhead to upload post-processing & we lose data that is useful to customers.
So we need to extract metadata client-side and staple it to the upload. Herein begins a journey of self-inflicted pain and suffering.
Is there anything else in the same class as ExifTool - super valuable but the only implementation is Perl?
tyingq 31 days ago [-]
Not sure if you would say these are the still the only implementations, but autoconf, Bugzilla and SpamAssassin were all at least once thought of that way.
Would this require using the WASIX libc? I didn’t spend much time on it, but when I attempted to use it the build started failing in mysterious ways - happy to open an issue on the relevant repo with a reproduction / notes on what I think our some bugs.
aghilmort 30 days ago [-]
very cool on wasmer; does it support python + local ai / gpu use cases, ffmpeg, etc. have some customer use cases & really don’t want to go docker etc? helpful if can be included in electron & browser extensions or other supercontainers
benatkin 31 days ago [-]
That's really impressive. Like the author I am aware of the advantages/drawbacks of emscripten and wasi-sdk. The author did well to look extensively into both of them. Here's the repo. https://github.com/uswriting/zeroperl
He obviously he meant my perlcc, it is maintained for compatible perl versions and it works and is used in industry.
andrew_rfc 30 days ago [-]
You cannot today build anything on the Blead of Perl with perlcc.
Support has languished because the language changed fast enough that it never kept up. It may be used in the industry, but for a modern Perl application (or anything past 5.10) it won’t work. Hence why it was removed.
rurban 31 days ago [-]
This guy is just crazy. https://github.com/libexif/libexif exists and is much easier to use than compiling perl to wasm, just to run the overly slow exiftool.
andrew_rfc 30 days ago [-]
I’m not sure where you got the idea it’s slow, but it executes within 30% of native speed. Which means extracting the metadata from a 30 GB pro video in my test takes 300-400ms.
Speaking of - libexif doesn’t support a lot of file formats, whereas ExifTool does.
If I always opt to choose the easiest path then I’m setting a precedent for myself that I will compromise on all my goals for half finished solutions that take me further from initial idea.
transpute 31 days ago [-]
There are security benefits to WASM sandboxing of client-side parsing of untrusted image files, to avoid scenarios like https://imagetragick.com
Gunax 31 days ago [-]
I do not understand this at all, but it looks cool.
hobs 31 days ago [-]
Honestly it seems like porting ExifTool would actually be easier than this nightmare.
andrew_rfc 31 days ago [-]
Author here: it wouldn't have been. Sure, it would be nice to have a native version, but that is a lifelong maintenance burden, and ExifTool is already the best at what it does and stays up-to-date with file format changes.
By completing this work, I can use ExifTool in any environment now - and more broadly, there is now a portable, embeddable, and sandboxed version of Perl that others can use in their projects. I can think of a lot of use cases, and helping other developers makes any hardship I endured worth it.
boris 30 days ago [-]
> and more broadly, there is now a portable, embeddable, and sandboxed version of Perl that others can use in their projects. I can think of a lot of use cases, and helping other developers makes any hardship I endured worth it.
Yes, exactly, and thank you for that!
I don't know of any other general-purpose scripting language that can be run in WebAssembly. This could make Perl an interesting choice for writing sandboxed build system rules.
jauntywundrkind 31 days ago [-]
This is a degree of thinking that feels rank common in the world. When you read a complex blog post with sophisticated approach, there's often a "wouldn't it be easy to just ____."
Maybe, yes, perhaps! But sometimes the problem at hand - the proboem specified in the post - isnt the entire desire or objective. "Actually be easier" abounds, but sometimes our scope & intent in the long run builds on the problems at hand.
Porting ExifTool may be easier (but then if you want to maintain it, that's a drain for life). But having perl that you can now run anywhere might be something this author sees other use for. Getting good at wasm or exhibiting their excellent systems skills might have been side objectives.
Whether we just do things to get to get results at hand, or whether we invest ourselves broadly to build a better world is a constant struggle for many engineers. This shows up in the comments time and time again as "would actually be easier". I'm sorry for hitting hard on this specific comment, but there's a Two Cultures problem here, and one culture trivializes the other endlessly in the comments. It's hard for me to state why trying hard & caring & doing extra matters, but I think that breed of people are the ones that I look up to, that make all the difference to me. And I really wish there was a good defense or rallying cry, something we could say when we get the inevitable "would actually be easier" that can capture the enthusiasm for setting ourselves up & building broadly towards our better worlds.
transpute 31 days ago [-]
> And I really wish there was a good defense or rallying cry, something we could say when we get the inevitable "would actually be easier" that can capture the enthusiasm for setting ourselves up & building broadly towards our better worlds.
“I must study Politicks and War that my sons may have liberty to study Mathematicks and Philosophy. My sons ought to study Mathematicks and Philosophy, Geography, natural History, Naval Architecture, navigation, Commerce and Agriculture, in order to give their Children a right to study Painting, Poetry, Musick, Architecture, Statuary, Tapestry and Porcelaine.”
—John Adams, 1780
“Hard times create strong men, strong men create good times, good times create weak men, and weak men create hard times.”
—G. Michael Hopf, 2016
Is there a programming abstraction of this circle of life?
hobs 31 days ago [-]
If it makes you feel good go for it, seems like a huge PITA for me to figure out what files do, but sometimes the journey is a lot more fun than the destination.
jauntywundrkind 31 days ago [-]
I see it less as the fun of the journey, and more as the chance to become better, and make better tools for yourself going forwards. Advancing your position is more tempting than just doing the task.
It's not always apparent how the side-quests will help us. But man, I can think of a half-dozens of upsides of the work here. You now have a portal Perl you can use anywhere. You don't have to maintain & keep a fork of ExifTool up to date. You've learned a ton about wasm file system interactions which could be useful for any portable systems. There's so many incidentals, that people just don't see when they blaze through to "but it could be easier."
aghilmort 30 days ago [-]
yes. this. same. au revoire to the yes and’ers & yes but’ers.
plus love how you worked in side quest angle — more often than not the payoff can be quite high!
cthor 31 days ago [-]
ExifTool is merely a bunch of Perl code, which can't be parsed nor compiled to a program, i.e. computer instructions. There is nothing to port.
The only other option is a rewrite. Does that really sound easier?
tantalor 31 days ago [-]
Comment obviously meant "port" as in "to another programming language".
> only other option is a rewrite
If you mean by hand, not necessarily, you could use a "code converter" tool to do the rewrite automatically. But it may be impractical; perl is a notoriously difficult language to work with.
Perl is pretty close to Ruby and Python; those may be the most obvious candidates.
cthor 31 days ago [-]
There are no programs that will fully transpile Perl to not-Perl, because that would require parsing Perl.
A code converter that kindof-sortof-but-not-really transpiles (like Python's 2to3) would require a lot of debugging of ugly machine generated source code. Who wants to do that? Would probably be even harder than a full rewrite. (How many Python libraries were actually ported using 2to3?) Similar for using any translation an LLM might spit out.
So yes, by hand. And it's not obvious if a full rewrite of a complex library would be easier than porting its runtime to WASM, especially when said runtime has been ported to countless other systems already.
ab5tract 30 days ago [-]
English is a notoriously difficult language to work with. Should we be translating all of our English to Esperanto?
hobs 30 days ago [-]
It really depends on how ExifTool (which again, I started from a point of ignorance so idk) decides to do the work in the first place.
Does it have a bunch of magic functions that decide random things about bits? Does it have some sort of internal metadata file format identification lookup? No idea! But in some of those cases rewriting it is a lot less insane than others.
Get in loser. We're rewinding the stack - https://news.ycombinator.com/item?id=43014070
Readers may want to look at both articles of course!
I've wanted to use wazero to run my Exiftool [1] for quite a while. Just as I use wazero to sandbox dcraw [2].
But WASI Perl never materialized.
This may just be what I'm missing.
[1]: https://github.com/ncruces/go-exiftool
[2]: https://pkg.go.dev/github.com/ncruces/rethinkraw@v0.10.7/pkg...
The subhead sounds weird, but part 1 makes more sense and is pretty interesting. Perl has many modules to deal with file formats nobody has used since Perl's prime. It isn't totally clear to me if the goal is to compile the Perl interpreter into WASM or interpreter + modules. In any either case the goal is to re-use the original tools within new tooling.
I’m building a new startup and file metadata plays an important role. There are thousands of file formats, each format may have dozens of versions, and each stores metadata differently.
Our use-case also needs metadata to be present when a file is uploaded - extracting the data on our servers means we add considerable overhead to upload post-processing & we lose data that is useful to customers.
So we need to extract metadata client-side and staple it to the upload. Herein begins a journey of self-inflicted pain and suffering.
ExifTool is written in Perl.
https://andrews.substack.com/p/zeroperl-sandboxed-perl-with-...
As increasingly is the case, a good starting point is the CI workflow: https://github.com/uswriting/zeroperl/blob/main/.github/work...
A thrown exception handled with a try/catch block any other name...
0 - https://en.wikipedia.org/wiki/A_rose_by_any_other_name_would...He obviously he meant my perlcc, it is maintained for compatible perl versions and it works and is used in industry.
Support has languished because the language changed fast enough that it never kept up. It may be used in the industry, but for a modern Perl application (or anything past 5.10) it won’t work. Hence why it was removed.
Speaking of - libexif doesn’t support a lot of file formats, whereas ExifTool does.
If I always opt to choose the easiest path then I’m setting a precedent for myself that I will compromise on all my goals for half finished solutions that take me further from initial idea.
By completing this work, I can use ExifTool in any environment now - and more broadly, there is now a portable, embeddable, and sandboxed version of Perl that others can use in their projects. I can think of a lot of use cases, and helping other developers makes any hardship I endured worth it.
Yes, exactly, and thank you for that!
I don't know of any other general-purpose scripting language that can be run in WebAssembly. This could make Perl an interesting choice for writing sandboxed build system rules.
Maybe, yes, perhaps! But sometimes the problem at hand - the proboem specified in the post - isnt the entire desire or objective. "Actually be easier" abounds, but sometimes our scope & intent in the long run builds on the problems at hand.
Porting ExifTool may be easier (but then if you want to maintain it, that's a drain for life). But having perl that you can now run anywhere might be something this author sees other use for. Getting good at wasm or exhibiting their excellent systems skills might have been side objectives.
Whether we just do things to get to get results at hand, or whether we invest ourselves broadly to build a better world is a constant struggle for many engineers. This shows up in the comments time and time again as "would actually be easier". I'm sorry for hitting hard on this specific comment, but there's a Two Cultures problem here, and one culture trivializes the other endlessly in the comments. It's hard for me to state why trying hard & caring & doing extra matters, but I think that breed of people are the ones that I look up to, that make all the difference to me. And I really wish there was a good defense or rallying cry, something we could say when we get the inevitable "would actually be easier" that can capture the enthusiasm for setting ourselves up & building broadly towards our better worlds.
It's not always apparent how the side-quests will help us. But man, I can think of a half-dozens of upsides of the work here. You now have a portal Perl you can use anywhere. You don't have to maintain & keep a fork of ExifTool up to date. You've learned a ton about wasm file system interactions which could be useful for any portable systems. There's so many incidentals, that people just don't see when they blaze through to "but it could be easier."
plus love how you worked in side quest angle — more often than not the payoff can be quite high!
The only other option is a rewrite. Does that really sound easier?
> only other option is a rewrite
If you mean by hand, not necessarily, you could use a "code converter" tool to do the rewrite automatically. But it may be impractical; perl is a notoriously difficult language to work with.
Perl is pretty close to Ruby and Python; those may be the most obvious candidates.
A code converter that kindof-sortof-but-not-really transpiles (like Python's 2to3) would require a lot of debugging of ugly machine generated source code. Who wants to do that? Would probably be even harder than a full rewrite. (How many Python libraries were actually ported using 2to3?) Similar for using any translation an LLM might spit out.
So yes, by hand. And it's not obvious if a full rewrite of a complex library would be easier than porting its runtime to WASM, especially when said runtime has been ported to countless other systems already.