Project author here -- happy to elaborate on anything; a continuous WIP project. The biggest insight has been limitations of vision models in spacial awareness -- see https://github.com/awwaiid/ghostwriter/blob/main/evaluation_... for some sketchy examples of my rudimentary eval.
Next top things:
* Continue to build/extract into a yaml+shellscript agentic framework/tool
* Continue exploring pre-segmenting or other methods of spacial awareness
* Write a reSvg backend that sends actual pen-strokes instead of lots of dots
loxias 35 days ago [-]
Wow! This is really cool! Really really cool! I imagine some sort of use where it's even more collaborative and not just "unadorned turn-by-turn".
For example, maybe I'm taking notes involving words, simple math, and a diagram. Underline a key phrase and "the device" expands on the phrase in the margin. Maybe the device is diagramming, and I interrupt and correct it, crossing out some parts, and it understands and alters.
Sorry, I know this is vague, I don't know precisely what I mean, but I do think that the combination of text (via some sort of handwriting recognition), stroke gestures, and a small iconography language with things enabled by LLMs probably opens up all sorts of new user interaction paradigms that I (and others) might be too set in our ways to think of immediately.
I think there's a "mother of all demos" moment potentially coming soon with stuff like this, but I am NOT a UX designer and can't quite imagine it clearly enough. Maybe you can.
awwaiid 35 days ago [-]
Yes! I have flashbacks to productive times standing in front of a whiteboard, alone or with others, doodling out thoughts and annotating them. When working with others I can usually talk to them, so we are also discussing as we are drawing and annotating. But also I've handed diagrams / equations to someone and then later they hand me back an annotated version -- that's interesting too.
rybosome 35 days ago [-]
This is a really cool effect. How do you envision this being used?
Thinking about it as a product, I’d want a way to easily slip in and out of “LLM please respond” so it wasn’t constantly trying to write back the moment I stopped the stylus - maybe I’d want awhile to sketch and think, then restart a conversation. Or maybe for certain pages to be LLM-enabled, and others not.
Does it require any sort of jailbreak to get SSH access to the device?
awwaiid 35 days ago [-]
The reMarkable comes with root-ssh out of the box, so installation here is scp'ing a rust-built binary over, and then ssh'ing and running it. I haven't wrapped it in a startup-on-boot service yet.
It is triggered right now by finger-tapping in the upper-right corner, so you can ask it to respond to the current contents of the screen on-demand. I think it would be cool to have another out-of-band communication, like voice, but this device has no microphone.
Also right now it is one-shot, but on my long long TODO list is a second trigger that would _continue_ a back and forth multi-screenshot (like multi-page even) conversation.
rybosome 35 days ago [-]
Ah great, I will definitely give this a try later then, thanks!
I’m curious if this is becoming something that you are using in your own day-to-day, or if your focus right now is on building it?
The context for my question is just a general interest in the transition to AI-enabled workflows. I know that I could be much more productive if I figured out how to integrate AI assistance into my workflows better.
awwaiid 35 days ago [-]
Only building so far.
The one use-case that is _close_ to ready-for-useful: I often take business meeting notes. In these notes I often write a T in a circle to indicate a TODO item. I am going to add a bit of config in there, basically "If you see a circle-T, then go add that to my todo list if it isn't already there. If you see a crossed-out circle-T then go mark it as done on the todo list" .
I got slightly distracted implementing this, working instead toward a pluggable "if you see X call X.sh" interface. Almost there though :)
0xferruccio 35 days ago [-]
This is so cool! I love to see people hacking together apps for the reMarkable tablet
It's so great seeing these, always make me want to play with developing apps for the Remarkable 2. Do you have any sources you can recommend? Thank you!
That’s awesome! Love seeing the reMarkable get more functionality through creative hacks. Just checked out your app—what was the biggest challenge you faced while developing for the reMarkable?
0xferruccio 35 days ago [-]
I think the thing I really didn't like was the lack of an OAuth like flow with fine-grained permissions
Basically authentication with devices is "all-access" or "no-access". I would've liked it if a "write-only" or "add-only" api permission scope existed
pieterhg 35 days ago [-]
Blocked for AI reply @dang
defrost 35 days ago [-]
Good catch, the last few pages of comment history are inhumanly insincere.
" @dang " isn't a thing, he doesn't watch for it - take credit and email him direct.
kordlessagain 35 days ago [-]
Do you have proof this is true?
awwaiid 35 days ago [-]
I might be biased because memorydial was complimentary to me ... but they SEEM like a human! Also I'm not all that opposed to robot participation in the scheme of things. Especially if they are nice to me or give good ideas :)
memorydial 34 days ago [-]
Ha thanks for having my back! I genuinely love your project. I have been toying with get either a boox or a remarkable for ages.
defrost 34 days ago [-]
Well you're human, you took the bait :-)
FWiW I mostly read HN at it's deadest time (I'm GMT+8 local time) and I see a lot of mechanical turk comments, especially from new (green coloured) accounts.
I always look for a response (eg: yours) before flagging them as spam bots . . .
memorydial 34 days ago [-]
Ha I guess when I stay up very late -8 overlaps with +8!
defrost 35 days ago [-]
He has commented on this.
Retrieval is tricky as Algolia doesn't index '@' symbols:
Most people don't correctly use an em-dash differently than a hyphen. That jumps out to me. :)
perihelions 35 days ago [-]
This is awkward—I use em-dash all the time on HN! I'm not an LLM (as far as I know); I just like to write neatly when I'm able to, and it's very low friction when you're familiar with your keyboard compose sequences[0]. It's a trivial four keypresses,
AltR(hold) - - -
(The discoverability of these functions is way too low, on GNOME/Linux; I really dislike the direction of modern UX, with its fake simplicity, and infantalization of users. Way more people would be using —'s and friends if they were easily discoverable and prominently hinted in their UX. "It's documented in x.org man pages" is an unacceptable state of affairs for a core GUI workflow).
never knew about the em dash thing, I was just using an AI writing assistant to help fix my shitty grammar and formatting. I think in future ill stick with bad formatting
memorydial 34 days ago [-]
no, just l–AI–zy copy-pasta. your book looks great! putting on your chat with lex now.
memorydial 34 days ago [-]
no, just lazily and stupidly used an AI writing assistant
kordlessagain 32 days ago [-]
Me too! :)
vendiddy 35 days ago [-]
I wish the remarkable tablets weren't so locked down.
It's one of my favorite pieces of hardware and wish there were more apps for it.
thrtythreeforty 35 days ago [-]
Locked down? You can get a shell by ssh'ing to it. Call me when an iPad lets you do that...
freedomben 35 days ago [-]
I agree I definitely wouldn't call them "locked down." I do however think they could do a lot more to make it usable/hackable. This slightly undermines their cloud service ambitinos, but I think the hackability is what makes the Remarkable so ... well .. remarkable. Certainly that's why I bought one!
owulveryck 35 days ago [-]
Awesome.
I wanted to try to implement this for months. You did a really good job.
awwaiid 35 days ago [-]
Thank you! Still a WIP, but a very fun learning / inspiration project. Got a bit of Rust jammed in there, bit of device constraint dancing, bit of multiple LLM api normalization, bit of spacial vision LLM education, etc.
owulveryck 35 days ago [-]
At some point I wanted to turn goMarkableStream into a MCP server (model context protocol).
I could get the screen, but without “hack” I couldn’t write the response back.
awwaiid 35 days ago [-]
The trick here is to inject events as if they came from the user. The virtual-keyboard works really reliably, you can see it over at https://github.com/awwaiid/ghostwriter/blob/main/src/keyboar... . It is the equivalent of plugging in the reMarkable type-folio.
Main limitation is that the reMarkable drawing app is very very minimal, it doesn't let you place text in arbitrary screen locations and is instead sort of a weird overlay text area spanning the entire screen.
rpicard 35 days ago [-]
This is so cool. I’m going to try it this weekend.
I’ve been playing with the idea of auto creating tasks when I write todos by emailing the PDF and sending it to an LLM.
This just opened up a whole realm of better ways to accomplish that goal in realtime.
r2_pilot 35 days ago [-]
This works pretty well when I did a proof of concept with Claude and rMPP a couple of months ago. It even handles scheduling fuzzy times ("I want to do this sometime but I don't have any real time I want to do it, pick a time that doesn't conflict with my actually scheduled tasks"). All with minimal prompting. I just didn't have a decent workflow and did exactly what you considered, emailed the pdf. I should probably revisit this but I haven't had the inclination since I just ignored the tasks anyway lol
rpicard 35 days ago [-]
Ha, automating the doing of the task is the next step.
Rust binary so should be easy to install. In theory :)
rpicard 35 days ago [-]
Will do! My wife and I love Harry Potter so I’m motivated to show her my investment in the tablet actually got me Tom Riddle’s diary.
I don’t use discord much but I’ll find you somewhere around here!
awwaiid 35 days ago [-]
I'm on at awwaiid@gmail.com and probably other places :)
"proof" to partner of tablet investment value based on interactive fiction conversation == excellent strategy and nothing could go wrong
t0bia_s 35 days ago [-]
How about this on android driven Onyx Boox ereaders? Would it be possible?
awwaiid 35 days ago [-]
The limitations for the reMarkable made it so that I took a screenshot and then injected input events to interact with the proprietary drawing app. Cross-app screenshots with the right permission are probably possible on Android, I'm not sure about injecting the drawing events.
The other way to go would be to make a specific app. I just picked up an Apple Pencil and am thinking of porting the concepts to a web app which so far works surprisingly well ... but for a real solution it'd be better for this Agent to interact with existing apps.
memorydial 35 days ago [-]
This is a brilliant use case—handwriting input combined with LLMs makes for a much more natural workflow. I wonder how well it handles messy handwriting and if fine-tuning on personal notes would improve recognition over time.
r2_pilot 35 days ago [-]
I did this a few months ago with the Remarkable Paper Pro and Claude. It worked quite well, my handwriting is pretty terrible, and I even had a clunky workflow where I could just write down stuff I wanted to do, and roughly(or specifically) when I wanted to do it, and it was able to generate an ical I could load into my calendar.
awwaiid 35 days ago [-]
Generally if I can read my handwriting then it can! It has no issues with that. Really the problem is more in spacial awareness -- it can't reliably draw an X in a box, let alone play tic-tac-toe or dots-and-boxes.
vessenes 35 days ago [-]
Love this! There are some vector diffusion models out there; why not use tool calling to outsource to one of those if the model decides to draw something? Then it could specify coordinate range and the prompt.
awwaiid 35 days ago [-]
Two reasons. One, because I haven't gotten to it yet. Two... er no just the one reason! Do you have a particular one, ideally with a hosted API, that you recommend?
I’ve been working on a different angle - in place updating of PDFs on the Remarkable, so it’s cool to see what you’re working on. Thanks for sharing it.
xtiansimon 35 days ago [-]
For PDF paper readers, is the Remarkable’s 11” size sufficient? I have the Sony DPT 2nd version at 13”, and it’s perfect viewing experience. But projects like this keep drawing me to the Remarkable product.
pilotneko 35 days ago [-]
I have used the Remarkable 2 for papers, but it is slightly too small to read text comfortably. I’m also an active reader, so I miss the color highlighting. Annotations are excellent. For now, I’m sticking to reviewing papers in the Zotero application on my iPad.
abawany 35 days ago [-]
I got the reMarkable Pro tablet recently and as a result was able to move on from my Sony DPT-S1 and reMarkable 2. The latter was nice for its hackability but the screen size of the Pro, its color functionality, and size have made it a great replacement.
kordlessagain 35 days ago [-]
It’s barely usable for PDFs
freedomben 35 days ago [-]
Depends mostly on the font size in the PDF. For dense PDFs I agree, it's barely usable. For most PDFs though I'd call it "acceptable." If you have control over the font size (such as when you're converting some source material to PDF) you can make it an excellent reading experience IMHO.
xtiansimon 31 days ago [-]
So close. The advertised diagonal screen for the reMarkable Pro is 11.8". The DPT-RP1 is advertised as 13.3" (my unit measures 13.125"). Hopefully in the future reMarkable will make a full-size unit. As mobile phones, tablets, laptops and monitor sales indicate, larger sized screen is important buying factor.
3abiton 35 days ago [-]
I own a boox tablet (full fledge Android tablet with eink screen), and this sort of things would be perfect for it. I wonder if in 5 years the mobile hw would support something like that locally!
complex1314 35 days ago [-]
Really cool. Would this run on the remarkable paper pro too?
awwaiid 35 days ago [-]
Buy me one and I'll find out! hahahaha
But also -- the main thing that might be different is the screenshot algorithm. I'm over on the reMarkable discord; if you want to take up a bit of Rust and give it a go then I'd be happy to (slowly/async) help!
complex1314 35 days ago [-]
:) Thanks! Been looking into learning rust recently, so will keep that in mind if I get it off the ground.
awwaiid 35 days ago [-]
Initially most of the Rust was written by copilot or Sourcegraph's Cody; then I learn more and more rust as I disagree with the code-helper's taste and organization. Though I have a solid foundation in other programming languages which accelerates the process ... it's still a weird way to learn a language that I'm getting used to and kinda like.
That said, I based the memory capture on https://github.com/cloudsftp/reSnap/tree/latest which is a shell script that slurps out of process space device files. If you can find something like that which works on the rPP then I can blindly slap it in there and we can see what happens!
chrismorgan 35 days ago [-]
> Things that worked at least once:
I like it.
awwaiid 35 days ago [-]
Top quality modern AI Eval!!!
seethedeaduu 35 days ago [-]
Kinda unrelated but should I go for kobo or the remarkable? I mostly want to read papers and maybe take notes. How do tthey compare in terms of hackability and freedom?
newman314 35 days ago [-]
I wonder if this can be abstracted to accept interaction from a Daylight too.
35 days ago [-]
cancelself 35 days ago [-]
@apple.com add to iPadOS Notes?
ghfhghg 35 days ago [-]
[flagged]
tony_francis 35 days ago [-]
Harry potter half-blood prince vibes. Interesting just how much the medium changes the feeling of interacting with a chat model
GeoAtreides 35 days ago [-]
erm, you mean harry potter tom riddle's horcrux diary, sure
you know, the diary that wrote back to you and possessed your soul? that cursed diary?
guax 35 days ago [-]
I wonder if its better than the current version where my soul gets possessed by youtube shorts for 40 minutes.
s2l 35 days ago [-]
Now only if llm response font is some handwritten style.
awwaiid 35 days ago [-]
This uses LLM Tools to pick between outputting an SVG or plugging in a virtual keyboard to type. The keyboard is much more reliable, and that's what you see in the screenshot.
If nothing else it could use an SVG font that has handwriting; you'd need to bundle that for rendering via reSVG or use some other technique.
But if I ever make a pen-backend to reSVG then it would be even cooler, you would be able to see it trace out the letters.
satvikpendem 35 days ago [-]
That's definitely pretty easy to achieve, just change the font settings to use a particular handwritten style font [0].
That would be next-level immersion! You could probably achieve this by rendering the LLM’s response using a handwritten font—maybe even train a model on your own handwriting to make it feel truly personal.
dharma1 35 days ago [-]
Script fonts don’t really look like handwriting - too regular.
But one of the early deep learning papers from Alex Graves does this really well with LSTMs - https://arxiv.org/abs/1308.0850
Actually if you figure that out please post it here!! I'd love to see that!
memorydial 35 days ago [-]
Exactly! There’s something about handwriting that makes it feel more personal—like scribbling notes in the margins of a spellbook. The shift from typing to pen input definitely changes the vibe of interacting with AI.
hexomancer 35 days ago [-]
That's beside the point but you are probably referring to harry potter and the chamber of secrets not the half-blood prince.
35 days ago [-]
8bithero 35 days ago [-]
Not to distract from the project but if anyone is interested in eink tablets with LLMs, the ViWoods tablet might be of interest to you.
Ensign35 35 days ago [-]
Is this a Remarkable rebrand? Even the UI looks the same!
Next top things:
* Continue to build/extract into a yaml+shellscript agentic framework/tool
* Continue exploring pre-segmenting or other methods of spacial awareness
* Write a reSvg backend that sends actual pen-strokes instead of lots of dots
For example, maybe I'm taking notes involving words, simple math, and a diagram. Underline a key phrase and "the device" expands on the phrase in the margin. Maybe the device is diagramming, and I interrupt and correct it, crossing out some parts, and it understands and alters.
Sorry, I know this is vague, I don't know precisely what I mean, but I do think that the combination of text (via some sort of handwriting recognition), stroke gestures, and a small iconography language with things enabled by LLMs probably opens up all sorts of new user interaction paradigms that I (and others) might be too set in our ways to think of immediately.
I think there's a "mother of all demos" moment potentially coming soon with stuff like this, but I am NOT a UX designer and can't quite imagine it clearly enough. Maybe you can.
Thinking about it as a product, I’d want a way to easily slip in and out of “LLM please respond” so it wasn’t constantly trying to write back the moment I stopped the stylus - maybe I’d want awhile to sketch and think, then restart a conversation. Or maybe for certain pages to be LLM-enabled, and others not.
Does it require any sort of jailbreak to get SSH access to the device?
It is triggered right now by finger-tapping in the upper-right corner, so you can ask it to respond to the current contents of the screen on-demand. I think it would be cool to have another out-of-band communication, like voice, but this device has no microphone.
Also right now it is one-shot, but on my long long TODO list is a second trigger that would _continue_ a back and forth multi-screenshot (like multi-page even) conversation.
I’m curious if this is becoming something that you are using in your own day-to-day, or if your focus right now is on building it?
The context for my question is just a general interest in the transition to AI-enabled workflows. I know that I could be much more productive if I figured out how to integrate AI assistance into my workflows better.
The one use-case that is _close_ to ready-for-useful: I often take business meeting notes. In these notes I often write a T in a circle to indicate a TODO item. I am going to add a bit of config in there, basically "If you see a circle-T, then go add that to my todo list if it isn't already there. If you see a crossed-out circle-T then go mark it as done on the todo list" .
I got slightly distracted implementing this, working instead toward a pluggable "if you see X call X.sh" interface. Almost there though :)
I made a little app for reMarkable too and I shared it here some time back: https://digest.ferrucc.io/
edit: found the official developer website https://developer.remarkable.com/documentation
https://github.com/erikbrinkman/rmapi-js
Basically authentication with devices is "all-access" or "no-access". I would've liked it if a "write-only" or "add-only" api permission scope existed
https://news.ycombinator.com/threads?id=memorydial
" @dang " isn't a thing, he doesn't watch for it - take credit and email him direct.
FWiW I mostly read HN at it's deadest time (I'm GMT+8 local time) and I see a lot of mechanical turk comments, especially from new (green coloured) accounts.
I always look for a response (eg: yours) before flagging them as spam bots . . .
Retrieval is tricky as Algolia doesn't index '@' symbols:
https://hn.algolia.com/?query=%40dang%20by%3Adang&sort=byDat...
[0] https://news.ycombinator.com/item?id=35118338#35118598 (On "Punctuation Matters: How to use the en dash, em dash and hyphen" (2023); 356 comments)
It's one of my favorite pieces of hardware and wish there were more apps for it.
I wanted to try to implement this for months. You did a really good job.
Main limitation is that the reMarkable drawing app is very very minimal, it doesn't let you place text in arbitrary screen locations and is instead sort of a weird overlay text area spanning the entire screen.
I’ve been playing with the idea of auto creating tasks when I write todos by emailing the PDF and sending it to an LLM.
This just opened up a whole realm of better ways to accomplish that goal in realtime.
Rust binary so should be easy to install. In theory :)
I don’t use discord much but I’ll find you somewhere around here!
"proof" to partner of tablet investment value based on interactive fiction conversation == excellent strategy and nothing could go wrong
The other way to go would be to make a specific app. I just picked up an Apple Pencil and am thinking of porting the concepts to a web app which so far works surprisingly well ... but for a real solution it'd be better for this Agent to interact with existing apps.
I’ve been working on a different angle - in place updating of PDFs on the Remarkable, so it’s cool to see what you’re working on. Thanks for sharing it.
But also -- the main thing that might be different is the screenshot algorithm. I'm over on the reMarkable discord; if you want to take up a bit of Rust and give it a go then I'd be happy to (slowly/async) help!
That said, I based the memory capture on https://github.com/cloudsftp/reSnap/tree/latest which is a shell script that slurps out of process space device files. If you can find something like that which works on the rPP then I can blindly slap it in there and we can see what happens!
I like it.
you know, the diary that wrote back to you and possessed your soul? that cursed diary?
If nothing else it could use an SVG font that has handwriting; you'd need to bundle that for rendering via reSVG or use some other technique.
But if I ever make a pen-backend to reSVG then it would be even cooler, you would be able to see it trace out the letters.
[0] https://fonts.google.com/?categoryFilters=Calligraphy:%2FScr...
But one of the early deep learning papers from Alex Graves does this really well with LSTMs - https://arxiv.org/abs/1308.0850
Implementation - https://www.calligrapher.ai/
edit: https://viwoods.com/ (based in Hong Kong)
edit 2:
It's a blatant copy of the Remarkable 2 for sure :/ LLM integration is interesting --> Remarkable are you listening?