Interesting and much needed--I absolutely refuse to go to Twitter links anymore; why bother, they don't usually work unless you're logged in. So, I clicked some of the links and, at this time, they don't load in a reasonable amount of time, I assume due to the Hug of Death or whatever it is we call that here.
This is the issue with Nitter instances; they seem too ephemeral for reliable use.
Akronymus 19 days ago [-]
nitter.poast.org is pretty reliable for me.
vednig 19 days ago [-]
I think nitter.poast.org is geo-restricted, I'm getting 403 error
KomoD 18 days ago [-]
They're just very strict with IP blocking and ratelimits. Pretty sure VPNs and such are blocked too.
vednig 19 days ago [-]
Xcancel is also a great service, it's a complete X replacement.
jimmySixDOF 19 days ago [-]
I had the same experience and suggest a loading progress bar of some sort. I also didn't see any of the Selenium process on their Github so can't see exactly how the tweets are processed for plain text cleaning. The Twitter/x API is so priced out of the market these types of work arounds are necessary inconveniences so thanks to the dev for putting this together!
Backend server's source code is now located at https://github.com/vednig/x-api ,currently the script waits for complete page to load and then extracts info to send out a response.
Since each new request takes high time and processing power, it was only logical to store responses and cache them for timely usage.
There's also a bit of a workaround required to get videos from X.com as they only load their direct URL when focused hence parsing html couldn't work, so I had to log network traffic to ensure I get their HLS links too.
vednig 16 days ago [-]
BTW, New Update Alert: fixed this with 4 lines of Python code
uniquetwid=[]
finalresponse=[]
for i in response:
if not uniquetwid.__contains__(i['tweet_id']):
finalresponse.append(i)
print(i)
Let's Enjoy, reading unrepeated threads now !
vednig 19 days ago [-]
Repetition occurs due to m3u8 requests I'm logging for videos, which require twice the reload of the page, it can be avoided by updating some code.
And since it's highly cached based on unique url requests, it persists, I'm thinking of having a purge cache option too, I'd have to think how to implement it so as to prevent abuse as well.
matcha-video 19 days ago [-]
Very cool! I'm very impressed that Twitter hasn't detected your selenium usage, it seems like they're really trying to push people towards their paid API.
I made a similar tool, except it's a browser extension that lets you download threads as markdown (then it's up to you to dump them into a DB or train a model on them or whatever), I'd love to get your feedback
https://chromewebstore.google.com/detail/thread-to-markdown/... (firefox version also available)
Selenium usage is similar to browsing patterns of real person. (I hope they don't ban my account, if they see this)
Your extension somehow crashed on my system while downloading a thread(i couldn't see download button), I was able to get 1 md file downloaded though and I'd say it does it's job nicely, in the highest manner.
Feature Request: Make download button compatible with Dark Mode(currently it stands out loud)
matcha-video 19 days ago [-]
You're a true user! I just put up a small change that should improve reliability, let me address dark mode as well.
May I ask what you use the markdown for after its downloaded? Would you modify the output at all to make it better fit your use case?
vednig 19 days ago [-]
I'd use it somewhere in Github or Dev.to where there is support for MD, though I'm opening it using VS Code which doesn't has direct MD support. Or I'd like to archive it for some reason.
Okay I figured out why it wasn't loading on every tweet page. v0.3.6 is much more reliable now.
matcha-video 19 days ago [-]
Dark mode is now supported as of v0.3.4 :)
Thanks for your feedback
toomuchtodo 19 days ago [-]
Great work! Can I chip in to support archiving unrolled threads to the Internet Archive and archive.today (it’s a trivial HTTP request to do so)? I asked threadreaderapp.com for this to radio silence.
vednig 19 days ago [-]
Sure, service is free for all(it's highly cached, so it shouldn't be a problem), code is open source and if it helps that would be great.
Here are few things to consider:
- First unique request for X might take time (time increase is directly proportional to number of comments on a post)
- If Internet Archive and Archive.today have a limit for crawled request this may affect them (only first time)
- Links to profile in Unlace are currently not 100% accurate all the time
I'm currently looking for ways to resolve these issues, in the server. Meanwhile feel free use it as you may, there's no T&C or Privacy Policy in service because there's nearly zero data collection and infinite use cases to explore(star trek reference).
Thanks for sharing! Let me ask one quick question -- how do you host/deploy python backend (x-api)?
vednig 19 days ago [-]
Since, app has selenium minimum running requirement would be 1GB RAM on any OS, and free version of any hosting do not provide that, some that provide are limited by build times or shared cpus.
So it was only logical to go through the pain/pleasure of setting up deployment on VM.
Python Backend is deployed on Azure VM(with 4 GB), since it's built on Fastapi their AWSGI support helps keep the server running on long intervals.
This one is using CF Tunnels to Prod to reduce the latency due to server location.
This one loaded: https://unlace.app/thread/ianwcrosby/status/1872724231999381..., but oddly it's duplicating the thread.
Replace "x.com" by "xcancel.com" (or some other still working Nitter instance) and you're set. https://xcancel.com/ianwcrosby/status/1872724231999381790
This is the issue with Nitter instances; they seem too ephemeral for reliable use.
Since each new request takes high time and processing power, it was only logical to store responses and cache them for timely usage.
There's also a bit of a workaround required to get videos from X.com as they only load their direct URL when focused hence parsing html couldn't work, so I had to log network traffic to ensure I get their HLS links too.
And since it's highly cached based on unique url requests, it persists, I'm thinking of having a purge cache option too, I'd have to think how to implement it so as to prevent abuse as well.
I made a similar tool, except it's a browser extension that lets you download threads as markdown (then it's up to you to dump them into a DB or train a model on them or whatever), I'd love to get your feedback https://chromewebstore.google.com/detail/thread-to-markdown/... (firefox version also available)
Edit: Example output using tweet thread mentioned elsewhere in this post https://sharetext.io/c04b7c82
Your extension somehow crashed on my system while downloading a thread(i couldn't see download button), I was able to get 1 md file downloaded though and I'd say it does it's job nicely, in the highest manner.
Feature Request: Make download button compatible with Dark Mode(currently it stands out loud)
May I ask what you use the markdown for after its downloaded? Would you modify the output at all to make it better fit your use case?
I downloaded this tweet, for example, this can give you an idea about potential usecases https://x.com/ThePrimeagen/status/1873778477448782035
Here are few things to consider:
- First unique request for X might take time (time increase is directly proportional to number of comments on a post)
- If Internet Archive and Archive.today have a limit for crawled request this may affect them (only first time)
- Links to profile in Unlace are currently not 100% accurate all the time
I'm currently looking for ways to resolve these issues, in the server. Meanwhile feel free use it as you may, there's no T&C or Privacy Policy in service because there's nearly zero data collection and infinite use cases to explore(star trek reference).
Doesn't load with Javascript disabled
https://unlace.app/thread/ianwcrosby/status/1872724231999381...
Does load with Javascript disabled.
Why one yes, the other no?
Both use nearly the same logic. You can view them in the repo
1. https://github.com/vednig/unlaceapp/blob/main/app/page.tsx
2.https://github.com/vednig/unlaceapp/blob/main/app/thread/%5B...
So it was only logical to go through the pain/pleasure of setting up deployment on VM.
Python Backend is deployed on Azure VM(with 4 GB), since it's built on Fastapi their AWSGI support helps keep the server running on long intervals.
This one is using CF Tunnels to Prod to reduce the latency due to server location.
You can see uvicorn docs to learn more https://www.uvicorn.org/