This is a good idea but I hope you've got some secret training data that isn't available on the open web. I've been able to stump ChatGPT with simple "gotcha" national electrical code questions that a foreman wouldn't have a problem answering(e.g sizing a breaker for a heater depending on different situations). There are far fewer subreddits and forums dedicated to trade specialists and as a community they're more hostile to DIY-ers and will tell you "get someone licensed." They're also not the types to write detailed reports and case studies on what they did.
It's not that trades are super complicated in comparison to other fields like web development, it's that there's no GitHub, no source shared among all pros like "here's what I did and how I got it to work." Without a good stack overflow how does the AI judge the quality of workmanship in photos?
You are absolutely right, btw, about google drives and one drives and hundreds of photos and all that. My experience is in dealing with general contractors on smaller jobs, not supers on mega projects, but they have similar issues. Lots of sloppy back and forths and poor tracking of change orders, etc,
What Im trying to say, since I sort of rambled there, is that while processing and sorting and making punchlists is a good idea, I have doubts about AI's current ability to accurately spot code(as in building code, which unlike JavaScript varies by zip code) issues. Does the AI know that you dont have enough clearance at X or does that have to go into the recording?
arvindveluvali 35 days ago [-]
Great point! We're really relying on the superintendent's expertise, transcribing/compiling what they're saying rather than flagging code violations or other notables ourselves. We think analysis should be (for now, at least) the job of the highly trained and experienced superintendent, and our job is to take care of the transcription and admin that isn't really a good use of their time.
insane_dreamer 34 days ago [-]
> our job is to take care of the transcription and admin that isn't really a good use of their time
that's the correct focus, IMO; let the experts be experts rather than pretend that LLMs are all-knowing
nicely done
codpiece 34 days ago [-]
Great answer, and good proper use of the benefits of LLM. Let the LLM do the grunt work and let the expert human be the expert. Best of luck to you!
deepGem 34 days ago [-]
I am trying out a handyman copilot for small repairs and these folks also have similar vibes. I think job protection is their no 1 priority. The field is rife with regulations as well. Some jobs need licensed professionals while other jobs don't. This varies state by state. It's a regulatory minefield from what I have seen. Perhaps rightfully so because it's your home and if something goes wrong, a lot is at stake.
It is almost always impossible to get someone to repair right away. The supply is nowhere near demand, so it is a problem worth solving IMO.
rm_-rf_slash 35 days ago [-]
Looks neat! I don’t work in construction but I know folks in civil engineering. Are there applicabilities with Fresco you could see in that domain?
arvindveluvali 35 days ago [-]
Absolutely. There are a ton of industries where people conduct physical site inspections and turn those into structured documents; as in construction, those take a long time to make! We've actually had some inbound from civil engineers, and if we can be useful to folks in your network, we'd love to connect with them.
wallawe 34 days ago [-]
Solid idea, and best of luck.
If I could make one recommendation: hire a UX/UI designer ASAP. The less technical the audience, the more intuitive and easy to navigate the UI needs to be.
Our company focuses on home service businesses and they get roadblocked super easily. I think you'll be glad you did it earlier rather than later. Otherwise, the ux debt will pile up and it will be quite a project a year down the line.
arvindveluvali 33 days ago [-]
Great advice—there’s definitely a lot of UI work we want to do :)
justinzhou13 35 days ago [-]
This is super cool and there’s a ton of other industries where this is sorely needed!
Closi 35 days ago [-]
FYI - this could be really useful in logistics operations and production too! (Which is my background, although I suspect the price point is unfortunately much too high for that application).
arvindveluvali 35 days ago [-]
Thanks for the flag! Absolutely, there are many verticals where we think Fresco can be useful. Would love to hear your thoughts on price point.
Closi 34 days ago [-]
I don't have firm thoughts on the price point, but two examples of real-world use cases would be:
* There is a food production company where their QA's do a monthly walkaround. It takes approx. 2-3 hours to type up notes after the walkaround. I'm in the UK and QA's are paid approx. £32k GBP, so 3 hours of their time is more like £50 benefit.
* Lots of logistics companies do daily walks with shift/team leaders. While these aren't usually typed up or anything, it would be great to document them in terms of actions and a tasklist to complete. The alternative to the software would be getting a team leader to write up notes after the walk, and this would take maybe 30 minutes. A team leader might be £28k p.a. so cheaper to get them to do it than buy software at $12k p.a.
The cost of the software would need to be a fraction (e.g. 10%) of what it is at the moment though for these sorts of use-cases to pay off.
Maybe a more generic version of the software not targeted at the construction niche could be something like £49 per month per user? Sounds more like the sort of level I would expect.
But I'm thinking that $1k is like way way way out of the reasonable range of my use case, and this is so different to your current business model I imagine it's irreconcilable.
arvindveluvali 33 days ago [-]
Makes sense! We’re pretty focused on the construction vertical at this point but when we do expand into others I imagine we’ll be a bit creative with the pricing/features.
35 days ago [-]
kyleli626 35 days ago [-]
really cool application of LLMs to a big problem - nice work.
0_____0 34 days ago [-]
I'm in the middle of a renovation project (not as a professional developer, just a random dipshit who wanted to make a multifam building a little nicer and bit off way too much)
Anyway, I've been running around compiling and recompiling photos and punchlists, and my reaction was "Coool!"
I'm not your target audience but I have to imagine the people that are would get utility out of this.
arvindveluvali 33 days ago [-]
Awesome! Happy to get you demo access if you want to use Fresco to make those punch lists, just shoot me a message at arvind@fresco-ai.com
StephenSmith 35 days ago [-]
We make an AI camera for residential home builders. I'd love to chat to see if there's any synergy here.
bedrockwireless.com
Ping me, stephen [at] bedrockwireless.com
bambax 35 days ago [-]
With all due respect and while wishing you best of luck, it's always a bit worrisome when generative AI is used in the real world with real consequences...
In my experience, what LLMs, even some of the most advanced ones (o1, Gemini 1.5) are really good at is rationalization after the fact: explaining why they were right, even when presented with direct evidence to the contrary.
I just ran an experiment trying to get various models put footnote references in the OCR of a text, based on the content of the footnotes. I tested 120+ different models via OpenRouter; they all failed, but the "best" ones failed in a very bizarre and I think, dangerous way: they made up some text to better fit the footnote references! And then they lied about it, saying in a "summary" paragraph that no text had been changed, and/or that they had indeed been able to place all references.
So I guess my question is: how do you detect and flag hallucinations?
arvindveluvali 35 days ago [-]
This is a really good point, but we don't think hallucinations pose a significant risk to us. You can think of Fresco like a really good scribe; we're not generating new information, just consolidating the information that the superintendent has already verbally flagged as important.
mayank 35 days ago [-]
This seems odd. If your scribe can lie in complex and sometimes hard to detect ways, how do you not see some form of risk? What happens when (not if) your scribe misses something and real world damages ensue as a result? Are you expecting your users to cross check every report? And if so, what’s the benefit of your product?
arvindveluvali 35 days ago [-]
We rely on multimodal input: the voiceover from the superintendent, as well as the video input. The two essentially cross check one another, so we think the likelihood of lies or hallucinations is incredibly low.
Superintendents usually still check and, if needed, edit/enrich Fresco’s notes. Editing is way faster/easier than generating notes net new, so even in the extreme scenario where a supe needs to edit every single note, they’re still saving ~90% of the time it’d otherwise have taken to generate those notes and compile them into the right format.
This is the wrong response. It doesn't matter whether you've asked it to summarize or to produce new information, hallucinations are always a question of when, not if. LLMs don't have a "summarize mode", their mode of operation is always the same.
A better response would have been "we run all responses through a second agent who validates that no content was added that wasn't in the original source". To say that you simply don't believe hallucinations apply to you tells me that you haven't spent enough time with this technology to be selling something to safety-critical industries.
joe_the_user 34 days ago [-]
"Concerns about medical note-taking tool raised after researcher discovers it invents things no one said..."
It has to be the same as all AI: you need someone thorough to check what it did.
LLM generated code needs to be read line by line. It is still useful to do that with code because reading is faster than googling then typing.
You can't detect hallucinations in general.
bambax 35 days ago [-]
A (costly) way is to compare responses from different models, as they don't hallucinate in exactly the same way.
fakedang 35 days ago [-]
Honestly this is a very nitpicky argument. The issue for site contractors is not with manually checking each entry to ensure it's correct or not. It's writing the stuff down in the first place.
I'm exploring a similar but unrelated use case for generative AI, and in discovery interviews, what I learnt was that site contractors and engineers do not request or expect 100% accuracy, and leave adequate room for doubt. For them, it's the hours and hours of manually writing down a TON of paperwork, which in some industries is often months and months of work written by some of the poorest communicators on the planet. Because these tasks end up consuming so much time, they forgo the correct methodology and some even tend to fill up some reports with random bullshit just so that the project moves forward - in most cases, this writing work is done for liability concerns as mentioned above, rather than for the purposes of someone actually going through it. If the writing part is cleared for many of these guys, most wouldn't have a problem with the reading and correcting part.
bambax 34 days ago [-]
It's unclear how filling reports with "random bullshit" will protect anyone from liability... It seems you're saying that the current situation is so bad that anything different would be an improvement, and less-random bs is better than outright bs.
I'm sorry if my comment came across as nitpicky; it's just that every time I try to do some actual work with LLMs (that's not pure creativity, where hallucination is a feature) it never follows prompts exactly, and goes fast off the rails. In the context of construction work, that sounded dangerous. But happy to be proved wrong.
34 days ago [-]
fakedang 34 days ago [-]
> It's unclear how filling reports with "random bullshit" will protect anyone from liability... It seems you're saying that the current situation is so bad that anything different would be an improvement, and less-random bs is better than outright bs.
Exactly. Oftentimes reports are filled with nonsensical documentation that are only discovered during the discovery process of litigation after a disaster has already happened. For example, from a real safety report at a chemicals facility, there was an instance of a report stating that under high valve pressure "many bad things will happen". Not joking, literally quoted verbatim.
Most companies' legal teams would love to have their engineers write proper documents and most engineers would love to not spend time on documentation. GenAI can fill that gap by at least giving a baseline starting point which can be edited further for a fraction of the time than writing from scratch.
arvindveluvali 35 days ago [-]
Totally agree! That's what we've observed, as well.
erispoe 35 days ago [-]
The process you described is very far from how companies who productize LLMs use them.
It's not that trades are super complicated in comparison to other fields like web development, it's that there's no GitHub, no source shared among all pros like "here's what I did and how I got it to work." Without a good stack overflow how does the AI judge the quality of workmanship in photos?
You are absolutely right, btw, about google drives and one drives and hundreds of photos and all that. My experience is in dealing with general contractors on smaller jobs, not supers on mega projects, but they have similar issues. Lots of sloppy back and forths and poor tracking of change orders, etc,
What Im trying to say, since I sort of rambled there, is that while processing and sorting and making punchlists is a good idea, I have doubts about AI's current ability to accurately spot code(as in building code, which unlike JavaScript varies by zip code) issues. Does the AI know that you dont have enough clearance at X or does that have to go into the recording?
that's the correct focus, IMO; let the experts be experts rather than pretend that LLMs are all-knowing
nicely done
It is almost always impossible to get someone to repair right away. The supply is nowhere near demand, so it is a problem worth solving IMO.
If I could make one recommendation: hire a UX/UI designer ASAP. The less technical the audience, the more intuitive and easy to navigate the UI needs to be.
Our company focuses on home service businesses and they get roadblocked super easily. I think you'll be glad you did it earlier rather than later. Otherwise, the ux debt will pile up and it will be quite a project a year down the line.
* There is a food production company where their QA's do a monthly walkaround. It takes approx. 2-3 hours to type up notes after the walkaround. I'm in the UK and QA's are paid approx. £32k GBP, so 3 hours of their time is more like £50 benefit.
* Lots of logistics companies do daily walks with shift/team leaders. While these aren't usually typed up or anything, it would be great to document them in terms of actions and a tasklist to complete. The alternative to the software would be getting a team leader to write up notes after the walk, and this would take maybe 30 minutes. A team leader might be £28k p.a. so cheaper to get them to do it than buy software at $12k p.a.
The cost of the software would need to be a fraction (e.g. 10%) of what it is at the moment though for these sorts of use-cases to pay off.
Maybe a more generic version of the software not targeted at the construction niche could be something like £49 per month per user? Sounds more like the sort of level I would expect.
But I'm thinking that $1k is like way way way out of the reasonable range of my use case, and this is so different to your current business model I imagine it's irreconcilable.
Anyway, I've been running around compiling and recompiling photos and punchlists, and my reaction was "Coool!"
I'm not your target audience but I have to imagine the people that are would get utility out of this.
bedrockwireless.com
Ping me, stephen [at] bedrockwireless.com
In my experience, what LLMs, even some of the most advanced ones (o1, Gemini 1.5) are really good at is rationalization after the fact: explaining why they were right, even when presented with direct evidence to the contrary.
I just ran an experiment trying to get various models put footnote references in the OCR of a text, based on the content of the footnotes. I tested 120+ different models via OpenRouter; they all failed, but the "best" ones failed in a very bizarre and I think, dangerous way: they made up some text to better fit the footnote references! And then they lied about it, saying in a "summary" paragraph that no text had been changed, and/or that they had indeed been able to place all references.
So I guess my question is: how do you detect and flag hallucinations?
Superintendents usually still check and, if needed, edit/enrich Fresco’s notes. Editing is way faster/easier than generating notes net new, so even in the extreme scenario where a supe needs to edit every single note, they’re still saving ~90% of the time it’d otherwise have taken to generate those notes and compile them into the right format.
A better response would have been "we run all responses through a second agent who validates that no content was added that wasn't in the original source". To say that you simply don't believe hallucinations apply to you tells me that you haven't spent enough time with this technology to be selling something to safety-critical industries.
https://www.tomshardware.com/tech-industry/artificial-intell...
LLM generated code needs to be read line by line. It is still useful to do that with code because reading is faster than googling then typing.
You can't detect hallucinations in general.
I'm exploring a similar but unrelated use case for generative AI, and in discovery interviews, what I learnt was that site contractors and engineers do not request or expect 100% accuracy, and leave adequate room for doubt. For them, it's the hours and hours of manually writing down a TON of paperwork, which in some industries is often months and months of work written by some of the poorest communicators on the planet. Because these tasks end up consuming so much time, they forgo the correct methodology and some even tend to fill up some reports with random bullshit just so that the project moves forward - in most cases, this writing work is done for liability concerns as mentioned above, rather than for the purposes of someone actually going through it. If the writing part is cleared for many of these guys, most wouldn't have a problem with the reading and correcting part.
I'm sorry if my comment came across as nitpicky; it's just that every time I try to do some actual work with LLMs (that's not pure creativity, where hallucination is a feature) it never follows prompts exactly, and goes fast off the rails. In the context of construction work, that sounded dangerous. But happy to be proved wrong.
Exactly. Oftentimes reports are filled with nonsensical documentation that are only discovered during the discovery process of litigation after a disaster has already happened. For example, from a real safety report at a chemicals facility, there was an instance of a report stating that under high valve pressure "many bad things will happen". Not joking, literally quoted verbatim.
Most companies' legal teams would love to have their engineers write proper documents and most engineers would love to not spend time on documentation. GenAI can fill that gap by at least giving a baseline starting point which can be edited further for a fraction of the time than writing from scratch.