I think that most people are underestimating Nvidia strategy.
Their bet is that AI will unlock robotics use and they don't want to be simply compute providers, they want to innovate on the whole chain, software, hardware, services, everything.
Their position is quite unique as their R&D is basically financed by their future competitors, they are making bank while going where the puck will be.
nthingtohide 17 days ago [-]
I think Nvidia should try to create an compute-analogue of wifi routers where computation will be offloaded to smart-home gpu-servers. This strategy will cement their future for eternity.
david-gpu 17 days ago [-]
Something in that spirit is already making progress: compute attached to cell phone towers, called "edge computing".
Because you are pooling computer resources across many more users you have better amortized cost than if you had a workstation on every home that was idle 99% of the time. Of course, latency and bandwidth to the cell tower is worse than wifi, but better than if the compute is done on a remote server.
I have no idea of which model will succeed for which use cases, but the idea is sound.
nthingtohide 17 days ago [-]
Smart mirror, smart appliances, smart robots (of various kinds), all could benefit from such a central compute server which can work without internet. All appliances will have voice interface, and will require heavy intelligence so it is better if the model is downloaded to a central place at home. This model also means all home appliances can be equally smart with no restriction on form factor of the appliances themselves. Nvidia could model itself around current mobile companies selling OS, models, everything in a nice compute-rack package.
giancarlostoro 17 days ago [-]
Not sure how well this would scale if every users needing to compute all at once. Which all it takes is some meme new app that requires heavy computation and then everyone downloads it.
david-gpu 16 days ago [-]
It has the same scalability issues and the same scalability solutions as cell towers themselves, doesn't it? Densely populated areas have more cell towers and thus smaller cells.
Again: which computation paradigm works best depends on the use case. It is not an all or nothing situation.
dartos 17 days ago [-]
Why sell to consumers when you can sell to large corporations at 10x the price?
talldayo 17 days ago [-]
Because consumers will pay ~3-5x MSRP if the hype is big enough.
jszymborski 16 days ago [-]
Corps aren't impervious to hype either.
Teever 17 days ago [-]
I think there's a definite itch to scratch with this kind of stuff and you can see hobbyists already tinkering on the edges with things like home assistant and the homelabs community.
As much as there are market drivers to make the cloud attractive to businesses and legitimate reasons why cloud solutions are better than alternatives there is a real desire in a growing number of people to have a solution that they can tinker with but that also has a polished UI so they don't have to tinker with it when they don't want to.
penjelly 16 days ago [-]
funny I landed on this idea myself but ended up thinking it had no value
detourdog 17 days ago [-]
The hard part to pull off with this strategy is that a truly wide spread "robot" platform I think depends on the commodification of the IP driving the platform.
I think there are many early innovators that fail in later stage growth because of this issue.
amelius 17 days ago [-]
> they don't want to be simply compute providers
I'm still surprised they did not create an App Store for AI. Basically lock everything down and make developers pay a % of their revenue, Apple style.
alephnerd 17 days ago [-]
> Basically lock everything down and make developers pay a % of their revenue, Apple style
At enterprise scale, locked down marketplaces don't work. They act as an forcing factor for larger organizations to build in house because no one wants vendor lock-in or to lose money via an arbitrage.
This is a major reason why you'll see large deals pushing for enhanced customization options or API parity, as larger customers have the ability to push back against vendor lock-in.
Furthermore, a relatively open market (eg. NGC) acts as a loss-leader by allowing a community to develop using a corporate standard, thus allowing you to build stickiness without directly impacting a customer's bottom-line
Fundamentally, a company driven by Enterprise revenue (eg. Nvidia) will have a different marketplace structure from a B2C product such as Apple's App Store where purchasers have little power.
amelius 16 days ago [-]
> At enterprise scale, locked down marketplaces don't work. They act as an forcing factor for larger organizations to build in house because no one wants vendor lock-in or to lose money via an arbitrage.
But you can say exactly the same thing about large companies publishing (consumer apps) in the App Store. Why would they want vendor lock-in?
akutlay 17 days ago [-]
I think AWS showed us that it could work for enterprises too.
alephnerd 17 days ago [-]
Listing fees for AWS Marketplace are marginal compared to the overall margins of Enterprise SaaS, as 90% are the expected target margins in Enterprise SaaS - hence why 80% discounts are fairly common in enterprise sales.
More tactically, excessive charging on marketplace pushes vendors away from selling on AWS Marketplace and makes them develop alternative deployment methods, which reduce the stickiness of AWS, as hyperscalers are commodified nowadays.
Motorola learned that the hard way 40 years ago when pushing excessively restrictive OEM and Partnership rules compared to IBM.
AWS is only as strong as it's Partnership ecosystem, as companies that are purchasing tend to use 80-90 different apps along with their cloud.
Basically, Enteprise Sales shows hallmarks of a Stag Hunt Game, so a mutually beneficial pricing strategy amongst vendors (AWS, AWS Partners such as Nvidia, MSP) is ideal.
amelius 17 days ago [-]
> At enterprise scale, locked down marketplaces don't work.
If this was true, we'd have more mobile computing platforms. Large enterprises publish in Apple's AppStore.
alephnerd 17 days ago [-]
> Large enterprises publish in Apple's AppStore.
Purchasers from Apple's App Store are primarily individual consumers. It is a B2C play.
Monetizing an Nvidia marketplace such as NGC would be foolhardy as the primary users/"purchasers" are organizations with budgets and procurement power. It is an Enterprise B2B player.
In enterprise sales, the power differential between (mid- and upper-market) customers and vendors is in the customer's favor, as they have significant buying power and thus a higher user acquisition cost. The upside is revenue is much higher, margins are better, and you can differentiate on product as commodification is difficult.
This is less so in consumer facing sales as customers have significantly weaker buying power, but conversely have a much lower user acquisition cost at scale. Hence, a growth-based GTM approach is critical, as you need customers in aggregate to truly unlock revenue at scale.
amelius 16 days ago [-]
They could still segment the market into large enterprises and small businesses.
Also, I don't believe the balance of power is tipping towards business customers, as nvidia is basically the only relevant player.
talldayo 17 days ago [-]
> If this was true, we'd have more mobile computing platforms.
If you think smartphone customers and server customers are evaluating hardware based on the same criteria, then why isn't Apple the leading datacenter hardware OEM?
> Large enterprises publish in Apple's AppStore.
Yeah? Where's my Pro Tools download on the App Store? Where's my Cinema4D download? Can I get Bitwig Studio from there? Hell, is iTerm2 or Hammerspoon even available there?
Large enterprises very explicitly don't publish on the MacOS App Store because it is a purely raw deal. If you're developing a cross-platform app (which most large enterprises do), then you've already solved all the problems the App Store offers to help with. It's a burdonsome tax for anyone that's not a helpless indie, and even the indies lack the negotiating power that makes the App Store profitable for certain enterprises.
amelius 16 days ago [-]
> If you think smartphone customers and server customers are evaluating hardware based on the same criteria, then why isn't Apple the leading datacenter hardware OEM?
Because Apple doesn't want to be in the B2B space.
talldayo 16 days ago [-]
Let it never be said they didn't learn their lesson from XServe, eh?
detourdog 17 days ago [-]
Apple works with consumers who need that simplification. The developers need the market. "AI" in its current form isn't really a consumer product to be sold in that way. Consumers aren't purchasing Nvidia products to improve their life developers are.
QuadmasterXLII 16 days ago [-]
I mean, you’re almost describing Google’s TPU sales approach, and we can all see how that goes
17 days ago [-]
whatever1 17 days ago [-]
What is the new breakthrough in robotics that is gpu driven ? There are subsets of the overall problem that can be solved by a gpu (eg object detection) but the whole planning and control algo scheme seems to be more or less the same as it has been for the past decades. These typically involve non-convex optimization so not much gpu benefit.
michaelt 17 days ago [-]
Two decades ago, I was trying to use classical machine vision to tell the difference between cut and uncut grass, to guide a self-driving lawnmower.
I concluded that it couldn't be done with classical machine vision, and that this "neural network" nonsense wasn't going to catch on. Very slow, computationally inefficient, full of weirdos making grandiose claims about "artificial intelligence" without the results to back it up, and they couldn't even explain how their own stuff worked.
These days - you want to find the boundary between cut and uncut grass, even though lighting levels can change and cloud cover can change and shadows can change and reflections can change and there's loads of types of grass and grass looks different depending on the angle you look from? Just label some data and chuck a neural network at it, no problemo.
blagie 17 days ago [-]
> These days - you want to find the boundary between cut and uncut grass, even though lighting levels can change and cloud cover can change and shadows can change and reflections can change and there's loads of types of grass and grass looks different depending on the angle you look from? Just label some data and chuck a neural network at it, no problemo.
If only.
Having been faced with the same problem in the real world:
1) There isn't a data bank of millions of images of cut / uncut grass
2) If there were, there's always the possibility of sample bias. E.g. all the cut photos happen to have been taken early in the day, of uncut late in the day, and we get a "time-of-day" detector. Sample bias is oddly common in vision data sets, and machine learning can look for very complex sample bias
3) With something like a lawnmower, you don't want it to kill people or run over flowerbeds. There can be actual damages. It's helpful to be able to understand and validate things.
Most machine vision algorithms I actually used in projects (small n) made zero use of neural networks, and 100% of classical algorithms I understand.
Right now, the best analogy to NLP is BERT. At that point, neural techniques were helpful for some tasks, and achieved stochastically interesting performance, but were well below the level of general uses, and 95% of what I wanted to do used classical NLP. IF I had a large data set AND could do transfer training from BERT AND didn't need things to work 100% of the time, BERT was great.
Systems like DALL-e and the reverse are moving us in the right direction. Once we're at GPT / Claude / etc.-level performance, life will be different, and there's a light at the end of the tunnel. For now, though, the ML machine is still a pretty limited way to go.
Think of it this way. What's cheaper:
1) A consulting project for a human expert in machine vision (tens or hundreds of thousands of dollars)
2) Hiring cheap contractors to build out a massive dataset of photos of grass (millions of dollars)
plaidfuji 17 days ago [-]
I don’t think people fully appreciate yet how much of LLMs’ value comes from their underlying dataset (I.e. the entire internet - probably .. quadrillions..? of tokens of text) rather than the model + compute itself.
If you’re trying to predict something within the manifold of data on the internet (which is incredibly vast, but not infinite), you will do very well with today’s LLMs. Building an internet-scale dataset for another problem domain is a monumental task, still with significant uncertainty about “how much is enough”.
People have been searching for the right analogy for “what type of company is Open AI most like?” I’ll suggest they’re like an oil company, but without the right to own oil fields. The internet is the field, the model is the refining process (which mostly yield the same output but with some variations - not dissimilar from petroleum products).. and the process / model is a significant asset. And today, Nvidia is the only manufacturer of refining equipment.
ramblenode 17 days ago [-]
This is an interesting analogy. Of course oil extraction and refining are very complex, but most of the value in that industry is simply the oil.
If you take the analogy further, while oil was necessary to jumpstart the petrochemical industry, biofuels and synthetic oil could potentially replace the natural stuff while keeping the rest of the value chain in tact (maybe not economical, but you get the idea). Is there a post-web source of data for LLMs once the well has been poisoned by bots? Maybe interactive chats?
michaelt 17 days ago [-]
> If only.
I will admit that "no problemo" made it sound easier than it actually is. But in the past I considered it literally impossible whereas these days I'm confident it is possible, using well known techniques.
> There isn't a data bank of millions of images of cut / uncut grass
True - but in my case I literally already had a robot lawnmower equipped with a camera. I could have captured a hundred thousand images pretty quickly if I'd known it was worth the effort.
> With something like a lawnmower, you don't want it to kill people or run over flowerbeds.
I agree - at the time I was actually exploring a hybrid approach which would have used landmarks for navigation when close enough to detect the landmarks precisely, and cut/uncut boundary detection for operating in the middle of large expanses of grass, where the landmarks are all distant. And a map for things like flowerbeds, and a LIDAR for obstacle tracking and safety.
So the scope of what I was aiming for was literally cut/uncut grass detection, not safety-of-life human detection :)
blagie 17 days ago [-]
Out of curiosity: Why would you need cut/uncut grass detection? If you have all the other stuff in place, what's the incremental value-add? It seems like you should be able to cut on a regular schedule, or if you really want to be fancy, predict how much grass has grown since you last cut it from things like the weather.
michaelt 17 days ago [-]
I wanted to steer the mower along the cut/uncut grass boundary, just like a human operator does. Image segmentation into cut/uncut grass would be the input to a steering control feedback loop - much like lane-following cruise control.
I hoped by doing so I could produce respectable results without the need to spend $$$$$ on a dual-frequency RTK GPS & IMU system.
nickserv 15 days ago [-]
Out of curiosity, since you already have lidar on the machine, why not use it to detect grass height?
krisoft 17 days ago [-]
> What's cheaper
If you don’t have the second how can you trust the first? Without the dataset to test on your human experts will deliver you slop and be confident about it. And you will only realise the many ways their hand finessed algorithms fail once you are trying to field the algorithm.
> With something like a lawnmower, you don't want it to kill people or run over flowerbeds.
Best to not mix concerns though. Not killing people with an automatic lawnmover is about the right mechanical design, appropriately selected slow speed, and bumper sensors. None of this is an AI problem. We don’t have to throw out good engineering practices just because the product uses AI somewhere. It is not an all or nothing thing.
The flowerbed avoidance question might or might not be an AI problem depending on design decisions.
> Hiring cheap contractors to build out a massive dataset of photos of grass (millions of dollars)
I think that you are over estimating the effort here. The database doesn’t have to be so huge. Transfer learning and similar techniques reduced the data requirements by a lot. If all you want is a grass height detector you can place stationary cameras in your garden, collect a bunch of data and automatically label them based on when you moved the grass. That will obviously only generalise to your garden, but if this is only a hobby project maybe that is all you want? If this is a product you intend to sell for the general public then of course you need access to a lot of different gardens to test it on. But that is just the nature of product testing anyway.
michaelt 17 days ago [-]
> If you don’t have the second how can you trust the first? Without the dataset to test on your human experts will deliver you slop and be confident about it.
1. Test datasets can be a lot smaller than training datasets.
2. For tasks like image segmentation, having a human look at a candidate segmentation and give it a thumbs up or a thumbs down is much faster than having them draw out the segments themselves.
3. If labelling needs 20k images segmented at 1 minute per image but testing only needs 2k segmentation results checked at 5 seconds per image, you can just do the latter yourself in a few hours, no outsourcing required.
blagie 17 days ago [-]
> If you don’t have the second how can you trust the first? Without the dataset to test on your human experts will deliver you slop and be confident about it.
One of the key things is that if you don't understand how things work, your test dataset needs to be the world. A classical system can be analyzed, and you can pick a test dataset which maximally stresses it. You can also engineer environments where you know it will work, and 9 times out of 10, part of the use of classical machine vision in safety-critical systems is to understand the environments it works in, and to only use it in such environments.
Examples:
- Placing the trackball sensor inside of the mouse (or the analogue for a larger machine) allows the lighting and everything else to be 100% controlled
- If it's not 100% controlled, in an industrial environment, you can still have well-understood boundaries.
You test beyond those bounds, and you understand that it works there, and by interpolation, it's robust within the bounds. You can also analyze things like error margin since you know if an edge detection is near the threshold or has a lot of leeway around it.
One of the differences with neural networks is that you don't understand the failure modes, so it's hard to know the axes to test on. Some innocuous change in the background might throw it completely. You don't have really meaningful, robust measures of confidence, so you don't know if some minor change somewhere won't throw things. That means your test set needs to be many orders of magnitude bigger.
For nitpickers: You can do sensitivity analysis, look at how strongly things activate, or a dozen other things, but the keywords there were "robust" and "meaningful."
jvanderbot 16 days ago [-]
Not that you're wrong, but when faced with a similar problem, I got a lot of mileage out of telling an intern to try a network trained detect the boundary between the ground-based-potentially-tall features and not the feature (e.g., background and sky), and measuring the height from a low camera. Voila, tall areas and not-tall areas.
justmarc 17 days ago [-]
Funnily now with with the advent of GPS+RTK lawnmower robots, fancy AI is not even needed anymore. They follow a very exact, pre-determined patterns and paths, and do a great job.
michaelt 17 days ago [-]
Yeah GPS+RTK was what I went with in the end.
Didn't work as well as I'd hoped back in those days though, as you could lose carrier lock if you got too close to trees (or indeed buildings), and our target market was golf courses which tend to have a lot of trees. And in those days a dual-frequency RTK+IMU setup was $20k or more, which is expensive for a lawnmower.
justmarc 16 days ago [-]
No tool is perfect for every job. That said, the positioning of the RTK unit is crucial. Possibly look for a mower which can work with multiple RTK units, or reposition your existing one for better coverage.
I find that even though signals get significantly weaker under trees, mine still works wonderfully in a complex large garden scenario. It will depend on your exact unit/model, as well as their firmware and how it chooses to deal with these scenarios.
madaxe_again 17 days ago [-]
That, and you can now train the perfect lawnmower in an entirely virtual environment before dropping it into a physical body. You do your standard GAN thing, have a network that is dedicated to creating the gnarliest lawn mowing problems possible, bang through a few thousand generations of your models, and then hone the best of the best. There are some really astonishing examples that have been published this last year or so - like learning to control a hand from absolute first principles, and perfecting it.
This is all pretty much automated by nvidia’s toolkits, and you can do it cheaply on rented hardware before dropping your pretrained model into cheap kit - what a time to be alive.
blagie 17 days ago [-]
FYI: A comment like this one is more helpful with links. There's one below with a few. If you happen to read this, feel free to respond, or to hit "edit" and add them.
fakedang 16 days ago [-]
What is classical machine vision? For an image recognition problem, wouldn't you use a conventional neural network? (And I might be a bit outdated here, considering that CNNs were used for image recognition a decade ago).
krasin 17 days ago [-]
> What is the new breakthrough in robotics that is gpu driven ? There are subsets of the overall problem that can be solved by a gpu (eg object detection) but the whole planning and control algo scheme seems to be more or less the same as it has been for the past decades. These typically involve non-convex optimization so not much gpu benefit.
In the past two years two very important developments appeared around imitation learning and LLMs. Some starting points for this rabbit hole:
We've been here many times before. Imitation learning doesn't generalise and that makes it useless in practice.
Aloha is a great example of that. It's great for demos, like the one where their robot "cooked" (not really) one shrimp, but if you wanted to deploy it to real peoples' houses you'd have to train it for every task in every house over a few hours at a time. And "a task" is still at the level of "cook (not really) one shrimp". You want to cook (not really) noodles? It's a new task and you have to train it all over again from scratch. You want it to fold your laundry? OK but you need to train it on each piece of laundry you want it to fold, separately. You want it to put away the dishes? Without exaggeration you'd have to train it to handle each dish separately. You want it to pick up the dishes from the kitchen? Train for that. You want it to pick up the dishes from the living room? Train for that. And so on.
It sucks so much with miserable disappointment that it could bring on a new AI winter on its own, if Google was dumb enough to try and make it into a product and market it to people.
Robot maids and robot butlers are a long way away. Yeah but you can cook one shrimp (not really) with a few hours of teleoperation training in your kitchen only. Oh wow. We could never cook (not really) one shrimp before. I mean we could but this uses RL and so it's just one step from AGI.
It's nonsense on stilts.
krasin 16 days ago [-]
I generally agree with your analysis of the current state of art but strongly disagree with the overall conclusion of where it leads us.
I believe it will take on the order of 100M hours of training data of doing tasks in real world (so, not just Youtube videos), and much larger models than we have now to make general-purpose robotics working, but I also believe that this will happen.
I've saved your comment to my favorites and hope to revisit it in 10 years.
YeGoblynQueenne 16 days ago [-]
Thanks, that'll be interesting :)
pseudosudoer 17 days ago [-]
There are search spaces that are quite large that are used in optimal control. GPUs can be used to drastically accelerate finding a solution.
As an example, imagine you are given a height map, a 2D discrete search space overlayed in the height map, 4 legs, and robot dynamics for every configuration of the legs in their constrained workspace. Find the optimal toe placement of the 4 legs. Although a GPU isn't designed exactly to deal with this sort of problem, if it's framed as a reduction problem it still significantly out performs a multi core CPU.
whatever1 17 days ago [-]
These are sparse systems, factorization is not a strength of the gpu architectures. Typically adding more cpu cores is a better investment rather than trying to parallelize it through gpu. Nvidia has been trying for some time to make progress with cuSparse etc, although not much has been achieved in the space.
Maybe they try a completely different approach with reinforcemnt learning and a ton of parallel simulations?
17 days ago [-]
XenophileJKO 17 days ago [-]
I think it is what nobody has answered yet.. virtualized training/testing. I watched a presentation by their research team. This is a HUGE force multiplier. Don't underestimate how much this changes robotic foundational model training.
It turns out you can take a vision language foundational model that has a broad understanding of visual and textual knowledge and fine tune it to output robot actions given a sequence of images and previous actions.
This approach beats all previous methods by a wide margin and transfers across tasks.
emn13 17 days ago [-]
The article lists 2: firstly, simply that ML models are now feasible at a scale they weren't only a few years ago. Secondly, compute power is now better enough that it can now simulate more realistic environments which enables sim-based (pre)training to work better. That second one is potentially particularly alluring to nvidia given how it plays on two of their unique strengths - AI and graphics.
mattlondon 17 days ago [-]
> What is the new breakthrough in robotics that is gpu driven ? There are subsets of the overall problem that can be solved by a gpu (eg object detection) but the whole planning and control algo scheme seems to be more or less the same as it has been for the past decades
I think the "object detection" goes quite far beyond the classic "objection detection" bounding boxes etc we're used to seeing. So not just a pair of x,y coords for the bounding box for e.g. a mug of coffee in the robot's field of view, but what is the orientation of the mug? where is the handle? If the handle is obscured, can we infer where it might be based on what we understand for what a mug typically looks like and plan our gripper motion towards it (and at 120hz etc)? Is it a solid mug, or a paper cup (affects grip strength/pressure)? Etc etc. Then there is the whole thing about visually show the robot once what you are doing, and it automatically "programs" itself to repeat the tasks in a generalised way etc. Then you could probably spawn 100 startups just on hooking up a LLM to tell a robot what to do in a residential setting (make me a coffee, clear up the kitchen, take out the trash etc)
This has all been possible before of course, but could it be done "on device" in a power efficient way? I am guessing they are hoping to sell a billion or two chips + boards to be built directly into things to do so so that your next robotic vacuum or lawn mower or whatever will be able to respond to you yelling at it and not mangle your pets/small children in the process.
I eagerly await the day when I have a plug and play robot platform that can tell the difference between my young children and a fox, and attack the fox shitting/shredding something small and fluffy in the garden but ignore the kids
amelius 17 days ago [-]
The fun thing with DL is that you don't have to optimize stuff with complicated math. You just train it, and it will generate solutions. Maybe not the perfect solutions, but don't let perfect be the enemy of good.
mmmore 17 days ago [-]
My understanding is that it's in vogue to use deep learning for complex control problems, and the results are fairly impressive. The idea is to train robotic motion end to end with RL. Not an expert so I don't know the strength and weaknesses versus classical approaches.
Unstructured Sensory input driven by these large neural networks, if I had to guess.
To be able to visually determine weight, texture, and how durable something is can be done with those systems so long as we have a training set.
jvanderbot 16 days ago [-]
I've been doing this for 10+ years, and have seen GPU-based calculations slowly eat away at the following problems:
* Mapping. Nowadays generating a dense grid of costs can be done insanely fast on GPU. There's just no excuse to not use a GPU on every robot so it can build a fast map, unless you move at snail speed.
* Computer Vision. Classical depth mapping is best done on a GPU. Classical computer vision object detection has fallen away to the rise in ML-based CV for segmentation. Some (IMHO) overzealous practitioners are trying to eat away at estimation and tracking, which IMHO will recede a little since there was nothing wrong with the estimators (just Bayesian stats) to begin with, it was always the measurements. Still, for detection (and sometimes association), ML on GPU is the way to go and that will very likely not change. It has gotten so good that you can get away without using other sensors and just deploying a vision system (though I don't recommend it, but this is what Tesla does). This is an obvious case for one (or one more) GPU on every robot.
* Planning - End to end planning is eating traditional planning now, similar to CV. There are some areas where this is an obvious win (e.g., complex manipulation tasks), and some areas where some overzealous overreach is happening (e.g., simpler planning tasks like routing). But ML on GPUs is here to stay for all planning tasks, especially when estimating costs from complex data, even if a classical planner uses those costs. And I'd be remiss if I didn't mention policy-based planning, which does a huge amount of training to generate essentially a fast lookup table for actions. Deployment of these types of planners often requires a very good estimator to determine what state you are in - and this is a great area for ML, mapping real world messy data to a clean state lookup. I think this can typically be done without a GPU, due to training prior to deployment, but if you have a GPU already (see prior two), you will find this is a good use of it.
* Low-level planning / Controls - Shares a small overlap with above, but mostly concerned with fast responses to transient data and stabilizing the system. I've heard, but not seen directly, that learned policies are coming into vogue here. But regardless, it is a common thread that a network can assist with estimating costs and states to allow a traditional controls system to operate more reliably. I doubt this will necessitate a GPU, but like above, will gladly use it if required and available.
To add to this, consider that we're generally not talking about discrete, gaming-type GPUs, we're talking about purpose built robotics-targeted embedded systems that speak native CUDA. The Jetson family, in particular.
imtringued 17 days ago [-]
Actually, it is quadratic programming that is big in robotics. QP is powerful enough that you can formulate your task, but also fast enough that you can run it in the control loop in real time.
iancmceachern 17 days ago [-]
It's the edge computing.
Similar to autonomous vehicles, doing complex multi sensor things very quickly.
Surgical robotics is a great example, lots of cool use cases coming out in that field.
ram_rattle 17 days ago [-]
I personally think apart from GPU and compute for intelligence for meaningful robotics to take off we still have lot of things to crack like better battery, better affordable sensors, microelectronics etc, I'm pretty sure we will get there but I don't think one company can do it.
01100011 17 days ago [-]
Better battery isn't really an issue for factories. Same with sensors if you're saving the cost of employing a human, especially for dangerous work.
michaelt 17 days ago [-]
True - and of course factories don't mind if a robot costs $40,000 if the payback time is right.
But factory robots haven't propelled Kuka, Fanuc, ABB, UR, Staubli and peers to anything like the levels of success nvidia is already at. A market big enough to accommodate several profitable companies with market caps in the tens of billions might not drive much growth for a company with a trillion-dollar market cap.
nvidia has several irons in the fire here. Industrial robot? Self-driving car? Creepy humanoid robots? Experimental academic robots? Whatever your needs are, nvidia is ready with a GPU, some software, and some tutorials on the basics.
Onavo 17 days ago [-]
> But factory robots haven't propelled Kuka, Fanuc, ABB, UR, Staubli and peers to anything like the levels of success nvidia is already at. A market big enough to accommodate several profitable companies with market caps in the tens of billions might not drive much growth for a company with a trillion-dollar market cap.
That's because the past year of robotics advancements (e.g. https://www.physicalintelligence.company/blog/pi0, https://arxiv.org/abs/2412.13196) has been driven by advances in machine learning and multimodal foundation models. There has been very little change in the actual electronics and mechanical engineering of robotics. So it's no surprise that the traditional hardware leaders like Kuka and ABB are not seeing massive gains so far. I suspect they might get the Tesla treatment soon when the Chinese competitors like unitree start muscling into the humanoid robotics space.
Robotics advancements are now AI driven and software defined. It turned out that adding a camera and tying a big foundation model to a traditional robot is all you need. Wall-E is now experiencing the ImageNet moment.
michaelt 17 days ago [-]
> There has been very little change in the actual electronics and mechanical engineering of robotics. So it's no surprise that the traditional hardware leaders like Kuka and ABB are not seeing massive gains so far.
Perhaps I wasn't explicit enough about the argument I was trying to make.
Revenue in business is about selling price multiplied by sales volumes, and I'm not sure factory robot sales volumes are big enough to 'drive future growth' for nvidia.
According to [1] there were 553,000 robots installed in factories in 2023. Even if every single one of those half a million robots needed a $2000 GPU that's only $1.1 billion in revenue. Meanwhile nvidia had revenue of 26 billion in 2023, and 61 billion in 2024.
Many of those robots will be doing basic, routine things that don't need complex vision systems. And 54% of those half a billion robot arms were sold in China - sanctions [2] mean nvidia can't export even the 4090 to China, let alone anything more expensive. Machine vision models are considered 'huge' if they reach half a gigabyte - industrial robots might not need the huge GPUs that LLMs call for.
So it's not clear nvidia can increase the price per GPU to compensate for the limited sales volumes.
If nvidia wants robotics to 'drive future growth' they need a bigger market than just factory automation.
You are forgetting that the "traditional" factory robots are the way they are because of software limitations. Now that the foundation models have mostly solved basic robotic limitations, there's going to be a lot more automation (and job layoffs). Your traditional factory robotics are dumb and mostly static. They are mostly robotic arms or other type of conveyor belt centric automation. The new generation of VLM enabled ones offers near-human levels of flexibility. Actual android type robotics will massively increase demand for GPUs, and this is not even accounting for non-heavy industry use cases in the service industry e.g. cleaning toilets, folding clothing at a hotel. They are already being done by telepresence, full AI automation is just the next step. Here's an example from a quick google:
Factories don’t mind if the robot costs $4,000,000 or even $40,000,000 I really don’t think people understand how much an industrial robots from the likes of KUKA cost…
michaelt 17 days ago [-]
I agree that you can get to some big cost figures if you're talking about a full work cell with multiple robots, conveyors, end effectors, fancy sensors, high-tech safety systems, and staff costs.
But if you're just buying the arm itself? There are quality robot arms, like the €38,928 UR10e [1], that are within reach of SMEs. No multi-million-dollar budget required.
It seems such costs would become prohibitive quite quickly? Stuff with moving parts breaks, and I'd expect ongoing maintenance costs to be proportional to the unit cost. Pair in the fact that most factories run on thin margins but massive volume, and it would seem cost is very much an issue.
rafaelmn 17 days ago [-]
I think it's more about how much of that 40m$ is nvidia and how many units can you deploy ?
blueboo 17 days ago [-]
It’s hard to say when we’re still looking for a first real household robot. But a car-priced (60k?) housekeeper bot will be very popular.
And those duties can be achieved with today’s mechanics — they just need good control, which is now seeing ferocious progress
EarthIsHome 17 days ago [-]
The ChatGPT, LLMs, generative AI, and other hyped usecases have been the driving force for Nvidia: it injected huge sums of money into their R&D, which also stimulated the economy as developers ran to build build build in order to keep up with the demand for datacenters, which in turn required more infrastructure building to satiate the thirst and power needs of datacenters, etc. Before, ChatGPT, I recall the hype was blockchain, crypto, and NFTs; and maybe before that, it was "big data."
As the LLM, generative AI, etc. bubble begins to deflate due to investors and companies finding it hard to make profits from those AI usecases, Nvidia needs to pivot. This article indicates that Nvidia is hedging on robotics as the next driving force that will continue to sustain the massive interest in their products. Personally, I don't see how robotics can maintain that same driving force for their products, and investors will find it hard to squeeze profit out of it, and they'll be back to searching for another hype. It's like Nvidia is trying to create a market to justify their products and continued development, similar to what Meta has tried, to spectacular failure, with the Metaverse for their virtual products.
After the frenzy that sustained these compute products transitioned from big data, to crypto, and now, to AI, I'm curious what the next jump will be; I don't think the "physical AI" space of robotics can sustain Nvidia in the way that they're hoping.
infecto 16 days ago [-]
The part that is hard for me to parse is there is hype but there is also a significant amount of value being extracted by using LLMs and other products coming from this new wave. Everytime I read opinions like yours it’s hard to make sense of it because there is value in the tooling that exists. It cannot be applied to everything and anything but it does exist.
gmays 16 days ago [-]
Comparing AI to crypto doesn't really work due to the utility of AI. If you believe that there haven't been meaningful use cases from the recent generative AI surge, then you might be out of touch.
On the investment side, it's hard to say that since ROIC is still generally up and to the right. As long as that continues, so will investment.
Then biggest gap I see is expected if you look at past trends like mobile and the internet: In the first wave of new tech there's a lot of trying to do the old things in the new way, which often fails or gives incremental improvements at best.
This is why the 'new' companies seem to be doing the best. I've been shocked at so many new AI startups generating millions in revenue so quickly (billions with OpenAI, but that's a special case). It's because they're not shackled to past products, business models, etc.
However, there are plenty of enterprise companies trying to integrate AI into existing workflows and failing miserably. Just like when they tried to retrofit factories with electricity. It's not just plug and play in most cases, you need new workflows, etc. That will take years and there will be plenty more failures.
The level of investment is staggering though, and might we see a crash at some point? Maybe, but likely not for a while since there's still so much white space. The hardest thing with new technologies like this is not to confuse the limits of our imagination with the limits of reality (and that goes both ways).
I have worked with Jetson Orin platform, and honestly Nvidia has something that is really easy to work with there. The Jetsons are basically a full GPU (plus some stuff) at very low power. If I were tasked with building a robot it would likely be the first place I look.
leetrout 17 days ago [-]
They are OK. If you need advanced vision - yes, because CUDA.
But off the shelf mini PCs are much more user friendly for existing software IME.
Thankfully ARM being so wide spread and continuing to grow this wont matter as much.
blihp 17 days ago [-]
Maybe you've had a different experience with GPU drivers on ARM for Linux than most of the rest of us? (i.e. it's the fact that nVidia actually has Linux support on ARM that is the real appeal)
talldayo 17 days ago [-]
> But off the shelf mini PCs are much more user friendly for existing software IME.
I'd love you to point me in the direction of an off-the-shelf mini PC that has 64gb of addressable memory and CUDA support.
adrian_b 16 days ago [-]
Off-the-shelf mini-PCs with 64 GB of addressable memory and reasonably powerful integrated GPUs, i.e. faster than the smaller Ampere GPUs of the cheaper NVIDIA Orin models, are plenty.
On the other hand, if you force the CUDA support condition and any automatic translation of CUDA programs is not accepted as good enough, then this mandates the use of a discrete NVIDIA GPU, which can be provided only by a mini-ITX mini-PC.
There are mini-ITX boards with laptop Ryzen 7940HX or 7945HX CPUs, at prices between $400 and $550. To such a board you must add 64 GB of DRAM, e.g. @ $175, and a GPU, e.g. a RTX 4060 at slightly more than $300.
Without a discrete GPU, a case for a mini-ITX motherboard has a volume of only 2.5 liter. With a discrete GPU like RTX 4060, the volume of the case must increase to 5 liter (for cases with PCIe extenders, which allow a smaller volume than typical mini-ITX cases).
So your CUDA condition still allows what can be considered an off-the-shelf mini-PC, but mandating CUDA raises the volume from the 0.5 L of a NUC-like mini-PC to 5 L and the price is also raised 2 or 3 times.
This of course unless you choose an Orin for CUDA support, but that will not give you 64 GB of DRAM, because NVIDIA has never provided enough memory in any of their products, unless you accept to pay a huge overprice.
leetrout 17 days ago [-]
Now if we could get a robotics platform like ROS that actually cares about modern dev patterns and practices from dev's slapping keyboards through production deployment with decent smoke tests, easy versioned artifacts and no need to understand linux packaging details...
Coming from web / app dev this was my very least favorite part of working on the software side of robotics with ROS.
alephnerd 17 days ago [-]
> Coming from web / app dev this was my very least favorite part of working on the software side of robotics with ROS
To be brutally honest, you aren't the primary persona in the robotics space.
If you have limited resources (as any organization does), the PM for DevEx will target customers with the best "bang-for-buck" from a developer effort to revenue standpoint.
Most purchasers and users in the robotics and hardware space tend to be experienced players in the hardware, aerospace, and MechE world, which has different patterns and priorities from a purely software world.
If there is a case to be made that there is a significant untapped market, it makes sense for someone like you to go it on your own and create an alternate offering via your own startup.
veunes 19 days ago [-]
Robotics has long been an area of promise but (I think) limited returns
Animats 17 days ago [-]
Yes. I've known people with robotics startups, and have visited some of them. They're all gone now. But that was all prior to about 2015.
Robots are a branch of industrial manufacturing machinery. That is not, historically, a high-margin business. It also demands high reliability and long machine life.
Interestingly, there's a trend towards renting robots by the working hour. It's a service - the robot company comes in, sets up robot workers, services them as needed, and monitors them remotely. The robot company gets paid for each operating hour. Pricing is somewhat below what humans cost.[1]
Having been involved in similar financial arrangements in software automation, years ago, it makes sense.
The end user usually doesn't have the expertise to even maintain the systems, nor does it make sense for them to do it in-house.
Charging per item of work (operating hour or thing processed) allows use of consultants but keeps incentives aligned between all parties (maximize uptime/productivity).
rapsey 17 days ago [-]
> Yes. I've known people with robotics startups, and have visited some of them. They're all gone now. But that was all prior to about 2015.
Lots of dotcom busts in the late 90s were concepts that worked 10-15 years later. We just did not have broadband and smartphones. Battery and AI tech is quite likely to be the missing piece robotics lacked in the past.
alephnerd 17 days ago [-]
> Battery and AI tech is quite likely to be the missing piece robotics lacked in the past.
Cheap semiconductors as well.
Fabricating a chip on a 28nm and 48nm process is extremely commodified nowadays. These are the same processes used to fabricate an Nvidia Tesla or an i7 or Xeon barely a decade ago, so the raw compute power available at extremely commodified prices is insane.
Just about every regional power has the ability to fabricate an Intel i7 or Nvidia Tesla equivalent nowadays.
And most regional powers have 3-7 year plans to build domestic 14nm fabrication capacity as well now. A number of firms like Taiwan's PSMC have made a killing selling the end-to-end IP and workflow for fabrication.
ksec 17 days ago [-]
That is interesting. I assume robots here means something close to humanoid for rent, and be programmed to take some or most of the human's job and not robots in terms of industrial manufacturing machinery?
e_y_ 17 days ago [-]
They're industrial robot arms, not humanoids, although the concept of android workers getting paid an hourly fee or "wage" (going to their masters, an android rental corporation) would be fascinating.
bregma 17 days ago [-]
Even just a robot arm with an appropriate sensors and hand attachment could replace human employees in the world's oldest profession. Consider what drove the video industry if you're looking to invest.
Animats 17 days ago [-]
Robot, as used here, is, in more modern forms, one or more arms with a vision system. Or some kind of mobile base for moving things around.
petra 17 days ago [-]
Given that robot-as-a-service removes the biggest barriers for companies not buying robots, why aren't we seeing a huge growth in "employed" robots?
Animats 16 days ago [-]
Because the setup is still too complicated.
This will probably take off once Amazon finally gets robots that can do unboxing, picking, and boxing. They've been trying for years to get that to work. Amazon already has robots doing most of the lifting and carrying, but people still handle each item.
People have been trying to do bin picking fulfillment with robots since the 1980s. Swisslog, Brightpick, and Universal Robotics have all demoed this, but so far it's not working well enough to take over. It's getting close, though.
contingencies 17 days ago [-]
Robotics is everywhere but you don't see it. The joke is we call it something else when it works. Large corporations with successful margin-supporting automation systems have every intent and reason to keep them secret. See for example ASML.
JFingleton 17 days ago [-]
Here in the UK there's been a boom in robot delivery over the years:
Care to elaborate? I feel the. the real power of AI will be unlocked when AI can sense and interact with the world.
malux85 17 days ago [-]
Yeah but the limiting factor for ages has been software and batteries, both of which have been improving a lot in the last 5-8 years.
peppertree 17 days ago [-]
After using FSD 13 for 2 weeks I'm convinced we are close to solving self driving. Too bad the everyone lost interest and now robotics is the hot new thing.
bobsomers 17 days ago [-]
As someone who worked in V&V for AV systems for a decade, it’s exactly the kind of thinking displayed here that has held back real assessment of AV safety for years.
There is absolutely no meaningful signal about a system’s safety that can be derived from one person using a system for two weeks.
At best it can only demonstrate that a system is wildly unsafe.
There is a very large chasm of 9s between one person being able to detect an unsafe system in two weeks of use and actually having a truly safe system.
n144q 17 days ago [-]
And it only takes a (near) accident in 5 more minutes' driving to completely negate that.
Your observation from this short time window isn't enough to prove the usefulness of something as serious as life and death.
sangeeth96 17 days ago [-]
I’m not sure if you’re generalizing to a specific region in your assessment but regardless, I doubt this is anywhere close to a solved problem given the crashes/incidents (so far) still associated with the tech and the dependencies IIRC on street signs and other markers.
re: region, I’d like to see it take on more challenging conditions, like in India for example where things are chaotic even for human drivers. I doubt that it’ll survive over here.
krisoft 17 days ago [-]
> Too bad the everyone lost interest and now robotics is the hot new thing.
Self driving is robotics. Simple as that.
amelius 17 days ago [-]
Quite simple robotics actually. Especially if you use Lidar. Basically IF (object present) THEN (do not go there) style of simplicity. Of course in reality there are lots of cases to consider, but each one of these cases is not rocket science.
Building a robot that can cook or fold a t-shirt, for example, is much harder.
cbsks 17 days ago [-]
Note that Nvidia is also working on self driving. The Jetson robotics platform is based on the same SoC as the DRIVE platform, but is a separate product.
mhh__ 17 days ago [-]
Although the idea of self driving is obviously cool I think it's good that robotics take priority (if such a thing is possible) e.g. think of it like the invention of the washing machine as a liberating force on the world.
myvoiceismypass 17 days ago [-]
Have you been a passenger in a Waymo? My only ride felt safer than every uber / Lyft driver I have ever had pretty much, so wondering how it compares to a beta thing you have to be able to take over in an instant.
hackcasual 17 days ago [-]
Last time I was in SF I took 3 waymo rides and attempted a fourth. The attempted one was cancelled after 15 minutes of waiting for it being 2 minutes away. As best as I can tell, the waymo was stuck at an intersection where power had been lost and didn't understand it needed to treat it like a 4 way stop.
2 rides went fine though neither was particularly challenging. The third though the car decided to head down a narrow side street where a pickup in front was partially blocking the road making a dropoff. There was enough space to just squeeze by and it was clear the truck expected the car to. A few cars turned in behind the waymo, effectively trapping it in as it didn't know how to proceed. The dropoff eventually completed and it was able to pull forward
seydor 17 days ago [-]
Cars are robots without arms
mdorazio 17 days ago [-]
Waymo already solved self driving years ago. Tesla still has a long way to go.
e_y_ 16 days ago [-]
Waymo is pretty good (but not perfect) as far as safety, but there's too many ways it can get stuck. Including vandalism from humans like "coning". And if a significant number of them are on the road, it could gum up traffic when that happens.
I still think it'll do well because even if you need to hire 1 person to remotely monitor every 10 cars (I doubt Waymo has anywhere near that many support staff) it's still better than having to pay 10 drivers who may or may not actually be good at driving. But to really take over they'll need to be much more independent.
ksec 17 days ago [-]
>n February 2024, a driverless Waymo robotaxi struck a cyclist in San Francisco.[132] Later that same month, Waymo issued recalls for 444 of its vehicles after two hit the same truck being towed on a highway
I am not entirely sure that is solved. And certainly not years ago. And it is only close in US where the data are trained. Doesn't mean it could be used in Japan ( where they are doing testing now ) driving on the different of the road with very different culture and traffics.
szvsw 17 days ago [-]
Citing a specific (tragic) incident isn’t really great evidence in re: safety. You have to normalize by something like accidents/mile driven and compare to comparable services (taxi/uber etc) - having said that I couldn’t quickly find any sources either positive or negative on those stats (besides Waymo PR docs) so I’m not saying you’re necessarily wrong. just wanted to point out the obvious flaw with citing anecdotal evidence for something like this.
You could easily use the same logic to say humans haven’t solved driving yet either!
vasco 17 days ago [-]
A big part of me believes the only extra safety they give is they drive much slower. This in itself might be the solution for human deaths on the road.
Dig1t 17 days ago [-]
Crashes per mile is multiple times lower than the human rate for both Waymo and Tesla. If your definition of solved is that there will be 0 collisions ever then the problem will never be solved. But if we have a system that is much better at driving than most humans, I think that qualifies it as good enough to start using.
echelon 17 days ago [-]
Is there anyone even close to Waymo in this game? Is Waymo going to own the entire market?
05 17 days ago [-]
Baidu Apollo. They also have a commercial fleet without in-car safety drivers (they use remote operators for real time monitoring, though, so hard to say how hands-off it really is)
IncreasePosts 17 days ago [-]
Why would waymo own the entire market? Sure, they might be the first ones there, but every year recreating what is "good enough" should be cheaper and cheaper.
krupan 17 days ago [-]
Cruise is close to Wayno, but nobody is willing to invest in Cruise anymore
AlotOfReading 17 days ago [-]
Cruise doesn't exist anymore (or rather shortly won't). The teams are being folded into GM to work on other things.
rapsey 17 days ago [-]
Waymo works in US grid cities on highly modified cars. I know people love hating Musk, but it is still very much up in the air if Waymo will be a better solution than what Tesla or Wayve is doing.
creer 17 days ago [-]
What does "grid" have to do with anything at this point? Mapping was done a million years ago and try and see if "grid" helps you understand the lane and traffic light system in San Francisco (which tourists need to figure out in real time - they are hard enough on the locals.)
rapsey 17 days ago [-]
Nice easy intersections. Wide two way streets. Put a waymo in Rome and I will be impressed.
sbuttgereit 17 days ago [-]
I ride Waymos in San Francisco that traverse longer twisting "two-way" roads in the San Francisco hills (look at the neighborhoods around Mount Davidson). In these cases, the road, while two way, more often than not only has space which allows a single car to pass at a time; the rest of the space is taken by cars parked on either side of the roadway. The Waymo cars. at least during my rides, handled these situations well.
While it's not Rome, the operating areas for Waymo, at least in San Francisco, are not all grids of modern wide streets either.
travisporter 15 days ago [-]
That’s not gonna impress me either. There were zoox cars on Lombard st in sf I think. Windy streets are not the challenge. Putting your money where your mouth is - that’s the challenge.
creer 15 days ago [-]
That seems to be thing now: get some deployed mass out there. And at this point it seems that each new city is still a significant investment (well within Waymo but still).
I'm still puzzled on why Waymo insists on not having any remote driving or any remote advising cars on where/how to get themselves out of a situation. Yes that would cost a little more - but this is early stages so it's just a little more money at this point (lol) - in exchange for avoiding embarassing PR bullshit about cars self-honking at each other or rides stuck in infinite hesitation loop or not knowing what to do when there is a traffic cone on the hood. I haven't seen any convincing arguments for not having that. Anyone heard a legitimately good tech or liability reason? I doubt I would have missed it but...
doublepg23 17 days ago [-]
Isn't that the difference though? I've never even seen a Waymo and I've been successfully driven by Teslas many times.
lm28469 17 days ago [-]
In a very small subset of cities, road conditions, weather condition, &c. Basically US grid cities with 300 days of sun per year
karlgkk 17 days ago [-]
That’s not the limiting factor, fwiw. It’s an operations problem for them at this point. Freeways are a big contentious point as well.
lm28469 15 days ago [-]
It 100% is a limiting factor, weather conditions and crazy roads/poor infra will definitely impact self driving cars, idk how it could not be a factor... Go drive in eastern Europe after a snow storm, you miss one snow covered sign and end up on the wrong side of a high speed road, &c.
It's like learning to code in JS on a 2024 MacBook pro and thinking you can "just" transfer your skills to cobol on 1970s hardware because both are "programming"
karlgkk 12 days ago [-]
Extreme weather is a problem, as is snow. No doubt.
I’m simply talking about “300 days of sun” as being the limiting factor. You extrapolated the rest.
17 days ago [-]
17 days ago [-]
akutlay 17 days ago [-]
I just finished reading Daron Acemoglu and Simon Johnson’s book “Power and Progress” where they talk about how the leaders in the technology space is (unfortunately) able to set the direction of the technology according to their goals, not humanity’s goals. This is an excellent example of such power. NVIDIA wants to expand its business and pushes the industry to use more and more AI, which highly depends on their cards. Now all the VCs put billions of dollars towards this goal, thousands of Phds spend all their time, and companies change direction of business to catch the AI hype. Not necessarily because we decided this is the best for humanity, just because it’s the best for NVIDIA.
scottLobster 17 days ago [-]
Sounds like NVIDIA doesn't know what the hell the future is going to look like but hopes it's something to do with robotics, and is taking some the boatloads of money from the past few years to build out product lines for every conceivable robotics need. Good for them I guess.
The article references a "ChatGPT moment" for physical robotics, but honestly I think the Chat GPT moment has kind of come and gone, and the world still runs largely as it ever did. Probably not the best analogy, unless they're just talking about buckets of VC money flowing into the space to fund lots of bad ideas, which would be good for NVIDIA financially.
As an admitted non-expert in this field, I guess the one thing that really annoys me about articles like this is the lack of a concrete vision. It's like Boston Dynamics and their dancing robots, which while impressive, haven't really amounted to much outside of the lab. The last thing I remember reading was a military prototype to carry stuff for infantry that ended up being turned down because it was too loud.
The article even confirms this general perspective, ending with "As of right now, we don’t have very effective tools for verifying the safety and reliability properties of machine learning systems, especially in robotics. This is a major open scientific question in the field,” said Rosen."
So whatever robot you're developing is incredibly complex, to be trusted with heavy machinery or around consumers directly, while being neither verifiably safe nor reliable.
Sorry, but almost everything in this article sounds like a projection of AI-hype onto physical robotics, with all the veracity of "this is good for Bitcoin". Sounds like NVIDIA is doing right by its shareholders though.
dankobgd 17 days ago [-]
took them 16 years to fix night light bug in the driver but yeah robots are the future
17 days ago [-]
asadalt 17 days ago [-]
if they really want to bet on robotics, I want them to release a $10 variant of jetson board.
jononor 17 days ago [-]
For 10 USD you get an ESP32S3 board, which can do basic computer vision tasks. For example using OpenMV or emlearn-micropython. For 15-20 USD you can get a board that includes an OV2640 camera. Examples would be XIAO ESP32S3 Sense, LilyGo T Camera S3 or "ESP32-S3-CAM" board from misc manufacturers.
asadalt 16 days ago [-]
yes that’s what i am working on these days but there is a need for a generally available neural chip (see google’s coral as one attempt). in my tests, esp32s3 is very very slow for any model with conv2d involved.
i just want a tiiiny gpu for $10 so i can run smaller models at higher speed than possible with xtensa/rp2040 having limited simd support etc.
jononor 16 days ago [-]
Are you utilizing the SIMD and acceleration instructions in the S3? What kind of performance are you seeing?
Neural accelerators are coming into MCUs. The just released STM32N6 is probably among the best. Alif with the U55/U85 has been out for a little while. Maxim MAX78000 has a CNN accelerator out for a couple of years. More will come in the next few years - though not from Nvidia any time soon.
abswest 12 days ago [-]
I'd love to hear more about your experience with Coral. Sounds like that'd be a good fit for a tiny GPU to run models with conv2d?
adrian_b 17 days ago [-]
A few weeks ago they have reduced the price of the Orin Nano development kit from $500 to $250, while also increasing a few of the performance limits that cripple it in comparison with the more expensive Orin models.
Previously it was far too overpriced for most uses (except for someone developing a certified automotive device), but at the new price and performance it has become competitive with the existing alternatives in the same $150 to $300 price range, which are based on Intel, AMD, MediaTek, Qualcomm or Rockchip CPUs.
bradfa 17 days ago [-]
When they reduced the price of the dev kit, they priced it below the low volume sales price of the cheapest Orin Nano 4GB module. Presumably the module prices go down when you buy in bulk but for small volumes it was (is still?) cheaper to buy the dev kit and throw away the carrier than to just buy the module. Granted the dev kits went out of stock pretty quick.
bfrog 17 days ago [-]
What breakthrough are people expecting in robotics I wonder?
cess11 17 days ago [-]
Since cops, guards and military officers are itching to get autonomous guns it's probably a reasonable move. The genocide of palestinians has showed that people operated gun drones aren't distance enough, the operators cost a lot in psych treatment and personnel churn.
Qiu_Zhanxuan 17 days ago [-]
Unironically, I think this is one of the "main benefits" that the disconnected people in places of power seems to covet. They won't have a human operator that will do their dirty jobs and potentially leaks the truth out of guilt.
ANewFormation 17 days ago [-]
I'm bearish on 'ai war' but this sounds like a huge positive if it comes to fruition because war would become more about two sides trying to kill each each sides leaders, instead of these leaders sending masses of doe eyed young people off to die for them.
If politicians had real skin in the game there'd be far less war.
cess11 17 days ago [-]
That's already being done. Saddam Hussein was quickly killed, and what followed? Same goes for Ghaddaffi. Israel has killed a lot of leaders and is clearly not satisfied with having done that.
What they want is border and population control that involves very few ordinary citizens, in large part in expectation of something like hundreds of millions of climate refugees. After having spent a couple of years killing and maiming poor people with almost nowhere to go you tend to need quite a bit of medical care and usually join the anti-war movement regardless if you got a college degree out of it or not.
I find it likely we'll see gun mounted robodog patrols along occidental borders within ten years from now, after having tested it on populations elsewhere.
Their bet is that AI will unlock robotics use and they don't want to be simply compute providers, they want to innovate on the whole chain, software, hardware, services, everything.
Their position is quite unique as their R&D is basically financed by their future competitors, they are making bank while going where the puck will be.
Because you are pooling computer resources across many more users you have better amortized cost than if you had a workstation on every home that was idle 99% of the time. Of course, latency and bandwidth to the cell tower is worse than wifi, but better than if the compute is done on a remote server.
I have no idea of which model will succeed for which use cases, but the idea is sound.
Again: which computation paradigm works best depends on the use case. It is not an all or nothing situation.
As much as there are market drivers to make the cloud attractive to businesses and legitimate reasons why cloud solutions are better than alternatives there is a real desire in a growing number of people to have a solution that they can tinker with but that also has a polished UI so they don't have to tinker with it when they don't want to.
I think there are many early innovators that fail in later stage growth because of this issue.
I'm still surprised they did not create an App Store for AI. Basically lock everything down and make developers pay a % of their revenue, Apple style.
At enterprise scale, locked down marketplaces don't work. They act as an forcing factor for larger organizations to build in house because no one wants vendor lock-in or to lose money via an arbitrage.
This is a major reason why you'll see large deals pushing for enhanced customization options or API parity, as larger customers have the ability to push back against vendor lock-in.
Furthermore, a relatively open market (eg. NGC) acts as a loss-leader by allowing a community to develop using a corporate standard, thus allowing you to build stickiness without directly impacting a customer's bottom-line
Fundamentally, a company driven by Enterprise revenue (eg. Nvidia) will have a different marketplace structure from a B2C product such as Apple's App Store where purchasers have little power.
But you can say exactly the same thing about large companies publishing (consumer apps) in the App Store. Why would they want vendor lock-in?
More tactically, excessive charging on marketplace pushes vendors away from selling on AWS Marketplace and makes them develop alternative deployment methods, which reduce the stickiness of AWS, as hyperscalers are commodified nowadays.
Motorola learned that the hard way 40 years ago when pushing excessively restrictive OEM and Partnership rules compared to IBM.
AWS is only as strong as it's Partnership ecosystem, as companies that are purchasing tend to use 80-90 different apps along with their cloud.
Basically, Enteprise Sales shows hallmarks of a Stag Hunt Game, so a mutually beneficial pricing strategy amongst vendors (AWS, AWS Partners such as Nvidia, MSP) is ideal.
If this was true, we'd have more mobile computing platforms. Large enterprises publish in Apple's AppStore.
Purchasers from Apple's App Store are primarily individual consumers. It is a B2C play.
Monetizing an Nvidia marketplace such as NGC would be foolhardy as the primary users/"purchasers" are organizations with budgets and procurement power. It is an Enterprise B2B player.
In enterprise sales, the power differential between (mid- and upper-market) customers and vendors is in the customer's favor, as they have significant buying power and thus a higher user acquisition cost. The upside is revenue is much higher, margins are better, and you can differentiate on product as commodification is difficult.
This is less so in consumer facing sales as customers have significantly weaker buying power, but conversely have a much lower user acquisition cost at scale. Hence, a growth-based GTM approach is critical, as you need customers in aggregate to truly unlock revenue at scale.
Also, I don't believe the balance of power is tipping towards business customers, as nvidia is basically the only relevant player.
If you think smartphone customers and server customers are evaluating hardware based on the same criteria, then why isn't Apple the leading datacenter hardware OEM?
> Large enterprises publish in Apple's AppStore.
Yeah? Where's my Pro Tools download on the App Store? Where's my Cinema4D download? Can I get Bitwig Studio from there? Hell, is iTerm2 or Hammerspoon even available there?
Large enterprises very explicitly don't publish on the MacOS App Store because it is a purely raw deal. If you're developing a cross-platform app (which most large enterprises do), then you've already solved all the problems the App Store offers to help with. It's a burdonsome tax for anyone that's not a helpless indie, and even the indies lack the negotiating power that makes the App Store profitable for certain enterprises.
Because Apple doesn't want to be in the B2B space.
I concluded that it couldn't be done with classical machine vision, and that this "neural network" nonsense wasn't going to catch on. Very slow, computationally inefficient, full of weirdos making grandiose claims about "artificial intelligence" without the results to back it up, and they couldn't even explain how their own stuff worked.
These days - you want to find the boundary between cut and uncut grass, even though lighting levels can change and cloud cover can change and shadows can change and reflections can change and there's loads of types of grass and grass looks different depending on the angle you look from? Just label some data and chuck a neural network at it, no problemo.
If only.
Having been faced with the same problem in the real world:
1) There isn't a data bank of millions of images of cut / uncut grass
2) If there were, there's always the possibility of sample bias. E.g. all the cut photos happen to have been taken early in the day, of uncut late in the day, and we get a "time-of-day" detector. Sample bias is oddly common in vision data sets, and machine learning can look for very complex sample bias
3) With something like a lawnmower, you don't want it to kill people or run over flowerbeds. There can be actual damages. It's helpful to be able to understand and validate things.
Most machine vision algorithms I actually used in projects (small n) made zero use of neural networks, and 100% of classical algorithms I understand.
Right now, the best analogy to NLP is BERT. At that point, neural techniques were helpful for some tasks, and achieved stochastically interesting performance, but were well below the level of general uses, and 95% of what I wanted to do used classical NLP. IF I had a large data set AND could do transfer training from BERT AND didn't need things to work 100% of the time, BERT was great.
Systems like DALL-e and the reverse are moving us in the right direction. Once we're at GPT / Claude / etc.-level performance, life will be different, and there's a light at the end of the tunnel. For now, though, the ML machine is still a pretty limited way to go.
Think of it this way. What's cheaper:
1) A consulting project for a human expert in machine vision (tens or hundreds of thousands of dollars)
2) Hiring cheap contractors to build out a massive dataset of photos of grass (millions of dollars)
If you’re trying to predict something within the manifold of data on the internet (which is incredibly vast, but not infinite), you will do very well with today’s LLMs. Building an internet-scale dataset for another problem domain is a monumental task, still with significant uncertainty about “how much is enough”.
People have been searching for the right analogy for “what type of company is Open AI most like?” I’ll suggest they’re like an oil company, but without the right to own oil fields. The internet is the field, the model is the refining process (which mostly yield the same output but with some variations - not dissimilar from petroleum products).. and the process / model is a significant asset. And today, Nvidia is the only manufacturer of refining equipment.
If you take the analogy further, while oil was necessary to jumpstart the petrochemical industry, biofuels and synthetic oil could potentially replace the natural stuff while keeping the rest of the value chain in tact (maybe not economical, but you get the idea). Is there a post-web source of data for LLMs once the well has been poisoned by bots? Maybe interactive chats?
I will admit that "no problemo" made it sound easier than it actually is. But in the past I considered it literally impossible whereas these days I'm confident it is possible, using well known techniques.
> There isn't a data bank of millions of images of cut / uncut grass
True - but in my case I literally already had a robot lawnmower equipped with a camera. I could have captured a hundred thousand images pretty quickly if I'd known it was worth the effort.
> With something like a lawnmower, you don't want it to kill people or run over flowerbeds.
I agree - at the time I was actually exploring a hybrid approach which would have used landmarks for navigation when close enough to detect the landmarks precisely, and cut/uncut boundary detection for operating in the middle of large expanses of grass, where the landmarks are all distant. And a map for things like flowerbeds, and a LIDAR for obstacle tracking and safety.
So the scope of what I was aiming for was literally cut/uncut grass detection, not safety-of-life human detection :)
I hoped by doing so I could produce respectable results without the need to spend $$$$$ on a dual-frequency RTK GPS & IMU system.
If you don’t have the second how can you trust the first? Without the dataset to test on your human experts will deliver you slop and be confident about it. And you will only realise the many ways their hand finessed algorithms fail once you are trying to field the algorithm.
> With something like a lawnmower, you don't want it to kill people or run over flowerbeds.
Best to not mix concerns though. Not killing people with an automatic lawnmover is about the right mechanical design, appropriately selected slow speed, and bumper sensors. None of this is an AI problem. We don’t have to throw out good engineering practices just because the product uses AI somewhere. It is not an all or nothing thing.
The flowerbed avoidance question might or might not be an AI problem depending on design decisions.
> Hiring cheap contractors to build out a massive dataset of photos of grass (millions of dollars)
I think that you are over estimating the effort here. The database doesn’t have to be so huge. Transfer learning and similar techniques reduced the data requirements by a lot. If all you want is a grass height detector you can place stationary cameras in your garden, collect a bunch of data and automatically label them based on when you moved the grass. That will obviously only generalise to your garden, but if this is only a hobby project maybe that is all you want? If this is a product you intend to sell for the general public then of course you need access to a lot of different gardens to test it on. But that is just the nature of product testing anyway.
1. Test datasets can be a lot smaller than training datasets.
2. For tasks like image segmentation, having a human look at a candidate segmentation and give it a thumbs up or a thumbs down is much faster than having them draw out the segments themselves.
3. If labelling needs 20k images segmented at 1 minute per image but testing only needs 2k segmentation results checked at 5 seconds per image, you can just do the latter yourself in a few hours, no outsourcing required.
One of the key things is that if you don't understand how things work, your test dataset needs to be the world. A classical system can be analyzed, and you can pick a test dataset which maximally stresses it. You can also engineer environments where you know it will work, and 9 times out of 10, part of the use of classical machine vision in safety-critical systems is to understand the environments it works in, and to only use it in such environments.
Examples:
- Placing the trackball sensor inside of the mouse (or the analogue for a larger machine) allows the lighting and everything else to be 100% controlled
- If it's not 100% controlled, in an industrial environment, you can still have well-understood boundaries.
You test beyond those bounds, and you understand that it works there, and by interpolation, it's robust within the bounds. You can also analyze things like error margin since you know if an edge detection is near the threshold or has a lot of leeway around it.
One of the differences with neural networks is that you don't understand the failure modes, so it's hard to know the axes to test on. Some innocuous change in the background might throw it completely. You don't have really meaningful, robust measures of confidence, so you don't know if some minor change somewhere won't throw things. That means your test set needs to be many orders of magnitude bigger.
For nitpickers: You can do sensitivity analysis, look at how strongly things activate, or a dozen other things, but the keywords there were "robust" and "meaningful."
Didn't work as well as I'd hoped back in those days though, as you could lose carrier lock if you got too close to trees (or indeed buildings), and our target market was golf courses which tend to have a lot of trees. And in those days a dual-frequency RTK+IMU setup was $20k or more, which is expensive for a lawnmower.
I find that even though signals get significantly weaker under trees, mine still works wonderfully in a complex large garden scenario. It will depend on your exact unit/model, as well as their firmware and how it chooses to deal with these scenarios.
This is all pretty much automated by nvidia’s toolkits, and you can do it cheaply on rented hardware before dropping your pretrained model into cheap kit - what a time to be alive.
In the past two years two very important developments appeared around imitation learning and LLMs. Some starting points for this rabbit hole:
1. HuggingFace LeRobot: https://github.com/huggingface/lerobot
2. ALOHA: https://aloha-2.github.io/
3. https://robotics-transformer2.github.io/
4. https://www.1x.tech/discover/1x-world-model
Aloha is a great example of that. It's great for demos, like the one where their robot "cooked" (not really) one shrimp, but if you wanted to deploy it to real peoples' houses you'd have to train it for every task in every house over a few hours at a time. And "a task" is still at the level of "cook (not really) one shrimp". You want to cook (not really) noodles? It's a new task and you have to train it all over again from scratch. You want it to fold your laundry? OK but you need to train it on each piece of laundry you want it to fold, separately. You want it to put away the dishes? Without exaggeration you'd have to train it to handle each dish separately. You want it to pick up the dishes from the kitchen? Train for that. You want it to pick up the dishes from the living room? Train for that. And so on.
It sucks so much with miserable disappointment that it could bring on a new AI winter on its own, if Google was dumb enough to try and make it into a product and market it to people.
Robot maids and robot butlers are a long way away. Yeah but you can cook one shrimp (not really) with a few hours of teleoperation training in your kitchen only. Oh wow. We could never cook (not really) one shrimp before. I mean we could but this uses RL and so it's just one step from AGI.
It's nonsense on stilts.
I believe it will take on the order of 100M hours of training data of doing tasks in real world (so, not just Youtube videos), and much larger models than we have now to make general-purpose robotics working, but I also believe that this will happen.
I've saved your comment to my favorites and hope to revisit it in 10 years.
As an example, imagine you are given a height map, a 2D discrete search space overlayed in the height map, 4 legs, and robot dynamics for every configuration of the legs in their constrained workspace. Find the optimal toe placement of the 4 legs. Although a GPU isn't designed exactly to deal with this sort of problem, if it's framed as a reduction problem it still significantly out performs a multi core CPU.
Maybe they try a completely different approach with reinforcemnt learning and a ton of parallel simulations?
https://arxiv.org/abs/2406.09246
It turns out you can take a vision language foundational model that has a broad understanding of visual and textual knowledge and fine tune it to output robot actions given a sequence of images and previous actions.
This approach beats all previous methods by a wide margin and transfers across tasks.
I think the "object detection" goes quite far beyond the classic "objection detection" bounding boxes etc we're used to seeing. So not just a pair of x,y coords for the bounding box for e.g. a mug of coffee in the robot's field of view, but what is the orientation of the mug? where is the handle? If the handle is obscured, can we infer where it might be based on what we understand for what a mug typically looks like and plan our gripper motion towards it (and at 120hz etc)? Is it a solid mug, or a paper cup (affects grip strength/pressure)? Etc etc. Then there is the whole thing about visually show the robot once what you are doing, and it automatically "programs" itself to repeat the tasks in a generalised way etc. Then you could probably spawn 100 startups just on hooking up a LLM to tell a robot what to do in a residential setting (make me a coffee, clear up the kitchen, take out the trash etc)
This has all been possible before of course, but could it be done "on device" in a power efficient way? I am guessing they are hoping to sell a billion or two chips + boards to be built directly into things to do so so that your next robotic vacuum or lawn mower or whatever will be able to respond to you yelling at it and not mangle your pets/small children in the process.
I eagerly await the day when I have a plug and play robot platform that can tell the difference between my young children and a fox, and attack the fox shitting/shredding something small and fluffy in the garden but ignore the kids
https://blogs.nvidia.com/blog/eureka-robotics-research/
https://arxiv.org/abs/2108.10470
To be able to visually determine weight, texture, and how durable something is can be done with those systems so long as we have a training set.
* Mapping. Nowadays generating a dense grid of costs can be done insanely fast on GPU. There's just no excuse to not use a GPU on every robot so it can build a fast map, unless you move at snail speed.
* Computer Vision. Classical depth mapping is best done on a GPU. Classical computer vision object detection has fallen away to the rise in ML-based CV for segmentation. Some (IMHO) overzealous practitioners are trying to eat away at estimation and tracking, which IMHO will recede a little since there was nothing wrong with the estimators (just Bayesian stats) to begin with, it was always the measurements. Still, for detection (and sometimes association), ML on GPU is the way to go and that will very likely not change. It has gotten so good that you can get away without using other sensors and just deploying a vision system (though I don't recommend it, but this is what Tesla does). This is an obvious case for one (or one more) GPU on every robot.
* Planning - End to end planning is eating traditional planning now, similar to CV. There are some areas where this is an obvious win (e.g., complex manipulation tasks), and some areas where some overzealous overreach is happening (e.g., simpler planning tasks like routing). But ML on GPUs is here to stay for all planning tasks, especially when estimating costs from complex data, even if a classical planner uses those costs. And I'd be remiss if I didn't mention policy-based planning, which does a huge amount of training to generate essentially a fast lookup table for actions. Deployment of these types of planners often requires a very good estimator to determine what state you are in - and this is a great area for ML, mapping real world messy data to a clean state lookup. I think this can typically be done without a GPU, due to training prior to deployment, but if you have a GPU already (see prior two), you will find this is a good use of it.
* Low-level planning / Controls - Shares a small overlap with above, but mostly concerned with fast responses to transient data and stabilizing the system. I've heard, but not seen directly, that learned policies are coming into vogue here. But regardless, it is a common thread that a network can assist with estimating costs and states to allow a traditional controls system to operate more reliably. I doubt this will necessitate a GPU, but like above, will gladly use it if required and available.
To add to this, consider that we're generally not talking about discrete, gaming-type GPUs, we're talking about purpose built robotics-targeted embedded systems that speak native CUDA. The Jetson family, in particular.
Similar to autonomous vehicles, doing complex multi sensor things very quickly.
Surgical robotics is a great example, lots of cool use cases coming out in that field.
But factory robots haven't propelled Kuka, Fanuc, ABB, UR, Staubli and peers to anything like the levels of success nvidia is already at. A market big enough to accommodate several profitable companies with market caps in the tens of billions might not drive much growth for a company with a trillion-dollar market cap.
nvidia has several irons in the fire here. Industrial robot? Self-driving car? Creepy humanoid robots? Experimental academic robots? Whatever your needs are, nvidia is ready with a GPU, some software, and some tutorials on the basics.
That's because the past year of robotics advancements (e.g. https://www.physicalintelligence.company/blog/pi0, https://arxiv.org/abs/2412.13196) has been driven by advances in machine learning and multimodal foundation models. There has been very little change in the actual electronics and mechanical engineering of robotics. So it's no surprise that the traditional hardware leaders like Kuka and ABB are not seeing massive gains so far. I suspect they might get the Tesla treatment soon when the Chinese competitors like unitree start muscling into the humanoid robotics space.
Robotics advancements are now AI driven and software defined. It turned out that adding a camera and tying a big foundation model to a traditional robot is all you need. Wall-E is now experiencing the ImageNet moment.
Perhaps I wasn't explicit enough about the argument I was trying to make.
Revenue in business is about selling price multiplied by sales volumes, and I'm not sure factory robot sales volumes are big enough to 'drive future growth' for nvidia.
According to [1] there were 553,000 robots installed in factories in 2023. Even if every single one of those half a million robots needed a $2000 GPU that's only $1.1 billion in revenue. Meanwhile nvidia had revenue of 26 billion in 2023, and 61 billion in 2024.
Many of those robots will be doing basic, routine things that don't need complex vision systems. And 54% of those half a billion robot arms were sold in China - sanctions [2] mean nvidia can't export even the 4090 to China, let alone anything more expensive. Machine vision models are considered 'huge' if they reach half a gigabyte - industrial robots might not need the huge GPUs that LLMs call for.
So it's not clear nvidia can increase the price per GPU to compensate for the limited sales volumes.
If nvidia wants robotics to 'drive future growth' they need a bigger market than just factory automation.
[1] https://ifr.org/img/worldrobotics/2023_WR_extended_version.p... [2] https://www.theregister.com/2023/10/19/china_biden_ai/
https://www.reddit.com/r/interestingasfuck/comments/1h1i1z1/...
But if you're just buying the arm itself? There are quality robot arms, like the €38,928 UR10e [1], that are within reach of SMEs. No multi-million-dollar budget required.
[1] https://shop.wiredworkers.io/en_GB/shop/universal-robots-ur1...
And those duties can be achieved with today’s mechanics — they just need good control, which is now seeing ferocious progress
As the LLM, generative AI, etc. bubble begins to deflate due to investors and companies finding it hard to make profits from those AI usecases, Nvidia needs to pivot. This article indicates that Nvidia is hedging on robotics as the next driving force that will continue to sustain the massive interest in their products. Personally, I don't see how robotics can maintain that same driving force for their products, and investors will find it hard to squeeze profit out of it, and they'll be back to searching for another hype. It's like Nvidia is trying to create a market to justify their products and continued development, similar to what Meta has tried, to spectacular failure, with the Metaverse for their virtual products.
After the frenzy that sustained these compute products transitioned from big data, to crypto, and now, to AI, I'm curious what the next jump will be; I don't think the "physical AI" space of robotics can sustain Nvidia in the way that they're hoping.
On the investment side, it's hard to say that since ROIC is still generally up and to the right. As long as that continues, so will investment.
Then biggest gap I see is expected if you look at past trends like mobile and the internet: In the first wave of new tech there's a lot of trying to do the old things in the new way, which often fails or gives incremental improvements at best.
This is why the 'new' companies seem to be doing the best. I've been shocked at so many new AI startups generating millions in revenue so quickly (billions with OpenAI, but that's a special case). It's because they're not shackled to past products, business models, etc.
However, there are plenty of enterprise companies trying to integrate AI into existing workflows and failing miserably. Just like when they tried to retrofit factories with electricity. It's not just plug and play in most cases, you need new workflows, etc. That will take years and there will be plenty more failures.
The level of investment is staggering though, and might we see a crash at some point? Maybe, but likely not for a while since there's still so much white space. The hardest thing with new technologies like this is not to confuse the limits of our imagination with the limits of reality (and that goes both ways).
But off the shelf mini PCs are much more user friendly for existing software IME.
Thankfully ARM being so wide spread and continuing to grow this wont matter as much.
I'd love you to point me in the direction of an off-the-shelf mini PC that has 64gb of addressable memory and CUDA support.
On the other hand, if you force the CUDA support condition and any automatic translation of CUDA programs is not accepted as good enough, then this mandates the use of a discrete NVIDIA GPU, which can be provided only by a mini-ITX mini-PC.
There are mini-ITX boards with laptop Ryzen 7940HX or 7945HX CPUs, at prices between $400 and $550. To such a board you must add 64 GB of DRAM, e.g. @ $175, and a GPU, e.g. a RTX 4060 at slightly more than $300.
Without a discrete GPU, a case for a mini-ITX motherboard has a volume of only 2.5 liter. With a discrete GPU like RTX 4060, the volume of the case must increase to 5 liter (for cases with PCIe extenders, which allow a smaller volume than typical mini-ITX cases).
So your CUDA condition still allows what can be considered an off-the-shelf mini-PC, but mandating CUDA raises the volume from the 0.5 L of a NUC-like mini-PC to 5 L and the price is also raised 2 or 3 times.
This of course unless you choose an Orin for CUDA support, but that will not give you 64 GB of DRAM, because NVIDIA has never provided enough memory in any of their products, unless you accept to pay a huge overprice.
Coming from web / app dev this was my very least favorite part of working on the software side of robotics with ROS.
To be brutally honest, you aren't the primary persona in the robotics space.
If you have limited resources (as any organization does), the PM for DevEx will target customers with the best "bang-for-buck" from a developer effort to revenue standpoint.
Most purchasers and users in the robotics and hardware space tend to be experienced players in the hardware, aerospace, and MechE world, which has different patterns and priorities from a purely software world.
If there is a case to be made that there is a significant untapped market, it makes sense for someone like you to go it on your own and create an alternate offering via your own startup.
Robots are a branch of industrial manufacturing machinery. That is not, historically, a high-margin business. It also demands high reliability and long machine life.
Interestingly, there's a trend towards renting robots by the working hour. It's a service - the robot company comes in, sets up robot workers, services them as needed, and monitors them remotely. The robot company gets paid for each operating hour. Pricing is somewhat below what humans cost.[1]
[1] https://bernardmarr.com/robots-as-a-service-a-technology-tre...
The end user usually doesn't have the expertise to even maintain the systems, nor does it make sense for them to do it in-house.
Charging per item of work (operating hour or thing processed) allows use of consultants but keeps incentives aligned between all parties (maximize uptime/productivity).
Lots of dotcom busts in the late 90s were concepts that worked 10-15 years later. We just did not have broadband and smartphones. Battery and AI tech is quite likely to be the missing piece robotics lacked in the past.
Cheap semiconductors as well.
Fabricating a chip on a 28nm and 48nm process is extremely commodified nowadays. These are the same processes used to fabricate an Nvidia Tesla or an i7 or Xeon barely a decade ago, so the raw compute power available at extremely commodified prices is insane.
Just about every regional power has the ability to fabricate an Intel i7 or Nvidia Tesla equivalent nowadays.
And most regional powers have 3-7 year plans to build domestic 14nm fabrication capacity as well now. A number of firms like Taiwan's PSMC have made a killing selling the end-to-end IP and workflow for fabrication.
This will probably take off once Amazon finally gets robots that can do unboxing, picking, and boxing. They've been trying for years to get that to work. Amazon already has robots doing most of the lifting and carrying, but people still handle each item.
People have been trying to do bin picking fulfillment with robots since the 1980s. Swisslog, Brightpick, and Universal Robotics have all demoed this, but so far it's not working well enough to take over. It's getting close, though.
https://www.starship.xyz/
"Slow but steady" I would call it.
There is absolutely no meaningful signal about a system’s safety that can be derived from one person using a system for two weeks.
At best it can only demonstrate that a system is wildly unsafe.
There is a very large chasm of 9s between one person being able to detect an unsafe system in two weeks of use and actually having a truly safe system.
Your observation from this short time window isn't enough to prove the usefulness of something as serious as life and death.
re: region, I’d like to see it take on more challenging conditions, like in India for example where things are chaotic even for human drivers. I doubt that it’ll survive over here.
Self driving is robotics. Simple as that.
Building a robot that can cook or fold a t-shirt, for example, is much harder.
2 rides went fine though neither was particularly challenging. The third though the car decided to head down a narrow side street where a pickup in front was partially blocking the road making a dropoff. There was enough space to just squeeze by and it was clear the truck expected the car to. A few cars turned in behind the waymo, effectively trapping it in as it didn't know how to proceed. The dropoff eventually completed and it was able to pull forward
I still think it'll do well because even if you need to hire 1 person to remotely monitor every 10 cars (I doubt Waymo has anywhere near that many support staff) it's still better than having to pay 10 drivers who may or may not actually be good at driving. But to really take over they'll need to be much more independent.
I am not entirely sure that is solved. And certainly not years ago. And it is only close in US where the data are trained. Doesn't mean it could be used in Japan ( where they are doing testing now ) driving on the different of the road with very different culture and traffics.
You could easily use the same logic to say humans haven’t solved driving yet either!
While it's not Rome, the operating areas for Waymo, at least in San Francisco, are not all grids of modern wide streets either.
I'm still puzzled on why Waymo insists on not having any remote driving or any remote advising cars on where/how to get themselves out of a situation. Yes that would cost a little more - but this is early stages so it's just a little more money at this point (lol) - in exchange for avoiding embarassing PR bullshit about cars self-honking at each other or rides stuck in infinite hesitation loop or not knowing what to do when there is a traffic cone on the hood. I haven't seen any convincing arguments for not having that. Anyone heard a legitimately good tech or liability reason? I doubt I would have missed it but...
It's like learning to code in JS on a 2024 MacBook pro and thinking you can "just" transfer your skills to cobol on 1970s hardware because both are "programming"
I’m simply talking about “300 days of sun” as being the limiting factor. You extrapolated the rest.
The article references a "ChatGPT moment" for physical robotics, but honestly I think the Chat GPT moment has kind of come and gone, and the world still runs largely as it ever did. Probably not the best analogy, unless they're just talking about buckets of VC money flowing into the space to fund lots of bad ideas, which would be good for NVIDIA financially.
As an admitted non-expert in this field, I guess the one thing that really annoys me about articles like this is the lack of a concrete vision. It's like Boston Dynamics and their dancing robots, which while impressive, haven't really amounted to much outside of the lab. The last thing I remember reading was a military prototype to carry stuff for infantry that ended up being turned down because it was too loud.
The article even confirms this general perspective, ending with "As of right now, we don’t have very effective tools for verifying the safety and reliability properties of machine learning systems, especially in robotics. This is a major open scientific question in the field,” said Rosen."
So whatever robot you're developing is incredibly complex, to be trusted with heavy machinery or around consumers directly, while being neither verifiably safe nor reliable.
Sorry, but almost everything in this article sounds like a projection of AI-hype onto physical robotics, with all the veracity of "this is good for Bitcoin". Sounds like NVIDIA is doing right by its shareholders though.
i just want a tiiiny gpu for $10 so i can run smaller models at higher speed than possible with xtensa/rp2040 having limited simd support etc.
Neural accelerators are coming into MCUs. The just released STM32N6 is probably among the best. Alif with the U55/U85 has been out for a little while. Maxim MAX78000 has a CNN accelerator out for a couple of years. More will come in the next few years - though not from Nvidia any time soon.
Previously it was far too overpriced for most uses (except for someone developing a certified automotive device), but at the new price and performance it has become competitive with the existing alternatives in the same $150 to $300 price range, which are based on Intel, AMD, MediaTek, Qualcomm or Rockchip CPUs.
If politicians had real skin in the game there'd be far less war.
What they want is border and population control that involves very few ordinary citizens, in large part in expectation of something like hundreds of millions of climate refugees. After having spent a couple of years killing and maiming poor people with almost nowhere to go you tend to need quite a bit of medical care and usually join the anti-war movement regardless if you got a college degree out of it or not.
I find it likely we'll see gun mounted robodog patrols along occidental borders within ten years from now, after having tested it on populations elsewhere.