> Unfortunately, today’s midrange cards like the RTX 4060 and RX 7600 only come with 8 GB of VRAM
Just a nit: one step up (RX 7600 XT) comes with 16GB memory, although in clamshell configuration. With the B580 falling inbetween the 7600 and 7600 XT in terms of pricing, it seems a bit unfair to only compare it with the former.
- RX 7600 (8GB) ~€300
- RTX 4060 (8GB) ~€310
- Intel B580 (12GB) ~€330
- RX 7600 XT (16GB) ~€350
- RTX 4060 Ti (8GB) ~€420
- RTX 4060 Ti (16GB) ~€580*
*Apparently this card is really rare plus a bad value proposition, so it is hard to find
qball 31 days ago [-]
All RTX xx60 cards are really bad value propositions, though (especially in comparison to the xx80 series cards).
If the 4060 was the 3080-for-400USD that everyone actually wants, that'd be a different story. Fortunately, its nonexistence is a major contributor to why the B580 can even be a viable GPU for Intel to produce in the first place.
jorvi 31 days ago [-]
Not all of them. The 3060 Ti was great because it was actually built on the same underlying chip as the 3070 and 3070 Ti. Which ironically made those less valuable.
But the release of those cards was during Covid pricing weirdness times. I scored a 3070 Ti at €650, whilst the 3060 Ti's that I actually wanted were being sold for €700+. Viva la Discord bots.
glenneroo 31 days ago [-]
I believe 3060 Ti's were in higher demand because they were a great value for shitcoin mining, especially after tuning (e.g. undervolting).
RachelF 31 days ago [-]
AMD and Intel copy Nvidia's VRAM lead.
The RTX 1060 came with 6GB of VRAM. Four generations later, the 5060 comes with only 2GB more.
I suspect NVidia does not want consumer cards to eat into those lucrative data centre profits?
I don’t think it’s accurate to say they’re copying NVidia’s lead. On the mid range it’s been segregated on memory and bus width for a very long time. Your 1060 is a good example actually. The standard GDDR5 versions have a reduced die with six memory controllers vs eight on the 1070 and 1080. The 1060 GDDR5X version a cut down version of the same die as the 1080 and with two memory controllers turned off. The odd sizes of 3 and 6 gigs of memory is due to the way they segmented their chips to have a 192bit bus on the 1060 vs the 256bit bus on the top end. The 5GB version is further chopped down to 160bit.
Those parts competed with the RX480 with 8GB of memory so NVidia was behind AMD at that price point.
AMD had not been competing with the *80/Ti cards at this point for a few generations and stuck with that strategy through today though the results have gotten better SKU to SKU.
And you’re quite right they don’t want these chips in the data center and at some point they didn’t really want these cards competing in games with the top end when placed in SLI (when that was a thing) as they removed the connector from the mid range.
est 30 days ago [-]
the VRAM chip is cheap, but how to inter-connect them at high speed isn't
michaelt 30 days ago [-]
If you want to double the memory and double the total memory bandwidth, sure. That'd need twice as many data lines, or the same lines at twice the speed.
But if you just want to double the memory without increasing the total memory bandwidth, isn't it a good deal simpler? What's 1 more bit on the address bus for a 256 bit bus?
kbolino 30 days ago [-]
The GPU already has DMA to system RAM. If you're going to make the VRAM as slow as system RAM, then a UMA makes more sense than throwing more memory chips on the GPU.
1W6MIC49CYX9GAP 30 days ago [-]
Why would you slow down VRAM?
kbolino 30 days ago [-]
Good point. I misunderstood the situation. I figured doubling the VRAM size at the same bus width would halve the bandwidth.
Instead, it appears entirely possible to double VRAM size (starting from current amounts) while keeping the bus width and bandwidth the same (cf. 4060 Ti 8GB vs. 4060 Ti 16GB). And, since that bandwidth is already much higher than system RAM (e.g. 128-bit GDDR6 at 288 GB/s vs DDR5 at 32-64 GB/s), it seems very useful to do so, though I'd imagine games wouldn't benefit as much as compute would.
jorvi 30 days ago [-]
Actually, it's compute workloads that love bandwidth, they just have hard thresholds on how much memory they need.
You can see this with overclocking VRAM. Greatly benefits mining, slightly or even negatively benefits gaming workloads.
This extends to system RAM too, most applications will see more benefit from better access times rather than higher MT/s.
immibis 30 days ago [-]
But having the VRAM allows you to run the model on the GPU at all, doesn't it? A card with 48GB can run twice as much model than a card with 24GB, even though it takes twice as long. Nobody is expecting to run twice as much model in the same time just by increasing the VRAM.
Without the extra VRAM, it takes hundreds of times divided by batch size longer due to swapping, or tens of times longer consistently if you run the rest of the model on the CPU.
clamchowder 31 days ago [-]
(author here) When I checked the 7600 XT was much more expensive.
Right now it's still $360 on eBay, vs the B580's $250 MSRP, though yeah I guess it's hard to find the B580 in stock
jorvi 31 days ago [-]
Yeah I guess regional availability really works into it.. bummer
I wonder if the B580 will drop to MSRP at all, or if retailers will just keep it slotted into the greater GPU line-up the way it is now and pocket the extra money.
And that is also one of the most popular cards on prebuilt systems. Just search through Amazon listings and see which card shows up all the damn time.
kbolino 30 days ago [-]
Prebuilders get priority access and volume discounts, so while it may not be a good value to buy individually, that doesn't necessarily apply to buying it in bulk.
hassleblad23 31 days ago [-]
> Intel takes advantage of this by launching the Arc B580 at $250, undercutting both competitors while offering 12 GB of VRAM.
Not sure where you got that 350 EUR number for B580?
xmodem 31 days ago [-]
330 EUR is roughly reflective of the street price of the B580 in Europe.
Can confirm, bought mine for about 350 EUR in Latvia from a store that's known to add a bit of markup on things.
Though the market is particularly bad here, because an RTX 3060 12 GB (not Ti) costs between 310 - 370 EUR and an RX 7600 XT is between 370 - 420 EUR.
Either way, I'm happy that these cards exist because Battlemage is noticeably better than Alchemist in my experience (previously had an A580, now it's my current backup instead of the old RX 570 or RX 580) and it's good to have entry/mid level cards.
AnotherGoodName 31 days ago [-]
On Newegg the cheapest in stock is USD$370 as an example. This is consistent for Intel cards unfortunately.
The reviews will say "decent value at RRP" but Intel cards never ever sell anywhere near RRP meaning that when it comes down to it you're much better off not going Intel.
I feel like reviews should all acknowledge this fact by now. "Would be decent value at RRP but not recommended since Intel cards are always %50 over RRP".
muststopmyths 31 days ago [-]
I bought the Asus version of the B580 at MSRP of $280 on launch day.
Central Computers in the SF Bay Area keeps them at MSRP. They may not be in stock online, but the stores frequently have stock, especially San Mateo.
Not useful to people outside the area, but then Microcenter also sells at MSRP. So there are non-scalping stores out there.
The trick is to jump on the stock when it arrives.
compsciphd 30 days ago [-]
newegg doesn't have any stock of B580s at hte moment, you're looking at 3rd party sellers who are raising prices (and hence why have stock).
I have a Rx 7600 XT that I purchased to run Ollama LLMs. Something just to screw around with.
Works fine with their ollama:rocm docker image on Fedora using podman. No complaints.
Did some gaming, too, just to see how well that works. A few steam games.
mrbonner 31 days ago [-]
Let me know where you could find 4060Ti 16GB for under $1000 USD
hedgehog 31 days ago [-]
What's annoying is they were under $500 just a few months ago.
donflamenco 31 days ago [-]
Bestbuy has the PNY 4060 Ti 16GB in stock right now for $450.
qingcharles 30 days ago [-]
Not any longer...!
This card still seems like a bad proposition. It's roughly similar performance to the 11GB 2080 Ti for double the price. You'd have to really want that extra 5GB.
donflamenco 30 days ago [-]
I still see it in stock for pick up in Bay Area and in Seattle. I tried a Montana zip code and it showed it as available also.
Most people who want the 4060 Ti 16GB is because they want the 16GB for running LLMs. So yes, they really want that extra 5GB.
I'm actually tempted, but I don't know if I should go for a Mac Studio M1 Max 64GB for $1350 (ebay) or build a PC around a GPU. I think the Mac makes a lot of sense.
buck746 24 days ago [-]
I have an M2 Max with 64Gb of ram. It handles everything I throw at it. Runs the 30ish gigabyte deepseek model fine. I will admit for gaming I pretty much just stick to Cyberpunk 2077, Minecraft, Stray, Viscera Cleanup Simulator and old games with open source engine options. I'm happy I can play Cyberpunk with my screen on full brightness using 30w, compared to my Xeon windows machine taking 250w for lower frame rates.
31 days ago [-]
netbioserror 31 days ago [-]
A lot of commentators have pointed out that Intel is reaching nowhere near the performance/mm2 of Nvidia or AMD designs, though contrary to what I thought that might imply, it seems that power consumption is very much under control on Battlemage. So it seems the primary trade-off here is on the die cost.
Can anyone explain what might be going on here, especially as it relates to power consumption? I thought (bigger die ^ bigger wires -> more current -> higher consumption).
kimixa 31 days ago [-]
Increasing clocks tends to have a greater-than-linear cost on power, as you need transistors to switch quicker so often need a higher voltage, which causes more leakage and other losses on top of the switching cost itself (that all turn into heat). Higher clock targets also have a cost for the design itself, often needing more transistors for things like extra redrivers to ensure you get fast switching speed, or even things like more pipeline stages. Plus not all area is "transistors" - it's often easier to place related units that need a lot of interconnectivity with shorter interconnects if an adjacent, less interconnected unit isn't also trying to be packed into much of the same space, routing on modern chips is really difficult (and a place where companies can really differentiate by investing more).
For tasks that tend to scale well with increased die area, which is often the case for GPUs as they're already focused on massively parallel tasks so laying down more parallel units is a realistic option, running a larger die at lower clocks is often notably more efficient in terms of performance per unit power.
For GPUs generally that's just part of the pricing and cost balance, a larger lower clocked die would be more efficient, but would that really sell for as much as the same die clocked even higher to get peak results?
netbioserror 31 days ago [-]
>For tasks that tend to scale well with increased die area, which is often the case for GPUs as they're already focused on massively parallel tasks so laying down more parallel units is a realistic option, running a larger die at lower clocks is often notably more efficient in terms of performance per unit power.
I should've considered this, I have an RTX A5000. It's a gigantic GA102 die (3090, 3080) that's underclocked to 230W, putting it at roughly 3070 throughput. That's ~15% less performance than a 3090 for a ~35% power reduction. Absolutely nonlinear savings there. Though some of that may have to do with power savings using GDDR6 over GDDR6X.
(I should mention that relative performance estimates are all over the place, by some metrics the A5000 is ~3070, by others it's ~3080.)
bgnn 31 days ago [-]
Yeah the power consumption scales, to first order, with Vdd^2 (square of power supply voltage) but performance scales with Vdd. Though you cannot simply reduce the Vdd and clock rate and do more pipelining etc to gain back the performance. If you are willing to back off on performance a bit you can gain hugely on power. Plus thermal management of it is more manageable.
cubefox 31 days ago [-]
> Increasing clocks tends to have a greater-than-linear cost on power
They are holding back the higher vram models of this card. GPU makers always do some nerfing of their cards in the same product line. Often times there’s no good reason for this other than they found specs that they can market and sell simply by moving voltages around.
Anyway, expecting good earnings throughout the year as they use Battlemage sales to hide the larger concerns about standing up their foundry (great earnings for the initial 12gb cards, and so on for the inevitable 16/24gb cards).
tonetegeatinst 31 days ago [-]
It mainly seems to boil down to design choice and process technology.
They might be targeting a lower power density per squad mm than compared to amd or nvidia, focusing more on lower power levels.
Instruction set architecture and layout of the chips and PCB also factor into this as well.
elric 31 days ago [-]
I couldn't find any information regarding power consumption in the article. I'd love to upgrade my aging gaming rig, but all modern AMD/Nvidia graphics cards consume significantly more power than my current card.
31 days ago [-]
MisterTea 31 days ago [-]
> I thought (bigger die ^ bigger wires -> more current -> higher consumption).
I am not a semi expert but bigger die doesn't mean bigger wires if you are referring to cross-section, the wires would be thinner meaning less current. Power is consumed pushing and pulling electrons from the transistor gates which are all of the FET type, field effect transistor. The gate is a capacitor that needs to be charged to open the gate to allow current to flow through the transistor. discharging the gate closes it. That current draw then gets multiplied by a few billion gates so you can see where the load comes from.
williamDafoe 31 days ago [-]
Actually the wires don't scale down like the transistors do. I remember in graduate school taking VLSI circuit complexity theory and the conclusion was for two dimensional circuits the wires will end Moore's Law. However I've seen articles about backside power delivery and they are already using seven+ layers so the wires are going through three dimensions now. Copper interconnects were a one-time bonus in the late 90s and after that wires just don't scale down-signal delay would go up too fast. Imagine taking a city with all the streets and houses and the houses now become the size of dog houses but you can't shrink the streets they have to stay the same size to carry signals quickly!
gruez 31 days ago [-]
>I thought (bigger die ^ bigger wires -> more current -> higher consumption).
All things being equal, a bigger die would result in more power consumption, but the factor you're not considering is the voltage/frequency curve. As you increase the frequency, you also need to up the voltage. However, as you increase voltage, there's diminishing returns to how much you can increase the frequency, so you end up massively increasing power consumption to get minor performance gains.
wmf 31 days ago [-]
If it's a similar number of transistors on a larger die then I can believe the power consumption is good. Less dense layout probably requires less design effort and may reduce hotspots.
If Intel is getting similar performance from more transistors that could be caused by extra control logic from a 16-wide core instead of 32.
p1necone 31 days ago [-]
performance/mm2
This strikes me as not a particularly useful metric, or at least one only indirectly related to the stuff that actually matters.
Performance/watt and performance/cost are the only metrics that really matter both to consumer and producer - performance/die size is only used as a metric because die size generally correlates to both of those. But comparing it between different manufacturers and different fabs strikes me as a mistake (although maybe it's just necessary because identifying actual manufacturing costs isn't possible?).
myrmidon 31 days ago [-]
Loosely related question:
What prevents manufacturers from taking some existing mid/toprange consumer GPU design, and just slapping like 256GB VRAM onto it? (enabling consumers to run big-LLM inference locally).
Would that be useless for some reason? What am I missing?
elabajaba 31 days ago [-]
The amount of memory you can put on a GPU is mainly constrained by the GPU's memory bus width (which is both expensive and power hungry to expand) and the available GDDR chips (generally require 32bits of the bus per chip). We've been using 16Gbit (2GB) chips for awhile, and they're just starting to roll out 24Gbit (3GB) GDDR7 modules, but they're expensive and in limited demand. You also have to account for VRAM being somewhat power hungry (~1.5-2.5w per module under load).
Once you've filled all the slots your only real option is to do a clamshell setup that will double the VRAM capacity by putting chips on the back of the PCB in the same spot as the ones on the front (for timing reasons the traces all have to be the same length). Clamshell designs then need to figure out how to cool those chips on the back (~1.5-2.5w per module depending on speed and if it's GDDR6/6X/7, meaning you could have up to 40w on the back).
Some basic math puts us at 16 modules for a 512 bit bus (only the 5090, have to go back a decade+ to get the last 512bit bus GPU), 12 with 384bit (4090, 7900xtx), or 8 with 256bit (5080, 4080, 7800xt).
A clamshell 5090 with 2GB modules has a max limit of 64GB, or 96GB with (currently expensive and limited) 3GB modules (you'll be able to buy this at some point as the RTX 6000 Blackwell at stupid prices).
HBM can get you higher amounts, but it's extremely expensive to buy (you're competing against H100s, MI300Xs, etc), supply limited (AI hardware companies are buying all of it and want even more), requires a different memory controller (meaning you'll still have to partially redesign the GPU), and requires expensive packaging to assemble it.
lostmsu 31 days ago [-]
What of previous generations of HBM? Older consumer AMD GPUs (Vega) and Titan V had HBM2. According to https://en.wikipedia.org/wiki/Radeon_RX_Vega_series#Radeon_V... you could get 16GB with 1TB/s for $700 at release. It is no longer use in data centers. I'd gladly pay $2800 for 48GB with 4TB/s.
Tuna-Fish 30 days ago [-]
Previous generation of HBM is not any cheaper than the current ones, and it is no longer in production, the lines having shifted to the new stuff.
IanCutress 30 days ago [-]
HBM2 is still in volume production. New products coming out with it on the ASIC side. Gaudi 3 uses HBM2e.
mppm 30 days ago [-]
Interesting. So a 32-chip GDDR6 clamshell design could pack 64GB VRAM with about 2TB/s on a 1024bit bus, consuming around 100W for the memory subsystem? With current chip prices [1], this would cost just about 200$ (!) for the memory chips, apparently. So theoretically, it should be possible to build fairly powerful AI accelerators in the 300W and < 1000$ range. If one wanted to, that is :)
Hardware-wise instead of putting the chips on the PCB surface one would mount an 16-gonal arrangement of perpendicular daughterboards, each containing 2-16 GDDR chips where there would be normally one, with external liquid cooling, power delivery and PCIe control connection.
Then each of the daughterboards would feature a multiplexer with a dual-ported SRAM containing a table where for each memory page it would store the chip number to map it to and it would use it to route requests from the GPU, using the second port to change the mapping from the extra PCIe interface.
API-wise, for each resource you would have N overlays and would have a new operation allowing to switch the resource overlay (which would require a custom driver that properly invalidates caches).
This would depend on the GPU supporting the much higher latency of this setup and providing good enough support for cache flushing and invalidation, as well as deterministic mapping from physical addresses to chip addresses, and the ability to manufacture all this in a reasonably affordable fashion.
Tuna-Fish 30 days ago [-]
Not at GDDR speeds.
GPUs use special DRAM that has much higher bandwidth than the DRAM that's used with CPUs. The main reason they can achieve this higher bandwidth at low cost is that the connection between the GPU and the DRAM chip is point-to-point, very short, and very clean. Today, even clamshell memory configuration is not supported by plugging two memory chips into the same bus, it's supported by having the interface in the GDDR chips internally split into two halves, and each chip can either serve requests using both halves at the same time, or using only one half over twice the time.
You are definitely not passing that link through some kind of daughterboard connector, or a flex cable.
nenaoki 30 days ago [-]
>A clamshell 5090 with 2GB modules has a max limit of 64GB
How does "clamshelling" get around the 32-bits per module requirement? Do the two 2GB modules act as one 4GB module when clamshelled?
m4rtink 30 days ago [-]
So I guess we just wait for HBM to get cheaper and better, which should not take too long, given how much money is being pumped into it ?
reginald78 31 days ago [-]
You'd need memory chips with double the memory capacity to slap the extra vram in, at least without altering the memory bus width. And indeed, some third party modded entries like that seem to have shown up:
https://www.tomshardware.com/pc-components/gpus/nvidia-gamin...
As far as official products, I think the real reason another commentator mentioned is that they don't want to cannibalize their more powerful card sales. I know I'd be interested in a lower powered card with a lot of vram just to get my foot in the door, that is why I bought a RTX 3060 12GB which is unimpressive for gaming but actually had the second most vram available in that generation. Nvidia seem to have noticed this mistake and later released a crappier 8GB version to replace it.
I think if the market reacted to create a product like this to compete with nvidia they'd pretty quickly release something to fit the need, but as it is they don't have too.
SunlitCat 31 days ago [-]
The 3060 with 12GB was an outlier for it's time of release because the crypto (currency) hype was raging at that moment and scalpers, miners and everyone in between were buying graphic cards left and right! Hard times were these! D:
Animats 31 days ago [-]
There are companies in China doing that, recycling older NVidia GPUs.[1]
You can actually getting GPUs from the Chinese markets (e.g., AliExpress) that have had their VRAM upgraded. Someone out there is doing aftermarket VRAM upgrades on cards to make them more usable for GPGPU tasks.
Which also answers your question: The manufacturers aren't doing it because they're assholes.
nenaoki 30 days ago [-]
These are a bit mythical, finding one for sale is no small feat.
I guess adding memory to some cards is a matter of completely reworking the PCB, not just swapping DRAM chips. From what I can find it has been done, both chip swaps and PCB reworks, it's just not easy to buy.
Software support is of course another consideration.
30 days ago [-]
ksec 31 days ago [-]
Bandwidth. GDDR / HBM, both used by GPU depending on usage are high bandwidth low capacity, comparatively speaking. Modern GPU tries to put more VRAM with more memory channel up to 512bit but requires more die space and hence are expensive.
We will need a new memory design for both GDDR and HBM. And I wont be surprised they are working on it already. But hardware takes time so it will be few more years down the road.
patmorgan23 31 days ago [-]
Because then they couldn't sell you the $10k enterprise GPU
RachelF 31 days ago [-]
True, but it is mostly profit - GDDR6 sells for $2.30 a gigabyte [1]
That's for 8Gbit chips, which are more or less unusable in modern products. 16Gbit chips are at ~$8, or $4 per GB.
Culonavirus 30 days ago [-]
10? Try 30+ ...
The_Colonel 31 days ago [-]
> enabling consumers to run big-LLM inference locally
A non-technical reason is that the market of people wanting to run their personal LLMs at home is very small.
numpad0 30 days ago [-]
Not sure where I read this and am paraphrasing a lot, but: there's a point where `RAM bandwidth < processor speed` becomes `true`, and processor becomes architecturally data starved.
As in, a 32bit CPU that runs at 1 giga instruction/second, with a 16 Gbps memory bus, could get up to 0.5 instruction per clock, and that's not very useful. For this reason there can't be an absolute potato with gigantic RAM.
How gigantic is not useful, idk.
fulafel 31 days ago [-]
Seems some years away to get that into consumer price range.
newsclues 30 days ago [-]
NVidia sells memory and GPUs as bundles to board partners.
If you harm their profit good luck continuing to have access to GPU chips. It’s a cartel.
singhrac 31 days ago [-]
There's some rumors of an Arc Pro, which would be a B580 in clamshell configuration with 24 GB of VRAM (which iiuc would be the same memory bandwidth unfortunately). Unless the price is absurd it would be the cheapest dollar/VRAM card at 24 GB.
mrandish 31 days ago [-]
That would be nice for AI-curious users and hobby experimenters, however gamers won't find near-term value in VRAM beyond 16GB. My concern is that due to Intel's severe financial challenges, their CEO du jour will end up killing off the discrete GPU business.
Back when they (finally) got into dGPUs seriously, Intel (and everyone else) said it would take many years and the patience to tolerate break-even products and losses while coming up the learning curve. Currently, it looks pretty much impossible to sustain ongoing profitability in low-end GPUs. Given gamer's current performance expectations vs the manufacturing costs to hit those targets, mid-range GPUs ($500-$750) seem like the minimum to have broad enough appeal to be sustainably profitable. Unfortunately, Intel is still probably years away from a competitive mid-range product. Sadly, the market has evolved weirdly such that there's now a minimum performance threshold preventing scaling price/performance down linearly below $300. The problem for Intel is they waited too long to enter the dGPU race, so now this profit gap coincides with no longer having the excess CPU profits to go in the red for years. Instead they squandered billions doing stupid stuff like buying McAfee.
treve 31 days ago [-]
I wonder if these GPUs are good options for Linux rigs and if first-party drivers are made.
mtlmtlmtlmtl 31 days ago [-]
Been running Linux on the A770 for about 2 years now. Very happy with the driver situation. Was a bit rough very early on, but it's nice and stable now. Recommend at least Linux 6.4, but preferably newer. I use a rolling release distro(Artix) to get up to date kernels.
ML stuff can be a pain sometimes because support in pytorch and various other libraries is not as prioritised as CUDA. But I've been able to get llama.cpp working via ollama, which has experimental intel gpu support. Worked fine when I tested it, though I haven't actually used it very much, so don't quote me on it.
For image gen, your best bet is to use sdnext(https://github.com/vladmandic/sdnext), which supports Intel on linux officially, and will automagically install the right pytorch version, and do a bunch of trickery to get libraries that insist on CUDA to work in many of the cases. Though some things are still unsupported due to various libraries still not supporting intel on Linux. Some types of quantization are unavailable for instance. But at least if you have the A770, quantisation for image gen is not as important due to plentyful VRAM, unless you're trying to use the flux models.
immibis 30 days ago [-]
I also have an A770. Don't use it for AI, but it runs fine for general 3D use (which mostly means either Minecraft, other similarly casual games, or demoscene shaders). I'm pretty sure I'm not utilizing it fully most of the time.
My main complaint is that the fan control just doesn't work. They stay at low speed or off no matter how hot the card gets, until it shuts down due to overheating. Apparently there's a firmware update to fix this, but you need Windows to flash it. You can zip-tie a spare fan somewhere pointing at the card...
Secondary complaint is that it's somehow not compatible with Linux's early boot console, so there's no graphical output until the driver is loaded. You'd better have ssh enabled while setting it up.
It's also incompatible with MBR/BIOS boot since it doesn't include an option ROM or whatever is needed to make that work - so I switched to UEFI (which I thought I was already using).
When I ran a shader "competition" some people's code with undefined behaviour ran differently on my GPU than theirs. That's unavoidable regardless of brand and not an Intel thing at all.
bradfa 31 days ago [-]
Yes, first party drivers are made. Upstream Linux and mesa project should have good support in their latest releases. If you're running a non-bleeding edge distro, you may need to wait or do a little leg work to get the newer versions of things, but this is not unusual for new hardware.
In fact, Intel has been a stellar contributor to the Linux kernel and associated projects, compared to all other vendors. They usually have launch day Linux support provided that you are running a bleeding edge Linux kernel.
baq 31 days ago [-]
Of all the god awful Linux GPU drivers Intel's are the least awful IME. Unless you're talking purely compute, then nvidia, have fun matching those cuda versions though...
dralley 31 days ago [-]
AMD's Linux drivers are pretty good. I get better performance playing games through Proton on Linux than I do playing the same games on Windows, despite whatever overhead the translation adds.
The only really annoying bug I've run into is the one where the system locks up if you go to sleep with more used swap space than free memory, but that one just got fixed.
ZeWaka 31 days ago [-]
I use an Alchemist series A380 on my nix media server, but it's absolutely fantastic for video encoding.
VTimofeenko 31 days ago [-]
Same; recently built SFF with low profile A310. Goes through video streams like hot knife through butter.
Do you have your config posted somewhere? I'd be interested to compare notes
Got it. I went native; NixOS wiki has an example of an overlay
bee_rider 31 days ago [-]
I have always associated Intel iGPUs with good drivers but people seem to often complain about their Linux dGPU drivers in these threads. I hope it is just an issue of them trying to break into a new field, rather than a slipping of their GPU drivers in general…
jorvi 31 days ago [-]
Intel switched over to a new driver for dGPUs and any iGPU newer than Skylake(?).
The newest beta-ish driver is Xe, the main driver is Intel HD, and the old driver is i915.
People complaining experienced the teething issues of early Xe builds.
sirn 31 days ago [-]
i915 is still the main kernel mode driver on Linux for every Intel GPUs up to Alchemist. xe kmd is used by Battlemage by default (as of 6.12).
There's a Mesa DRI driver, called i965 (originally made for Broadwater chipset, thus the 965 numbering), which has since been replaced by either:
- Crocus for anything up to Broadwell (Gen 8)
- Iris for anything from Broadwell and newer
Then there's a Video Acceleration driver, which is (also) called i965. I think this is what you're referring to. There are:
- i965 (aka Intel VAAPI Driver), which supports anything from Westmere (Gen 5) to Coffee Lake (Gen 9.5)
- iHD (aka Intel Media Driver), is a newer one, which supports anything from Broadwell (Gen 8)
- libvpl, an even newer one, which supports anything from Tiger Lake (Gen 12) and up
Battlemage users had to use libvpl until recently because Media Driver 2024Q4 with BMG support was only released 2 weeks ago. Using libvpl with ffmpeg may requires rebuilding ffmpeg, as some distro doesn't have it enabled (due to conflict with legacy Intel Media SDK, so you have to choose).
I have B580 for my Linux machine (6.12), and xe seems pretty stable/performant so far.
nullify88 31 days ago [-]
I am always confused about which drivers need installing to fully enable all hardware acceleration features on Broadwell. Also not all distros maintain the drivers equally resulting in mismatches between the vaapi driver or some other driver.
sirn 30 days ago [-]
My rough mental Intel driver matrix is something like this (might not be entirely correct):
Usually, KMD/DRI/Vulkan should work as-is if you use a reasonably recent kernel and mesa, but video acceleration sure is a bit of a mess.
elabajaba 31 days ago [-]
Intel GPU drivers have always been terrible. There's so many features that are just broken if you try to actually use them, on top of just generally being extremely slow.
Hell, the B580 is CPU bottlenecked on everything that isn't a 7800x3d or 9800x3d which is insane for a low-midrange GPU.
everfrustrated 31 days ago [-]
Intel also have up-streamed their video encoding acceleration support into software like ffmpeg.
Intel Arc gpus also support hardware video encoding for the AV1 codec which even the just released Nvidia 50 series still doesn't support.
lostmsu 31 days ago [-]
This is wrong. AV1 encoding is supported since Nvidia 40 series.
jcarrano 31 days ago [-]
Last year I was doing a livestream for a band. The NVidia encoder on my friend's computer (running Windows) just wouldn't work. We tried in vain to install drivers and random stuff from Nvidia. I pulled out my own machine with Linux and Intel iGPU and not only did it worked flawlessly, but did so on battery and with charge to spare.
On the other hand, I have to keep the driver for the secondary GPU (also intel) blacklisted because last time I tried to use it it was constantly drawing power.
Double the memory for double the price and I would buy one in a heartbeat.
kevincox 30 days ago [-]
There are a tons of products that I would buy if I could double a single spec for the same price.
talldayo 31 days ago [-]
If your application is video transcoding or AI inference, you could probably buy two and use them in a multi-GPU configuration.
glitchc 29 days ago [-]
Hate futzing around with multi-GPU configurations. It's always a bit of a mess from a driver perspective, not to mention all the extra power connectors needed, even though this card only requires two.
MezzoDelCammin 30 days ago [-]
This would have been a great card for a homelab if only they haven't decided to move away from SR-IOV in their consumer GPUs.
AFAIK it used to be possible to get some SR-IOV working on the previous Alchemists (with some flashing), but Battlemage seems like a proof of Intel abandoning the virtualization/GPU splitting in the consumer space altogether.
taurknaut 31 days ago [-]
I don't really care about how it performs so long as it's better than a CPU. I just want to target the GPU myself and remove the vendor from the software equation. Nvidia has taught me there isn't any value that can't be destroyed with sufficiently bad drivers.
I really don’t see the argument for patents. It just slows down healthy competition in Western countries while China disregards them and surges ahead. How can we expect to compete when they don’t play by the same rules?
joelthelion 31 days ago [-]
That's cool and all but can you use it for deep learning?
coderenegade 31 days ago [-]
You can. You need a recent Linux kernel, but pytorch now officially supports Intel's extensions (xpu). These are actually a decent consumer proposition because the bottleneck for most people training models on their own hardware is VRAM. These have substantially more VRAM than anything in their price bracket, and are priced competitively enough that you could buy two and have a pretty solid training setup. Or one for training and one for inference.
joelthelion 30 days ago [-]
Very interesting, thank you!
SG- 31 days ago [-]
it's a nice technical article but the charts are just terrible and seem blurry even when zoomed in.
clamchowder 31 days ago [-]
Yea Wordpress was a terrible platform and Substack is also a terrible platform. I don't know why every platform wants to take a simple uploaded PNG and apply TAA to it. And don't get me started on how Substack has no native table support, when HTML had it since prehistoric times.
If I had more time I'd roll my own site with basic HTML/CSS. It's not even hard, just time consuming.
dark__paladin 31 days ago [-]
TAA is temporal anti-aliasing, correct? There is no time dimension here, isn't it just compression + bilinear filtering?
clamchowder 31 days ago [-]
It was a joke about blurriness. To extend the joke, be glad it doesn't flicker and shimmer.
But yes, platforms usually apply compression in terrible ways, and it's especially noticeable coming from text and straight line stuff like graphs
dark__paladin 31 days ago [-]
Thanks for clarifying, went right over my head!
31 days ago [-]
singhrac 31 days ago [-]
Ghost as an alternative? They’ll let you sign up paying subscribers as well.
stoatstudios 31 days ago [-]
Is nobody going talk about how the architecture is called "Battlemage?" Is that just normal to GPU enthusiasts?
reginald78 31 days ago [-]
The generations are all fantasy type names in alphabetical order. The first was Alchemist (and the cards were things like A310) and the next is Celestial. Actually when I think about product names for GPUs and CPUs these seem above average in clarity and only slightly dorkier than average. I'm sure they'll get more confusing and nonsensical with time as that seems to be a constant of the universe.
spiffytech 31 days ago [-]
Dorky, alphabetical codenames are a big step up from a bunch of lakes in no obvious order.
PaulHoule 31 days ago [-]
Yeah, with the way Intel has been struggling I thought they should get it out of their system and name one of their chips "Shit Creek."
ReptileMan 31 days ago [-]
It has been 20 years since Prescott But the name is suitable still.
Workaccount2 31 days ago [-]
Can't wait for Dungeon architecture.
CodesInChaos 31 days ago [-]
Dragon and Druid sound like viable options.
30 days ago [-]
meragrin_ 31 days ago [-]
Dungeon architecture? What's that?
sevg 31 days ago [-]
Looks to have been a joke about the alphabetical naming: Alchemist, Battlemage, Celestial .. Dungeon
(There’s no name decided yet for the fourth in the series.)
pocak 31 days ago [-]
There is, it's Druid. Intel announced the first four codenames in 2021.
> [...] first generation, based on the Xe HPG microarchitecture, codenamed Alchemist (formerly known as DG2). Intel also revealed the code names of future generations under the Arc brand: Battlemage, Celestial and Druid.
C should have been Cleric, and I don't know about E (Eldritch Knight?!), but if F ain't Fighter I'm going to be disappointed.
MrDrMcCoy 30 days ago [-]
E could be evoker, enchanter, or exorcist. F could be Firedancer, but will probably be fighter :)
ZeWaka 31 days ago [-]
It's their 2nd generation, the 'B' series. The previous was their 'A' / Alchemist.
> According to Intel, the brand is named after the concept of story arcs found in video games. Each generation of Arc is named after character classes sorted by each letter of the Latin alphabet in ascending order.
(https://en.wikipedia.org/wiki/Intel_Arc)
31 days ago [-]
tdb7893 31 days ago [-]
It's dorky but there isn't much else to say about it. Personal GPU enthusiasts are almost always video game enthusiasts so it's not really a particularly weird name in context.
babypuncher 31 days ago [-]
It's just the code name for this generation of their GPU architecture, not the name for its instruction set. Intel's are all fantasy themed. Nvidia names theirs after famous scientists and mathematicians (Alan Turing, Ada Lovelace, David Blackwell)
dark-star 31 days ago [-]
A well-known commercial storage vendor gives their system releases codenames from beer brands. We had Becks, Guinnes, Longboard, Voodoo Ranger, and many others. Presumably what the devs drank during that release cycle, or something ;-)
It's fun for the developers and the end-users alike... So no, it's not limited to GPU enthusiasts at all. Everyone likes codenames :)
I mean, living people seems like a dick move in general for codenames.
wincy 31 days ago [-]
That’s what we make sure our codenames are sensible things like Jimmy Carter and James Earl Jones
We were actually told to change our internal names for our servers after someone named an AWS instance “stupid” and I rolled my eyes so hard, one dev ruined the fun for everyone.
monocasa 31 days ago [-]
I mean, sure, for a lot of the same reasons you can't file a defamation claim in defense of someone who's dead. The idea of them is in the public domain in a lot of ways.
So sure, pour one out to whoever's funeral is on the grocery store tabloids that week with your codenames.
high_na_euv 31 days ago [-]
Cool name, easy to remember, aint it?
faefox 31 days ago [-]
It sounds cool and has actual personality. What would you prefer, Intel Vision Pro Max? :)
baq 31 days ago [-]
A codename as good as any. Nvidia has Tesla, Turing etc.
userbinator 31 days ago [-]
It's very much normal "gamer" aesthetic.
ein0p 31 days ago [-]
Why not go out on a limb and produce a 64GB compute-optimized card with 1TB/sec memory bandwidth, Intel? What do you have to lose at this point?
williamDafoe 31 days ago [-]
[flagged]
keyringlight 31 days ago [-]
The other major issue with regards pricing is that intel need to pay one way or another to get market penetration, if no one buys their cards at all and they don't establish a beachhead then it's even more wasted money.
As I see it AMD get _potentially_ squeezed between intel and nvidia. Nvidia's majority marketshare seems pretty secure for the foreseeable future, intel undercutting AMD plus their connections to prebuilt system manufacturers would likely grab them a few more nibbles into AMD territory. If intel release a competent B770 versus AMD products priced a few hundred dollars more, even if Arc isn't as mature I'm not sure they have solid answers for why someone should buy Radeon.
In my view AMD's issue is that they don't have any vision for what their GPUs can offer besides a slightly better version of the previous generation, it appears back in 2018 that the RTX offering must have blindsided them, and years later they're not giving us any alternative vision for what comes next for graphics to make Radeon desirable besides catching up to nvidia (who I imagine will have something new to move the goalposts if anyone gets close), and this is an AMD that is currently well resourced from Zen
adgjlsfhk1 31 days ago [-]
I think this is a bad take because it assumes that NVidia is making rapid price/performance improvements in the consumer space The RTX 4060 is roughly equivalent to a 2080 (similar performance and ram and transistors). Intel isn't making much margin, but from what I've seen they're probably roughly breaking even not taking a huge loss.
Also, a ton of the work for Intel is in drivers which are (as the A770 showed) very improvable after launch. Based on the hardware, it seems very possible that B580 could get an extra 10% (especially in 1080p) which would bring it clear above the 4060ti in perf.
wirybeige 31 days ago [-]
Strange to point out those comparisons but not the actual transistor difference between the two.
B580 only has 19.6B transistors while the RTX 4070 has 35.8B transistors. So the RTX 4070 has nearly double (1.82x) the transistors of B580.
The RTX 4060 ti has 22.9B and the RTX 4060 has 18.9B transistors
throwawaythekey 31 days ago [-]
Would the difference in density be more likely due to a difference in design philosophy or the intel design team being less expert?
As a customer do intel pay for mm2 or for transistors?
Forgive me if you are not the right person for these questions.
wirybeige 31 days ago [-]
Hard to say why the density is that different, if those transistor numbers are accurate. A less dense design would allow for higher clocking, & while the clocks are fairly high, they aren't that far out there, but that's one factor (I'd hope they wouldn't trade half the area for a few extra MHz, when a gpu w/ 2x the tr will just be better).
It could also be in addition that the # of transistors that each company provides is different as they may count them differently (but I'm not convinced of this).
Customers pay by the wafer, so mm^2; though tr cost is a function of that so :3 .
wqaatwt 31 days ago [-]
> 4060 performance
That’s really not true though. It’s closer to 4060 Ti and somewhat ahead/behind depending on specific game.
ksec 31 days ago [-]
> I am not too impressed with the "chips and cheese ant's view article" as they don't uncover the reason why performance is SO PATHETIC!
Performance on GPU has always been about Drivers. Chip and Cheese is only here to show the uArch behind it. This isn't even new as we should have learned all about it during Voodoo 3Dfx era. And 9 years have passed since an ( now retired ) Intel Engineers said that they would be completing against Nvidia by 2020 if not 2021. We are now in 2025 and they are not even close. But somehow Raja Koduri was suppose to save them and now gone.
rincebrain 31 days ago [-]
Intel seems to have deep-seated issues with their PR department writing checks their engineers can't pay out on time for.
Not that Intel engineers are bad - on the contrary. But as you pointed out, they've been promising they'd be further than they are now for over 5 years now, and even 10+ years ago when I was working in HPC systems, they kept promising things you should build your systems on that would be "in the next gen" that were not, in fact, there.
It seems much like the Bioware Problem(tm) where Bioware got very comfortable promising the moon in 12 months and assuming 6 months of crunch would Magically produce a good outcome, and then discovered that Results May Vary.
Just a nit: one step up (RX 7600 XT) comes with 16GB memory, although in clamshell configuration. With the B580 falling inbetween the 7600 and 7600 XT in terms of pricing, it seems a bit unfair to only compare it with the former.
- RX 7600 (8GB) ~€300
- RTX 4060 (8GB) ~€310
- Intel B580 (12GB) ~€330
- RX 7600 XT (16GB) ~€350
- RTX 4060 Ti (8GB) ~€420
- RTX 4060 Ti (16GB) ~€580*
*Apparently this card is really rare plus a bad value proposition, so it is hard to find
If the 4060 was the 3080-for-400USD that everyone actually wants, that'd be a different story. Fortunately, its nonexistence is a major contributor to why the B580 can even be a viable GPU for Intel to produce in the first place.
But the release of those cards was during Covid pricing weirdness times. I scored a 3070 Ti at €650, whilst the 3060 Ti's that I actually wanted were being sold for €700+. Viva la Discord bots.
The RTX 1060 came with 6GB of VRAM. Four generations later, the 5060 comes with only 2GB more.
I suspect NVidia does not want consumer cards to eat into those lucrative data centre profits?
The cost of 1GB of VRAM is $2.30 see https://www.dramexchange.com
Those parts competed with the RX480 with 8GB of memory so NVidia was behind AMD at that price point.
AMD had not been competing with the *80/Ti cards at this point for a few generations and stuck with that strategy through today though the results have gotten better SKU to SKU.
And you’re quite right they don’t want these chips in the data center and at some point they didn’t really want these cards competing in games with the top end when placed in SLI (when that was a thing) as they removed the connector from the mid range.
But if you just want to double the memory without increasing the total memory bandwidth, isn't it a good deal simpler? What's 1 more bit on the address bus for a 256 bit bus?
Instead, it appears entirely possible to double VRAM size (starting from current amounts) while keeping the bus width and bandwidth the same (cf. 4060 Ti 8GB vs. 4060 Ti 16GB). And, since that bandwidth is already much higher than system RAM (e.g. 128-bit GDDR6 at 288 GB/s vs DDR5 at 32-64 GB/s), it seems very useful to do so, though I'd imagine games wouldn't benefit as much as compute would.
You can see this with overclocking VRAM. Greatly benefits mining, slightly or even negatively benefits gaming workloads.
This extends to system RAM too, most applications will see more benefit from better access times rather than higher MT/s.
Without the extra VRAM, it takes hundreds of times divided by batch size longer due to swapping, or tens of times longer consistently if you run the rest of the model on the CPU.
I wonder if the B580 will drop to MSRP at all, or if retailers will just keep it slotted into the greater GPU line-up the way it is now and pocket the extra money.
Not sure where you got that 350 EUR number for B580?
For example:
https://www.mindfactory.de/product_info.php/12GB-ASRock-Inte... (~327 EUR)
https://www.overclockers.co.uk/sparkle-intel-arc-b580-guardi... (~330 EUR)
https://www.inet.se/produkt/5414587/acer-arc-b580-12gb-nitro... (~336 EUR)
Though the market is particularly bad here, because an RTX 3060 12 GB (not Ti) costs between 310 - 370 EUR and an RX 7600 XT is between 370 - 420 EUR.
Either way, I'm happy that these cards exist because Battlemage is noticeably better than Alchemist in my experience (previously had an A580, now it's my current backup instead of the old RX 570 or RX 580) and it's good to have entry/mid level cards.
The reviews will say "decent value at RRP" but Intel cards never ever sell anywhere near RRP meaning that when it comes down to it you're much better off not going Intel.
I feel like reviews should all acknowledge this fact by now. "Would be decent value at RRP but not recommended since Intel cards are always %50 over RRP".
Central Computers in the SF Bay Area keeps them at MSRP. They may not be in stock online, but the stores frequently have stock, especially San Mateo.
Not useful to people outside the area, but then Microcenter also sells at MSRP. So there are non-scalping stores out there.
The trick is to jump on the stock when it arrives.
https://www.newegg.com/p/pl?d=b580&N=8000 to see sold by newegg stock.
Works fine with their ollama:rocm docker image on Fedora using podman. No complaints.
Did some gaming, too, just to see how well that works. A few steam games.
This card still seems like a bad proposition. It's roughly similar performance to the 11GB 2080 Ti for double the price. You'd have to really want that extra 5GB.
https://www.bestbuy.com/site/pny-nvidia-geforce-rtx-4060-ti-...
Most people who want the 4060 Ti 16GB is because they want the 16GB for running LLMs. So yes, they really want that extra 5GB.
I'm actually tempted, but I don't know if I should go for a Mac Studio M1 Max 64GB for $1350 (ebay) or build a PC around a GPU. I think the Mac makes a lot of sense.
Can anyone explain what might be going on here, especially as it relates to power consumption? I thought (bigger die ^ bigger wires -> more current -> higher consumption).
For tasks that tend to scale well with increased die area, which is often the case for GPUs as they're already focused on massively parallel tasks so laying down more parallel units is a realistic option, running a larger die at lower clocks is often notably more efficient in terms of performance per unit power.
For GPUs generally that's just part of the pricing and cost balance, a larger lower clocked die would be more efficient, but would that really sell for as much as the same die clocked even higher to get peak results?
I should've considered this, I have an RTX A5000. It's a gigantic GA102 die (3090, 3080) that's underclocked to 230W, putting it at roughly 3070 throughput. That's ~15% less performance than a 3090 for a ~35% power reduction. Absolutely nonlinear savings there. Though some of that may have to do with power savings using GDDR6 over GDDR6X.
(I should mention that relative performance estimates are all over the place, by some metrics the A5000 is ~3070, by others it's ~3080.)
Old source, but this says the power cost of increasing the clock frequency is cubic: https://physics.stackexchange.com/questions/34766/how-does-p...
Anyway, expecting good earnings throughout the year as they use Battlemage sales to hide the larger concerns about standing up their foundry (great earnings for the initial 12gb cards, and so on for the inevitable 16/24gb cards).
They might be targeting a lower power density per squad mm than compared to amd or nvidia, focusing more on lower power levels.
Instruction set architecture and layout of the chips and PCB also factor into this as well.
I am not a semi expert but bigger die doesn't mean bigger wires if you are referring to cross-section, the wires would be thinner meaning less current. Power is consumed pushing and pulling electrons from the transistor gates which are all of the FET type, field effect transistor. The gate is a capacitor that needs to be charged to open the gate to allow current to flow through the transistor. discharging the gate closes it. That current draw then gets multiplied by a few billion gates so you can see where the load comes from.
All things being equal, a bigger die would result in more power consumption, but the factor you're not considering is the voltage/frequency curve. As you increase the frequency, you also need to up the voltage. However, as you increase voltage, there's diminishing returns to how much you can increase the frequency, so you end up massively increasing power consumption to get minor performance gains.
If Intel is getting similar performance from more transistors that could be caused by extra control logic from a 16-wide core instead of 32.
This strikes me as not a particularly useful metric, or at least one only indirectly related to the stuff that actually matters.
Performance/watt and performance/cost are the only metrics that really matter both to consumer and producer - performance/die size is only used as a metric because die size generally correlates to both of those. But comparing it between different manufacturers and different fabs strikes me as a mistake (although maybe it's just necessary because identifying actual manufacturing costs isn't possible?).
What prevents manufacturers from taking some existing mid/toprange consumer GPU design, and just slapping like 256GB VRAM onto it? (enabling consumers to run big-LLM inference locally).
Would that be useless for some reason? What am I missing?
Once you've filled all the slots your only real option is to do a clamshell setup that will double the VRAM capacity by putting chips on the back of the PCB in the same spot as the ones on the front (for timing reasons the traces all have to be the same length). Clamshell designs then need to figure out how to cool those chips on the back (~1.5-2.5w per module depending on speed and if it's GDDR6/6X/7, meaning you could have up to 40w on the back).
Some basic math puts us at 16 modules for a 512 bit bus (only the 5090, have to go back a decade+ to get the last 512bit bus GPU), 12 with 384bit (4090, 7900xtx), or 8 with 256bit (5080, 4080, 7800xt).
A clamshell 5090 with 2GB modules has a max limit of 64GB, or 96GB with (currently expensive and limited) 3GB modules (you'll be able to buy this at some point as the RTX 6000 Blackwell at stupid prices).
HBM can get you higher amounts, but it's extremely expensive to buy (you're competing against H100s, MI300Xs, etc), supply limited (AI hardware companies are buying all of it and want even more), requires a different memory controller (meaning you'll still have to partially redesign the GPU), and requires expensive packaging to assemble it.
1. https://dramexchange.com/
Hardware-wise instead of putting the chips on the PCB surface one would mount an 16-gonal arrangement of perpendicular daughterboards, each containing 2-16 GDDR chips where there would be normally one, with external liquid cooling, power delivery and PCIe control connection.
Then each of the daughterboards would feature a multiplexer with a dual-ported SRAM containing a table where for each memory page it would store the chip number to map it to and it would use it to route requests from the GPU, using the second port to change the mapping from the extra PCIe interface.
API-wise, for each resource you would have N overlays and would have a new operation allowing to switch the resource overlay (which would require a custom driver that properly invalidates caches).
This would depend on the GPU supporting the much higher latency of this setup and providing good enough support for cache flushing and invalidation, as well as deterministic mapping from physical addresses to chip addresses, and the ability to manufacture all this in a reasonably affordable fashion.
GPUs use special DRAM that has much higher bandwidth than the DRAM that's used with CPUs. The main reason they can achieve this higher bandwidth at low cost is that the connection between the GPU and the DRAM chip is point-to-point, very short, and very clean. Today, even clamshell memory configuration is not supported by plugging two memory chips into the same bus, it's supported by having the interface in the GDDR chips internally split into two halves, and each chip can either serve requests using both halves at the same time, or using only one half over twice the time.
You are definitely not passing that link through some kind of daughterboard connector, or a flex cable.
How does "clamshelling" get around the 32-bits per module requirement? Do the two 2GB modules act as one 4GB module when clamshelled?
As far as official products, I think the real reason another commentator mentioned is that they don't want to cannibalize their more powerful card sales. I know I'd be interested in a lower powered card with a lot of vram just to get my foot in the door, that is why I bought a RTX 3060 12GB which is unimpressive for gaming but actually had the second most vram available in that generation. Nvidia seem to have noticed this mistake and later released a crappier 8GB version to replace it.
I think if the market reacted to create a product like this to compete with nvidia they'd pretty quickly release something to fit the need, but as it is they don't have too.
[1] https://www.reddit.com/r/hardware/comments/182nmmy/special_c...
Which also answers your question: The manufacturers aren't doing it because they're assholes.
I guess adding memory to some cards is a matter of completely reworking the PCB, not just swapping DRAM chips. From what I can find it has been done, both chip swaps and PCB reworks, it's just not easy to buy.
Software support is of course another consideration.
We will need a new memory design for both GDDR and HBM. And I wont be surprised they are working on it already. But hardware takes time so it will be few more years down the road.
[1]https://www.dramexchange.com
A non-technical reason is that the market of people wanting to run their personal LLMs at home is very small.
As in, a 32bit CPU that runs at 1 giga instruction/second, with a 16 Gbps memory bus, could get up to 0.5 instruction per clock, and that's not very useful. For this reason there can't be an absolute potato with gigantic RAM.
How gigantic is not useful, idk.
If you harm their profit good luck continuing to have access to GPU chips. It’s a cartel.
Back when they (finally) got into dGPUs seriously, Intel (and everyone else) said it would take many years and the patience to tolerate break-even products and losses while coming up the learning curve. Currently, it looks pretty much impossible to sustain ongoing profitability in low-end GPUs. Given gamer's current performance expectations vs the manufacturing costs to hit those targets, mid-range GPUs ($500-$750) seem like the minimum to have broad enough appeal to be sustainably profitable. Unfortunately, Intel is still probably years away from a competitive mid-range product. Sadly, the market has evolved weirdly such that there's now a minimum performance threshold preventing scaling price/performance down linearly below $300. The problem for Intel is they waited too long to enter the dGPU race, so now this profit gap coincides with no longer having the excess CPU profits to go in the red for years. Instead they squandered billions doing stupid stuff like buying McAfee.
ML stuff can be a pain sometimes because support in pytorch and various other libraries is not as prioritised as CUDA. But I've been able to get llama.cpp working via ollama, which has experimental intel gpu support. Worked fine when I tested it, though I haven't actually used it very much, so don't quote me on it.
For image gen, your best bet is to use sdnext(https://github.com/vladmandic/sdnext), which supports Intel on linux officially, and will automagically install the right pytorch version, and do a bunch of trickery to get libraries that insist on CUDA to work in many of the cases. Though some things are still unsupported due to various libraries still not supporting intel on Linux. Some types of quantization are unavailable for instance. But at least if you have the A770, quantisation for image gen is not as important due to plentyful VRAM, unless you're trying to use the flux models.
My main complaint is that the fan control just doesn't work. They stay at low speed or off no matter how hot the card gets, until it shuts down due to overheating. Apparently there's a firmware update to fix this, but you need Windows to flash it. You can zip-tie a spare fan somewhere pointing at the card...
Secondary complaint is that it's somehow not compatible with Linux's early boot console, so there's no graphical output until the driver is loaded. You'd better have ssh enabled while setting it up.
It's also incompatible with MBR/BIOS boot since it doesn't include an option ROM or whatever is needed to make that work - so I switched to UEFI (which I thought I was already using).
When I ran a shader "competition" some people's code with undefined behaviour ran differently on my GPU than theirs. That's unavoidable regardless of brand and not an Intel thing at all.
If you're running Ubuntu, Intel has some exact steps you can follow: https://dgpu-docs.intel.com/driver/client/overview.html
The only really annoying bug I've run into is the one where the system locks up if you go to sleep with more used swap space than free memory, but that one just got fixed.
Do you have your config posted somewhere? I'd be interested to compare notes
The newest beta-ish driver is Xe, the main driver is Intel HD, and the old driver is i915.
People complaining experienced the teething issues of early Xe builds.
There's a Mesa DRI driver, called i965 (originally made for Broadwater chipset, thus the 965 numbering), which has since been replaced by either:
- Crocus for anything up to Broadwell (Gen 8)
- Iris for anything from Broadwell and newer
Then there's a Video Acceleration driver, which is (also) called i965. I think this is what you're referring to. There are:
- i965 (aka Intel VAAPI Driver), which supports anything from Westmere (Gen 5) to Coffee Lake (Gen 9.5)
- iHD (aka Intel Media Driver), is a newer one, which supports anything from Broadwell (Gen 8)
- libvpl, an even newer one, which supports anything from Tiger Lake (Gen 12) and up
Battlemage users had to use libvpl until recently because Media Driver 2024Q4 with BMG support was only released 2 weeks ago. Using libvpl with ffmpeg may requires rebuilding ffmpeg, as some distro doesn't have it enabled (due to conflict with legacy Intel Media SDK, so you have to choose).
I have B580 for my Linux machine (6.12), and xe seems pretty stable/performant so far.
Hell, the B580 is CPU bottlenecked on everything that isn't a 7800x3d or 9800x3d which is insane for a low-midrange GPU.
Intel Arc gpus also support hardware video encoding for the AV1 codec which even the just released Nvidia 50 series still doesn't support.
On the other hand, I have to keep the driver for the secondary GPU (also intel) blacklisted because last time I tried to use it it was constantly drawing power.
Whoops - included the wrong link! https://www.phoronix.com/review/intel-arc-b580-graphics-linu...
AFAIK it used to be possible to get some SR-IOV working on the previous Alchemists (with some flashing), but Battlemage seems like a proof of Intel abandoning the virtualization/GPU splitting in the consumer space altogether.
If I had more time I'd roll my own site with basic HTML/CSS. It's not even hard, just time consuming.
But yes, platforms usually apply compression in terrible ways, and it's especially noticeable coming from text and straight line stuff like graphs
(There’s no name decided yet for the fourth in the series.)
> [...] first generation, based on the Xe HPG microarchitecture, codenamed Alchemist (formerly known as DG2). Intel also revealed the code names of future generations under the Arc brand: Battlemage, Celestial and Druid.
https://www.intel.com/content/www/us/en/newsroom/news/introd...
> According to Intel, the brand is named after the concept of story arcs found in video games. Each generation of Arc is named after character classes sorted by each letter of the Latin alphabet in ascending order. (https://en.wikipedia.org/wiki/Intel_Arc)
It's fun for the developers and the end-users alike... So no, it's not limited to GPU enthusiasts at all. Everyone likes codenames :)
Except butt-headed astronomers
We were actually told to change our internal names for our servers after someone named an AWS instance “stupid” and I rolled my eyes so hard, one dev ruined the fun for everyone.
So sure, pour one out to whoever's funeral is on the grocery store tabloids that week with your codenames.
As I see it AMD get _potentially_ squeezed between intel and nvidia. Nvidia's majority marketshare seems pretty secure for the foreseeable future, intel undercutting AMD plus their connections to prebuilt system manufacturers would likely grab them a few more nibbles into AMD territory. If intel release a competent B770 versus AMD products priced a few hundred dollars more, even if Arc isn't as mature I'm not sure they have solid answers for why someone should buy Radeon.
In my view AMD's issue is that they don't have any vision for what their GPUs can offer besides a slightly better version of the previous generation, it appears back in 2018 that the RTX offering must have blindsided them, and years later they're not giving us any alternative vision for what comes next for graphics to make Radeon desirable besides catching up to nvidia (who I imagine will have something new to move the goalposts if anyone gets close), and this is an AMD that is currently well resourced from Zen
Also, a ton of the work for Intel is in drivers which are (as the A770 showed) very improvable after launch. Based on the hardware, it seems very possible that B580 could get an extra 10% (especially in 1080p) which would bring it clear above the 4060ti in perf.
B580 only has 19.6B transistors while the RTX 4070 has 35.8B transistors. So the RTX 4070 has nearly double (1.82x) the transistors of B580.
The RTX 4060 ti has 22.9B and the RTX 4060 has 18.9B transistors
As a customer do intel pay for mm2 or for transistors?
Forgive me if you are not the right person for these questions.
It could also be in addition that the # of transistors that each company provides is different as they may count them differently (but I'm not convinced of this).
Customers pay by the wafer, so mm^2; though tr cost is a function of that so :3 .
That’s really not true though. It’s closer to 4060 Ti and somewhat ahead/behind depending on specific game.
Performance on GPU has always been about Drivers. Chip and Cheese is only here to show the uArch behind it. This isn't even new as we should have learned all about it during Voodoo 3Dfx era. And 9 years have passed since an ( now retired ) Intel Engineers said that they would be completing against Nvidia by 2020 if not 2021. We are now in 2025 and they are not even close. But somehow Raja Koduri was suppose to save them and now gone.
Not that Intel engineers are bad - on the contrary. But as you pointed out, they've been promising they'd be further than they are now for over 5 years now, and even 10+ years ago when I was working in HPC systems, they kept promising things you should build your systems on that would be "in the next gen" that were not, in fact, there.
It seems much like the Bioware Problem(tm) where Bioware got very comfortable promising the moon in 12 months and assuming 6 months of crunch would Magically produce a good outcome, and then discovered that Results May Vary.