Next.js 13 + React Server Components Demo

NHacker Next

new
past
show
ask
show
jobs
submit

▲Wave Network: An Ultra-Small Language Model (arxiv.org)

25 points by PaulHoule 13 days ago | 5 comments

▲Wave Network: An Ultra-Small Language Model (arxiv.org)

25 points by PaulHoule 13 days ago | 5 comments

Loading comments...

starlite-5008 12 days ago [-]

[dead]

jerpint 13 days ago [-]

> In summary, we used a 2.4-million-parameter small language model to achieve accuracy comparable to a 100-million-parameter BERT model in text classification.

Neat, but the question will be how the scaling laws hold up

PaulHoule 13 days ago [-]

Doesn't have to.

I use models like the 100M parameter BERT model for text classification and they work great. I get a 0.78 AUC with one model; Tik Tok gets about 0.82 for a similar problem and I'm sure they spent at least 500x what I spent on mine. I could 10x my parameters and get an 0.79 AUC but I don't know if I'd feel the difference. (I got about 0.71 AUC with bag of words + logistic regression and perceive a big difference between the output of the SBERT model and that)

My current model can do a complete training cycle which involves training about 20 models and picking the best in about 3 minutes. The process is highly reliable and can run unattended every day, I could run it every hour if I wanted. I worked on another classifier based on fine-tuning a larger model and it took about 30 minutes to train just one model and was not reliable at all.

If you can 50x the speed the BERT model and 1/50 the resources that's a big boon that makes text classification more accessible, the only excuse people have now is that it is too hard to make a training set.

jerpint 13 days ago [-]

Somewhat agreed for use cases of text classification, but for anything requiring more language understanding it is a desirable property

froonly 13 days ago [-]

is there a github for this?