Reasoning models are a new class of large language models (LLMs) designed to tackle highly complex tasks by employing chain-of-thought (CoT) reasoning with the tradeoff of taking longer to respond. The DeepSeek R1 is a recently released frontier “reasoning” model which has been distilled into highly...
Because I can run the downloaded model locally anyway. With distilling, I want to train another model by, well distilling an existing model.
But looking at the demo video, this happens way too fast to be actually what I think it should do.
I’m not sure I understand the question. The demo video is showing the model answering questions, e.g. a demonstration of how fast it can generate tokens. The distilling part is to make the “faster” happen. As I understand it, this works by trimming out content from the base model.
Distillation happens separately from question answering - it’s like rebuilding a car engine for better performance, you’re not driving at 100km/h while you’re rebuilding the engine.
It seems like you might already know that though from your other posts so I fear I’m misunderstanding you. Could you explain what you mean by
the distilling itself should take longer, not just like being able to immediately answer questions faster.
? Distilling doesn’t just make it faster at generating tokens (though it definitely should, running on the same hardware, because the model is smaller and has less data to sift through in writing it’s answer). I guess my last answer was a bit narrow - in addition to making it faster, the main reason for distillation is to improve performance in a specific task. As an example, a distilled model might produce more detailed answers with fewer hallucinations than the undistilled version of the same model.
I was mostly interested in the distilling part, while in the video, they pressed on a button and directly afterwards talked with an LLM
I’m really not an expert, but distilling is usually a time consuming task to get “knowledge” from a lager network to a smaller one, so we can have kinda the same results, without the bulk.
But in the video I just don’t see that happening, when it is a “how to distill” video
To be honest, I’m really naive here and maybe I’m wrong, but that just isn’t how I understood distilling
Oh, right, there’s the issue. It’s not a “how to distill” video. That video has the description of what’s going on under the video player: “Demo showcasing DeepSeek R1 Qwen 1.5 Q4 K M model running on an AMD Ryzen™ HX 370 series processor in real time”
The team releasing this already did the distillation for us; what follows the video are instructions on how to run these new distilled models on your AMD system, not how to distill the models yourself.
I’m not sure if I get the distilling part right.
Because I can run the downloaded model locally anyway. With distilling, I want to train another model by, well distilling an existing model.
But looking at the demo video, this happens way too fast to be actually what I think it should do.
Am I wrong or is this presented wrong?
I’m not sure I understand the question. The demo video is showing the model answering questions, e.g. a demonstration of how fast it can generate tokens. The distilling part is to make the “faster” happen. As I understand it, this works by trimming out content from the base model.
The thing is, that the distilling itself should take longer, not just like being immediately able to answer questions faster.
At least, if I’ve understood that term correctly
Distillation happens separately from question answering - it’s like rebuilding a car engine for better performance, you’re not driving at 100km/h while you’re rebuilding the engine.
It seems like you might already know that though from your other posts so I fear I’m misunderstanding you. Could you explain what you mean by
? Distilling doesn’t just make it faster at generating tokens (though it definitely should, running on the same hardware, because the model is smaller and has less data to sift through in writing it’s answer). I guess my last answer was a bit narrow - in addition to making it faster, the main reason for distillation is to improve performance in a specific task. As an example, a distilled model might produce more detailed answers with fewer hallucinations than the undistilled version of the same model.
I was mostly interested in the distilling part, while in the video, they pressed on a button and directly afterwards talked with an LLM
I’m really not an expert, but distilling is usually a time consuming task to get “knowledge” from a lager network to a smaller one, so we can have kinda the same results, without the bulk.
But in the video I just don’t see that happening, when it is a “how to distill” video
To be honest, I’m really naive here and maybe I’m wrong, but that just isn’t how I understood distilling
Oh, right, there’s the issue. It’s not a “how to distill” video. That video has the description of what’s going on under the video player: “Demo showcasing DeepSeek R1 Qwen 1.5 Q4 K M model running on an AMD Ryzen™ HX 370 series processor in real time”
The team releasing this already did the distillation for us; what follows the video are instructions on how to run these new distilled models on your AMD system, not how to distill the models yourself.
Ok, well, that’s quite anticlimactic…
Ok, maybe the performance of running models locally is still nice on their chips
Thanks for clarifying, their title was bringing me to other hopes