У нас вы можете посмотреть бесплатно How Fast is DeepSeek R1 on YOUR PC? (Local Speed Test) или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
DeepSeek R1 has changed the game for open-source reasoning models, but how does the Distill Qwen version actually perform when you cut the cord and run it 100% offline? In this video, we use a custom Python script to benchmark the inference speed of DeepSeek R1 Distill Qwen on local hardware.
We’re skipping the web UI and going straight to the source. Using a Python script (leveraging Hugging Face transformers and torch), we measure exactly how many Tokens Per Second (TPS) these models can output. We test everything from the lightning-fast 8B model to the more robust 14B and 32B versions to see where the "sweet spot" is for consumer-grade CPUs and GPUs.
Key Highlights:
✅ DeepSeek R1 Distill Qwen Overview: Why these models are unique.
✅ Offline Setup: Running the model without an internet connection or API keys.
✅ The Python Script: A walkthrough of the code used to calculate inference speed and latency.
✅ Hardware Stress Test: How [Your CPU/GPU] handles the reasoning "thinking" phase vs. the final output.
✅ Optimization Tips: Using 4-bit and 8-bit quantization to boost your TPS.
Which version of DeepSeek R1 are you running? Let me know your hardware specs and your average Tokens Per Second in the comments! If you want to see more local AI benchmarks, make sure to SUBSCRIBE and hit the like button.
import time
from llama_cpp import Llama
#Model Paths
models = {
"8B (Fastest)": r"C:\Users\kanis\Downloads\models\DeepSeek-R1-Distill-Llama-8B-Q4_K_M.gguf",
"14B (Balanced)": r"C:\Users\kanis\Downloads\models\DeepSeek-R1-Distill-Qwen-14B-Q4_K_M.gguf",
"32B (Expert 0)": r"C:\Users\kanis\Downloads\models\DeepSeek-R1-Distill-Qwen-32B-Q2_K.gguf",
"32B (Expert 1)": r"C:\Users\kanis\Downloads\models\DeepSeek-R1-Distill-Qwen-32B-Q4_K_M.gguf"
}
test_prompt = "Explain Fourier Transform in two sentences."
def run_test(name, path):
print(f"
--- Testing {name} ---")
try:
n_gpu_layers=-1 sends everything to GPU. Use 0 for CPU only.
llm = Llama(model_path=path, n_gpu_layers=24, verbose=False)
start = time.time()
output = llm(test_prompt, max_tokens=128)
end = time.time()
tokens = output['usage']['completion_tokens']
tps = tokens / (end - start)
print(f"Result: {output['choices'][0]['text'].strip()[:128]}...")
print(f"Speed: {tps:.2f} tokens/sec")
except Exception as e:
print(f"Error loading {name}: {e}")
for name, path in models.items():
run_test(name, path)
#DeepSeekR1 #DeepSeek #Qwen #LocalLLM #PythonCoding #AI #MachineLearning #TechBenchmark #OfflineAI #LLM