Ticker

6/recent/ticker-posts

Ad Code

Responsive Advertisement

Tsinghua University’s KTransformers enables full-powered DeepSeek-R1 with low-cost graphics card

KTransformers breaks the limitation of AI large models relying on expensive cloud servers.

The KVCache.AI team from Tsinghua University, in partnership with APPROACHING.AI, announced a major update to the KTransformers open-source project last week, local media outlet National Business Daily reported on Saturday. Now, with a 24GB VRAM 4090D (NVIDIA GPU), users can run the full-powered DeepSeek-R1 and V3 671B version locally. Pre-processing speeds can reach up to 286 tokens per second, while inference generation speeds peak at 14 tokens per second.

Why it matters: Currently, users access DeepSeek-R1 mainly through cloud services or local deployment, but the official servers often suffer from downtime, and personal deployments usually involve a distilled version with 90% fewer parameters. Running the full version of DeepSeek-R1 on standard hardware is a major challenge for most users. Even developers find the cost of renting servers to be a heavy burden. The KTransformers open-source project offers an affordable solution to this issue.

Details: KTransformers breaks the limitation of AI large models relying on expensive cloud servers, according to the National Business Daily report.

  • A user analyzed the solution’s costs and found that running the DeepSeek R1 model locally could be done for under RMB 70,000 ($9,650) — over 95% cheaper than using NVIDIA A100/H100 servers, which can cost up to RMB 2 million ($280,000).
  • KTransformers optimizes the deployment of large language models (LLMs) on local machines to overcome resource limitations. The framework leverages innovative techniques, including heterogeneous computing, advanced quantization, and sparse attention mechanisms, to enhance computational efficiency while enabling the processing of long-context sequences.
  • However, the inference speed of KTransformers cannot compare with the cost of high-end servers, and it can only serve a single user at a time, whereas servers can simultaneously meet the demands of dozens of users, the report noted.
  • Currently, the overall solution also relies on Intel’s AMX instruction set, and CPUs from other brands are not yet capable of performing these operations. Additionally, this solution is primarily designed for DeepSeek’s MOE model; applying it to other mainstream models may not be optimal in terms of performance.
  • To use the KTransformers setup, Chinese media outlet IThome listed the following prerequisites: an Intel Xeon Gold 6454S CPU with 1TB DRAM (2 NUMA nodes), an RTX 4090D GPU with 24GB VRAM, 1TB of standard DDR5-4800 server memory, and CUDA version 12.1 or higher.

Context: On Jan. 20, the release of DeepSeek-R1 created headlines around the world and led many to suggest that the AI industry had entered a new phase where competition is more global, open-source models thrive, and cost efficiency is becoming a major factor in the development and deployment of AI systems.The published API (Application

  • Programming Interface) pricing for DeepSeek-R1 is as follows: RMB 1 ($0.14) per million input tokens (cache hit), RMB 4 ($0.55) per million input tokens (cache miss), and RMB 16 ($2.21) per million output tokens. This is roughly 1/30th of the operational cost of OpenAI’s GPT-4.

Enregistrer un commentaire

0 Commentaires