The original post: /r/datacenter by /u/Status-Hearing-4084 on 2025-02-21 19:35:46.

Hey r/LocalLLaMA

I wanted to share our results running the full-size deepseek-ai/DeepSeek-R1-Distill-Llama-70B on consumer hardware. This distilled model maintains the strong performance of the original DeepSeek-LLM-70B while being optimized for inference.

https://x.com/tensorblock_aoi/status/1893021600548827305

TL;DR: Got Deepseek 70B running on repurposed crypto mining rigs (RTX 3080s), matching A100 performance at 1/3 the cost.

Successfully tested running full-size Deepseek 70B model on three 8x RTX 3080 rigs, achieving 25 tokens/s through 3-way pipeline and 8-way tensor parallelism optimization.

Each rig is equipped with 8x 10GB consumer GPUs (typical crypto mining rig configuration), implementing full tensor parallelism via PCIe interconnect, delivering combined performance equivalent to three A100 80Gs at just ~$18k versus ~$54k for datacenter hardware.

Our next phase focuses on optimizing throughput via 2-way pipeline and 16-way tensor parallelism architecture, exploring integration with AMD 7900 XTX’s 24GB VRAM capacity.

This implementation validates the feasibility of repurposing consumer GPU clusters for distributed AI inference at datacenter scale.

https://reddit.com/link/1iuzj74/video/wkmexog4pjke1/player

Edit: Thanks for all the interest! Working on documentation and will share more implementation details soon. Yes, planning to open source once properly tested.

What’s your take on the most cost-effective consumer GPU setup that can match datacenter performance (A100/H100) for LLM inference? Especially interested in performance/$ comparisons.