MagicPIG: LSH Sampling for Efficient LLM Generation

This repo is for exploring the possibility of GPU-CPU system powered by LSH. Three models are supported now: llama3-8b-chat-128k, llama3-70b-chat-128k, mistral-7b-chat-512k.

https://github.com/Infini-AI-Lab/MagicPIG

Comments