Recommended hardware for consuming local LLM:

1. ~$1000 Laptop/desktop with discrete GPU and 8GB VRAM, 

2. $1000+ Apple silicon M4 M3 M2 M1 line of laptops/desktop with minimum 16GB RAM.

3. $1000- Qualcomm Snapdragon X windows devices with minimum 16GB RAM

Mainstream Copilot PC branded devices with minimum 16GB of RAM and iGPU (AMD Ryzen 8000, AI 300, intel core ultra 2) can be used for inferencing original LLM models up to 8GB in size. They will provide 6+ tokens/s, which is a lower end for the real-time usability. 

However, if RAG (Retrieval Augmented Generation) approach is used to include your proprietary data with the LLM then a discrete GPU, device with Snapdragon X CPU or Mac M series devices are preferred.

Recommended software: LM Studio, Anything LLM that can be set into server mode and consumed from web browser within local network without any exposure of your proprietary data outside your conform zone. 

Note that currently NPU units are not in use for model inference (except in Anything LLM for specifically prepared LLM for Snapdragon X Elite laptops)

Please contact us for consulting about everything LLM - implementation and usage scenarios! 

We are specialized in custom client data through RAG, connecting and consuming models locally.

We will review your requirements and suggest proper scenario - RAG approach will satisfy most small businesses, and can be applied within day, while model fine-tuning will be considered for larger entities and specific use cases.