Llama-3.2-3B-Instruct (GGUF / Apple Silicon)

Process

by mm-team · 5/18/2026

Llama 3.2 3B Instruct Q4_K_M via llama-server on Apple Silicon (M1/M2/M3). Metal GPU acceleration. OpenAI-compatible API on port 8080.

↓ 9 downloads ♥ 3 likes ⚡ 6 boots macos

#apple-silicon #gguf #llama #llama.cpp

config.json

{
  "server_type": "process",
  "hardware_type": "macos",
  "script": "scripts/process/macos/setup.sh",
  "env": {
    "MODEL_PATH": "models/llama-3.2-3b-instruct-q4_k_m.gguf",
    "HOST": "127.0.0.1",
    "PORT": "8080",
    "GPU_LAYERS": "99",
    "CTX_SIZE": "4096"
  },
  "health_check": "http://127.0.0.1:8080/health",
  "startup_timeout_secs": 45
}