llm-top

Table of Contents

llm-top is a top-style terminal dashboard for monitoring LLM inference workloads running on NVIDIA DGX Spark (GB10). Get real-time visibility into GPU utilization, memory, processes, containers, and model health — all in one live-updating view.

Overview
#

If you’re running vLLM, SGLang, NIM, or other LLM inference servers on a DGX Spark, llm-top gives you an at-a-glance view of:

GPU stats: SM utilization, memory bandwidth, temperature, power, clock, and memory usage
Host stats: CPU, RAM, core count
GPU processes: PID, name, memory usage, and type (compute/graphics)
Containers: CPU%, memory, network I/O, block I/O, PID counts
Model servers: Port, health, request counts, KV cache usage, RPS, token throughput

Tech Stack
#

Language: Python
Target: NVIDIA DGX Spark (GB10)

Links
#

GitHub Repository
License: Apache 2.0

Overview#

Tech Stack#

Links#

Overview
#

Tech Stack
#

Links
#