Skip to main content
  1. Projects/

llm-top

Table of Contents

llm-top is a top-style terminal dashboard for monitoring LLM inference workloads running on NVIDIA DGX Spark (GB10). Get real-time visibility into GPU utilization, memory, processes, containers, and model health — all in one live-updating view.

Overview
#

If you’re running vLLM, SGLang, NIM, or other LLM inference servers on a DGX Spark, llm-top gives you an at-a-glance view of:

  • GPU stats: SM utilization, memory bandwidth, temperature, power, clock, and memory usage
  • Host stats: CPU, RAM, core count
  • GPU processes: PID, name, memory usage, and type (compute/graphics)
  • Containers: CPU%, memory, network I/O, block I/O, PID counts
  • Model servers: Port, health, request counts, KV cache usage, RPS, token throughput

Tech Stack
#

  • Language: Python
  • Target: NVIDIA DGX Spark (GB10)

Links#