Skip to content
  • There are no suggestions because the search field is empty.

Writing Custom Health Checks (PowerShell)

A health check is the sensor at the heart of PC Health: a small measurement taken on a Windows device that produces one number. K12Panel ships with built-in checks, but the real power is that you can measure anything a short PowerShell command can produce — a service's state, free memory, certificate expiry, backup age — and treat it exactly like a built-in.

This article explains the check contract, how thresholds work, and gives ready-to-use sample scripts. For the surrounding workflow, see Using PC Health in K12Panel. Custom checks are authored by users with the Architect role.


The check contract

Every check script obeys one simple rule:

Compute one number and output it — or throw if you can't measure it.

That's the whole contract. You don't format anything, assemble JSON, or manage timeouts — K12Panel wraps your script in a harness that captures your number, attributes it to the right check, enforces a timeout, and reports it. You just return the value.

  • Output exactly one number. The harness reads your script's final output as the measured value (a whole number or a decimal). Keep the script quiet otherwise.
  • Throw when you genuinely can't measure. If the thing you're measuring is missing or errors, throw — K12Panel records the check as Unknown rather than inventing a value. Unknown is a real state (it means "no reliable reading"), not a failure.
  • Read, don't change. A check should only measure. Never modify the device from a check — that's what a remediation is for.
  • Be quick. Checks run on every device on a schedule; keep them lightweight and let the timeout (which you set) bound them.

Whether a number is "good" or "bad" is not decided in the check — it's decided by the bands in a policy. This keeps a single check reusable across strict and relaxed policies.


Thresholds: bands, severity, and hysteresis

In a policy, each enabled check has bands — an ordered list of rules that turn the measured number into a status. Bands are JSON, evaluated top to bottom; the first rule that matches wins, so put the most severe rule first.

[   {"op": "<", "value": 5,  "severity": "critical", "clear_value": 8},   {"op": "<", "value": 15, "severity": "warning",  "clear_value": 18} ] 

Read that as: "below 5% → Critical; otherwise below 15% → Warning; otherwise Healthy."

  • op — the comparison: <, <=, >, >=, or ==.
  • value — the threshold to compare the measurement against.
  • severitywarning or critical.
  • clear_value (optional) — hysteresis. Once an issue is raised, the value must cross back past clear_value (not just back over value) before it clears. This stops a metric hovering on the line from flapping between alert and clear. In the example, a disk that drops under 15% must climb back above 18% to clear the warning.

When you author a check you can also set default bands, so anyone binding the check to a policy starts from sensible thresholds.


Direction, units, and display

A few properties control how a check reads in the UI:

  • Value unit — a label like %, days, or count, shown next to the number.
  • Direction — whether higher or lower is healthier (e.g., disk-free is "higher is better"; uptime-days is "lower is better"). This drives gauge coloring.
  • Display modeGauge, Stoplight, or Do not display, plus a gauge min/max range. This controls how (and whether) the check appears on the device Health tab.

Built-in checks

Out of the box you get checks such as Disk free (system drive), Uptime since reboot, and Pending-reboot age, plus a server-derived Agent offline check. Built-ins are locked (you can't edit them), but you bind them to policies and set their thresholds just like your own.


Authoring a custom check

Go to PC Health → Checks → New Custom Check and fill in:

  • Name and Slug — the slug auto-generates from the name; it's the stable identifier.
  • Value unit, Direction, Display mode, gauge min/max.
  • Timeout — how long the measurement may take before it's reported as Unknown.
  • PowerShell — your measurement, in the built-in editor.
  • Default bands — optional starting thresholds (JSON).

Then Test on a device to run it once on a machine of your choice; the raw output (your number, or an error) appears in that device's command history so you can confirm it behaves.

Editing note: changing a check's script or bands bumps its version and clears its live state across devices, so stale readings can't linger. The check re-measures on the next check-in.


Sample checks

Each of these outputs a single number. Suggested bands are included — adjust to taste.

Days since last reboot (unit: days, lower is better)

((Get-Date) - (Get-CimInstance Win32_OperatingSystem).LastBootUpTime).TotalDays 

Suggested bands: [{"op":">","value":30,"severity":"critical"},{"op":">","value":14,"severity":"warning"}]

Free space on the system drive (unit: %, higher is better)

$d = Get-PSDrive -Name $env:SystemDrive.TrimEnd(':') [math]::Round($d.Free / ($d.Free + $d.Used) * 100, 2) 

Suggested bands: [{"op":"<","value":5,"severity":"critical","clear_value":8},{"op":"<","value":15,"severity":"warning","clear_value":18}]

Free physical memory (unit: %, higher is better)

$os = Get-CimInstance Win32_OperatingSystem [math]::Round($os.FreePhysicalMemory / $os.TotalVisibleMemorySize * 100, 1) 

Suggested bands: [{"op":"<","value":5,"severity":"warning"}]

Is a critical service running? (unit: count, 1 = running, higher is better)

if ((Get-Service -Name 'Spooler' -ErrorAction Stop).Status -eq 'Running') { 1 } else { 0 } 

Suggested bands: [{"op":"<","value":1,"severity":"critical"}] — a 0 means the service is stopped. (Pair this one with a Restart Service remediation; see Writing Custom Remediations.)

Is a reboot pending? (unit: count, 0 = no, lower is better)

$pending = (Test-Path 'HKLM:\SOFTWARE\Microsoft\Windows\CurrentVersion\Component Based Servicing\RebootPending') -or            (Test-Path 'HKLM:\SOFTWARE\Microsoft\Windows\CurrentVersion\WindowsUpdate\Auto Update\RebootRequired') if ($pending) { 1 } else { 0 } 

Suggested bands: [{"op":">","value":0,"severity":"warning"}]

Count of stopped automatic-start services (unit: count, lower is better)

(Get-CimInstance Win32_Service -Filter "StartMode='Auto' AND State!='Running'").Count 

Suggested bands: [{"op":">","value":0,"severity":"warning"}]

Days since last successful backup (unit: days, lower is better — demonstrates throw)

$marker = 'C:\Backups\last_success.txt' if (-not (Test-Path $marker)) { throw 'backup success marker not found' } ((Get-Date) - (Get-Item $marker).LastWriteTime).TotalDays 

Suggested bands: [{"op":">","value":2,"severity":"critical"},{"op":">","value":1,"severity":"warning"}] If the marker is missing, the script throws and the check reads Unknown — which is itself a signal worth surfacing.


Best practices

  • Return one number, nothing else. If you must compute intermediate values, don't print them — only the final number should reach output.
  • Throw instead of guessing. If you can't get a trustworthy reading, throw. A wrong number is worse than an honest Unknown.
  • Keep it read-only and fast. Checks run fleet-wide on a schedule. Avoid slow queries, network calls, and anything that changes the device.
  • Set a realistic timeout. Long enough for a healthy device, short enough that a hung measurement is reported as Unknown promptly.
  • Test on a device before binding it broadly. Use Test on a device and confirm the raw output is the single number you expect.

Frequently Asked Questions

My check shows "Unknown" — what does that mean? It means K12Panel didn't get a reliable number — usually because your script threw, timed out, or produced something that wasn't a single number. Unknown is intentional; it's the honest state for "couldn't measure." Test the script on a device to see its raw output.

Can a check return a decimal, or only whole numbers? Either. 12, 12.5, and 0 are all valid. Percentages and "days since" values are commonly decimals.

How do I make a yes/no check? Return 1 for one state and 0 for the other, then band on it — for example [{"op":"<","value":1,"severity":"critical"}] treats 0 as critical. The service-running and reboot-pending samples above use this pattern.

Where do thresholds live — in the check or the policy? Thresholds (bands) live in the policy, per check. The check only measures. You can set default bands on the check so policies start from a sensible baseline, but each policy can override them — which is how the same check can be strict for servers and relaxed for laptops.

What is clear_value for? Hysteresis. Without it, a value sitting right on a threshold flaps between alert and clear. clear_value requires the metric to recover past a second, safer point before the issue clears.

Why did my check's history disappear after I edited it? Editing a check's script or bands clears its live state on purpose, so old readings taken under different logic don't linger. It re-measures on the next check-in.

Can a check fix a problem it finds? No — checks only measure. To act on what a check finds, wire a remediation to it in the policy editor. See Writing Custom Remediations.