Writing Custom Health Checks (PowerShell)

This article explains the check contract, how thresholds work, and gives ready-to-use sample scripts. For the surrounding workflow, see Using PC Health in K12Panel. Custom checks are authored by users with the Architect role.

The check contract

Every check script obeys one simple rule:

Compute one number and output it — or throw if you can't measure it.

That's the whole contract. You don't format anything, assemble JSON, or manage timeouts — K12Panel wraps your script in a harness that captures your number, attributes it to the right check, enforces a timeout, and reports it. You just return the value.

Output exactly one number. The harness reads your script's final output as the measured value (a whole number or a decimal). Keep the script quiet otherwise.
Throw when you genuinely can't measure. If the thing you're measuring is missing or errors, throw — K12Panel records the check as Unknown rather than inventing a value. Unknown is a real state (it means "no reliable reading"), not a failure.
Read, don't change. A check should only measure. Never modify the device from a check — that's what a remediation is for.
Be quick. Checks run on every device on a schedule; keep them lightweight and let the timeout (which you set) bound them.

Whether a number is "good" or "bad" is not decided in the check — it's decided by the bands in a policy. This keeps a single check reusable across strict and relaxed policies.

Thresholds: bands, severity, and hysteresis

In a policy, each enabled check has bands — an ordered list of rules that turn the measured number into a status. Bands are JSON, evaluated top to bottom; the first rule that matches wins, so put the most severe rule first.

[
  {"op": "<", "value": 5,  "severity": "critical", "clear_value": 8},
  {"op": "<", "value": 15, "severity": "warning",  "clear_value": 18}
]

Read that as: "below 5% → Critical; otherwise below 15% → Warning; otherwise Healthy."

op — the comparison: <, <=, >, >=, or ==.
value — the threshold to compare the measurement against.
severity — warning or critical.
clear_value (optional) — hysteresis. Once an issue is raised, the value must cross back past clear_value (not just back over value) before it clears. This stops a metric hovering on the line from flapping between alert and clear. In the example, a disk that drops under 15% must climb back above 18% to clear the warning.

When you author a check you can also set default bands, so anyone binding the check to a policy starts from sensible thresholds.

Direction, units, and display

A few properties control how a check reads in the UI:

Value unit — a label like %, days, or count, shown next to the number.
Direction — whether higher or lower is healthier (e.g., disk-free is "higher is better"; uptime-days is "lower is better"). This drives gauge coloring.
Display mode — Gauge, Stoplight, or Do not display, plus a gauge min/max range. This controls how (and whether) the check appears on the device Health tab.

Built-in checks

K12Panel ships a catalog of ready-to-use checks. Built-ins are locked (you can't edit their scripts), but you bind them to policies and set their thresholds exactly like your own. All are Windows-only and ride the agent's normal check-in.

Storage & disk

Disk free (system drive) — percent free on the system drive (lower is worse).
Data drive free space — lowest free % across internal data drives (D:, E:, …); reads 100 on single-drive machines so they stay quiet.
Predictive disk health (SMART) — 1 if every physical disk is healthy, 0 if any reports Warning/Unhealthy. Early warning of a failing drive.

Windows servicing

Uptime since reboot — days since the last boot (higher is worse — "state rot").
Pending-reboot age — days a reboot has been pending (0 = none).
Windows Update age — days since the most recent update installed.

Security & compliance

Windows Firewall on (all profiles) — 1 if Domain, Private, and Public are all enabled.
System drive BitLocker on — 1 if the system drive is encrypted (Unknown on Windows Home, which has no BitLocker).

Hardware & time

Battery health (capacity vs design) — full-charge capacity as a % of the original design capacity (100 = new; reads 100 on desktops with no battery). A replacement-planning signal.
System clock drift — absolute clock offset in seconds vs an external time source. Works on non-domain and Entra-ID-joined devices; large drift breaks HTTPS/cert validation, Google/Microsoft sign-in, and MFA.
Print Spooler running — 1 if the Print Spooler service is running.

Delivery

Agent offline (server-derived) — hours since a device last checked in. K12Panel computes this itself, so it still works when the agent is dead.

Most are notify-only, but several pair with a built-in remediation (see Writing Custom Remediations) — for example Windows Update age with Reset Windows Update cache, Print Spooler running with Restart Print Spooler, System clock drift with Resync system time, and Windows Firewall on with Enable Windows Firewall.

Check templates

Some checks are the same measurement pointed at a different target — "is this service running," "is this host reachable." Rather than copy-paste a script per target, K12Panel provides templates: a check with a placeholder you fill in to stamp out a concrete, editable check.

The built-in Service running template takes a service name and produces an "is that service running" check (1/0). Go to PC Health → Checks, find it in the Templates section, click Use template, enter the service (for example wuauserv or W32Time) and a name, and optionally tick the box to also stamp the matching Restart service remediation in the same step. The result is an ordinary custom check you can edit, band, and bind like any other. Using a template requires the Architect role.

Authoring a custom check

Go to PC Health → Checks → New Custom Check and fill in:

Name and Slug — the slug auto-generates from the name; it's the stable identifier.
Value unit, Direction, Display mode, gauge min/max.
Timeout — how long the measurement may take before it's reported as Unknown.
PowerShell — your measurement, in the built-in editor.
Default bands — optional starting thresholds (JSON).

Then Test on a device to run it once on a machine of your choice; the raw output (your number, or an error) appears in that device's command history so you can confirm it behaves.

Editing note: changing a check's script or bands bumps its version and clears its live state across devices, so stale readings can't linger. The check re-measures on the next check-in.

Sample checks

Each of these outputs a single number. Suggested bands are included — adjust to taste.

Days since last reboot (unit: days, lower is better)

((Get-Date) - (Get-CimInstance Win32_OperatingSystem).LastBootUpTime).TotalDays

Suggested bands: [{"op":">","value":30,"severity":"critical"},{"op":">","value":14,"severity":"warning"}]

Free space on the system drive (unit: %, higher is better)

$d = Get-PSDrive -Name $env:SystemDrive.TrimEnd(':')
[math]::Round($d.Free / ($d.Free + $d.Used) * 100, 2)

Suggested bands: [{"op":"<","value":5,"severity":"critical","clear_value":8},{"op":"<","value":15,"severity":"warning","clear_value":18}]

Free physical memory (unit: %, higher is better)

$os = Get-CimInstance Win32_OperatingSystem
[math]::Round($os.FreePhysicalMemory / $os.TotalVisibleMemorySize * 100, 1)

Suggested bands: [{"op":"<","value":5,"severity":"warning"}]

Is a critical service running? (unit: count, 1 = running, higher is better)

if ((Get-Service -Name 'Spooler' -ErrorAction Stop).Status -eq 'Running') { 1 } else { 0 }

Suggested bands: [{"op":"<","value":1,"severity":"critical"}] — a 0 means the service is stopped. (Pair this one with a Restart Service remediation; see Writing Custom Remediations.) You usually don't need to write this by hand — the built-in Service running template stamps exactly this check for any service you name; see Check templates above.

Is a reboot pending? (unit: count, 0 = no, lower is better)

$pending = (Test-Path 'HKLM:\SOFTWARE\Microsoft\Windows\CurrentVersion\Component Based Servicing\RebootPending') -or
           (Test-Path 'HKLM:\SOFTWARE\Microsoft\Windows\CurrentVersion\WindowsUpdate\Auto Update\RebootRequired')
if ($pending) { 1 } else { 0 }

Suggested bands: [{"op":">","value":0,"severity":"warning"}]

Count of stopped automatic-start services (unit: count, lower is better)

(Get-CimInstance Win32_Service -Filter "StartMode='Auto' AND State!='Running'").Count

Suggested bands: [{"op":">","value":0,"severity":"warning"}]

Days since last successful backup (unit: days, lower is better — demonstrates throw)

$marker = 'C:\Backups\last_success.txt'
if (-not (Test-Path $marker)) { throw 'backup success marker not found' }
((Get-Date) - (Get-Item $marker).LastWriteTime).TotalDays

Suggested bands: [{"op":">","value":2,"severity":"critical"},{"op":">","value":1,"severity":"warning"}] If the marker is missing, the script throws and the check reads Unknown — which is itself a signal worth surfacing.

Best practices

Return one number, nothing else. If you must compute intermediate values, don't print them — only the final number should reach output.
Throw instead of guessing. If you can't get a trustworthy reading, throw. A wrong number is worse than an honest Unknown.
Keep it read-only and fast. Checks run fleet-wide on a schedule. Avoid slow queries, network calls, and anything that changes the device.
Set a realistic timeout. Long enough for a healthy device, short enough that a hung measurement is reported as Unknown promptly.
Test on a device before binding it broadly. Use Test on a device and confirm the raw output is the single number you expect.

Frequently Asked Questions

My check shows "Unknown" — what does that mean? It means K12Panel didn't get a reliable number — usually because your script threw, timed out, or produced something that wasn't a single number. Unknown is intentional; it's the honest state for "couldn't measure." Test the script on a device to see its raw output.

Can a check return a decimal, or only whole numbers? Either. 12, 12.5, and 0 are all valid. Percentages and "days since" values are commonly decimals.

How do I make a yes/no check? Return 1 for one state and 0 for the other, then band on it — for example [{"op":"<","value":1,"severity":"critical"}] treats 0 as critical. The service-running and reboot-pending samples above use this pattern.

Where do thresholds live — in the check or the policy? Thresholds (bands) live in the policy, per check. The check only measures. You can set default bands on the check so policies start from a sensible baseline, but each policy can override them — which is how the same check can be strict for servers and relaxed for laptops.

What is clear_value for? Hysteresis. Without it, a value sitting right on a threshold flaps between alert and clear. clear_value requires the metric to recover past a second, safer point before the issue clears.

Why did my check's history disappear after I edited it? Editing a check's script or bands clears its live state on purpose, so old readings taken under different logic don't linger. It re-measures on the next check-in.

Can a check fix a problem it finds? No — checks only measure. To act on what a check finds, wire a remediation to it in the policy editor. See Writing Custom Remediations.