Writing Custom Remediations (Self-Healing)

This article covers the remediation contract, how fixes are verified, and gives ready-to-use sample scripts. For the concepts, see About PC Health and Self-Healing; for the workflow and how to wire a fix to a check, see Using PC Health in K12Panel. Custom remediations are authored by users with the Architect role.

How a remediation is a mirror of a check

It helps to hold both ideas side by side:

A check is a sensor: compute one number, or throw if you can't measure.
A remediation is an actuator: perform one fix, or throw if it failed.

Both are short PowerShell scripts. The server orchestrates them: it decides when to run a fix, runs it, and then re-runs the check to decide whether the fix actually worked.

The remediation contract

Every script remediation obeys this rule:

Do the fix and return normally to signal success — or throw to signal failure. Never call exit.

The details:

Completing without a terminating error = success. If your script finishes normally, K12Panel treats the action as provisionally successful.
throw (or any terminating error) = failure. K12Panel runs your script with $ErrorActionPreference = 'Stop', so most cmdlet errors already become failures you can catch. To signal a definite failure yourself, throw.
Do not use exit. K12Panel wraps your script in a harness that reports the outcome; an exit would end the process before that report is emitted. Use throw for failure and simply return for success.
Be idempotent. A fix may be retried, so running it twice must not make things worse. Deleting temp files twice is harmless — design for that.
Stay within the timeout. You set a timeout; a hung fix is bounded so it can't wedge the device.

You never format a result or manage the success signal yourself — the harness does that. You just perform the action.

Success is decided by the check, not the fix

This is the most important idea in K12Panel self-healing. A remediation returning "success" only means the script ran cleanly — not that the problem is gone. A disk-cleanup script can delete everything it can find and still not cross your free-space threshold.

So after a remediation runs, K12Panel re-runs the failing check and uses that result as the verdict:

Check now passes → the issue resolves (quietly).
Check still fails → recorded as Still failing; K12Panel retries within your limits and, when it runs out, raises the alert for a human.

You'll see this in the Runs log as two columns: Result (the fix's own report) and Verified (the check's verdict). A row reading Success / Still failing is the system correctly refusing to declare victory.

Optional self-verify: your script may confirm its own work (restart a service, then throw if it isn't running) for a faster, stronger signal. It's never required — the check re-run is authoritative — but it makes the Runs log more informative.

Action types

A remediation's action can be:

Run script — your PowerShell, following the contract above. This is what you author.
Run built-in command — a native K12Panel command such as Reboot Device or Restart Agent. No script; K12Panel dispatches the command.
Apply modifier — reserved for a future release.

Disruptive vs. non-disruptive

Flag a remediation disruptive if it interrupts the user (a reboot, an agent restart). Disruptive fixes only run automatically inside the policy's maintenance window; non-disruptive fixes (like clearing temp files) run whenever they're needed. Set this honestly — it's the switch that keeps automation from rebooting a device mid-class.

How a fix runs (the loop)

Once you've wired a remediation to a check in the policy editor (with a mode, tries, and cooldown — see Using PC Health in K12Panel):

A check breaches its threshold and raises an issue.
K12Panel's gate decides whether to act — respecting Suggested vs. Automatic, attempt limits, cooldowns, maintenance windows, mute state, and the circuit breaker.
If it acts, the fix is dispatched to the device and the alert is suppressed while the fix is actively running.
When the fix returns, K12Panel re-runs just that check to verify.
Healed → resolved. Still failing → retry (within limits) or, once exhausted, alert a human.

Because verification re-runs only the relevant check rather than a full device scan, it's lightweight even across a large fleet.

The fleet circuit breaker

If an automatic remediation starts failing across many devices, K12Panel halts it automatically and raises a Remediation halted alert — so one bad script can't run fleet-wide. Halted remediations appear at the top of the Remediations tab; after you fix and test the script, an Architect can Re-enable it.

Built-in remediations

You start with a catalog of ready-to-use fixes. All are Windows-only and delivered on the agent's normal check-in.

Non-disruptive — run whenever they're needed:

Disk Cleanup — clears user and Windows temp folders, the Windows Update download cache, and the recycle bin.
Reset Windows Update cache — stops Windows Update, clears the download cache so stuck or partial updates re-fetch, and restarts it (no reboot). Pairs with Windows Update age.
Restart Print Spooler + clear queue — restarts the spooler and clears stuck print jobs, self-verifying it comes back running. Pairs with Print Spooler running.
Resync system time — ensures the Windows Time service is running and forces a resync against its source. Pairs with System clock drift.
Enable Windows Firewall (all profiles) — turns the firewall back on for Domain, Public, and Private and self-verifies. Pairs with Windows Firewall on.

Disruptive — auto-run only inside the policy's maintenance window:

Reboot Device — reboots the machine.
Restart Agent — restarts the K12Panel agent service.

Remediation templates

Like checks, some fixes are one action pointed at a different target. K12Panel provides templates for these: a remediation with a placeholder you fill in to stamp out a concrete, editable fix.

The built-in Restart a service template takes a service name and produces a "restart that service and verify it came back running" remediation. Find it in the Templates section of PC Health → Remediations, click Use template, and enter the service and a name. It's the natural partner to the Service running check template — when you stamp that check, you can create the matching restart in the same step. Using a template requires the Architect role.

Authoring a custom remediation

Go to PC Health → Remediations → New Custom Remediation and set:

Name and Slug (slug auto-generates).
Action type — usually Run script.
Disruptive — check it only if the fix interrupts the user.
Timeout — how long the fix may take.
PowerShell — your action, in the built-in editor.

Then Test on a device to dispatch it once to a machine of your choice and confirm it behaves before wiring it to a policy. Finally, open the relevant policy, find the check you want to heal, and set the remediation, mode, and limits on that check's row.

Sample remediations

Each follows the contract: return normally for success, throw for failure, never exit. Pair each with a check that measures the same condition so verification is meaningful. A few of these now ship as built-ins (spooler restart, Windows Update cache reset, service restart via the Restart a service template) — reach for those first; the scripts below are here to teach the pattern and to adapt for cases K12Panel doesn't cover.

Restart a service (with self-verify) — pairs with the "is a critical service running?" check.

$service = 'Spooler'          # change to the service this remediation restarts
Restart-Service -Name $service -Force -ErrorAction Stop
Start-Sleep -Seconds 3
if ((Get-Service -Name $service).Status -ne 'Running') {
    throw "$service is not running after restart"
}
# returns normally -> success; the service check re-verifies

Clear an application cache (best-effort, non-disruptive) — pairs with a disk-free or folder-size check.

$ErrorActionPreference = 'SilentlyContinue'
Remove-Item -Path 'C:\ProgramData\YourApp\Cache\*' -Recurse -Force
# ignores its own errors, so it always completes; the disk/folder check decides if it helped

Restart the print spooler and clear the stuck queue — pairs with a spooler-running check.

Stop-Service -Name Spooler -Force -ErrorAction Stop
Remove-Item -Path "$env:SystemRoot\System32\spool\PRINTERS\*" -Force -ErrorAction SilentlyContinue
Start-Service -Name Spooler -ErrorAction Stop
Start-Sleep -Seconds 2
if ((Get-Service -Name Spooler).Status -ne 'Running') { throw 'spooler not running after restart' }

Reset the Windows Update download cache — pairs with a "reboot pending" or update-related check.

Stop-Service -Name wuauserv -Force -ErrorAction Stop
Remove-Item -Path "$env:SystemRoot\SoftwareDistribution\Download\*" -Recurse -Force -ErrorAction SilentlyContinue
Start-Service -Name wuauserv -ErrorAction Stop

Stop a runaway process — pairs with a per-process memory/CPU check.

$name = 'HungApp'             # change to the process this remediation stops
Get-Process -Name $name -ErrorAction SilentlyContinue | Stop-Process -Force
# completing = success; the process check re-verifies the condition cleared

Best practices

Return for success, throw for failure, never exit. This is the contract the harness depends on.
Be idempotent. Assume the fix may run more than once. Never let a second run cause harm.
Keep it targeted. Fix the one condition; don't bundle unrelated actions into a single remediation.
Pair every remediation with a check. Verification re-runs the check — a fix with nothing to verify against can't be confirmed and will just alert.
Mark disruptive fixes disruptive. So they respect the maintenance window and never surprise a user mid-work.
Start Suggested. Wire a new fix as Suggested first, watch it in the Runs log, and only promote it to Automatic once you trust it.
Test on a device first. Confirm the script does what you expect on one machine before it touches the fleet.

Frequently Asked Questions

My remediation shows "Success" but the issue didn't clear. That's expected and correct. "Success" means the script ran cleanly; the Verified column shows the check's verdict. "Success / Still failing" means the fix executed but didn't resolve the condition (for example, cleanup couldn't free enough disk). K12Panel keeps the issue open and retries within your limits rather than falsely declaring it fixed.

Why can't I use exit 0 to signal success? Because K12Panel wraps your script in a harness that reports the outcome after your code runs. exit would terminate before that report is produced. Return normally for success and throw for failure instead — the harness handles the rest.

How does K12Panel know my fix failed? Two ways: your script throws (or a cmdlet errors, since it runs under $ErrorActionPreference = 'Stop'), which marks the attempt a failure; and, independently, the check is re-run to verify. Even a fix that reports success is judged "still failing" if the check disagrees.

Will an automatic fix keep retrying forever? No. Each rule sets a maximum number of tries and a cooldown between them. Once tries are exhausted, K12Panel stops and raises the alert for a person. And if a fix is failing across many devices, the circuit breaker halts it fleet-wide.

Will a reboot fix run during class? Only if you let it. Reboots are disruptive and run automatically only inside the policy's maintenance window. Set that to after hours, or leave it blank to prevent automatic reboots entirely (you can still run one by hand).

Do I need to deploy the script to devices or update the agent? No. Remediations ride the agent's normal check-in. Create or edit one and it's available on the next check-in — nothing to install, no agent update.

A remediation got "halted." What do I do? It failed across enough devices that the circuit breaker stopped it to protect your fleet. Open the Remediations tab, edit and test the script on one device to find the problem, then an Architect can Re-enable it.

Can a remediation run without a person clicking anything? Yes — set its mode to Automatic on the check's row in the policy editor. It then runs unattended within all the guardrails (attempt limits, cooldowns, maintenance window, circuit breaker). Until you do that, it stays Suggested and only runs when someone clicks Run.