Skip to content
  • There are no suggestions because the search field is empty.

Writing Custom Remediations (Self-Healing)

A remediation is the actuator in K12Panel's self-healing loop: an action that attempts to fix a condition a check has flagged — free up disk space, restart a stopped service, reboot a stuck device. K12Panel ships with built-in remediations, and you can write your own as short PowerShell scripts, exactly the way you write checks.

This article covers the remediation contract, how fixes are verified, and gives ready-to-use sample scripts. For the concepts, see About PC Health and Self-Healing; for the workflow and how to wire a fix to a check, see Using PC Health in K12Panel. Custom remediations are authored by users with the Architect role.


How a remediation is a mirror of a check

It helps to hold both ideas side by side:

  • A check is a sensor: compute one number, or throw if you can't measure.
  • A remediation is an actuator: perform one fix, or throw if it failed.

Both are short PowerShell scripts. The server orchestrates them: it decides when to run a fix, runs it, and then re-runs the check to decide whether the fix actually worked.


The remediation contract

Every script remediation obeys this rule:

Do the fix and return normally to signal success — or throw to signal failure. Never call exit.

The details:

  • Completing without a terminating error = success. If your script finishes normally, K12Panel treats the action as provisionally successful.
  • throw (or any terminating error) = failure. K12Panel runs your script with $ErrorActionPreference = 'Stop', so most cmdlet errors already become failures you can catch. To signal a definite failure yourself, throw.
  • Do not use exit. K12Panel wraps your script in a harness that reports the outcome; an exit would end the process before that report is emitted. Use throw for failure and simply return for success.
  • Be idempotent. A fix may be retried, so running it twice must not make things worse. Deleting temp files twice is harmless — design for that.
  • Stay within the timeout. You set a timeout; a hung fix is bounded so it can't wedge the device.

You never format a result or manage the success signal yourself — the harness does that. You just perform the action.


Success is decided by the check, not the fix

This is the most important idea in K12Panel self-healing. A remediation returning "success" only means the script ran cleanly — not that the problem is gone. A disk-cleanup script can delete everything it can find and still not cross your free-space threshold.

So after a remediation runs, K12Panel re-runs the failing check and uses that result as the verdict:

  • Check now passes → the issue resolves (quietly).
  • Check still fails → recorded as Still failing; K12Panel retries within your limits and, when it runs out, raises the alert for a human.

You'll see this in the Runs log as two columns: Result (the fix's own report) and Verified (the check's verdict). A row reading Success / Still failing is the system correctly refusing to declare victory.

Optional self-verify: your script may confirm its own work (restart a service, then throw if it isn't running) for a faster, stronger signal. It's never required — the check re-run is authoritative — but it makes the Runs log more informative.


Action types

A remediation's action can be:

  • Run script — your PowerShell, following the contract above. This is what you author.
  • Run built-in command — a native K12Panel command such as Reboot Device or Restart Agent. No script; K12Panel dispatches the command.
  • Apply modifierreserved for a future release.

Disruptive vs. non-disruptive

Flag a remediation disruptive if it interrupts the user (a reboot, an agent restart). Disruptive fixes only run automatically inside the policy's maintenance window; non-disruptive fixes (like clearing temp files) run whenever they're needed. Set this honestly — it's the switch that keeps automation from rebooting a device mid-class.


How a fix runs (the loop)

Once you've wired a remediation to a check in the policy editor (with a mode, tries, and cooldown — see Using PC Health in K12Panel):

  1. A check breaches its threshold and raises an issue.
  2. K12Panel's gate decides whether to act — respecting Suggested vs. Automatic, attempt limits, cooldowns, maintenance windows, mute state, and the circuit breaker.
  3. If it acts, the fix is dispatched to the device and the alert is suppressed while the fix is actively running.
  4. When the fix returns, K12Panel re-runs just that check to verify.
  5. Healed → resolved. Still failing → retry (within limits) or, once exhausted, alert a human.

Because verification re-runs only the relevant check rather than a full device scan, it's lightweight even across a large fleet.


The fleet circuit breaker

If an automatic remediation starts failing across many devices, K12Panel halts it automatically and raises a Remediation halted alert — so one bad script can't run fleet-wide. Halted remediations appear at the top of the Remediations tab; after you fix and test the script, an Architect can Re-enable it.


Built-in remediations

You start with a few ready to use:

  • Disk Cleanup (non-disruptive) — clears user and Windows temp folders, the Windows Update download cache, and the recycle bin.
  • Reboot Device (disruptive) — reboots the machine.
  • Restart Agent (disruptive) — restarts the K12Panel agent service.

Authoring a custom remediation

Go to PC Health → Remediations → New Custom Remediation and set:

  • Name and Slug (slug auto-generates).
  • Action type — usually Run script.
  • Disruptive — check it only if the fix interrupts the user.
  • Timeout — how long the fix may take.
  • PowerShell — your action, in the built-in editor.

Then Test on a device to dispatch it once to a machine of your choice and confirm it behaves before wiring it to a policy. Finally, open the relevant policy, find the check you want to heal, and set the remediation, mode, and limits on that check's row.


Sample remediations

Each follows the contract: return normally for success, throw for failure, never exit. Pair each with a check that measures the same condition so verification is meaningful.

Restart a service (with self-verify) — pairs with the "is a critical service running?" check.

$service = 'Spooler'          # change to the service this remediation restarts Restart-Service -Name $service -Force -ErrorAction Stop Start-Sleep -Seconds 3 if ((Get-Service -Name $service).Status -ne 'Running') {     throw "$service is not running after restart" } # returns normally -> success; the service check re-verifies 

Clear an application cache (best-effort, non-disruptive) — pairs with a disk-free or folder-size check.

$ErrorActionPreference = 'SilentlyContinue' Remove-Item -Path 'C:\ProgramData\YourApp\Cache\*' -Recurse -Force # ignores its own errors, so it always completes; the disk/folder check decides if it helped 

Restart the print spooler and clear the stuck queue — pairs with a spooler-running check.

Stop-Service -Name Spooler -Force -ErrorAction Stop Remove-Item -Path "$env:SystemRoot\System32\spool\PRINTERS\*" -Force -ErrorAction SilentlyContinue Start-Service -Name Spooler -ErrorAction Stop Start-Sleep -Seconds 2 if ((Get-Service -Name Spooler).Status -ne 'Running') { throw 'spooler not running after restart' } 

Reset the Windows Update download cache — pairs with a "reboot pending" or update-related check.

Stop-Service -Name wuauserv -Force -ErrorAction Stop Remove-Item -Path "$env:SystemRoot\SoftwareDistribution\Download\*" -Recurse -Force -ErrorAction SilentlyContinue Start-Service -Name wuauserv -ErrorAction Stop 

Stop a runaway process — pairs with a per-process memory/CPU check.

$name = 'HungApp'             # change to the process this remediation stops Get-Process -Name $name -ErrorAction SilentlyContinue | Stop-Process -Force # completing = success; the process check re-verifies the condition cleared 

Best practices

  • Return for success, throw for failure, never exit. This is the contract the harness depends on.
  • Be idempotent. Assume the fix may run more than once. Never let a second run cause harm.
  • Keep it targeted. Fix the one condition; don't bundle unrelated actions into a single remediation.
  • Pair every remediation with a check. Verification re-runs the check — a fix with nothing to verify against can't be confirmed and will just alert.
  • Mark disruptive fixes disruptive. So they respect the maintenance window and never surprise a user mid-work.
  • Start Suggested. Wire a new fix as Suggested first, watch it in the Runs log, and only promote it to Automatic once you trust it.
  • Test on a device first. Confirm the script does what you expect on one machine before it touches the fleet.

Frequently Asked Questions

My remediation shows "Success" but the issue didn't clear. That's expected and correct. "Success" means the script ran cleanly; the Verified column shows the check's verdict. "Success / Still failing" means the fix executed but didn't resolve the condition (for example, cleanup couldn't free enough disk). K12Panel keeps the issue open and retries within your limits rather than falsely declaring it fixed.

Why can't I use exit 0 to signal success? Because K12Panel wraps your script in a harness that reports the outcome after your code runs. exit would terminate before that report is produced. Return normally for success and throw for failure instead — the harness handles the rest.

How does K12Panel know my fix failed? Two ways: your script throws (or a cmdlet errors, since it runs under $ErrorActionPreference = 'Stop'), which marks the attempt a failure; and, independently, the check is re-run to verify. Even a fix that reports success is judged "still failing" if the check disagrees.

Will an automatic fix keep retrying forever? No. Each rule sets a maximum number of tries and a cooldown between them. Once tries are exhausted, K12Panel stops and raises the alert for a person. And if a fix is failing across many devices, the circuit breaker halts it fleet-wide.

Will a reboot fix run during class? Only if you let it. Reboots are disruptive and run automatically only inside the policy's maintenance window. Set that to after hours, or leave it blank to prevent automatic reboots entirely (you can still run one by hand).

Do I need to deploy the script to devices or update the agent? No. Remediations ride the agent's normal check-in. Create or edit one and it's available on the next check-in — nothing to install, no agent update.

A remediation got "halted." What do I do? It failed across enough devices that the circuit breaker stopped it to protect your fleet. Open the Remediations tab, edit and test the script on one device to find the problem, then an Architect can Re-enable it.

Can a remediation run without a person clicking anything? Yes — set its mode to Automatic on the check's row in the policy editor. It then runs unattended within all the guardrails (attempt limits, cooldowns, maintenance window, circuit breaker). Until you do that, it stays Suggested and only runs when someone clicks Run.