Friday, April 4, 2025

Back to Basics: Updated USMT Wrapper Script

I recently transitioned to a new role where one of my responsibilities is modernizing EUC functions. A low-hanging fruit I noticed was around how Hardware Refresh is handled, specifically moving data. Due to the type of work that is performed, we cannot use current technologies such as OneDrive to simplify this process even further. Previously, the process involved manually copying user data to a USB stick, then transferring it to the new hardware—a method that's outdated and painfully slow. Enter USMT.

Over a decade ago, I created a USMT shell wrapper so updated this wrapper. Why not rewrite it in PowerShell, you might ask? Speed and simplicity. It was much quicker to update the existing script than to build a new one from scratch in PowerShell. Moreover, this presents a learning opportunity for some of my staff, as I plan to assign them the task of converting the script to PowerShell once we've agreed on the featureset.

What’s Changed?

Being over a decade old, I made several updates and enhancements based on the initial need. As Microsoft has not updated USMT MIG files (it is going away after all). I am leveraging EhlerTech's MIG files, which are outstanding. This update was focused on the USMT version (10.0.22621.1) within ADK 11 version 10.1.22621, as newer ADKs are a little buggy, but other versions should work fine, as its more about the Windows version. Here’s a breakdown of the major updates:

  • Removed XP detection.
  • Forced AMD64 for data capture (this can be toggled back if needed).
  • Added Execution Escalation detection.
  • Enabled the ability to exclude IT support staff escalated accounts from user captures.
  • Expanded store file detection from just the (destination) C drive to multiple drives, supporting USB storage, which can be faster than gigabit networking anymore.
  • Implemented detection for Windows 10 or 11 as the source, ensuring the correct MIG file is used during capture.

  • Introduced a switch to specify a specific sAMAccountName for data capture.

What’s Next?

While these updates mark significant progress, there are additional improvements I'd like to make in the future, particularly once we transition to PowerShell. Some ideas include:

  • Supporting ARM64 captures and destinations if needed.
  • Enhancing USMT store file detection to ignore mapped drives etc.
  • Add UNC error handling
  • Make DOMAIN a variable for easier sharing
  • Add OneDrive Detection
You can find this and the older USMT files located here on GitHub.


Saturday, February 15, 2025

Replace failed ZFS mirror drive in OPNSense

Welcome Back!

It's been a while since I last shared anything. I recently changed jobs and have been busy with that endeavor, but I hope to share more insights from this journey soon.

Encountering SMART Errors after OPNSense Upgrade

Upon upgrading one of my OPNSense instances, I noticed some errors upon restarting one of my drives, ada1. After further investigation, I came across some SMART errors. Although these errors were not enough to trigger a SMART failure, they were still concerning. Even manual short tests returned clean results. Here's what I found when running smartctl -a:

 ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
...  
180 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       4185
...
195 Hardware_ECC_Recovered  0x0032   100   099   000    Old_age   Always       -       1676229109
...
SMART Error Log Version: 1
ATA Error Count: 4216 (device log contains only the most recent five errors)
...	
Error 4216 occurred at disk power-on lifetime: 32126 hours (1338 days + 14 hours)
  When the command that caused the error occurred, the device was in an unknown state.
  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  00 00 00 00 00 00 00

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ec 00 00 00 00 00 00 00      00:00:09.590  IDENTIFY DEVICE
  ec 00 00 00 00 00 00 00      00:00:09.260  IDENTIFY DEVICE
  f5 00 00 00 00 00 00 00      00:00:09.250  SECURITY FREEZE LOCK
  ec 00 00 00 00 00 00 00      00:00:09.250  IDENTIFY DEVICE
  ec 00 00 00 00 00 00 00      00:00:09.250  IDENTIFY DEVICE
...

This drive is on its way out, so I need to replace it. This particular system is a retasked $30 Barracuda Load Balancer 340 that I upgraded with a new processor and memory. It works great for the use case. Unfortunately, it uses commodity hardware (an MSI customized mainboard), and its manual did not state it supports hotplug, so I had to bring it down to swap it out. Log into the console via your preferred method—I'm using SSH. The first task is to remove the failing drive from the ZFS pool after identifying it.

root@OPNsense:~ # zpool status

pool: zroot
 state: ONLINE
  scan: scan: scrub repaired 0B in 00:00:15 with 0 errors on Wed Feb  5 01:31:15 2025
config:
        NAME        STATE     READ WRITE CKSUM
        zroot       ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            ada0p4  ONLINE       0     0     0
            ada1p4  ONLINE       0     0     0
errors: No known data errors 

Since ada1 is failing, the ZFS partition is ada1p4, so we will remove that partition.

 root@OPNsense:~ # zpool detach ada1p4 

Replacing the Drive

As this system does not support hotplug, I then shut it down and swapped the defective drive with a known good one of the same size or larger. The new drive should be clean without any partitions on it. However, in this case, as the second drive, OPNSense will boot back up from ada0, so it's not super important. Your results may vary. Once back up, log back into the console via your preferred method.

Verifying the Partitions

Firstly, we need to verify the partitions because copying the wrong ones could lead to trouble! If your new drive has a partition table this will show it.

root@OPNsense:~ # gpart show

=>       40  500118112  ada0  GPT  (238G)
         40     532480     1  efi  (260M)
     532520       1024     2  freebsd-boot  (512K)
     533544        984        - free -  (492K)
     534528   16777216     3  freebsd-swap  (8.0G)
   17311744  482805760     4  freebsd-zfs  (230G)
  500117504        648        - free -  (324K)

We have four partitions and the partition table to clone to the new disk. We use dd on partitions 1 and 2; however, partitions 3 and 4 are addressed via the relevant tools. Next, we need to turn off swap. Since both partitions are listed in /etc/fstab, we receive an error for the swap located on the now-missing disk.

root@OPNsense:~ # swapoff -a

swapoff: removing /dev/ada0p3 as swap device
swapoff: /dev/ada1p3: No such file or directory

Cloning Partitions

Now comes the potentially dangerous parts, so be VERY careful here. The source drive is ada0, and the new drive is ada1. We will clone the partition table from ada0 to ada1.

root@OPNsense:~ # gpart backup ada0 | gpart restore -F ada1

Next we clone partition 1:

root@OPNsense:~ # dd if=/dev/ada0p1 of=/dev/ada1p1
532480+0 records in
532480+0 records out
272629760 bytes transferred in 22.115694 secs (12327434 bytes/sec)

Then partition 2:

root@OPNsense:~ # dd if=/dev/ada0p2 of=/dev/ada1p2
1024+0 records in
1024+0 records out
524288 bytes transferred in 0.054477 secs (9623964 bytes/sec)

For the ZFS mirror, we use the zpool tool to attach it to the zroot pool as shown by zpool status above.

 root@OPNsense:~ # zpool attach zroot ada0p4 ada1p4

You can verify it’s back to an expected state via zpool status:

 root@OPNsense:~ # zpool status
  pool: zroot
 state: ONLINE
  scan: resilvered 2.29G in 00:00:10 with 0 errors on Sat Feb 15 10:14:39 2025
config:

        NAME        STATE     READ WRITE CKSUM
        zroot       ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            ada0p4  ONLINE       0     0     0
            ada1p4  ONLINE       0     0     0

errors: No known data errors

Finally, turn swap back on, which will take care of the third partition.

 root@OPNsense:~ # root@OPNsense:~ # swapon -a
swapon: adding /dev/ada0p3 as swap device
swapon: adding /dev/ada1p3 as swap device

At this point, it would be a good idea to go to the GUI and navigate to System: Settings: Cron, and verify the SMART tasks are configured the way you want.

Along with any ZFS tasks. I only have a monthly scrub due to enabling autotrim per my config articles.