F5-221 - High Resource Utilization, System Not Accessible After ~15 Minutes
Posted: 03 Mar 2023, 03:42
Hi, I've been having some pretty significant issues with my new F5-221 NAS upgraded to 10 GB of memory. I've configured the system with 3x16 TB Drives in RAID 5 BTRFS, originally running a version of TM 5.0. I enabled the snapshot feature and left the options on the default, figuring I would configure them later if they caused issues.
After a few days of usage, the system started to act very sluggish, with regularly high power utilization, eventually becoming unusable. The system would no longer respond to connection requests, with the TOS resulting in a white screen. It would respond to requests either in Windows SMB or through SSH, and the power button was no longer sending any commands to the machine. I had no choice but to pull the power, and tried again,with the same results. High utilization from btrfs-cleaner was all that specifically stood out, but some process not in the list of processes was also trying to use a significant amount of memory. I tried shutting down and rebooting from the TOS web interface, but the system wouldn't respond to this.
After 15-20 minutes, the machine would no longer be accessible from the web interface or load the TOS system, so options could not be configured and saved. (I'm not sure if the options were ever actually saved after this state was entered.) I left the system running for a few days with no change, eventually plugging in a monitor to find that the system had entered a kernel panic state and was deadlocked on memory, allocating 33 GB of memory. The system would also not respond to shutdown and reboot commands entered directly via the console.
I sat on this for a few days and yesterday tried another solution - I pulled the 3x16TB drives and installed the latest TOS 5.1 on 2x4TB drives, configured as ext4 in RAID 1. I configured the system, disabling snapshots entirely, and it ran smoothly. I then shut down the system, rebooted, inserted the 3x16 TB drives, and found that while the system did boot into the latest 5.1, desktop icons would not appear to check on the processes. When I ran top by directly connecting, I found that it was the same btrfs-cleaner process using high amounts of CPU, but not the one using memory. After ~15 minutes the system would no longer be accessible, but drive activity continues. I tried manually killing btrfs-cleaner, but it doesn't seem to have affected the system. I left it running ~18 hours and found that when it froze it was trying to allocate 99 GB of memory somewhere.
Because I can't access the TOS Control Panel, but I can access an SSH session for a few minutes, what are my options for ensuring that my BTRFS drives are safe, or configuring the system in such a way on the 2x4TB drives that when I plug in the BTRFS drives that they won't lock up the system?
After a few days of usage, the system started to act very sluggish, with regularly high power utilization, eventually becoming unusable. The system would no longer respond to connection requests, with the TOS resulting in a white screen. It would respond to requests either in Windows SMB or through SSH, and the power button was no longer sending any commands to the machine. I had no choice but to pull the power, and tried again,with the same results. High utilization from btrfs-cleaner was all that specifically stood out, but some process not in the list of processes was also trying to use a significant amount of memory. I tried shutting down and rebooting from the TOS web interface, but the system wouldn't respond to this.
After 15-20 minutes, the machine would no longer be accessible from the web interface or load the TOS system, so options could not be configured and saved. (I'm not sure if the options were ever actually saved after this state was entered.) I left the system running for a few days with no change, eventually plugging in a monitor to find that the system had entered a kernel panic state and was deadlocked on memory, allocating 33 GB of memory. The system would also not respond to shutdown and reboot commands entered directly via the console.
I sat on this for a few days and yesterday tried another solution - I pulled the 3x16TB drives and installed the latest TOS 5.1 on 2x4TB drives, configured as ext4 in RAID 1. I configured the system, disabling snapshots entirely, and it ran smoothly. I then shut down the system, rebooted, inserted the 3x16 TB drives, and found that while the system did boot into the latest 5.1, desktop icons would not appear to check on the processes. When I ran top by directly connecting, I found that it was the same btrfs-cleaner process using high amounts of CPU, but not the one using memory. After ~15 minutes the system would no longer be accessible, but drive activity continues. I tried manually killing btrfs-cleaner, but it doesn't seem to have affected the system. I left it running ~18 hours and found that when it froze it was trying to allocate 99 GB of memory somewhere.
Because I can't access the TOS Control Panel, but I can access an SSH session for a few minutes, what are my options for ensuring that my BTRFS drives are safe, or configuring the system in such a way on the 2x4TB drives that when I plug in the BTRFS drives that they won't lock up the system?