BTRFS processes take up 100% CPU resources then make logging back in impossible

RAID, Volume, storage pool, hard drive, USB, SSD cache and iSCSI LUN
Locked
User avatar
yerc1
Posts: 85
Joined: 30 Oct 2020, 15:50

BTRFS processes take up 100% CPU resources then make logging back in impossible

Post by yerc1 »

TLDR - issue occurs when Snapshot is installed whether enabled or not. It may be due to change introduced in TOS update 4.2.13.

Or you can see the long story below.

My TNAS has always had Text Editor, PDF Reader, Snapshot, and Docker to run Syncthing container. In addition to being my central storage unit, the TNAS also functions as Samba server and Plex Media Server.

It has seen TOS upgraded from 4.1.27 to each update up to 4.2.12 with NO major issue.

Issue described in the subject line started with TOS 4.2.13 update and went on to 4.2.14.

When still able to log in, I could see processes btrfs-transacti and btrfs-cleaner alternately taking 100% of CPU resources. I would leave the TNAS in such state one morning and find it in the same state the next morning. Upon losing the ability to log in after some time from observing the issue, I would continue to revisit the TNAS throughout the day to see if the processes have completed, but to no avail.

Shutting down via the power button and turning the device back on after some hours would not help.

Re-installing TOS has been the only way for me to log back in.

Then I would face the same issue again which has forced me to do a TOS re-install - three times so far between updates 4.2.13 up to TOS Patch 1.10.

I have now removed all snapshot files and uninstalled the Snapshot app.
I'll be back with an update to see if this resolves the issue.
F2-221
is my first NAS, bought in October 2020
User avatar
titanrx8
Posts: 222
Joined: 17 Jul 2020, 06:17

Re: BTRFS processes take up 100% CPU resources then make logging back in impossible

Post by titanrx8 »

My system bricked back in February. F2-221 which was at 4.2.07 I believe + RAID 0 + BTRFS + Snapshot. I was experiencing TNASDBD "storms" where the process would take 100%cpu for days on end allowing no access or other function whatsoever. Ultimately I had to pull the power since even the power button was non-functional. Upon attempting to power up, the TNAS PC app couldn't find the system nor was it visible to the router. Didn't matter if disks were installed or not. I connected HDMI monitor and USB keyboard and could see that the firmware died upon starting the boot sequence. The boot loader was corrupted.

Thankfully I had a recent backup because my disks were also unreadable on a second system. I have since rebuilt everything but now use ext4 + RAID 0. No more BTRFS or snapshot. Instead I am using rsync to backup my production data to a second TNAS and have plans to rsync to a different Linux distro based NAS as soon as I build one.

Snapshots are nice but not worth losing the data they're supposed to protect.
User avatar
yerc1
Posts: 85
Joined: 30 Oct 2020, 15:50

Re: BTRFS processes take up 100% CPU resources then make logging back in impossible

Post by yerc1 »

The issue appears to have gone away (fingers crossed).

I say this after monitoring between the TOS re-install yesterday morning to today, with a scheduled power off at night and scheduled power on the following day. I can confirm the unit actually powered off then on as scheduled.

In this last TOS re-install, I ran the "Filesystem defragment" in Control Panel.
If this is the missing key I have no idea, but I don't believe it is as the process completed in a matter of seconds (i.e. I take this to mean there was nothing much to defragment??).

It would be good to hear from TM @TMroy @TMSupport what they say of this.

A recommendation whether or not to change filesystem from btrfs to ext4 would be better.
Recommend here means backed up by hard data, for example "because TOS was engineered this way, users should be using x filesystem".
See https://btrfs.wiki.kernel.org/index.php/Gotchas.
F2-221
is my first NAS, bought in October 2020
Locked