[Accepted to Roadmap] Ransomware & Malicious Activity Detection

Problem/Justification

Ransomware remains one of the biggest security threats facing organisations with significant time and money being spent on detection and response. While TrueNAS is excellent when it comes to recovery through snapshots and replication there is currently nothing built in that helps identify suspicious file activity before the damage has already occurred.

In most ransomware incidents the issue is only discovered after a large number of files have already been encrypted, deleted, or modified. At that point recovery relies entirely on snapshots and backups. While this generally works it often results in downtime and a disruptive recovery process for users and services.

Given that TrueNAS already has good visibility of datasets and snapshots it feels like there is an opportunity to use this information more proactively rather than only after an incident has taken place.

Impact

Large-scale file activity such as mass encryption, deletions, or renames can go unnoticed until users start reporting missing or unreadable data. Ransomware can modify very large numbers of files in a short period of time often before anyone has a chance to intervene. Although recovery is usually possible it can still be time-consuming and disruptive to normal business operations.

There is also an increasing expectation that storage platforms contribute to overall security, not just data protection and recovery. Without some form of early detection TrueNAS remains largely reactive when dealing with ransomware-style incidents.

User Story

I would love to see TrueNAS detect abnormal or suspicious file system activity such as rapid bulk file changes, deletions, or file extension changes so that potential ransomware or malicious behaviour can be identified early and the amount of damage limited.

Proposed Capability

TrueNAS could harness tools like ZFS diff or similar snapshot comparison techniques to identify sudden spikes in file changes, high-volume deletions or overwrites, and unusual file extension changes such as .docx files becoming .locked or .encrypted. This could be combined with configurable thresholds on a per-dataset basis, for example triggering when a certain percentage of files change within a defined time window.

Alerts could be generated using existing TrueNAS mechanisms such as email. Optionally automated response actions could be supported including temporarily disabling or restricting client access over SMB or NFS and/or forcing a dataset into a read-only state. Administrators would also benefit from visibility into which client or user triggered the activity.

Value to TrueNAS

This capability would improve TrueNAS’ overall security posture without attempting to replace endpoint protection. It would add an additional layer of defence by reducing time to detection, make TrueNAS more attractive to security conscious organisations and align well with the current focus on ransomware resilience and incident response.

5 Likes

Enterprise customers already have departments dedicated to cyber security.

As a homelabber I just wouldn’t be interested as I feel snapshots are good enough for me & that this would be bloat.

1 Like

This is true however these teams tools often don’t integrate that tightly with existing infrastructure meaning outside of alerting they are fairly limited in what they can do. Trying to bolt different technologies together will often result in a suboptimal outcome.

Agreed this is perhaps a little OTT for the average homelabber and is aiming much more at business. As for ā€˜bloat’ well you could say that about a lot of the TrueNAS features you don’t use. I would guess that the average TrueNAS user uses less than 10% of its functionality but this should not cause any detriment to the end user.

This definitely has the vibe of ā€œAn average user has no need to move the taskbar off the bottom of the screen so that bloat was removed.ā€ :smiley:

Being more on-topic, I think it would be more useful to have some sort of diff view report accessible so that tinkerers could build something that facilitates the goal of this concept.

At one point I enabled samba audits for unlinks so I could watch the logs for when and who was doing evil, but that was manual and error prone (and I have forgotten how to turn it off again since what I thought let it go wasn’t and I keep forgetting to sleuth it out again, and the syslog spam is manageable otherwise).

As far as real-time detection is concerned, I’m not sure how that would be implemented outside an audit log watcher? Is there a way to make a would-be snapshot for comparison (i.e. ā€œThese are the changed blocks that would be captured if a snapshot was takenā€)?

1 Like

Yes using ZFS Diff comparing a recent snapshot with an existing dataset and setting parameters that would trigger alerts.

As a homelabber I would actually appreciate this feature. Bloatware? It is only if you do not find it valuable. If this were implemented correctly, where you could disable it if desired, I would think most, if not all people would use it. Configurability I think is a key factor. I voted for this as I feel a business would benefit from it and I would benefit from it, then TrueNAS (the company) would benefit from it, a significant feature.

This is one of the better feature requests I’ve seen in quite a while.

4 Likes

Well, can’t always be right I guess. I think I’m arguing against as I imagine this to be some sort of built-in anti virus, but I am getting the feeling now that I am way off.

Guys mind breaking it down for me further?

I took it at face value what @Johnny_Fartpants wrote. If a large amount of files start changing all at once, over a specified time period, then do something about it. And I’m sure there are some specific or common naming conventions but that should be trivial.

As for how this feature would stop or block the changes, or alert the admins, that I think could be interesting to solve. I’m not a SYSADMIN or anything like that, but I did sleep at a Howard Johnsons last night… :laughing:

But seriously, I can see several options but it will depend on how TrueNAS would apply a change like this. I’d like to see a few options:

  1. Able to select Automatic Termination of the offending connection.
  2. Some Instant Email to a list of Admins.
  3. Ability to set thresholds like how many files can be changed through a connection over a specified time. I see this as an attack scenario.
  4. Threshold of total number of files changed form a single connection/user in 24 hour (or user selected value). I see this as one user clicked the wrong link and found some ransomeware.
  5. Some sort of verifiable authorization to override the termination of an offending link, if going this route.

There are quite a few ways I could see implementing it from a user perspective. For a home user, I’d like the feature to send me an email asking for an authorization code to continue. I’m not sure what other verifiable options there would be. When I’m logged in via SSH, I would like a simple authorization code question. Preferably not the same user password.

Maybe @Johnny_Fartpants has a different idea but anything that would help protect against a ransomeware attack would be good. Even if just letting the Admins know, they could at least jump in and stop it and restore a backup. As a home user, I’d prefer it asking me for an authorization code to continue.

Would this impact ā€œmovingā€ files to another directory, without modifying the original file and the directory is fully accessible, hopefully not.

My imagination can get wild, but even real basic implementation just sounds like a plus. As to how it would impact the system in general, hopefully it would not be scanning each file, that is way too much overhead, and more like a virus scanner, which I don’t think it the goal here.

@Johnny_Fartpants can expound on what he desired, I don’t want to speak for him.

3 Likes

Thanks @joeschmuck you’ve highlighted this well.

Imagine a client machine has been infected with ransomware. This client has access to your NAS. It quietly starts encrypting all data on your client machine including your network shares (your NAS). By the time the sysadmin is alerted to this a lot of damage has occurred.

Now rolling back a snapshot as we all know is easy but deciding which one is a bit trickier. Nevertheless once this is achieved you are still left with an infected client that has access to your NAS so it starts encrypting all over again. Wouldn’t it be nice to know which client/user is causing the issue? Wouldn’t it be nice if perhaps automatically this client’s access was disabled when the alert was triggered?

Thats the idea.

Hope this helps.

3 Likes

We can use research that has already been done to identify some good thresholds/triggers.

These thresholds could be adjusted or even removed at a per dataset level. This would allow users to tailor alerts for each of their workloads. Perhaps on an Archive dataset you want to be alerted as soon as there is any write activity while on the other hand perhaps on a very write intensive workload you need the thresholds changed/increased or even disabled.

Perhaps dare I suggest it would could use AI to apply some logic? Thresholds could be adjusted based on common write patterns per dataset.

1 Like

Alright this sounds much less intrusive than what I have imagined & I can see that maybe I was a bit negative.

I wonder if a lazy way of setting up alerting is if a honeypot file was modified could be a lightweight implementation; file walletkeys.txt gets any kind of changes/modifications then the alert goes out for example…

Ah, I’m not up to date on new things, I’ll have to google that a bit. I had always pondered why it seemed that wasn’t a thing but if it is now that would make me quite excited lol.

Informative video. And so much of it sounds like common sense.

I see a couple of challenges re: implementation here.

Let’s say you have a NAS that is operating normally and suddenly there is a massive change in the number of files getting changed, and they’re getting changed likely methodically, ie one directory after another.

This kind of delta may be somewhat easy to code for - ie if normal monthly activity is y, alert the sysadmin if that level is reached in a week.

On the other hand, for paying enterprise customers, the workload may be so constant all the time that the multiple-y approach will be harder to implement. The encryption will be happening in the noise of everyday activity, snapshots getting bigger won’t tell the story either. Or at least only marginally.

Without deep file inspection, it might also be difficult to determine if a file went from just compressed data to encrypted data.

So I’m not sure how one would go about this proactively without having a very careful think re: what the detection mechanism would be. And that detection algorithm cannot rely just on file names or bulk activity, as there may be bulk operations that are perfectly legit.

Remember, this has to be a feature that paying enterprise customers find useful or it’s unlikely to get funded.

Hit on the head right there. The most common access pattern is a streaming read and write followed by a delete, or perhaps a read-write in place, and there’s no guarantee the file names will be preserved at all.

Come to think of it, is there even a way to determine IOP patterns at the file level beyond the services that have audit capability to log it at a detailed enough level?

I have a feeling that at the current tech level the best we’ll be able to do is something akin to OneDrive’s ā€œHey, you deleted a bunch of things, just to let you know they’re still there in the recycle binā€ popup.

Yes the execution will be the tricky bit. At a very basic level I was thinking renames and/or deletions would be the most obvious red flag. Therefore you could have a snapshot schedule that runs every 15mins and use ZFS Diff to compared it to the live dataset. Somewhere between 100-500 renames and/or deletes within that 15min window would be considered highly suspicious and alert or even take action like make dataset read-only and/or kill the clients access to the NAS. Obviously these parameters could be tweaked per dataset so you could tailor it to your environment.

1 Like

:100: and in the current climate where companies are spending a fortune on cybersecurity I can’t see that being an issue.

I wish I shared your enthusiasm. Many companies seem to be outsourcing increasingly as CIOs get rewarded for paring down company spend, not increasing compute performance, reducing latency, etc.

The OneDrive mention two posts up is a perfect example of that. Depending on the size of the pipe and the size of the files you work on, its performance can only be described as patience-building.

Yet the folk that implemented it keep getting prizes, likely sponsored by the very industry that benefits from such outsourcing.

The companies that do value performance, still install servers, etc. may be interested but Legal will have a cow re: any representations you make re: detecting stuff before it causes a problem.

Your best bet may be pattern recognition, ie any user suddenly behaving very differently, across all times of day, etc. causing alarm. That also limits the overhead that such auditing would entail.

Yes I have seen this firsthand. I’ve also started to see the chaos this causes both financially to the organisation and perhaps worse the impact on the users of the service. On a positive note I do believe some realisation is slowly starting to kick-in and I honestly believe we are on the cusp of a renaissance on the benefits of local IT support.

1 Like

I hope you’re right. The challenge will be to prove to the C-suite that having control over one’s electronic destiny is better than abdicating same and just dealing with the loss of productivity as users wait and wait for their files to save, load, etc. That proposition is further weakened in the context of more and more remote work where ā€œlocal network low-latency for anything corporateā€ pretty much stops being a factor altogether…

Even when companies reach the scale where maintaining local infrastructure / fat office pipes, etc. is a fiscal rounding error, I’d wager that CIOs like the convenience of outsourcing not only from a cost perspective (because it’s usually cheaper and scales as needed) but also because it gives them plausible deniability should something go wrong - ā€œhey it’s that guys faultā€.

And that’s the thing - the impact on productivity will be use-case specific. Companies that do a lot of work with big datasets will likely stick to keeping stuff on premise longer, i.e. think local storage for CAT scans while the patient is in treatment, for example. But even there, the incentive for the CIO is to push the cloud as much as possible in order to shrink their department footprint.

Some people also love the idea that their data is always accessible in the cloud, across multiple devices. They’re used to it from their home devices (iCloud, OneDrive, etc.) and are also used to waiting… the impact on productivity is ignored in many service-related industries where Time & Materials contracts mask such inefficiency other than when contracts go up for bid.