Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Volatility2's linux_recover_filesystem capabilities to linux.pagecache plugin #1468

Open
Abyss-W4tcher opened this issue Dec 26, 2024 · 11 comments · May be fixed by #1613
Open

Add Volatility2's linux_recover_filesystem capabilities to linux.pagecache plugin #1468

Abyss-W4tcher opened this issue Dec 26, 2024 · 11 comments · May be fixed by #1613

Comments

@Abyss-W4tcher
Copy link
Contributor

Abyss-W4tcher commented Dec 26, 2024

It would be awesome to integrate capabilities from https://github.com/volatilityfoundation/volatility/blob/master/volatility/plugins/linux/recover_filesystem.py into the pagecache extraction and parsing system.

The following user parameters could be added as well:

  • Decide to replicate file attributes to the exported files or not
  • Add a general file limit/truncate size, which prevents file outputs or adds a .trunc at the end of impacted files on disk

Ideas:

  • Think of a way to handle symlinks, or leave them as is (e.g. /forensics/vol3_output/dumped_fs/symlink_to_etc_passwd points to host's /etc/passwd instead of /forensics/vol3_output/dumped_fs/etc/passwd)
@atcuno
Copy link
Contributor

atcuno commented Dec 27, 2024

Attributes are really hard for a couple reasons, such as:

  • They do not translate well between file systems, such as an analyst on Linux dumping NTFS files.. the plugin would miss so many of the attributes that the results wouldn't make sense and would essentially be inaccurate.

  • We don't want to created privileged files on the analyst's system, such as setuid/setgid files from memory.

  • Not all filesystems keep timestamps to the same precision, so you are either losing precision in the case of the memory sample having more precision than the analyst's filesystems OR you are producing wrong results when taking data with more precision and writing to a file system with less as the less precise bits will be set to 0.

The best option would be to write a metadata file with all the permissions that the plugin knows how to parse.

@Abyss-W4tcher
Copy link
Contributor Author

Abyss-W4tcher commented Jan 2, 2025

These are valid points, the metadata file could simply be plugin's stdout, allowing users to save it as csv to explore and filter through sqlitebrowser for example.

Files should be assigned R permissions only (analysts should copy files before modifying them if needed), no SUID or X bits, like you pointed out. Timestamps should also be set to the current time of extraction, giving full power to the "metadata file" for this, preventing crazy smear timestamps on disk.

I will work on it if no one else is available currently. Anyone, please feel free to share ideas to enhance its relevance :)

@atcuno
Copy link
Contributor

atcuno commented Jan 2, 2025

Another thing to consider is that you will definitely run into file path length restrictions when trying to preserve directory structure. This plagues basically every forensics/IR software. For example, NTFS is limited to 255 characters total in a file path.

So then you will have where the analyst does:

-o /mnt/usb/evidence/case12345

pointing to their NTFS formatted external drive or it can be someone running Vol3 from Windows and saving to somewhere under their Users folder.. All those characters in the output path contribute to the 255 limit... Your plugin then tries to extract using the full path from the sample, which gets appended to the -o option, and now you are over the 255 limit and open() throws a backtrace. Different tools then hack up different ways to truncate the paths in the ouput, but it is rarely pretty.

pagecache.files already supports timelines for mactime/body file integration, but this misses all the other useful metadata you mentioned like permissions, symlink targets, users, and groups. I do think some type of plugin to produce this extra information would be useful, but I think the mass extraction part is bound to be very painful if trying to preserve directory structure.

We could do a mix where people run this new plugin to get the full directory structure listed, along with the detailed metadata of every file, and then explore with grep or sqlite explorer or anything else.. then for files of interest, they use the current pagecache.files to extract the files. This removes the directory structure pain points while still giving analysts the detailed metadata view.

Thoughts on the dual approach?

I am definitely open to all suggestions and not trying to shut the idea down - just trying to avoid a bunch of inevitable headaches for you.

@Abyss-W4tcher
Copy link
Contributor Author

Abyss-W4tcher commented Jan 3, 2025

I think the dual approach loses the initial benefit of having not to manually extract everything (making file exploration tedious). Instead of dealing with host filesystem, I though of approaching the directory problem with an ISO. There is a Python library out there named pycdlib that looks promising, as it is maintained, supports multiple ISO formats, and handles symlinks !

I made a small initial PoC:

try:
    from cStringIO import StringIO as BytesIO
except ImportError:
    from io import BytesIO

# pip3 install pycdlib
import pycdlib

# https://clalancette.github.io/pycdlib/example-creating-udf-iso.html
iso = pycdlib.PyCdlib()
iso.new(udf='2.60')

PATH_MAX = 4096
NAME_MAX = 255

# Directory and file to add
dir_path = ""

# TEST MAX LENGTH
# Create a "/<A*NAME_MAX>/<B*NAME_MAX>/<A*NAME_MAX>" etc. directory path
for i in range(0, PATH_MAX//(NAME_MAX+1)):
    dir_path += "/" + chr(65 + i%2)*(NAME_MAX-1)

# Create (sub-)directories iteratively, as it is not supported by default in pycdlib
path_parts = dir_path.strip('/').split('/')
current_path = ''
for part in path_parts:
    current_path += '/' + part
    iso.add_directory(udf_path=current_path)

# Add a file at the deepest level
foo_content = b'foo\n'
file_path = current_path + '/testfile.txt'
iso.add_fp(BytesIO(foo_content), len(foo_content), udf_path=file_path)

# Write and close the ISO
iso.write('new.iso')
iso.close()

This will abort if a file name exceeds the 255 limit (it is handled by default), and successfuly write an ISO on my local disk which can be mounted and explored in read-only (no attributes problems, and prevents manipulation mistakes). Basically, we are creating a "Python represented filesystem", that gets centralized to a single file and immediately portable.

What do you think about it @atcuno (@gcmoreira maybe, as you worked on the pagecache) ?

Of course, we would have to do some sanity checks, but everything else will be handled by the software opening the ISO, which should be able to handle and backup edge cases ?

@atcuno
Copy link
Contributor

atcuno commented Jan 3, 2025

So what happens with file paths over 255? it just doesn't add them or does it work with them?

@Abyss-W4tcher
Copy link
Contributor Author

Abyss-W4tcher commented Jan 3, 2025

The Python module raises an exception (not added to the ISO), so you have to catch it and decide what to do (skip with a debug message for example).

A file path over 255 mostly indicates memory smear, as it is a limitation on ext2/3/4.

@atcuno
Copy link
Contributor

atcuno commented Jan 3, 2025

Ah I see, so the the 255 limit is solely within the ISO and the ISO path doesn't take up any characters towards the 255?

@Abyss-W4tcher
Copy link
Contributor Author

Abyss-W4tcher commented Jan 3, 2025

Yes, the module will refuse to create a file in the ISO with a name longer than 255. This only applies to name parts, as /<254_chars>/<254_chars> is absolutely valid.

@gcmoreira
Copy link
Contributor

We could generate a tarball, using the Python standard tarfile module, which I think supports everything you mentioned, permissions, links, etc. It supports raw or gzip, bzip2 and xz compression

@Abyss-W4tcher
Copy link
Contributor Author

Abyss-W4tcher commented Jan 4, 2025

Good point, and it doesn't require additional imports. I found out:

  • paths longer than 4096 and filenames > 255 can be inserted in a tarball with tarfile
  • tar -xvf and WinRAR will crash on paths longer than 4096 and filenames > 255 (does not account of the extraction directory path prefix length, it is inherent to tar itself).

ISO mounting inherits path lengths limit from the host system, and from the early tests it won't block entire archive content because of one "problematic" element. The FS will be simply mounted to a disk letter and that's it (+ robust read-only). I was also able to mount an ISO with very long paths on Windows (disregarding Windows limits) when a tar archive wouldn't extract.

Of course, we could restrict to a certain path length, but this could be leveraged to prevent certain files from being extracted by creating path lengths greater than 4096 ?

@Abyss-W4tcher
Copy link
Contributor Author

Abyss-W4tcher commented Jan 5, 2025

Got a first plugin PoC, which dumps DIR, REG and LNK (with proper path patching) into an ISO.

I'd be happy to hear if anyone has a solution to circumvent the tar/WinRAR software path lengths limitations. A tar extraction might work on one system but not on another one, which I think is not quite reliable.

The major downside of the ISO way is the external dependency needed, apart from this it is in my sense a good candidate to solve the problem expressed by atcuno. In fact, it seems that the host filesystem restrictions doesn't apply as we aren't technically writing the file (virtual filesystem representation).

edit: A tool named ratarmount allows to mount a tarball to a directory, allowing to bypass all the potential length restrictions ! We could link it in the plugin output, when users encounter path length extraction issues. In the end, no need for an external dependency at all :) (at least when Volatility3 runs).

@Abyss-W4tcher Abyss-W4tcher linked a pull request Feb 10, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants