Replies: 2 comments 4 replies
-
I give it a try, and it behaved very well, until i deduped 7To on a 32Go system, and ddt eated all the system RAM :-( |
Beta Was this translation helpful? Give feedback.
-
Hello, If I compare it with commercial dedupe appliances, then the dedupe ratio is close to 100% for the second full backup. For the initial backup, it's around 22GB vs. 19GB. That difference is okay I would say. The commercial appliance is around 15% better. That is fair for me. The test machine that I'm backing up is a Windows Domain Controller that idles (lab environment). This is the space usage after one backup
This is the space usage after two backups
when I then copy one of the backup files (one backup file = one machine in my case), then the dedupe ratio goes up
the DDT has entries, so I think it should do something
Am I doing something wrong, or was my expectation too high? Best regards, |
Beta Was this translation helpful? Give feedback.
-
Fast Dedup Review Guide
Hello, dear reviewer, and welcome to the “Fast Dedup” project, brought to you by Klara and iXsystems.
This discussion stands as an overview of the entire project and as a kind of guide for reviewers. We hope it’s useful!
Overview
“Fast Dedup” is an umbrella project for a significant upgrade of the original OpenZFS block deduplication system. It’s composed of multiple logical changes:
Cleanup and documentation of the existing dedup code, creating a good base to build fromFast Dedup: Cleanup and documentation ahead of integrating Fast Dedup #15887ZAP shrinking: allows ZAPs of all kinds (dedup and others) to reclaim some of their space after a large number of entries are deletedFast Dedup: ZAP Shrinking #15888Dedup quota: allows the operator to set a maximum size on dedup tables, which when reached will stop creating new entries, converting dedup writes for new blocks into regular writes.Fast Dedup: Dedup Quota #15889Dedup prefetch: adds a newFast Dedup: DDT Prefetch #15890zpool prefetch
command that loads dedup tables into the ARC, improving performance from cold.Table container format: allows a dedup table to be composed of multiple different kinds of objects (rather than simply ZAPs), and be self-configuring, allowing some kinds of new features to be added in the future without needing to discard existing dedup tables.Fast Dedup: Introduce the FDT on-disk format and feature flag #15892“Flat” entry format: a new smaller data format for in-memory and on-disk table entries.Fast Dedup: “flat” DDT entry format #15893FDT Storage Class: Ensure the fast dedup data is able to use the dedup vdev class.Fast Dedup: dnode: allow storage class to be overridden by object type #15894Dedup log: adds a journal to a dedup table, allowing fast updates and vastly reducing IO and memory overhead of the overall dedup system.Fast Dedup: FDT-log feature #15895Dedup prune: Adds the ability to remove older entries from the UNIQUE dedup table to allow continued use of dedup under the Dedup quota feature.Fast Dedup: prune unique entries #16277These features are designed to peacefully co-exist with the original dedup system. A pool with an existing dedup table will continue to work exactly the same as it always has with a Fast Dedup-capable build of OpenZFS. If the
fast_dedup
feature is enabled on such a pool, new dedup tables will be created with all Fast Dedup features available, but the old ones will continue to work as they always have.Trying it
Warning
Do not use this code on a production pool. The on-disk format changes are not yet finalized or stable, and are not compatible with stable OpenZFS releases, and the compatibility code for traditional dedup tables may not be stable.
The whole combined FDT code is on the
fdt-rel
branch of theKlaraSystems/zfs
repository:Once built and running, enable the
fast_dedup
feature to use it.Using it should be exactly the same: enable the
dedup=
option on a new dataset, and you get transparent block-level deduplication as before. It should just be more efficient.Standard dedup-related inspection tools like
zpool status -D...
andzdb -D...
should work the same as before, just show more kinds of dedup objects, and different sizings.New tools are available to invoke the prefetch and prune features:
zpool prefetch -t ddt <pool>
zpool ddtprune <pool>
These are documented in
zpool-prefetch(8)
andzpool-ddtprune(8)
.There is are some new pool properties:
dedupcached
dedup_table_quota
dedup_table_size
These are documented in
zpoolprops(7)
There’s a collection of new kstats in the pool kstats, eg
/proc/spl/kstat/zfs/tank/ddt_stats_sha256
:There’s also a collection of new tuneables:
dmu_ddt_copies
zfs_dedup_log_flush_rounds_max
zfs_dedup_log_flush_min_time_ms
zfs_dedup_log_flush_entries_min
zfs_dedup_log_flush_flow_rate_txgs
zfs_dedup_log_txg_max
zfs_dedup_log_mem_max
zfs_dedup_log_mem_max_percent
zfs_sap_shrink_enabled
These are documented in
zfs(4)
.Review guide
All these changes are interconnected but not all are directly related. This would make reviewing them as a single mass extremely difficult, which in turn increases the likelihood that the changes will either be waved through with bugs, or languish in the issue tracker forever.
To make review easier, we’ve tried to layer the patch stack into a logical series of changes, each building on the previous ones. The intent is that they can be reviewed in order, with the reviewer’s understanding of the changes and the system as a whole growing with each commit.
Our intent is that as the earlier PRs are reviewed and updated based on review feedback, the later ones will be rebased and pushed to match, and the
fdt-rel
combined branch updated too. Some of the earlier PRs that do not affect the on-disk format could be merged as they are approved, while the later ones we expect will be approved and “locked in place”, and once everything above them is approved, the whole log can be merged.(Unfortunately, Github can’t easily handle a stack of PRs, only showing the changes between each one, so the later ones all show the commits for the earlier ones).
Its worth noting that at time of writing, ZTS coverage is still limited. There has been testing of course, both for performance and for function, but not enough to cover everything. We are fully expecting and intending that more tests will be created before these PRs are merged, and that work will happen within the scope of each PR.
PR list in review order:
#15887 dedup: cleanup and document
This is a collection of cleanups, refactors and documenting the existing dedup system. There should be no functional changes here at all, and we expect that this PR could be merged almost immediately without controversy.
#15888 zap: add shrinking support
This is a standalone PR that allows ZAPs to be shrunk, by collapsing empty sibling leaf blocks. It could provide a nice space improvement for high-churn ZAPs (eg ZPL directories), and is a prerequisite for the quota and prune features, as there’s no point pruning entries if we can’t reclaim they space they would use.
#15889 dedup: quota
This allows a quota to be set for the on-disk dedup table, with dedup effectively “disabled” for new entries. This is the first time its ever been possible for a block created in a dedup-enabled dataset to not be duplicated and not have a
D
bit set, and internally, the first timeddt_lookup()
has ever been able to return NULL, so it represents something of a departure and is important to understand.This is positioned in the stack before the on-disk format changes because this does not require a format change, and can work just fine on traditional dedup tables.
#15890 dedup: prefetch
This invokes the regular DMU prefetch code to get all dedup tables into the ARC, to try to reduce the time after importing the pool that performance suffers because most of the dedup tail are not in memory.
#15892 dedup: add fast_dedup feature and support for traditional and new on-disk formats
This adds the core of the
fast_dedup
feature itself: the “container” object for the table, which includes its config, and any objects that that form the table as a whole. Its designed to be extendable separately from the pool feature flag, that is, new FDT “subfeatures” could be added to an individual DDT without needing to break backward compatibility for all DDTs in the pool.Reviewing this here means understanding the basic structure that the “flat entry” and “log” PRs fit into, since they each add an Fast Dedup “subfeature”.
#15893 dedup: “flat” entry format
This adds the “flat phys” subfeature, which reduces the in-memory and on-disk size of an individual entry by reducing the number of blocks stored in a single dedup entry from 4 to 1, and removes and reorganising things that aren’t needed in every entry.
This is where you will see the gymnastics required to retain compatibility with traditional dedup tables.
A significant chunk of this is in the IO pipeline in
zio_ddt_write()
, which is a quite involved rewrite to allow a single dedup entry to be “extended” with new DVAs.#15894 dnode: allow storage class to be overridden by object type
This is a small standalone PR that provides a mechanism needed by the log feature. It’s included as a separate PR because the method may require a specific discussion.
#15895 dedup: log feature
This adds the “log” subfeature, which is a fast append-only on-disk object intended to buffer changes to the dedup ZAPs to allow them to be updated in batches, over multiple transactions, without competing with true user IO. That makes these feature very involved, mostly in the flushing machinery. Hopefully by now it will be clear how it fits with everything else.
#16277 dedup: prune
This adds a facility to remove unused unique entries from the dedup table, shrinking it down to make it more efficient to update. It is positioned last in the patch stack because it requires an on-disk format change to the dedup entry, subtly changes the meaning of the
D
block pointer flag, and requires some delicate interactions with the dedup log.Discussion
General discussion about the OpenZFS dedup feature as a whole and feedback on the above can be included on this issue below. Specific feedback and review of the individual PRs should go to those PRs, to make sure we don’t miss anything.
Beta Was this translation helpful? Give feedback.
All reactions