Question: Are there hidden costs to enabling dedup on a rarely-used dataset? #10676

adamdmoss · 2020-08-05T20:45:19Z

adamdmoss
Aug 5, 2020

System information

Type	Version/Name
ZFS Version	master

I understand (and have verified with testing) that using dedup is typically a bad idea except for specific cases.

I have a case for an archival dataset that's rarely written or read (a big metadata crawl and ~120MB of writes daily in a burst); I believe there's fair redundancy across files in the dataset and I don't care hugely about read/write performance.

I'd like to permit dedup on this dataset but I was wondering; is there a resource usage overhead added for the non-dedup datasets in the same pool? Like, anything persistent in memory or extra indirection that affects those non-dedup datasets whose performance I do care about?

The net has conflicting opinions, as usual.

Cheers.

Answered by GregorKopka

Aug 10, 2020

Dedup will affect only the datasets where it is (or ever had been) activated on disk.
File deletes and snapshot destroys might force DDT reads, this will reduce overall available bandwidth of the pool and can increase IO latency for non-dedup filesystems. The DDT will occupy a part of memory available to ARC, this can impact caching efficiency for other datasets.

View full answer

GregorKopka · 2020-08-10T09:53:04Z

GregorKopka
Aug 10, 2020

Dedup will affect only the datasets where it is (or ever had been) activated on disk.
File deletes and snapshot destroys might force DDT reads, this will reduce overall available bandwidth of the pool and can increase IO latency for non-dedup filesystems. The DDT will occupy a part of memory available to ARC, this can impact caching efficiency for other datasets.

0 replies

bghira · 2020-08-10T15:18:34Z

bghira
Aug 10, 2020

dedup table is pool wide so any destroy operation on the pool at all has to traverse it, as i understand.

0 replies

GregorKopka · 2020-08-10T15:24:58Z

GregorKopka
Aug 10, 2020

dedup table is pool wide so any destroy operation on the pool at all has to traverse it, as i understand.

The DDT are pool wide, but they have to be consulted only in case a block is freed that is marked as deduplicated.

0 replies

bghira · 2020-08-10T16:00:51Z

bghira
Aug 10, 2020

The DDT are pool wide, but they have to be consulted only in case a block is freed that is marked as deduplicated.

and how does it know? by traversing the DDT.

0 replies

GregorKopka · 2020-08-10T19:24:02Z

GregorKopka
Aug 10, 2020

No. The block pointer contains a flag when the target is deduplicated.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: Are there hidden costs to enabling dedup on a rarely-used dataset? #10676

{{title}}

Replies: 5 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Question: Are there hidden costs to enabling dedup on a rarely-used dataset? #10676

adamdmoss Aug 5, 2020

System information

Replies: 5 comments

GregorKopka Aug 10, 2020

bghira Aug 10, 2020

GregorKopka Aug 10, 2020

bghira Aug 10, 2020

GregorKopka Aug 10, 2020

adamdmoss
Aug 5, 2020

GregorKopka
Aug 10, 2020

bghira
Aug 10, 2020

GregorKopka
Aug 10, 2020

bghira
Aug 10, 2020

GregorKopka
Aug 10, 2020