Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Brick inode table size #4339

Open
wants to merge 3 commits into
base: devel
Choose a base branch
from

Conversation

jkroonza
Copy link
Contributor

No description provided.

@gluster-ant
Copy link
Collaborator

Can one of the admins verify this patch?

2 similar comments
@gluster-ant
Copy link
Collaborator

Can one of the admins verify this patch?

@gluster-ant
Copy link
Collaborator

Can one of the admins verify this patch?

@jkroonza jkroonza mentioned this pull request Apr 18, 2024
@gluster-ant
Copy link
Collaborator

CLANG-FORMAT FAILURE:
Before merging the patch, this diff needs to be considered for passing clang-format

index 3a85d7233..129af5eee 100644
--- a/libglusterfs/src/xlator.c
+++ b/libglusterfs/src/xlator.c
@@ -484,7 +484,7 @@ xlator_set_inode_table_size(xlator_t *this, void *data)
             goto out;
         }
         inode_table_size = *(int *)data;
-        //inode_table_set_table_size(this->itable, inode_table_size);
+        // inode_table_set_table_size(this->itable, inode_table_size);
         /* can we replace the table ? */
         /* alternatively, create a new table and migrate all entries? */
     }
diff --git a/xlators/protocol/server/src/server.c b/xlators/protocol/server/src/server.c
index d91f94749..41c491c8a 100644
--- a/xlators/protocol/server/src/server.c
+++ b/xlators/protocol/server/src/server.c
@@ -306,8 +306,8 @@ get_auth_types(dict_t *this, char *key, data_t *value, void *data)
             ret = dict_set_dynptr(auth_dict, tmp, NULL, 0);
             if (ret < 0) {
                 gf_msg_debug("server", 0,
-                            "failed to "
-                            "dict_set_dynptr");
+                             "failed to "
+                             "dict_set_dynptr");
             }
         }
     }
@@ -1486,9 +1486,9 @@ server_process_event_upcall(xlator_t *this, void *data)
             if (!client || strcmp(client->client_uid, client_uid))
                 continue;
 
-           /* Avoid upcall notification to client if disconnect is in
-              progress
-            */
+            /* Avoid upcall notification to client if disconnect is in
+               progress
+             */
             if (GF_ATOMIC_GET(xprt->disconnect_progress))
                 continue;
 

libglusterfs/src/xlator.c Outdated Show resolved Hide resolved
1m is the lower workable limit for us currently, and this is set to
grow, so pre-emptively permit for pushing this limit higher.
Sacrificing RAM to avoid disk IO for us is a no-brainer.

Signed-off-by: Jaco Kroon <[email protected]>
@jkroonza jkroonza force-pushed the brick-inode-table-size branch from 61164ca to 03f7428 Compare April 18, 2024 21:58
@@ -1512,6 +1512,9 @@ struct volopt_map_entry glusterd_volopt_map[] = {
{.key = "network.inode-lru-limit",
.voltype = "protocol/server",
.op_version = 1},
{.key = "network.inode-table-size",
.voltype = "protocol/server",
.op_version = GD_OP_VERSION_11_0}, /* is this right? */
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^^ other .op_versions here is 1, but this option will only be available from glusterfs 11.2 (or patched 11.1s).

@gluster-ant
Copy link
Collaborator

CLANG-FORMAT FAILURE:
Before merging the patch, this diff needs to be considered for passing clang-format

index 3a85d7233..129af5eee 100644
--- a/libglusterfs/src/xlator.c
+++ b/libglusterfs/src/xlator.c
@@ -484,7 +484,7 @@ xlator_set_inode_table_size(xlator_t *this, void *data)
             goto out;
         }
         inode_table_size = *(int *)data;
-        //inode_table_set_table_size(this->itable, inode_table_size);
+        // inode_table_set_table_size(this->itable, inode_table_size);
         /* can we replace the table ? */
         /* alternatively, create a new table and migrate all entries? */
     }
diff --git a/xlators/protocol/server/src/server.c b/xlators/protocol/server/src/server.c
index d91f94749..41c491c8a 100644
--- a/xlators/protocol/server/src/server.c
+++ b/xlators/protocol/server/src/server.c
@@ -306,8 +306,8 @@ get_auth_types(dict_t *this, char *key, data_t *value, void *data)
             ret = dict_set_dynptr(auth_dict, tmp, NULL, 0);
             if (ret < 0) {
                 gf_msg_debug("server", 0,
-                            "failed to "
-                            "dict_set_dynptr");
+                             "failed to "
+                             "dict_set_dynptr");
             }
         }
     }
@@ -1486,9 +1486,9 @@ server_process_event_upcall(xlator_t *this, void *data)
             if (!client || strcmp(client->client_uid, client_uid))
                 continue;
 
-           /* Avoid upcall notification to client if disconnect is in
-              progress
-            */
+            /* Avoid upcall notification to client if disconnect is in
+               progress
+             */
             if (GF_ATOMIC_GET(xprt->disconnect_progress))
                 continue;
 

…h table.

This creates a new table and destroys the old.

Signed-off-by: Jaco Kroon <[email protected]>
@jkroonza jkroonza force-pushed the brick-inode-table-size branch from 03f7428 to f64b7ef Compare April 19, 2024 09:26
@gluster-ant
Copy link
Collaborator

CLANG-FORMAT FAILURE:
Before merging the patch, this diff needs to be considered for passing clang-format

index 84b0ce925..6c35a89e2 100644
--- a/libglusterfs/src/xlator.c
+++ b/libglusterfs/src/xlator.c
@@ -486,9 +486,10 @@ xlator_set_inode_table_size(xlator_t *this, void *data)
         inode_table_size = *(int *)data;
 
         if (inode_table_size != this->itable->inode_hashsize) {
-            inode_table_t* tmp = inode_table_with_invalidator(this->itable->lru_limit,
-                    this, this->itable->invalidator_fn, this->itable->invalidator_xl,
-                    this->itable->dentry_hashsize, inode_table_size);
+            inode_table_t *tmp = inode_table_with_invalidator(
+                this->itable->lru_limit, this, this->itable->invalidator_fn,
+                this->itable->invalidator_xl, this->itable->dentry_hashsize,
+                inode_table_size);
             if (tmp->inode_hashsize != this->itable->inode_hashsize) {
                 inode_table_destroy(this->itable);
                 this->itable = tmp;
diff --git a/xlators/protocol/server/src/server.c b/xlators/protocol/server/src/server.c
index d91f94749..41c491c8a 100644
--- a/xlators/protocol/server/src/server.c
+++ b/xlators/protocol/server/src/server.c
@@ -306,8 +306,8 @@ get_auth_types(dict_t *this, char *key, data_t *value, void *data)
             ret = dict_set_dynptr(auth_dict, tmp, NULL, 0);
             if (ret < 0) {
                 gf_msg_debug("server", 0,
-                            "failed to "
-                            "dict_set_dynptr");
+                             "failed to "
+                             "dict_set_dynptr");
             }
         }
     }
@@ -1486,9 +1486,9 @@ server_process_event_upcall(xlator_t *this, void *data)
             if (!client || strcmp(client->client_uid, client_uid))
                 continue;
 
-           /* Avoid upcall notification to client if disconnect is in
-              progress
-            */
+            /* Avoid upcall notification to client if disconnect is in
+               progress
+             */
             if (GF_ATOMIC_GET(xprt->disconnect_progress))
                 continue;
 

.default_value = "65536",
.description = "Specifies the size of the inode hash table, "
"must be a power of two.",
.op_version = {GD_OP_VERSION_10_0}, /* if possible */
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably wrong. Should be 11_0.

This can be used to determine if dentry_hashsize is correctly set or
not.

Signed-off-by: Jaco Kroon <[email protected]>
@jkroonza
Copy link
Contributor Author

@mykaul @mohit84

Whilst this isn't ready yet, and has only been compile-tested so far, I would really appreciate some feedback as to whether I'm on the correct track here or not. I kept it to a commit per "idea" as per #4335 discussion.

The outstanding bit is "Ability to set the dentry_hashsize (network.dentry-hash-size ?)." - which I think will take a re-hash approach rather than the same just dump everything approach I'm currently taking elsewhere.

@gluster-ant
Copy link
Collaborator

CLANG-FORMAT FAILURE:
Before merging the patch, this diff needs to be considered for passing clang-format

index e5ca927be..e90af425f 100644
--- a/libglusterfs/src/inode.c
+++ b/libglusterfs/src/inode.c
@@ -2419,8 +2419,8 @@ unlock:
     return;
 }
 
-static
-size_t __itable_count_dentry_hash_entries(inode_table_t *itable)
+static size_t
+__itable_count_dentry_hash_entries(inode_table_t *itable)
 {
     size_t ret = 0;
     int i;
@@ -2451,7 +2451,8 @@ inode_table_dump(inode_table_t *itable, char *prefix)
     gf_proc_dump_build_key(key, prefix, "dentry_hashsize");
     gf_proc_dump_write(key, "%" GF_PRI_SIZET, itable->dentry_hashsize);
     gf_proc_dump_build_key(key, prefix, "dentry_hashentries");
-    gf_proc_dump_write(key, "%" GF_PRI_SIZET, __itable_count_dentry_hash_entries(itable));
+    gf_proc_dump_write(key, "%" GF_PRI_SIZET,
+                       __itable_count_dentry_hash_entries(itable));
     gf_proc_dump_build_key(key, prefix, "inode_hashsize");
     gf_proc_dump_write(key, "%" GF_PRI_SIZET, itable->inode_hashsize);
     gf_proc_dump_build_key(key, prefix, "name");
diff --git a/libglusterfs/src/xlator.c b/libglusterfs/src/xlator.c
index 84b0ce925..6c35a89e2 100644
--- a/libglusterfs/src/xlator.c
+++ b/libglusterfs/src/xlator.c
@@ -486,9 +486,10 @@ xlator_set_inode_table_size(xlator_t *this, void *data)
         inode_table_size = *(int *)data;
 
         if (inode_table_size != this->itable->inode_hashsize) {
-            inode_table_t* tmp = inode_table_with_invalidator(this->itable->lru_limit,
-                    this, this->itable->invalidator_fn, this->itable->invalidator_xl,
-                    this->itable->dentry_hashsize, inode_table_size);
+            inode_table_t *tmp = inode_table_with_invalidator(
+                this->itable->lru_limit, this, this->itable->invalidator_fn,
+                this->itable->invalidator_xl, this->itable->dentry_hashsize,
+                inode_table_size);
             if (tmp->inode_hashsize != this->itable->inode_hashsize) {
                 inode_table_destroy(this->itable);
                 this->itable = tmp;
diff --git a/xlators/protocol/server/src/server.c b/xlators/protocol/server/src/server.c
index d91f94749..41c491c8a 100644
--- a/xlators/protocol/server/src/server.c
+++ b/xlators/protocol/server/src/server.c
@@ -306,8 +306,8 @@ get_auth_types(dict_t *this, char *key, data_t *value, void *data)
             ret = dict_set_dynptr(auth_dict, tmp, NULL, 0);
             if (ret < 0) {
                 gf_msg_debug("server", 0,
-                            "failed to "
-                            "dict_set_dynptr");
+                             "failed to "
+                             "dict_set_dynptr");
             }
         }
     }
@@ -1486,9 +1486,9 @@ server_process_event_upcall(xlator_t *this, void *data)
             if (!client || strcmp(client->client_uid, client_uid))
                 continue;
 
-           /* Avoid upcall notification to client if disconnect is in
-              progress
-            */
+            /* Avoid upcall notification to client if disconnect is in
+               progress
+             */
             if (GF_ATOMIC_GET(xprt->disconnect_progress))
                 continue;
 

.default_value = "16384",
.description = "Specifies the limit on the number of inodes "
"in the lru list of the inode cache.",
.op_version = {1},
.flags = OPT_FLAG_SETTABLE | OPT_FLAG_DOC},
{.key = {"inode-table-size"},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to declare a new option inode-table-size, i think inode-table-size is already present, We can pass as an argument inode_table_size to the process.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would you pass that to a brick?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the time being you can pass as an argument to a brick process as like kill the running brick process and start a brick process with same argument and add inode-table-size something like "--inode-table-size=65536" and later you can change glusterd to pass the argument during start of a brick process instead of creating a new option .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So either modify glusterd or add an option? Isn't then just adding an option simpler because that's how most of the rest of the system is managed anyway?

Please note, just trying to understand the logic here.

If adding options to glusterd is the way to go - is there an example mechanism I can follow?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So either modify glusterd or add an option? Isn't then just adding an option simpler because that's how most of the rest of the system is managed anyway?

Please note, just trying to understand the logic here.

If adding options to glusterd is the way to go - is there an example mechanism I can follow?

I still believe the best way is to use an existing option(inode-table-size) instead of creating a new argument.To
configure the same via glusterd you can declare an option at glusterd.c option table something like
working-directory and save the configured value in glusterd_conf and access the same value via glusterd_conf_t during brick start in glusterd_volume_start_glusterfs and pass as an argument.

@xhernandez @aravindavk Can you please share your view about the same.

the xlator will have its itable pointer set). If so, then
set the lru limit for the itable.
*/
xlator_foreach(this, xlator_set_inode_table_size, &inode_table_size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need to call xlator_foreach here to configure the inode_table_size, you can pass the value(inode_table_size, dentry_hash_size)during call inode_table_new in the function server_setvolume.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was following the example from network.inode-lru-limit, will make the required update here, thanks for pointing this out. So just save this into the conf variable then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants