Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add server-count support with random selection algorithm #63

Merged
merged 9 commits into from
Sep 3, 2024

Conversation

glebkin
Copy link
Contributor

@glebkin glebkin commented Jul 23, 2024

The Fanout plugin currently sends DNS requests to all servers specified in the configuration. However, this may be unnecessary in some cases. For example, if we have multiple instances of DNS servers scaled horizontally, each with the same copy of DNS records, it would be pointless to send requests to each server individually. Instead, we can choose to send requests to only some of the servers based on their load-factor. To address this, I have made some improvements to the plugin:

  1. Added two new params - server-count and load-factor:
  • server-count is the number of DNS servers to be requested. Equals to the number of specified IPs by default. If this parameter is lower than the number of specified IP addresses, servers are randomly selected based on the load-factor parameter.
  • load-factor - the probability of selecting a server. This is specified in the order of the list of IP addresses and takes values between 1 and 100. By default, all servers have an equal probability of 100.
    Example of such configuration:
example.org {
    fanout . 127.0.0.1:9005 127.0.0.1:9006 127.0.0.1:9007 127.0.0.1:9008 {
        server-count 2
        load-factor 100 50 70 100
    }
}

In this example, for each DNS request to example.org, only two of the specified servers will be selected. The servers are selected randomly based on the load-factor parameter of each server.
2. Added the internal/selector package with two implementations for generic values selection

  • Simple selector - useful when the server-count equals the number of available IPs. In this case, we don't need to spend time randomly selecting servers, so we simply return all of them, one by one.
  • WeightedRand selector which selects servers randomly based on their load-factor
  1. Added runWorkers function for better readability
  2. Added responseCh close and context check to get rid of potential goroutine leaks
  3. Updated go version to 1.22 to support math/rand/v2 and generics
  4. Updated CI/CD + linter to use new versions of Go + fixed some linter issues

@glebkin
Copy link
Contributor Author

glebkin commented Jul 24, 2024

Hi, @denis-tingaikin
Could you please provide updates regarding this PR? Is it possible to merge it or some additional work needs to be done?
One thing that comes to mind right now is maybe it would be best to use the same version of Go that CoreDNS currently supports, which is 1.21.

@glebkin
Copy link
Contributor Author

glebkin commented Jul 26, 2024

Sorry, got some troubles with DCO, reopened this PR)
Go version is 1.21 now.

Copy link
Member

@denis-tingaikin denis-tingaikin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the idea to get the policy for the fanout plugin. Many thanks for updating everything to the latest versions. Please feel free to open a separate PR with uplift; it will be merged instantly.

Just curious. Why just the load balancer server could not be used for the servers with the same records?

I found a few corner cases that at this moment are looking like blockers:

  1. Servers with the same records can be selected during the request.
  2. Server availability is not managed.

But if these problems are fine for your use-cases, we can merge this if a policy block is implemented.

setup.go Outdated Show resolved Hide resolved
setup_test.go Outdated Show resolved Hide resolved
internal/selector/simple.go Outdated Show resolved Hide resolved
fanout.go Outdated Show resolved Hide resolved
README.md Outdated
Comment on lines 27 to 28
* `server-count` is the number of DNS servers to be requested. Equals to the number of specified IPs by default. If this parameter is lower than the number of specified IP addresses, servers are randomly selected based on the `load-factor` parameter.
* `load-factor` - the probability of selecting a server. This is specified in the order of the list of IP addresses and takes values between 1 and 100. By default, all servers have an equal probability of 100.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think if we define a policy field to be consistent with the forward plugin?

fanout  addr1, addr2, adddr3 {
  policy weighted-rand{
    server-count 2
    load-factor 50  50 50
  }
}

Copy link
Contributor Author

@glebkin glebkin Aug 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.
Now it's possible to specify policy for fanout:

weighted-random:

example.org {
    fanout . 127.0.0.1:9005 127.0.0.1:9006 127.0.0.1:9007
    policy weighted-random {
      server-count 2
      load-factor 50 70 100
    }
}

sequential (default):

example.org {
    fanout . 127.0.0.1:9005 127.0.0.1:9006 127.0.0.1:9007
    policy sequential
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

policy is going to be a separate Caddyfile block that may confuse Coredns users. 

What if we make it flat?

example.org {
    fanout . 127.0.0.1:9005 127.0.0.1:9006 127.0.0.1:9007 {
     policy: weighted-random 
     weighted-random-server-count: 2
      weighted-random-load-factor 50 70 100
    }
}

Copy link
Contributor Author

@glebkin glebkin Sep 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Example:

example.org {
    fanout . 127.0.0.1:9005 127.0.0.1:9006 127.0.0.1:9007 {
      policy weighted-random
      weighted-random-server-count 2
      weighted-random-load-factor 50 70 100
    }
}

fanout.go Outdated Show resolved Hide resolved
fanout.go Outdated Show resolved Hide resolved
* Set Go version to 1.21

---------

Signed-off-by: Gleb Kogtev <[email protected]>
- update error message in setup.go
- add empty case in setup_test.go
- make variables names more consistent
- update copyright

Signed-off-by: Gleb Kogtev <[email protected]>
@glebkin
Copy link
Contributor Author

glebkin commented Aug 20, 2024

I really like the idea to get the policy for the fanout plugin. Many thanks for updating everything to the latest versions. Please feel free to open a separate PR with uplift; it will be merged instantly.

Just curious. Why just the load balancer server could not be used for the servers with the same records?

I found a few corner cases that at this moment are looking like blockers:

  1. Servers with the same records can be selected during the request.
  2. Server availability is not managed.

But if these problems are fine for your use-cases, we can merge this if a policy block is implemented.

Well, we expect this plugin to be further improved. It's just a first step.
I'm sure, that we might want to add some sort of self-management of load-factor on CoreDNS side based on responses.
For now its okay for us to have this values externally managed

Load balancer is additional component, which might be a bottleneck in high-load scenarios

- rename simple selector to sequential
- update docs + add examples

Signed-off-by: Gleb Kogtev <[email protected]>
@glebkin
Copy link
Contributor Author

glebkin commented Sep 2, 2024

@denis-tingaikin kindly ask you to take a look into this PR one more time

@denis-tingaikin
Copy link
Member

denis-tingaikin commented Sep 2, 2024

@glebkin Looks good, only one comment and failed linter are remaining ;)

* make policy configuration flat
* update docs
* fix linter issues

---------

Signed-off-by: Gleb Kogtev <[email protected]>
@glebkin
Copy link
Contributor Author

glebkin commented Sep 2, 2024

@glebkin Looks good, only one comment and failed linter are remaining ;)

Done. Please take a look.

README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
@denis-tingaikin
Copy link
Member

I've found a few confusing things in the README.md file; could you check if the configuration that I suggest is correct?

glebkin and others added 2 commits September 2, 2024 17:05
Co-authored-by: Denis Tingaikin <[email protected]>
Signed-off-by: Gleb Kogtev <[email protected]>
Co-authored-by: Denis Tingaikin <[email protected]>
Signed-off-by: Gleb Kogtev <[email protected]>
@glebkin
Copy link
Contributor Author

glebkin commented Sep 2, 2024

I've found a few confusing things in the README.md file; could you check if the configuration that I suggest is correct?

Yep, thanks for help, fixed these typos

Honestly, I didn't notice that I printed configuration wrong here, but actually it was like this:

example.org {
    fanout . 127.0.0.1:9005 127.0.0.1:9006 127.0.0.1:9007 {
      policy weighted-random {
        server-count 2
        load-factor 50 70 100
      }
    }
}

So I had nested block with policy configuration.
Should I rollback it to this config with nested blocks?

Revert flat policy configuration params.
Fix README.md typos with configuration examples
---------

Signed-off-by: Gleb Kogtev <[email protected]>
@glebkin
Copy link
Contributor Author

glebkin commented Sep 2, 2024

@denis-tingaikin I decided to rollback my latest commit with flat policy configuration, because I think

this configuration is better:

example.org {
    fanout . 127.0.0.1:9005 127.0.0.1:9006 127.0.0.1:9007 {
      policy weighted-random {
        server-count 2
        load-factor 50 70 100
      }
    }
}

than this:

example.org {
    fanout . 127.0.0.1:9005 127.0.0.1:9006 127.0.0.1:9007 {
      policy weighted-random
      weighted-random-server-count 2
      weighted-random-load-factor 50 70 100
    }
}

Also I've update README.md examples to match actual setup settings.
Please let me know if you have concerns about this approach and prefer the second approach with flat fanout configuration

@denis-tingaikin
Copy link
Member

Since we're trying to use the syntax that, as you found, is not allowed by Caddyfile, I am going to ask CoreDNS contact to be sure that this is OK.

@denis-tingaikin
Copy link
Member

@glebkin I just got a response that it's not recommended. Following the stict Caddyfile format is the best option. So I suggest returning to the flat configuration.

This reverts commit 48958d5.
---------

Signed-off-by: Gleb Kogtev <[email protected]>
@glebkin
Copy link
Contributor Author

glebkin commented Sep 3, 2024

@glebkin I just got a response that it's not recommended. Following the stict Caddyfile format is the best option. So I suggest returning to the flat configuration.

Sad, but okay. I've reverted it.

Copy link
Member

@denis-tingaikin denis-tingaikin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@glebkin Many thanks!

@denis-tingaikin denis-tingaikin merged commit 402e0dc into networkservicemesh:main Sep 3, 2024
13 checks passed
@glebkin
Copy link
Contributor Author

glebkin commented Sep 3, 2024

@denis-tingaikin is it possible to create new release?

@denis-tingaikin
Copy link
Member

Sure done: https://github.com/networkservicemesh/fanout/releases/tag/v1.11.3

tag is v1.11.3 indicates which coredns we use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants