Support for TCP Searches #192

sveseli · 2024-01-24T23:19:22Z

This PR has been modified from the original (support for direct tcp connections) to include support for tcp searches.

Server side changes:

Servers have ability to respond to tcp searches on the server port

Client side changes:

By default, only udp channel searches are performed as before
Specifying EPICS_PVA_NAME_SERVERS variable will result in tcp search queries to the listed HOST:PORT addresses, in addition to the regular udp search

Here are some examples of how things work. In terminal 1, we run test server using non-default ports:

daq-dss02> EPICS_PVA_SERVER_PORT=11111 EPICS_PVA_BROADCAST_PORT=22222 ./modules/pvAccess/testApp/O.linux-x86_64/testServer
pvAccess Server v7.1.8-SNAPSHOT
Active configuration (w/ defaults)
EPICS_PVAS_INTF_ADDR_LIST = 0.0.0.0:11111
EPICS_PVAS_BEACON_ADDR_LIST = 
EPICS_PVAS_AUTO_BEACON_ADDR_LIST = YES
EPICS_PVAS_BEACON_PERIOD = 15
EPICS_PVAS_BROADCAST_PORT = 22222
EPICS_PVAS_SERVER_PORT = 11111
EPICS_PVAS_PROVIDER_NAMES = local
...

In terminal 2, on a second machine, we run client. If we do not specify anything, client fails to connect:

daq-qss21> pvget testCounter
Timeout
testCounter

If we specify broadcast port, channel connects using udp discovery:

daq-qss21> EPICS_PVA_BROADCAST_PORT=22222 pvget testCounter
testCounter 2024-03-22 14:01:33.332  27 
daq-qss21> EPICS_PVA_BROADCAST_PORT=22222 pvget testCounter
testCounter 2024-03-22 14:01:33.332  27 
daq-qss21> EPICS_PVA_BROADCAST_PORT=22222 pvget -d testCounter
2024-03-22T14:01:41.838 Configured PVA address list: 
2024-03-22T14:01:41.838 Configured server port: 5075
2024-03-22T14:01:41.838 Configured name server address list: 
2024-03-22T14:01:41.838 Configured broadcast port: 22222
...
2024-03-22T14:01:41.839 Server address decoded as 10.6.13.8:11111, transport type is udp
...
2024-03-22T14:01:41.841 Connected to PVA server: 10.6.13.8:11111.
2024-03-22T14:01:41.841 Unregistering search instance: testCounter
testCounter 2024-03-22 14:01:41.333  35 
...

If we specify EPICS_PVA_NAME_SERVERS variable, tcp search will result in channel connection:

daq-qss21> EPICS_PVA_NAME_SERVERS=daq-dss02:11111 pvget testCounter
testCounter 2024-03-22 14:14:16.483  790 
daq-qss21> EPICS_PVA_NAME_SERVERS=daq-dss02:11111 pvget -d testCounter
2024-03-22T14:14:23.016 Configured PVA address list: 
2024-03-22T14:14:23.016 Configured server port: 5075
2024-03-22T14:14:23.016 Configured name server address list: daq-dss02:11111
2024-03-22T14:14:23.016 Configured broadcast port: 5076
2024-03-22T14:14:23.018 Creating datagram socket from: 0.0.0.0:53776.
...
2024-03-22T14:14:23.019 Getting name server transport for address 10.6.13.8:11111
...
2024-03-22T14:14:23.021 Searching for channel: testCounter
2024-03-22T14:14:23.021 Server address decoded as 10.6.13.8:11111, transport type is tcp
...
2024-03-22T14:14:23.021 Connecting to PVA server: 10.6.13.8:11111.
...
testCounter 2024-03-22 14:14:22.485  796 
...

…ible (e.g., if the connection has to go through firewalls); this can be done in those cases where server address is simply known, using combination of EPICS_PVA_ADDR_LIST and EPICS_PVA_SERVER_PORT variables

AppVeyorBot · 2024-01-25T03:14:16Z

✅ Build pvAccessCPP 1.0.80 completed (commit 2da7698c75 by @sveseli)

…T environment variable on both server and client side

kasemir

I can't comment on the PVA C++ specifics, I'm simply not familiar with it.
But as for configuring direct TCP connections to avoid the UDP searches:
PVXS and the new Java implementation use EPICS_PVA_NAME_SERVERS to set a list of TCP addresses. The "name server" in here is a bit misleading. You could of course list an actual name server IP, but to the client it just means that it sends a search request via TCP. The message handling code on the server is really the same, no matter if a message is received via UDP or TCP.
In CA, I think it was similar: EPICS_CA_NAME_SERVERS lists TCP addresses for searches.

So with EPICS_PVA_NAME_SERVERS="IP1 IP2 IP3" the client will send searches to those 3 IP addresses. A name server might reply with a search response that lists the actual IOC, but if the address is already an IOC, that IOC can return a search response with zero for the IP address to indicate: Continue on this TCP connection for the data exchange.
In more detail, EPICS_PVA_NAME_SERVERS may contain "IP:port" to list a TCP address with port. See https://github.com/epics-base/pvAccessCPP/wiki/Protocol-Messages#cmd_search-0x03 for details.

kasemir · 2024-01-25T18:50:21Z

So with PVXS and the new java lib, this is how you would configure TCP-only searches:

EPICS_PVA_NAME_SERVERS="IP1:PORT1 IP2:PORT2"
EPICS_PVA_AUTO_ADDR_LIST=NO
EPICS_PVA_ADDR_LIST=""

If only setting EPICS_PVA_NAME_SERVERS, it will contact those TCP addresses plus still send UDP searches.

anjohnson · 2024-01-30T15:16:44Z

@kasemir is there an existing PVA name server?

@sveseli is looking to write one based on pvaPy if there isn't one already out there, but we'd prefer to use something that already exists.

mdavidsaver · 2024-01-30T16:23:03Z

I feel bound to point out that PVXS already has a more complete "direct connection" (aka. bypassing the search phase entirely). One usage is in pvxlist to connect to the un-searchable server PV. Unlike pvlist in pvAccessCPP, pvxlist is implemented using public API.

Also, fair warning. As I recall pvAccessCPP does not correctly handle reconnection of "direct" channels. So imo. it is not suitable for use with subscriptions specifically and may cause problems with long running clients generally.

sveseli · 2024-01-30T16:31:38Z

From what I have seen, before this PR direct connection to channels was not possible at all, as the code relied entirely on UDP searches, and if those are not resulting in channel discovery, there would be no connection made. In my tests, this PR handles monitor re-connections correctly, as long as the server comes up on the same machine and same port, regardless of whether UDP search is available or not.

mdavidsaver

Please add some test converge.

kasemir · 2024-01-30T16:53:56Z

is there an existing PVA name server?

The Java PVA lib contains a command line demo, https://github.com/ControlSystemStudio/phoebus/blob/master/core/pva/src/test/java/org/epics/pva/server/SearchMonitorDemo.java

It takes list of PVs and their IP:port on the command line and then replies to searches for those PVs with the provided name. That's of course only usable as a demo for one or two PVs, but it allows testing EPICS_PVA_NAME_SERVERS, so clients go to the name server via TCP, get the IP:port of the IOC and then connect via TCP to the IOC.

The plan was somewhat like this:

Extend the channel finder's IOC tool (reccaster?) to provide the TCP port. I think that's been completed.
Channel finder will thus not only know the PV names but also the TCP IP and port.
Next build a name server that starts like that demo but instead of taking info for 1 or 2 PVs from the command line it queries the channel finder.

anjohnson · 2024-01-30T17:27:42Z

Thanks Kay. Given that the PVA protocol has the ability for a name-server to query the servers for their complete list of PV names I don't think an oracle for PV names would be essential, although it might be good for performance reasons.

There are 2 modes that I could see it using, maybe both at once: When the name-server sees a search request for a PV name that it doesn't recognize it would request that name through its client-side API, and would handle any response by asking the server that answered for all of its PVs. Alternatively/also it could monitor for beacons from new servers coming up, and proactively ask them for their names. The first mode would be needed for it to work with dynamic servers which could add new PVs at runtime, if it sees a response from a known server it would refresh the list of names for that server.

I'm not quite sure how to handle dynamic servers that can drop PV names, somehow the name-server needs to know that's happened. I suspect the pvlist request can't be issued as a monitor, although that would be really helpful.

It's important for the server to cache the PV names requested that it doesn't have a server for so it doesn't replicate every random request (the CA name-server has one). Supporting a configurable list of regexp's that match names to be forwarded and/or to not be forwarded might be a reasonable alternative to the Channel Finder to help with that.

anjohnson · 2024-01-30T17:50:30Z

@mdavidsaver Unfortunately APS would need several FTE's of effort to convert all our DAQ software to the PVXS API, and we also rely on the plugin abilities currently unique to the pvAccess and pvDatabase libraries to be able to handle the high data rates coming from those DAQ systems. The data distributor plugin lets us accept updates from one data stream and fan them out to multiple clients to be processed in parallel. This PR provides the ability to connect through firewalls.

I do agree with the need for tests.

kasemir · 2024-01-30T18:31:37Z

Andrew, you're correct, a name server might either rely on some type of name database (Channel Finder, Oracle, ...), or it might build that database itself. It could issue "pvlist" requests, or operate similar to the gateway by sending its own search request and then memorizing the reply.

@shroffk those are options we could consider if we ever get back to working on a name server.

shroffk · 2024-01-30T19:07:31Z

We could consider a combination of those actions

I have a very basic name server which uses ChannelFinder populated with recsync and the PVA port.
https://github.com/ChannelFinder/cfNameserver

I had imagined that the fall back mechanism for the name server if it fails to find the name resolution in CF would be to do its own name resolution search.

shroffk · 2024-04-15T19:59:11Z

Sorry about the accidental close

AppVeyorBot · 2024-04-16T00:27:10Z

✅ Build pvAccessCPP 1.0.99 completed (commit 1b3a3f5095 by @sveseli)

anjohnson · 2024-04-17T15:04:05Z

Core Group: MAD will review this and provide feedback before the next meeting.

…R_PORT constant

AppVeyorBot · 2024-04-18T02:31:05Z

✅ Build pvAccessCPP 1.0.103 completed (commit 9d7930478a by @sveseli)

…r context impl class

AppVeyorBot · 2024-05-16T20:01:01Z

✅ Build pvAccessCPP 1.0.104 completed (commit c81093cb2c by @sveseli)

… nubmer of attempts

AppVeyorBot · 2024-05-17T03:54:21Z

✅ Build pvAccessCPP 1.0.105 completed (commit 84eb3d6400 by @sveseli)

mdavidsaver · 2024-05-19T18:36:34Z

With @sveseli s recent pruning, I think I see what is going on wrt. name server handling. A couple of questions to check my understanding.

Why are connections to name servers treated differently to other connections?

Is each client context limited to connecting to one name server at a time?

wrt. releaseNameServerTransport(). This looks like manual ref. counting. Can it be avoided?

wrt. the design around getNameServerSearchTransport(). Can you avoid looping for blocking connect() from the search timer callback?

eg. doing so stalls searching while name server(s) are offline.

$ time ./bin/linux-x86_64/pvget foo
Timeout
foo 
real    0m5.029s
$ time EPICS_PVA_NAME_SERVERS=10.127.127.2 ./bin/linux-x86_64/pvget foo
Timeout
foo 
real    0m9.227s
$ time EPICS_PVA_NAME_SERVERS="10.127.127.2 10.127.127.3" ./bin/linux-x86_64/pvget foo
Timeout
foo 
real    0m12.285s

Compare with:

$ time EPICS_PVA_NAME_SERVERS="10.127.127.2 10.127.127.3" ./bin/linux-x86_64/pvxget foo
2024-05-19T14:17:47.878231057 ERR pvxs.tcp.io connection to Server 10.127.127.3:5075 closed with socket error 113 : No route to host
2024-05-19T14:17:47.878889508 ERR pvxs.tcp.io connection to Server 10.127.127.2:5075 closed with socket error 113 : No route to host
Timeout with 1 outstanding

real    0m5.031s

… use timers; if multiple name servers are configured, connections to all of them are obtained at the same time

…ccessful

AppVeyorBot · 2024-05-29T19:16:02Z

✅ Build pvAccessCPP 1.0.106 completed (commit 8fc112b9b2 by @sveseli)

sveseli · 2024-05-29T19:29:38Z

The blocking call for name server connections has been removed. Name server connections now use separate set of timers, so that they can be established at the same time. In the examples below I used non-existent hosts:

$ time pvget foo
Timeout
foo 
real	0m5.021s
user	0m0.011s
sys	0m0.015s

$ time EPICS_PVA_NAME_SERVERS="192.168.0.112:11111" pvget foo
Timeout
foo 
real	0m5.128s
user	0m0.010s
sys	0m0.023s

$ time EPICS_PVA_NAME_SERVERS="192.168.0.112:11111 192.168.0.113:22222" pvget foo
Timeout
foo 
real	0m5.113s
user	0m0.000s
sys	0m0.021s

Client can now establish connections to multiple name servers at the same time, and they are released as soon as they are no longer needed or aren't useful (hence methods that release those connections). Name server connections reuse existing classes (e.g. TransportRegistry, BlockingTCPConnector, etc.) as much as possible.

AppVeyorBot · 2024-05-29T23:54:45Z

✅ Build pvAccessCPP 1.0.107 completed (commit 5f53b171aa by @sveseli)

sveseli · 2024-06-10T21:16:17Z

@mdavidsaver Is there anything else you would like to see done or modified before this PR can be merged?

sveseli · 2024-07-12T14:24:58Z

@mdavidsaver @anjohnson Is this PR okay to be merged, or would you like to see some other changes? I would like to make sure that this goes into the next release, if that would be possible.

mdavidsaver · 2024-07-12T15:28:54Z

... Is this PR okay to be merged ...

No. There are several aspects of this proposed design which I do no like. eg. blocking connect() not on a dedicated thread. Pointing these out piecemeal does not seem to be moving us towards a resolution, but I have not found the time to write out an alternate design.

sveseli · 2024-07-12T15:45:27Z

... Is this PR okay to be merged ...

No. There are several aspects of this proposed design which I do no like. eg. blocking connect() not on a dedicated thread. Pointing these out piecemeal does not seem to be moving us towards a resolution, but I have not found the time to write out an alternate design.

The most recent commits (two months ago) addressed the issue of blocking connect you brought up. The connections are done via a separate set of timers. If there are other issues that need to be resolved, I would be happy to do this.

mdavidsaver · 2024-07-12T16:33:37Z

The most recent commits (two months ago) addressed the issue of blocking connect you brought up. ...

Then what is going on with this wait(0.1)?

           // Wait to get transport if we do not have it
            m_nsTransportEvent.wait(MAX_NS_TRANSPORT_WAIT_TIME);

As for the rest. I have a limited amount of time which I can spend on EPICS community work, and many projects which ask for my attention. I would have an easier time reviewing your proposed changes if you would say more about what you are trying to achieve, and how you intend to achieve it. Correctly inferring intent from code is both difficult and error prone.

... released as soon as they are no longer needed ...

eg. Do you intend that an idle client will not maintain name server connections? Why not maintain NS connections continually? Are you are making a trade off of resource use vs. (re)connect latency?

eg. Why are connections to name servers treated differently to other PVA server? Do you see this as a "need", a convenience, ...?

sveseli · 2024-07-12T19:34:52Z

The most recent commits (two months ago) addressed the issue of blocking connect you brought up. ...

Then what is going on with this wait(0.1)?
           // Wait to get transport if we do not have it
            m_nsTransportEvent.wait(MAX_NS_TRANSPORT_WAIT_TIME);

If you look closely, the earlier code had an actual connect() in this thread. Right now a name server connect timer is scheduled that does the actual connect, and notifies corresponding event if connection was established. I used a short wait on this event just to see if connection can be established quickly, which improves search performance greatly when the name server responds, and does not hurt very much if name server does not respond, as we do not have channel connection anyways. I could certainly make sure that this wait happens only when the name server connection timer starts, rather than on every name server search attempt, but I suspect this will have a very little effect in terms of channel timeouts, etc. (see my examples of what happens when trying to connect to channels using non-existent machines).

As for the rest. I have a limited amount of time which I can spend on EPICS community work, and many projects which ask for my attention. I would have an easier time reviewing your proposed changes if you would say more about what you are trying to achieve, and how you intend to achieve it. Correctly inferring intent from code is both difficult and error prone.

... released as soon as they are no longer needed ...

eg. Do you intend that an idle client will not maintain name server connections? Why not maintain NS connections continually? Are you are making a trade off of resource use vs. (re)connect latency?

This code is intended to work for a general case, where the name server connection is different from PVA Server connections. Rather than every client maintaining name server connections (and thus keeping those TCP sockets alive), I chose to close those connections as soon as I did not need them in order to preserve resources. There is obviously a trade off here in terms of resources and re-connection latency. One can perhaps argue one way or another as to what is more important, but given that name server can have hundreds or even thousands of client connections at any given time, I decided to prioritize resources, rather than to worry about reconnecting a bit faster every once in a while.

eg. Why are connections to name servers treated differently to other PVA server? Do you see this as a "need", a convenience, ...?

This stems partly from the fact that in general connections to name servers are different from connections to PVA servers in the sense we use them only in the discovery phase and also (for the reasons mentioned above) I chose to close them as soon as possible. Another reason is trying to minimize amount of the new code and impact of the required changes to the existing code, and making sure that no matter what happens with the name server connections, those do not affect client behavior and existing PVA server connections.

I do appreciate the feedback/suggestions and I am willing to make any changes that are feasible and do not require major rewrite of the existing code.

…uled

AppVeyorBot · 2024-07-25T18:32:11Z

✅ Build pvAccessCPP 1.0.110 completed (commit a6f58f8049 by @sveseli)

anjohnson · 2024-07-31T19:23:09Z

Meeting:
MD: All servers must support "_server" for RPC to get name list.
SV: This PR now just adds support for PVA searches over TCP, it doesn't implement a name-server. That will come in a separate PR (that was removed from this PR earlier).
Agreed: MD will see implementation of a future name server before SV distributes it.
MD: QOS_REPLY_REQUIRED doesn't need to be set, a searching application cancelling the search is the only thing that can cause the number of searches to go to 0. This should reduce traffic. Careful to avoid stalling the search process if the name-server dies.
SV: Client only sends 3 searches to a connected name-server before giving up and trying the next name-server in its list.
ANJ: "The nameserver" — we want to be able to query multiple name-servers at the same time, to reduce search latency.
SV: Want to save resources by not keeping sockets open when we don't have any names we haven't found.
MD: Having multiple open sockets is likely to need less resources than multiple threads.
ANJ: Okay with closing NS sockets when all searches done, but wait for some time period after the "last name found" for the client to ask for more channels.
MD: Define test cases to prove that this feature is doing what is intended. Measure that UDP searches and TCP searches don't take significantly different periods of time to complete. How long does it take to search for 10,000 PVs over both transports? Use the client API to measure those, so not a micro-benchmark.

AIs on SV:

Client searches should be sent to multiple NSs in parallel.
Wait for a short time after searching before closing TCP socket to the NS, in case client is inefficient and does (search; wait; get; wait) multiple times.
Measure how long connections, and later reconnections, of 10,000 channels take through both transports. Not a unit-test, report the results in a comment here (may prompt further discussion), can use pvaPy. Specifically connect to 5 PVs, wait for all connect and for the NS to disconnect, then connect to 10,0000 PVs and measure how long it takes. Check separately against both 1 and 2 name-servers, with some channels in both. Test for (everything UDP NS; everything TCP NS; split between 2 UDP NSs; split between 2 TCP NS; split between 1 UDP and 1 TCP NS).

sveseli added 2 commits January 24, 2024 15:56

add ability to connect to server channels when UDP search is not feas…

9f76c2f

…ible (e.g., if the connection has to go through firewalls); this can be done in those cases where server address is simply known, using combination of EPICS_PVA_ADDR_LIST and EPICS_PVA_SERVER_PORT variables

remove un-needed code

63548a1

allow control of the UDP sender port via the EPICS_PVA_UDP_SENDER_POR…

99bfee2

…T environment variable on both server and client side

sveseli changed the title ~~Draft: Direct PVA Channel Connections~~ Direct PVA Channel Connections Jan 25, 2024

sveseli requested review from ralphlange, kasemir, mdavidsaver and anjohnson January 25, 2024 16:33

sveseli assigned anjohnson Jan 25, 2024

kasemir approved these changes Jan 25, 2024

View reviewed changes

mdavidsaver requested changes Jan 30, 2024

View reviewed changes

sveseli added 9 commits March 11, 2024 15:52

add tests for udp search, name server search, and direct connection

a0c436f

set connected status on state change

6e7420a

add method for retrieving search server port

4de9a83

add new tests to Makefile

6976fdc

fix typo

6fe5a79

add name server search

8af89e7

add protocol constants

7caa5d3

use constant for type

424090f

support for name server search

57c03c8

shroffk closed this Apr 15, 2024

shroffk reopened this Apr 15, 2024

sveseli added 2 commits April 17, 2024 17:08

remove pvaConstants.cpp, restore pvaVersion.cpp, remove PVA_UDP_SENDE…

80c708c

…R_PORT constant

remove udp sender port variables

5f001f5

sveseli added 3 commits May 16, 2024 10:17

remove unused utility method for converting string to inet address

fd15386

remove unused methods, restore original initialize() method for serve…

0179b3c

…r context impl class

remove unused server search response handler class

fe0be26

sveseli added 2 commits May 16, 2024 18:19

reduce logging for request start

39c6106

force name server transport release if channel is not found after max…

2383407

… nubmer of attempts

sveseli added 2 commits May 29, 2024 09:44

remove blocking call for getting name server connections, and instead…

7a64947

… use timers; if multiple name servers are configured, connections to all of them are obtained at the same time

do not release all name server connections if channel search was unsu…

bb5fce6

…ccessful

wait for name server transport only if the search has been just sched…

0a1f8f6

…uled

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for TCP Searches #192

Support for TCP Searches #192

sveseli commented Jan 24, 2024 •

edited

Loading

AppVeyorBot commented Jan 25, 2024

kasemir left a comment

kasemir commented Jan 25, 2024

anjohnson commented Jan 30, 2024

mdavidsaver commented Jan 30, 2024

sveseli commented Jan 30, 2024

mdavidsaver left a comment

kasemir commented Jan 30, 2024

anjohnson commented Jan 30, 2024

anjohnson commented Jan 30, 2024

kasemir commented Jan 30, 2024

shroffk commented Jan 30, 2024

shroffk commented Apr 15, 2024

AppVeyorBot commented Apr 16, 2024

anjohnson commented Apr 17, 2024

AppVeyorBot commented Apr 18, 2024

AppVeyorBot commented May 16, 2024

AppVeyorBot commented May 17, 2024

mdavidsaver commented May 19, 2024

AppVeyorBot commented May 29, 2024

sveseli commented May 29, 2024

AppVeyorBot commented May 29, 2024

sveseli commented Jun 10, 2024

sveseli commented Jul 12, 2024

mdavidsaver commented Jul 12, 2024

sveseli commented Jul 12, 2024

mdavidsaver commented Jul 12, 2024

sveseli commented Jul 12, 2024

AppVeyorBot commented Jul 25, 2024

anjohnson commented Jul 31, 2024

Support for TCP Searches #192

Are you sure you want to change the base?

Support for TCP Searches #192

Conversation

sveseli commented Jan 24, 2024 • edited Loading

AppVeyorBot commented Jan 25, 2024

kasemir left a comment

Choose a reason for hiding this comment

kasemir commented Jan 25, 2024

anjohnson commented Jan 30, 2024

mdavidsaver commented Jan 30, 2024

sveseli commented Jan 30, 2024

mdavidsaver left a comment

Choose a reason for hiding this comment

kasemir commented Jan 30, 2024

anjohnson commented Jan 30, 2024

anjohnson commented Jan 30, 2024

kasemir commented Jan 30, 2024

shroffk commented Jan 30, 2024

shroffk commented Apr 15, 2024

AppVeyorBot commented Apr 16, 2024

anjohnson commented Apr 17, 2024

AppVeyorBot commented Apr 18, 2024

AppVeyorBot commented May 16, 2024

AppVeyorBot commented May 17, 2024

mdavidsaver commented May 19, 2024

AppVeyorBot commented May 29, 2024

sveseli commented May 29, 2024

AppVeyorBot commented May 29, 2024

sveseli commented Jun 10, 2024

sveseli commented Jul 12, 2024

mdavidsaver commented Jul 12, 2024

sveseli commented Jul 12, 2024

mdavidsaver commented Jul 12, 2024

sveseli commented Jul 12, 2024

AppVeyorBot commented Jul 25, 2024

anjohnson commented Jul 31, 2024

sveseli commented Jan 24, 2024 •

edited

Loading