-
Notifications
You must be signed in to change notification settings - Fork 338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jerky moves - proposed improvements of the 500Hz control loop for position control #153
Comments
Update: here are some videos that show the behaviour we observe:
I have also tried to disable oversampling (so the interpolation was calculated every 8ms) and the results were not nice. The move got totally out of control (it was supposed to finish just above the table and it shoot up extremely fast ending with protective stop): https://youtu.be/58iITB4IzNk . So I understand that oversampling is needed, now I would love to understand what's going on. |
Some background info on the various motion primitives in URScript: UR10 Performance Analysis (tech report by @ThomasTimm and his group). |
First off, I don't have access to a UR with the newest firmware, so I can't test this issue - if anybody feels like sponsoring one and thus help me maintain the driver, please reach out to me. In general, to answer you questions:
2 and 3) The reason for doing the interpolation on the host instead of on the robot (in which case you could, as you argue, just as well use movej, as that probably does the same interpolation) is to be able to handle an abort action or new trajectory action gracefully. It is also a matter of reusing code for maintenance. The current servoj loop is also used for the ros_control-based position controller and, as mentioned above, is the only way to stop the robot neatly on a abort_action with firmware v 1.8 (and maybe into the 3.x). Thus the current code would still have to be included and maintained. Why so many experience problems with jerky movement on 3.4, I don't know (as I said, I don't have access to a newer robot). My recommendation would be to analyse the network traffic both on the ROS computer and on the controller side (it's an ubuntu machine, just log in with root/easybot and install wireshark) and verify that new poses are sent from the ROS computer and arrive at the controller at a steady 500 Hz. |
@ThomasTimm wrote:
just to clarify: which latency are you referring to here? (potential) network latency, controller/motion execution latency or a combination? |
It is the delay from the position is sent from the computer until the robot is at that position. in http://ieeexplore.ieee.org/abstract/document/7424304/ I call this the actuation delay. |
Thanks a lot for the explanation @ThomasTimm and the analysis doc link @gavanderhoorn. I already found out the performance report before and It's been very useful to understand some of the limitiations. I also understand that there were a number of changes to the UR firmware since it was written (including some changes to the API - includng servoj_gain, and lookahead parameters in servoj for example) and I read it with some reservations that it might be out of date. I indeed initially misunderstood how the oversampling works, but since then I realised that it works exactly as you explained (I like "inverse oversampling" name :D ).. Thanks for that explanation! It makes me more assured that I better understand how it works and can reason about potential solution better. That's exactly what I was hoping for when I opened the ticket. I also did experiments with different oversampling rates, and indeed around 4ms mark the moves become smoother (and above 4ms it gets really bad) - so your explanation is perfectly matching the observations. It's great to know about the reasoning behind those design decisions - the need to handle stopping and updating the goal mid-flight is really important! Many thanks for that! Regarding the setup of ours. We are controlling the robot from quite powerful laptops (8 core Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz), lots of RAM, fast SSDs. I also used direct cable connection and static IP configuration to the UR10 PC during my tests. Also - unlike for others - unfortunately the low-latency kernel did not help at all. The important part - we run all of the ROS code (including the driver) in Docker container, which might be another layer of problem, although we use --network=host which in essence means that we are using the network on a native speed/latency without any containerisation penalties. I know you exclude running driver in virtual machine, but Docker contenairisation is quite a bit different beast) And we have friends reporting the very same problems without Docker, so it is unlikely that Docker introduces the problem (but I am going to check it soon). I have a list of things to test next: runnig it without Docker for sure, running the driver on separate machine (actually we even thought about running it on the UR PC rather than on separate machine to get it even closer to the "metal"). Then checking the packet /stability of TCP/IP frequency is a good idea and I am going to take a close look at that. Another check will be indeed to try to see if we can use other commands including changing the goal and stopping in the middle). I am also planning to plot actual/target positions and see how they follow each other (with slight modification of the driver as seen in one of the older issues). It might take some time before we can conclude alll the tests though (I will continue updating this issue with my findings). However i start to think that the problem is simply too much relying on TCP/IP low-latency in this case. The main problem I see that there is no synchronisation of time between host and UR PC.Calculating positions on host is done in "well known" time from the start of the move, but the actual servoj command for the particular position might execute with pretty much arbitrary delay after it was calculated (up to 20ms it seems from your performance analysis). Inverse oversampling of course helps, but If you imagine for example that there are just 4 positions queued somewhere on the way every now and then - it's easy to understand where the jerky moves could originate from in this architecture. I think that even some smart UDP communication might give better results (sending position + timestamp and only saving the value if timestamp>current). Unfortunately the way TCP/IP works for small packets/messages - even if you have one packet to retransmit and you already sent several following packets, the whole undelivered set of packets will be buffered and there is no guarantees on how fast the lost packet is retransmitted, so it's really easy to get a number of subsequent messages delayed by single collision on the wire on Ethernet level. Using UDP you can avoid this queuing problem easily. But we have no UDP in URScript :(. When you look in detail how TCP/IP works- you have to realise how many layers you have to pass on both Host and UR PC and physical link (from high level protocol, to low level drivers, and even potential collisions/retransmitions in the ethernet layer). Other user space/kernel/hardware interruptions might get in the way at any time and there is no guarantee on low latency for single packets/messages nor good control over it. There might be a number of components/layers (both in host and the UR PC/firmware) that might influence it. We have no way to control options when we open reverse connection in the URScript, and we do not know what kind of TCP/IP options are used. There is even note in the official URscript docs: "Note: The used network setup influences the performance of client/server communication. For instance, TCP/IP communication is buffered by the underlying network interfaces" We do not even know what kind of delays might be introduced on the UR PC side of TCP/IP protocol even if we use low-latency kernel on the host. We don't have Gb Ethernet (it's 100Mb!) on UR. They continue adding new features that might break some old behaviours and asumptions. I can for example imagine that RTDE interface introduced in 3.3 has much higher priority than other connections (especially customly opened from URScript). And we have to remember that the very same TCP/IP stack and wire is at the same time used to continuously send back joint states/robot information @125hz. I can easily imagine occasional netwotrk hiccups. The random hiccups make it rather unsuitable for the "semi-production" setting we plan to use the UR for. We need a bit more predictability. I am happy to hear that there are no fundamental llimitations of URScript that made you choose the architecture over running it all in the URScript. And I understand why you chose the way when you did. I really think however, that in our case the stability of moves and predictability might benefit from interpolating the position (or even using movej if we can make it stop nicely) in UR Script. I can also imagine very easily how we could provide protection you talk about. We could simply send an occasional (much less frequent) heartbeat from the host and check periodically in the URScript control loop if it was send in one of the last few iterations) - one of my favourite leaky-bucket algorithm would nicely fit here. That would make much smaller frequency of interpolation calculation (always and only when needed - just after previous servoj command finished). If we have a good source of time in the URScript, it would be pretty much always accurate at every loop with virtually no delays, also TCP/IP communication overhead would be much, much lighter. I'd imagine using a new URScript for it, and keeping the old one for ROS control - it would be fairly straightforward to have two different implementations of TrajectoryFollower.cpp (building on top and following similar patterns of the nice refactor by @Zagitta). It is around 200 lines of code in total, so I think that's rather doable and maintainable to keep both of them. Do you think it might make sense overall to try it out ? Any other thoughts on feasibility of it ? |
I slept over it and as often happens I have an idea how I could simplify it, dropping assumption about having good source of time in URSim and not having to move interpolation to URSim. I think we can change the code just slighlty and get much better behaviour and much less TCP/IP overhead and jerkyness. The rough idea:
I am not 100% sure how threading behaves in the URSim, but from what I understand the control thread and servoj command will get high (near-realtime) priority and each loop is almost guaranteed to execute in exactly INTERVAL_TIME (checking the heartbeat will be a little more complex than the pseudo code, but it will be simple +/- calculation of leaky-bucket on heartbeats - I can fine tune it to be checked every 10th iteration or so. There might be occasional delays but they should not be frequent. Even if there will be an interrupt of any sort/other thread takes over for a while between servoj commands, it will mean simply slowing down this particular segment of the move a bit, but it will never lead to a dangerous over-shooting - we will never tell the robot to move the positions too far from the last position - they will always be in pretty much expected position from previous servoj command and we will always incrementally do one INTERVAL_TIME segment at the time. In case of any delays, the end result will be that trajectory execution might take a little longer to execute than originally planned and the robot will occasionally slow down, but It will always go through all the interpolated positions (which we want) and it will never overshoot (it is guaranteed in this design that there is no sudden catch-ups no matter what delays we have - those we want to avoid). I think it makes it a very good characteristics for what we want to achieve. What do you think @ThomasTimm ? Any watchouts/concerns for this design that you can see? |
@potiuk if you're planning to embed the trajectory directly in the uploaded urscript I believe that's one of the approaches Thomas has already tried and if I remember correctly it became unfeasible rather fast as the number of positions increased due to some startup and parsing penalty. |
Thanks @Zagitta - that's the kind of comments that I love and make my life much easier - It will help me to avoid some dead-ends when I implement it :). BTW. I really like the refactor. It's super easy to read it and modify it. One of the other pull requests that I will make shortly will be the capability of controlling the RG2 Gripper (which nicely integrate with UR Robots) via the ur_modern_driver. I based it on https://github.com/sharathrjtr/ur10_rg2_ros (they used an old version of the original ur_modern_driver as base) - I added depth-compensation option tht we need. I found that modifying your refactored code will be so much nicer and easier to maintain, so we are going to port it following the patterns you introduced. |
Thee is no doubt that your computer is powerful enough to run the driver (as stated previously, it can run just fine on a RPi). But apparently it lacks power to do all the other stuff you want, seeing the jerky-ness gets worse when you start doing all the other things. I have no experience with docker, so I can't say how that does or doesn't influence network performance. You are right in that the core problem is relying too much on TCP/IP low-latency; that is a fundamental drawback of this approach. Control should always be done as close to the hardware as possible, and not via network. Unfortunately, a lot of users won't install ROS (or anything else) on their brand new robot in fear of bricking it, we learned that with the C-API approach. So from a ROS-Industrial point of view, we have to have a solution that doesn't require installation on the robot. If you, in your controlled environment has the option of installing ROS on the controller, and supply that to your customers, I would certainly recommend that. What you must remember is that the "official" ROS driver, targeted at a wide and varied audience, has to be very general and be able to handle all likely and unlikely use cases if possible. If, for your application, using movej would be sufficient, you might want to consider just sending the urscript via the implemented topic for doing so. That way you would have a workable and stable solution running by the end of today. But where's the fun in that? ;P As Zagitta says, sending the entire trajectory pre-interpolated is just not feasible for the general use case. You (or at least I) just knows that at some time, somewhere, somebody will try to send a trajectory of several minutes of length. Each minute worth of trajectory is 125 * 60 * 6 = 45000 joint values in clear text, plus overhead like parenthesis and commas. That would take quite some time to transfer and for the robot to process (I urge you to try it out; make a small python script that generates the urscript for a 5 minutes long trajectory and uploads it to the controller), making the user think the robot doesn't respond. I guess the stability could be increased by implementing some sort of intelligent buffer in the urscript code and then transmit say 40ms of trajectory every 2ms ( or maybe 8ms?), that continuously updates the buffer. We have previously discussed the idea of including control of the RG2 gripper (or any other gripper for that matter), but as it doesn't have anything to do with the robot, except having a matching plug, such functionality should be kept in a separate package. |
First: 👍 for (re)starting the discussion about driver design decisions @potiuk. Much appreciated. @ThomasTimm: thanks for the insights.
This definitely gets a 👍 from me: let's maintain separation of concerns as much as possible.
It's a nice idea to get closer to the controller, but all 'external' interfaces that exist use TCP/IP. Even if a process runs on the controller itself, it's limited to TCP/IP. Afaik, everything communicates with the
You could argue that if the choice is made to essentially upload the entire trajectory as a URScript that we could let the robot interpolate, which would mean back to However, we would be giving up control over the execution of the trajectory in a way which would probably make it deviate in ways that would be undesirable. |
Another reason why I didn't use the |
Thanks for the comments @gavanderhoorn @ThomasTimm -> I think I am getting a full picture now. I understand the problem with long moves and sending full trajectories. You as package maintainers must take into account all the different scenarios.
Yeah. I see it being a problem. Our case is a bit simpler. We split the moves into few (4) separate shorter moves and no single move should take more than 6 seconds. We are planning to do a number of optimisations like planning the next move in parallell to executing the previously planned one (in order to avoid robot to pause between move. This means that we are talking about 125 * 6 * 6 = 4.5K positions - much more feasible to send in one go. Of course - I wil first work on a solution that will be good for us, but I would love to be able to contribute it back via pull request eventually, so that others having similar cases could benefit. I think what I could do is have the "pre-planned" trajectory exectution implemented as experimental feature enabled with a flag AND only enabled for short moves (limited by planned execution time). If the planned execution time is longer than limit, I would fall back to the original method. As a next step (but that would be follow-up) I could work on splititng longer move into sub-segments of up to 6 seconds long for example. If we can make like 6 seconds long segments of trajectory, I don't think we will be back to the low-latency problem. It would be much better optimisable actually - because we could send the second segment during execution of the first one and swap between the current/next buffers stored in URScript. I think we could then entirely avoid low-latency problem this way. Sending bigger block of data of TCP/IP rather than small messages is exactly what TCP/IP was designed for and I am 100% sure we can receive the whole next segment during the time the previous segment is being executed. It is quite a bit more complex to implement, but the implementation can be staged like I described - first simple short moves only and then segmenetation when we got it proven and tested (and maybe merged :D). I will make some tests on how big data we can transmit etc.
I perfectly understand your point. And now when I think about it, it indeed makes little sense to make it part of the driver. Closing and opening the gripper is essentially executing a custom RG2 URScript command, so instead of adding RG2_Gripper I would rather rea-dd the
Yeah. Where would be the fun :P, indeed. I don't think the changes I am proposing will take a long time to implement (especially the first part for short moves only). I will definitely try simpler |
I think the primary reason the |
Thanks @Zagitta - i will add the urscript pull request shortly. Update: I made tests with and without docker (withe the TCP optimised driver) and there is no noticeable difference as I suspected. Some random moves are still slowing down in the middle regardless if docker is used or not. Seems that there is slight difference (improvement) with realtime kernel and no docker but it's hard to quantify and even with that the moves are quite far from the smoothness we would expect. I will post further findings here. |
@potiuk wrote:
I didn't expect Docker to make any difference, but perhaps you're being bitten by moveit/moveit#416 (check also the OMPL issue on BitBucket). |
Well, that is interesting Gijs. |
@ThomasTimm wrote:
or use an Indigo version of MoveIt and see whether that improves things. The change that seems to impact this the most is only 5 lines or so, but an Indigo MoveIt is probably easiest. @potiuk: there are Docker images for MoveIt available which should make this not too hard. See Using Docker Containers with MoveIt!. |
Interesting threads indeed @gavanderhoorn and @ThomasTimm . Thanks! I will check it before attempting to rewrite it. Indeed trying on indigo and our moveit configuration might be simplest. - I was also planning to take a look at the plots and make sure those trajectories are OK. I will do it before I attempt to do the almost-no-TCP/IP rewrite. |
Short update on the planned improvement work (after i will check if moveit trajectory is good). I got quite good hands on with the URScript/robot interface and I understand limitations of the URscript better. It seems that due to the urscript API limitations (send/receive methods) it will be indeed difficult (maybe impossible) to send even the 4.5 K data to the URScript. @ThomasTimm - now I understand what you meant :D. I will be in contact with UR engineers shortly and will raise some questions about it, maybe there will be some ways around. I have quite some experience in doing similar work from the telecommunication world (I used to program telecom switches that had even more limitations on messages sent/data stored internally). This means that I will have to fall-back to the earlier idea of sending the coarse trajectory from MoveIt one-by-one to URScript instead of interpolating whole trajectory upfront. I will keep three trajectory points in URscript - previous target, current target and next target. This way I will avoid lags in communication. I will send subsequent point when the urscript will be busy with iterating through interpolations of the previous step. Then the interpolation will be done in the URScript. This should make it possible to have even very long trajectories working smoothly and without lags. We have some 40 points on average for 4 second moves generated by MoveIT, so seems that we can get back to around 10Hz communicaoin frequency - this should be more than enough to get rid of all the communication lags. I got quite familiar with URScript now and I am able to do efficient 2-way communication urscript <-> ROS node (I got working ROS actionlib interface to control RG2 gripper using such approach already), so I don't think it will be too complex. It should then allow for much less frequent communication and better control while trajectory is executed (no need for separate heartbeat). |
@potiuk wrote:
just a quick comment: I would really recommend you check the output of MoveIt before starting to change the driver. You'll need to do it anyway and it shouldn't take that much time. |
I certainly plan to do it (and in this sequence). This week I am busy with other parts (like achieving a milestone we planned - for this the refactored driver + TCP/IP options set + my RG2 Gripper node should be quite enough for the milestone). And the speed improvements will start to be important after we achieve this milestone - then I will start with plotting and analysing the problematic trajectories and making sure that the problems are not caused by MoveIt planning. This will be base for our decision where to focus our effort first. |
There are some fundamental limitations with the response time of position control. You can crank the control frequency up as high as you please but I think you will still see a faster response with velocity control at a lower control frequency. See http://proceedings.asmedigitalcollection.asme.org/proceeding.aspx?articleid=2482045 I know it sounds crazy, but that's what a Bode plot shows. (at least for the compliant robot I was working on, but I would guess for other cases as well) |
“Although they are mathematically equivalent, velocity based and position based Not to say, there couldn't be another issue causing this... |
Other benefits of velocity control, i.e. sending speed commands to the joints: more robust to control signal delay and probably more energy efficient. |
Further corroborated on pg. 12 of this pdf that @gavanderhoorn linked earlier. Notice, speedj has no downsides while the other command types all have one issue or another. |
Ok. Good ideas. Short update on where I am with the issue:
|
Clicked close by mistake. :). Continuing ...
What Jacob suggested instead (and that was actually great idea) is to use the - relatively new - RTDE interface for communication with the driver. It looks like the RTDE client has possibility of setting/updating up to 64 ints, 64 bools and 24 floats - that can be read by the urscript (and urscript can similar data back). This interface has much higher priority than TCP connection opened by the urscript and is pretty much guaranteed to be real time. Moreover the urscript can read /write those much faster (they are accessible via fast registries rather than having to be parsed via receive_ commands). That sounds very promising and I might also be able to try this one with my experiments (especially that in the future we will need much finer grade control of the robot and nearly realtime feedback loop). |
Cool. Well the new jog_arm package is a pretty good example of sending speedj commands to a UR robot, and it seems buttery smooth at 100Hz to my eye. It won't be up to the task if you're doing high-precision machining or something like that, though. It's mostly intended for human teleoperation. |
Yep. Thanks for the link! That's the doubt we have with velocity control - that at the end we will have to anyway use position control (at least at the very end of the move) to move the arm to desired precise position anyway. And switching speedj -> servoj commands mid-flight is not going to be smooth at all. We discussed this with Jacob today as well and he confirmed position is probably the best way to go for us. Big part of our discussion was about calibration and how can we bring the individual calibration of each robot to reflect it in our URDF/kinematic model that we use outside of the robot (MoveIT). Precision (both accuracy and repeatability) is a must for us. |
I got some interesting charts to share. I actually manage to record both - pretty good and smooth move with real robot as well as one that had exactly the dangerous catch-up that is the problem of ours. The "catch-up" was so abrupt that it ended up with the robot in protective mode. It was even so abrupt that vibrations caused our light camera stand to shake a bit (you can see it in the movie that I recorded). It occured when the ur_driver laptop was under quite a stress (reading and visualising in RVIZ the point cloud from RGBD camera + recording the video itself). But regardless of that I think it shows quite clearly the mechanism of what's going on. You need to zoom-in the images quite a lot to see it better (they are big), but they should be quite self explanatory. I also re-wrote (well mostly copy-pasted) the interpolation code to python and interpolation is also visualised there (green) - this way I am sure that interpolation works as expected (and makes it easy to port to URScript). The problem looks indeed suspiciously like catching up when some of the trajectory points are queued somewhere on the way and delivered almost instantly to the URScript. Corresponding video is here: https://youtu.be/qc_NM32XLvI (see the "abrupt catch-up" around 00:13s) Corresponding video: https://youtu.be/-FqekvoSe7M - it still has some slow-downs but we are less concerned at this stage (the catch-up is what bothers us most). They indeed look like induced by moveit plan rather than by the driver/communication. There is one suspicious thing about those charts - it looks like all the joint positions/velocities are planned in almost perfect "sync" - different ranges but for different joints shapes are pretty much the same (though you can see small differences). I will look a bit more, whether it's a bug in visualisation code, but we found it pretty OK that in an empty space with no collisions MoveIt planner will do exactly this and move the joints in sync using very similarly looking paths. I saw similar patterns when I was looking at rqt_graph charts so I am quite certain it is ok. We will analyse it further, but maybe @gavanderhoorn @ThomasTimm you can take a look and have some additional observations? |
I have some good results with reimplemented logic. Just in time to leave for Xmas :). This is the first time those moves are controlled by my modified control loop on the robot itself (URScript). We only send MoveIt trajectory to the UR control (with few Hz) and then the URScript does all the interpolation internally (with 125 Hz). Here is the video: https://youtu.be/PPeJRgBuPSs It seems to work very smoothly (still moveit generated trajectories might be improved). I still will need to run some full load tests to see how load on the laptop impacts it. I checked that it even works over WiFI (!) - it behaves as I intended. Right now when there is a network delay over WIFI, it will stop pretty much immediately rather than "catch-up". This will still be improved in the near future and I will make it slow down instead - and continue when the messages arrive. That might require a bit more fine-tuning, but it should be fairly possible. What's more - it even uses python client that I use for testing, rather than modified C++ modern driver, so I am pretty sure I am on a good track. Overall - it looks really promising that we can solve the jerky behaviour (so that in the worst case the robot slows down rather than dangerously catch-up). I will continue with it after Xmas and hopefully you will see another pull request coming to the refactored version of the modern driver. |
BTW. Merry Xmas and Happy New year to everyone reading it ;) |
Some further progress: I now have a rock-solid implementation of it working and tested. What I have so far:
Pull request follows shortly.
|
I just created the promised pull request (to the refactored version)- with very comprehensive description. results of my testing, movies, charts, and so on - so I won't repeat it here. Simply head on to Zagitta#9 I'd love if people who had similar problems check it out, build and see if it solves it for them as well. Note that you need to enable it via "use_safe_trajectory_follower" and without "use_ros_control". You can check which driver is used by looking at the logs in Polyscope. |
@gavanderhoorn. One more comment. I had a chance to test planning with Indigo. Quick tests shown us that the robot seems to generate a bit better trajectories for some moves (without full quantitative measurements). But for some other moves, the trajectories are not very smooth and we got an example of it very quickly. Example here: |
@potiuk To introduce myself, I am a researcher at DFKI, Bremen using universal robots for manipulation research. We use the RoCK framework (https://www.rock-robotics.org) for controlling our robots. For our recently purchased UR5 (FW3.4), we have adapted the ur_modern-driver for controlling the robot via an Orocos component. We have been facing the same jerky motion problem. Though we did not face it with the earlier UR10 with FW 3.2. Hence I suspect this is the same problem. We are using an interpolator at 4ms to send trajectories to the UR driver and then we use the speedj command to send the command to the robot. I am attaching the command and feedback joint positions of the first 5 joints. There was no motion in the wrist3 joint. I saw that you implemented changes including using ros trajectory follower. However since we are not using ROS, can you please help understand the changes you have made so that I can reimplement them in RoCK? |
@rohitmenon86 -> In essence what i've done was I moved the interpolation loop to URScript and lowered (2 orders of magnitude) communication frequency ur driver <> urscript.. Rather than sending TCP/IP message with 4 ms interval I am sending only coarse trajectory points (as calculated by MoveIt) - which happens to be max. few per second rather than 500/second with 4ms interval. I implemented a communication scheme where you have separate threads sending and receiving data. Sending thread notifies the connecting driver when it should send the next trajectory point and receiving thread receives those points when the driver sends it. The communication happens using reverse TCP connection opened back to driver by URScript. The main "work" is done in third thread - controlling thread - which is then not interrupted by sending/receiving, it exchangest the data with sending/receiving threads via global variables and performs interpolation (by default every 8ms /125Hz) calculating positions based on positions and velocities of the coarse trajectory points calculated by MoveIt. An important part of the solution is that I do not use the "real time" from the start of the move. I assume that every interpolation step takes exactly 0.008 and I assume that's the time that passed - so we make the interpolation calculation independent from any interruptions/delays. This make the move potentially lasting longer, but without the "jerky" catch-ups. One more comment - you can simplify the threading model by using RTDE data exchange (using built-in registries). We did not want to change ur_modern_driver to use RTDE, but we are working on RTDE simplified driver (to be open-sourced) for our internal purposes. |
@potiuk Thank you for the quick reply. One more question- We have our own Whole Body control with collision avoidance controllers which uses velocity interpolator. Hence we prefer to send finely grained commands at 4/8 ms interval rather than coarse trajectory points. Will the refactored driver with its 3 thread scheme perform better for streaming speedj commands? Have you done any such experiments? (Asking this to get an idea before I develop an Orocos component for this) |
I do not know details of your solution, but the 'jerky move' problem was not caused by using some specific commands, but because of unreliable TCP stack and the fact that requrement of reliabilty of this connection at 500Hz frequency of communication was built in the original design. I only looked very closely at position based interface built-in the ur_modern_driver - I have not checked the other types of control. So if you are sending something to robot over TCP stack very frequently (like 125Hz+) and rely on good timing and low latency you might have similar problems. But there could be a number of factors influencing it - the detailed design of communication between the driver and the urscript. Do you have any design doc describing the communication interface ? I think you'd have to assess yourselves (following the line of thoughts explained above) if the problems you have could have the same causes. |
@rohitmenon86 wrote:
there is almost no streaming any more in the variant implemented by @potiuk. Or: the time resolution (and thus space, via velocity) of the stream is very low, as sparse trajectories are interpolated on the controller itself. I don't have any nrs, but I would expect that if you (@rohitmenon86) are trying to close a position control loop over a velocity control interface from outside the controller, sending sparse trajectories and having the controller interpolate them is not what you'd want. |
With the merge of #120 (which included the work by @potiuk) both approaches (ie: interpolation on the ROS side as well as the low-bandwidth trajectory follower) have been integrated in the driver. Thanks for the analysis, suggested changes and the contribution @potiuk 👍 And thanks to all the commenters on this thread for providing input. |
Here is an outline of proposal and request for comments for the - apparently common - "jerky" moves problem that we have when trying to control UR robot with position control via ur_modern_driver.
Like many others, when we try to use direct joint position control of the ur_modern_driver with the recent firmware of UR (3.4.5) we have a problem with "jerky" moves of the arm. Those seem to be related to the way 500Hz (4*125Hz) control loop is implemented in the driver, and while there are some ways this can be optimized (TCP/IP socket options, low-latency kernel on host), it seems that the problem still persists and whenever the host PC starts being busy doing other things (for example reading camera streams) the problem is magnified. Also using low-latency kernel on host might not always be feasible/development friendly. We are using refactored version of the driver with TCP_QUICKACK changes pulled (Zagitta#4) and I asked about the problem in ROS answers - https://answers.ros.org/question/276854/jerky-movements-of-ur10-robot-with-ur_modern_driver-and-moveit/ and got some comments pointing to the refactored branch/TCP problems, Despite applying the solutions, we still experience some jerkyness (much less than before) .
So I took a close look of the underlying code and I think the current design for the position tracking is a bit flawed and it can probably be improved with relatively small effort. The company I work for (still stealth mode - http://nomagic.ai) would love to invest some of our engineering time to improve it, test it with our UR10 and contribute the solution back, but before we spend more time on it we would like to verify some of our hypothesis with the people involved so that we know the ideas we have are plausible. We do not have a full context of the original implementation and maybe we misinterprete what's going on, so maybe you can help. @ThomasTimm @Zagitta @gavanderhoorn - I'd really love if you can comment on this. I just need a help to understand of limitations/constraints - we will take it from here and do implementation/testing ourselves.
There are few hypothiesis/ideas that we have - please let me know If I got it wrong/right :).
1) 4 x Oversampling
Currently the ur_modern_driver interpolation loop performs cubic interpolation based on tracejectory positions and velocities (basically Bezier curve calculation) for 0.008/4 = 0.002 s intervals from the start of the move and sends those over reverse TCP/IP connection (port 50001) opened by the UR Script program. There is no feedback information, so ur_modern_driver does not account for the actual robot joint positions, it assumes that the robot is following the trajectory closely. Those updates are then applied using servoj() command by the URScript. That's the main reason why there is the fast "catching up" if some of those updates get delayed.
Question: do we really need oversampling here and why it was needed in the first place? The URScript documentation mentions that we should control the robot with 125 Hz frequency (so tell robot what to do every 0.008 s). Seems that we are trying to do it every 0.002 s - and the documentation from UR does not say what happens if we do this. In fact if I look at the docs, it looks like the servoj command will actually block for those 0.008 s and will take the latest value that was written to the cmd_servo_q variable. If I understand it correctly, the host part of the driver will calculate the positions as fast as possible, and the ur_script will use the latest value set every 0.008 s (or as often as it wakes up). Do I understand correctly? What wrong could happen if we decrease oversampling ? I am going to try it out, but I would like to understand rationale behind the hardcoded "4" oversampling ratio.
2) Why do we use TCP/IP communication and host interpolation?
I looked at the code of both - Host part and URScript part and it struck me that the host part does very little work (just very tight loop of interpolation calculation based on moveit-provided trajectory). But the communication overhead between those two is huge (500Hz). If I understand it correctly (see point 1), most of the positions calculated are discarded and overwritted by subsequent ones. The question is then - why do we use the host <> UR tight 500Hz loop at all? It seems perfectly reasonable (and fairly straightfoward) to implement the whole loop fully in URScript rather than rely on host and TCP/IP. That should completely eliminate the packet delay problem.
There are few reasons I can think of why it might be difficult/impossible:
If none of the above is blocking, I think I'd love to implement the improved URScript with complete control loop inside a thread in UR Script. Do you think there is something from the list (or outside of it) that blocks us from doing it this way?
3) Different commands (than servoj) to control robot
There are a number of other than servoj in UR Script that we could use to control the robot. Notably movej/movel/movep/movec (with blend). They cannot be controlled between trajectory points of course and there are potentially non-smooth joint movements at joint points, but maybe if we use them, the overal impression will be smooth. Unlike the interpolation we use now, they do not take velocity into account, but maybe it will be "good enough".
Can you think of other drawbacks of using move* series of commands instead of servoj? Do you think move* commands are feasible?
Looking forward to your comments! It would be great if we could contribute to improve the driver.
The text was updated successfully, but these errors were encountered: