forked from RussTedrake/manipulation
-
Notifications
You must be signed in to change notification settings - Fork 0
/
clutter.html
1520 lines (1266 loc) · 86.8 KB
/
clutter.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html>
<html>
<head>
<title>Ch. 5 - Bin Picking</title>
<meta name="Ch. 5 - Bin Picking" content="text/html; charset=utf-8;" />
<link rel="canonical" href="http://manipulation.csail.mit.edu/clutter.html" />
<script src="https://hypothes.is/embed.js" async></script>
<script type="text/javascript" src="chapters.js"></script>
<script type="text/javascript" src="htmlbook/book.js"></script>
<script src="htmlbook/mathjax-config.js" defer></script>
<script type="text/javascript" id="MathJax-script" defer
src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-chtml.js">
</script>
<script>window.MathJax || document.write('<script type="text/javascript" src="htmlbook/MathJax/es5/tex-chtml.js" defer><\/script>')</script>
<link rel="stylesheet" href="htmlbook/highlight/styles/default.css">
<script src="htmlbook/highlight/highlight.pack.js"></script> <!-- http://highlightjs.readthedocs.io/en/latest/css-classes-reference.html#language-names-and-aliases -->
<script>hljs.initHighlightingOnLoad();</script>
<link rel="stylesheet" type="text/css" href="htmlbook/book.css" />
</head>
<body onload="loadChapter('manipulation');">
<div data-type="titlepage" pdf="no">
<header>
<h1><a href="index.html" style="text-decoration:none;">Robotic Manipulation</a></h1>
<p data-type="subtitle">Perception, Planning, and Control</p>
<p style="font-size: 18px;"><a href="http://people.csail.mit.edu/russt/">Russ Tedrake</a></p>
<p style="font-size: 14px; text-align: right;">
© Russ Tedrake, 2020-2022<br/>
Last modified <span id="last_modified"></span>.</br>
<script>
var d = new Date(document.lastModified);
document.getElementById("last_modified").innerHTML = d.getFullYear() + "-" + (d.getMonth()+1) + "-" + d.getDate();</script>
<a href="misc.html">How to cite these notes, use annotations, and give feedback.</a><br/>
</p>
</header>
</div>
<p pdf="no"><b>Note:</b> These are working notes used for <a
href="http://manipulation.csail.mit.edu/Fall2022/">a course being taught
at MIT</a>. They will be updated throughout the Fall 2022 semester. <!-- <a
href="https://www.youtube.com/channel/UChfUOAhz7ynELF-s_1LPpWg">Lecture videos are available on YouTube</a>.--></p>
<table style="width:100%;" pdf="no"><tr style="width:100%">
<td style="width:33%;text-align:left;"><a class="previous_chapter" href=pose.html>Previous Chapter</a></td>
<td style="width:33%;text-align:center;"><a href=index.html>Table of contents</a></td>
<td style="width:33%;text-align:right;"><a class="next_chapter" href=segmentation.html>Next Chapter</a></td>
</tr></table>
<script type="text/javascript">document.write(notebook_header('clutter'))
</script>
<!-- EVERYTHING ABOVE THIS LINE IS OVERWRITTEN BY THE INSTALL SCRIPT -->
<chapter style="counter-reset: chapter 4"><h1>Bin Picking</h1>
<todo>Consider moving / teaching this BEFORE geometric pose estimation. It could actually be nice to introduce cameras / point clouds with this material instead of with pose estimation.</todo>
<p>Our initial study of geometric perception gave us some powerful tools, but
also revealed some major limitations. In the next chapter, we will begin
applying techniques from deep learning to perception. Spoiler alert: those
methods are going to be insatiable in their hunger for data. So before we get
there, I'd like to take a brief excursion into a nice subproblem that might
help us feed that need.</p>
<p>In this chapter we'll consider the simplest version of the bin picking
problem: the robot has a bin full of random objects and simply needs to move
those objects from one bin to the other. We'll be agnostic about what those
objects are and about where they end up in the other bin, but we would like
our solution to achieve a reasonable level of performance for a very wide
variety of objects. This turns out to be a pretty convenient way to create a
training ground for robot learning -- we can set the robot up to move objects
back and forth between bins all day and night, and intermittently add and
remove objects from the bin to improve diversity. Of course, it is even
easier in simulation!</p>
<p>Bin picking has potentially very important applications in industries such
as logistics, and there are significantly more refined versions of this
problem. For example, we might need to pick only objects from a specific
class, and/or place the objects in known position (e.g. for "packing"). But
let's start with the basic case.</p>
<section><h1>Generating random cluttered scenes</h1>
<p>If our goal is to test a diversity of bin picking situations, then the
first task is to figure out how to generate diverse simulations. How
should we populate the bin full of objects? So far we've set up each
simulation by carefully setting the initial positions (in the Context) for
each of the objects, but that approach won't scale.</p>
<p>So far we've used robot description files (e.g., URDF, SDF, MJCF, or
Model Directives) to parse into the MultibodyPlant; those were sufficient
for setting up the bins, cameras, and the robot which can be welded in
fixed positions. But generating distributions over objects and object
initial conditions is more advanced, and we need an algorithm for that.</p>
<subsection><h1>Falling things</h1>
<p>In the real world, we would probably just dump the random objects into
the bin. That's a decent strategy for simulation, too. We can roughly
expect our simulation to faithfully implement multibody physics as long
as our initial conditions (and time step) are reasonable; the physics
isn't well defined if we initialize the Context with multiple objects
occupying the same physical space. The simplest and most common way to
avoid this is to generate a random number of objects in random poses,
with their vertical positions staggered so that they trivially start out
of penetration.</p>
<p>If you look for them, you can find animations of large numbers of
falling objects in the demo reels for most advanced multibody simulators.
(These demos are actually a bit of a gimmick; the simulations may look
realistic to us, but they are often not very physically accurate.) For
our purposes the falling dynamics themselves are not the focus. We just
want the state of the objects where they are done falling as initial
conditions for our manipulation system.</p>
<todo>maybe cite the falling things paper, but make it clear that the idea
is not new?</todo></p>
<example><h1>Piles of foam bricks in 2D</h1>
<p>Here is the 2D case. I've added many instances of our favorite red
foam brick to the plant. Note that it's possible to write highly
optimized 2D simulators; that's not what I've done here. Rather, I've
added a planar joint connecting each brick to the world, and run our
full 3D simulator. The planar joint has three degrees of freedom. I've
oriented them here to be $x$, $z$, and $\theta$ to constrain the
objects to the $xz$ plane.</p>
<p>I've set the initial positions for each object in the Context to be
uniformly distributed over the horizontal position, uniformly rotated,
and staggered every 0.1m in their initial vertical position. We only
have to simulate for a little more than a second for the bricks to come
to rest and give us our intended "initial conditions".</p>
<script>document.write(notebook_link('clutter', deepnote, "", "falling_things"))</script>
<figure>
<iframe style="border:0;height:300px;width:540px;" src="data/falling_bricks_2d.html?height=240px" pdf="no"></iframe>
<p pdf="only"><a href="data/falling_bricks_2d.html">Click here for the animation.</a></p>
</figure>
</example>
<p>It's not really any different to do this with any random objects --
here is what it looks like when I run the same code, but replace the
brick with a random draw from a few objects from the YCB dataset
<elib>Calli17</elib>. It somehow amuses me that we can see the <a
href="https://en.wikipedia.org/wiki/Central_limit_theorem">central limit
theorem</a> hard at work, even when our objects are slightly
ridiculous.</p>
<figure>
<img style="width:70%" src="data/ycb_planar_clutter.png"/>
</figure>
<example><h1>Filling bins with clutter</h1>
<p>The same technique also works in 3D. Setting uniformly random
orientations in 3D requires a little more thought, but Drake supplies
the method <code>UniformlyRandomRotationMatrix</code> (and also one for
quaternions and roll-pitch-yaw) to do that work for us.</p>
<script>document.write(notebook_link('clutter'))</script>
<figure>
<img style="width:70%" src="data/ycb_clutter.png"/>
</figure>
</example>
<p>Please appreciate that bins are a particularly simple case for
generating random scenarios. If you wanted to generate random kitchen
environments, for example, then you won't be as happy with a solution
that drops refrigerators, toasters, and stools from uniformly random
i.i.d. poses. In those cases, authoring reasonable distributions gets
much more interesting; we will revisit the topic of generative scene
models <elib>Izatt20</elib> later in the notes.</p>
</subsection>
<subsection><h1>Static equilibrium with frictional contact</h1>
<p>Even in the case of bins, we should try to think critically about
whether dropping objects from a height is really the best solution. Given
our discussion in the last chapter about writing optimizations with
non-penetration constraints, I hope you are already asking yourself: why
not use those constraints again here? Let's explore that idea a bit
further.</p>
<p>I won't dive into a full discussion of multibody dynamics nor
multibody simulation, though I do have more notes <a
href="http://underactuated.mit.edu/multibody.html">available here</a>.
What is important to understand here is that the equations of motion of
our multibody system are described by differential equations of the form:
$$M(q)\dot{v} + C(q,v)v = \tau_g(q) + \sum_i J^T_i(q)f^{c_i}.$$ We
already understand the generalized positions, $q$, and velocities, $v$.
The left side of the equation is just a generalization of "mass times
acceleration", with the mass matrix, $M$, and the Coriolis terms $C$. The
right hand side is the sum of the (generalized) forces, with $\tau_g(q)$
capturing the terms due to gravity, and $f^{c_i}$ is the <a
href="https://drake.mit.edu/doxygen_cxx/group__multibody__spatial__vectors.html">spatial
force</a> due to the $i$th contact. $J_i(q)$ is the $i$th "contact
Jacobian" -- it is the Jacobian that maps from the generalized velocities
to the spatial velocity of the $i$th contact frame.</p>
<p>Our interest here is in finding (stable) steady-state solutions to
these differential equations that can serve as good initial conditions
for our simulation. At steady-state we have $v=\dot{v}=0$, and
conveniently all of the terms on the left-hand side of the equations are
zero. This leaves us with just the force-balance equations $$\tau_g(q) =
- \sum_i J^T_i(q) f^{c_i}.$$ But we still need to understand where the
contact forces come from.</p>
<subsubsection><h1>Collision geometry</h1>
<p>Geometry engines for robotics, like <code>SceneGraph</code> in
<drake></drake>, distinguish between a few different <a
href="https://drake.mit.edu/doxygen_cxx/group__geometry__roles.html">roles</a>
that geometry can play in a simulation. In <a
href="http://sdformat.org/spec">robot description files</a>, we
distinguish between <a
href="http://sdformat.org/spec?ver=1.8&elem=link">visual</a>
and <a
href="http://sdformat.org/spec?ver=1.8&elem=collision">collision</a>
geometries. In particular, every rigid body in the simulation can have
multiple
<i>collision</i>
geometries associated with it (playing the "proximity" role). Collision
geometries are often much simpler than the visual geometries we use for
illustration and simulating perception -- sometimes they are just a
low-polygon-count version of the visual mesh and sometimes we actually
use much simpler geometry (like boxes, spheres, and cylinders). These
simpler geometries make the physics engine faster and more robust.</p>
<p>For subtle reasons we will explore below, in addition to simplifying
the geometry, we sometimes over-parameterize the collision geometry in
order to make the numerics more robust. For example, when we simulate
the red brick, we actually use <i>nine</i> collision geometries for
the one body.</p>
<figure><img style="width:50%"
src="data/foam_brick_contact_geometry.png"/><figcaption>The (exaggerated)
contact geometry used for robust simulation of boxes. We add contact
"points" (epsilon radius spheres) to each corner, and have a slightly
inset box for the remaining contacts. <a
href="data/foam_brick_contact_geometry.html">Here</a> is the interactive
version.</figcaption></figure>
<example><h1>Collision geometry inspector</h1>
<p>Drake's tutorial on "authoring a multibody simulation" has a very
useful <code>model_inspector</code> tool that publishes by the
illustration and collision geometry roles to Meshcat. This is very
useful for designing new models, but also for understanding the contact
geometry of existing models.</p>
<p><a
href="https://deepnote.com/workspace/Drake-0b3b2c53-a7ad-441b-80f8-bf8350752305/project/Tutorials-2b4fc509-aef2-417d-a40d-6071dfed9199/%2Fauthoring_multibody_simulation.ipynb#00005-d93fafb7-f99e-4989-a817-49ab668f44e0"
style="background:none; border:none;" target="${project}"> <img
src="https://deepnote.com/buttons/launch-in-deepnote-white.svg"></a></p>
<p>You might want to copy that snippet of code into your own notebook
to use it with your own model files. (We'll be making this much easier
<a
href="https://github.com/RobotLocomotion/drake/issues/17689">soon</a>).</p>
</example>
<p>SceneGraph also implements the concept of a <a
href="https://drake.mit.edu/doxygen_cxx/classdrake_1_1geometry_1_1_collision_filter_declaration.html">collision
filter</a>. It can be important to specify that, for instance, the iiwa
geometry at link 5 cannot collide with the geometry in links 4 or 6.
Specifying that some collisions should be ignored not only speeds up the
computation, but it also facilitates the use of simplified collision
geometry for links. It would be extremely hard to approximate link 4 and
5 accurately with spheres, and cylinders if I had to make sure that those
spheres and cylinders did not overlap in any feasible joint angle. The
default collision filter settings should work fine for most applications,
but you can tweak them if you like.</p>
<p>So where do the contact forces, $f^{c_i}$, come from? There is
potentially an equal and opposite contact force for every <i>pair</i> of
collision geometries that are not filtered out by the collision filter.
In <code>SceneGraph</code>, the <a
href="https://drake.mit.edu/doxygen_cxx/classdrake_1_1geometry_1_1_scene_graph_inspector.html#a79b84ada25718e2a38fd2d13adb4d839"><code>GetCollisionCandidates</code></a>
method returns them all. We'll take to calling the two bodies in a collision pair "body A" and "body B".</p>
</subsubsection>
<subsubsection><h1>The Contact Frame</h1>
<p>We still need to decide the magnitude and direction of these spatial
forces, and to do this we need to specify the
<i>contact frame</i> in which the spatial force is to be applied. For
instance, we <a
href="https://drake.mit.edu/doxygen_cxx/classdrake_1_1multibody_1_1_point_pair_contact_info.html">might
use</a> $C_B$ to denote a contact frame on body $B$, with the forces
applied at the origin of the frame.</p>
<p>Most simulators summarize the contact between two bodies as a
translational force (e.g. zero torque) at one or more contact points. Our
convention will be to align the positive $z$ axis with the "contact
normal", with positive forces resisting penetration. Defining this normal
can be deceptively complicated. For instance, what is the normal at the
corner of a box? Taking the normal as a (<a
href="https://en.wikipedia.org/wiki/Subderivative">sub-</a>)gradient of
the signed-distance function for the collision geometry provides a
reliable definition that will extend to the distance between two bodies
and into generalized coordinates. The $x$ and $y$ axes of the contact
frame are any orthogonal basis for the tangential coordinates. You can
find additional figures and explanations <a
href="https://underactuated.csail.mit.edu/multibody.html#contact">here</a>.</p>
<example><h1>Brick on a half-plane</h1>
<p>Let's work through these details on a simple example -- our foam
brick sitting on the ground. The ground is welded to the world, so has
no degrees of freedom; we can ignore the forces applied to the ground
and focus on the forces applied to the brick.</p>
<p>Where are the contact forces and contact frames? If we used only
box-on-box geometry, then the locations of the contact forces are
ambiguous during this "face-on-face" contact; this is even more true in
3D (where three forces would be sufficient). But by adding extra
contact spheres to the corners of our box, we are telling the physics
engine that we specifically want contact forces at each of the corners
(and all four of the corners in 3D). I've labeled the frames $C_1$ and
$C_2$ in the sketch below.</p>
<figure>
<img src="data/brick_on_half_plane.jpg" width=50%/>
</figure>
<p>In order to achieve static equilibrium, we require that all of the
forces and torques on the brick be in balance. This is a simple
manifestation of the equations $$\tau_g(q) = - \sum_i J^T_i(q)
f^{c_i},$$ because the configuration $q$ is the $x,z, \theta$ position
and orientation of the brick; we have three equations and three
unknowns: \begin{gather*} 0 = \sum_i f^{c_i}_{C_{i,x}} \\ -mg = -\sum_i
f^{c_i}_{C_{i,z}} \\ 0 = \sum_i \left[ \left[ p^{C_i} - p^{B_{cm}}
\right] \times f^{c_i} \right]_{W_y}.\end{gather*} The last equation
represents the torque balance, taken around the center of mass of the
brick which I've call $p^{B_{cm}}$. Torque about $\theta$ corresponds
to the $y$ component of the cross product. The torque balance ensures
that $f^{c_1}_{C_{1,z}} = f^{c_2}_{C_{2,z}}$ assuming the center of
mass is in the middle of the box; force balance in $z$ set them equal
to $\frac{mg}{2}$. We haven't said anything about the horizontal
forces yet ($0$ is a reasonable solution here). Let's develop that
next.</p>
</example>
</subsubsection>
<subsubsection><h1>The (Coulomb) Friction Cone</h1>
<p>Now the rules governing contact forces can begin to take shape. First
and foremost, we demand that there is no force at a distance. Using
$\phi_i(q)$ to denote the distance between two bodies in configuration
$q$, we have $$\phi(q) > 0 \Rightarrow f^{c_i} = 0.$$ Second, we demand
that the normal force only resists penetration; bodies are never pulled
into contact: $$f^{c_i}_{C_z} \ge 0.$$ In <i>rigid</i> contact models, we
solve for the smallest normal force enforces the non-penetration
constraint (this is known as the principle of least constraint). In
<i>soft</i> contact models, we define the force to be a function of the
penetration depth and velocity.</p> <todo>Citations</todo>
<todo>Need figures here</todo>
<p>Forces in the tangential directions are due to friction. The most
commonly used model of friction is Coulomb friction, which states that
$$|f^{c_i}_{C_{x,y}}|_2 \le \mu f^{c_i}_{C_z},$$ with $\mu$ a
non-negative scalar <i>coefficient of friction</i>. Typically we define
both a $\mu_{static}$, which is applied when the tangential velocity is
zero, and $\mu_{dynamic}$, applied when the tangential velocity is
non-zero. In the Coulomb friction model, the tangential contact force is
the force within this friction cone which produces maximum dissipation.
</p>
<p>Taken together, the geometry of these constraints forms a <a
href="https://en.wikipedia.org/wiki/Convex_cone">cone</a> of admissable
contact forces. It is famously known as the "friction cone", and we will
refer to it often.</p>
<p>It's a bit strange for us to write that the forces are in some set.
Surely the world will pick just one force to apply? It can't apply all
forces in a set. The friction cone specifies the range of possible
forces; under the Coulomb friction model we say that the one force that
will be picked is the force inside this set that will successfully resist
relative motion in the contact x-y frame. If no force inside the friction
cone can completely resist the motion, then we have sliding, and we say
that the force that the world will apply is the force inside the friction
cone of <i>maximal dissipation</i>. For the conic friction cone, this
will be pointed in the direction opposite of the sliding velocity. So
even though the world will indeed "pick" one force from the friction cone
to apply, it can still be meaningful to reason about the set of possible
forces that could be applied because those denote the set of possible
opposing forces that friction can perfectly resist. For instance, a brick
under gravity will not move if we can exactly oppose the force of gravity
with a force inside the friction cone.</p>
<example><h1>Brick on an inclined half-plane</h1>
<p>If we take our last example, but tilt the table to an angle relative
to gravity, then the horizontal forces start becoming important.
Before going through the equations, let's check your intuition. Will
the magnitude of the forces on the two corners stay the same in this
configuration? Or will there be more contact force on the lower
corner?</p>
<figure>
<img src="data/brick_incline.jpg" width=50%/>
</figure>
<p>In the illustration above, you'll notice that the contact frames
have rotated so that the $z$ axis is aligned with the contact normals.
I've sketched two possible friction cones (dark green and lighter
green), corresponding to two different coefficients of friction. We
can tell immediately by inspection that the smaller value of $\mu$
(corresponding to the darker green) cannot produce contact forces that
will completely counteract gravity (the friction cone does not contain
the vertical line). In this case, the box will slide and no static
equilibrium exists in this configuration.</p>
<p>If we increase the coefficient of (static) friction to the range
corresponding to the lighter green, then we can find contact forces
that produce an equilibrium. Indeed, for this problem, we need some
amount of friction to even have an equilibrium (we'll explore this in
the <a href="#static_equilibrium">exercises</a>). We also need for the
vertical projection of the center of mass onto the ramp to land between
the two contact points, otherwise the brick will tumble over the bottom
edge. We can see this by writing our same force/torque balance
equations. We can write them in body frame, B, assuming the center of
mass in the center of the brick and the brick has length $l$ and height
$h$:\begin{gather*} f^{c_1}_{B_x} + f^{c_2}_{B_x} = - mg\sin\gamma \\
f^{c_1}_{B_z} + f^{c_2}_{B_z} = mg\cos\gamma \\ -h f^{c_1}_{B_x} + l
f^{c_1}_{B_z} = h f^{c_2}_{B_x} + l f^{c_2}_{B_z} \\ f^{c_1}_{B_z} \ge
0, \quad f^{c_2}_{B_z} \ge 0 \\ |f^{c_1}_{B_x}| \le \mu f^{c_1}_{B_z},
\quad |f^{c_2}_{B_x}| \le \mu f^{c_2}_{B_z} \end{gather*} </p>
<p>So, are the magnitude of the contact forces the same or different?
Substituting the first equation into the third reveals $$f^{c_2}_{B_z}
= f^{c_1}_{B_z} + \frac{mgh}{l}\sin\gamma.$$</p>
</example>
</subsubsection>
<subsubsection><h1>Static equilibrium as an optimization problem</h1>
<p>Rather than dropping objects from a random height, perhaps we can
initialize our simulations using optimization to find the initial
conditions that are already in static equilibrium. In <drake></drake>,
the
<a
href="https://drake.mit.edu/doxygen_cxx/classdrake_1_1multibody_1_1_static_equilibrium_problem.html"><code>StaticEquilbriumProblem</code></a>
collects all of the constraints we enumerated above into an optimization
problem: \begin{align*} \find_q \quad \subjto \quad& \tau_g(q) = - \sum_i
J^T_i(q) f^{c_i} & \\ & f^{c_i}_{C_z} \ge 0 & \forall i, \\ &
|f^{c_i}_{C_{x,y}}|_2 \le \mu f^{c_i}_{C_z} & \forall i, \\ & \phi_i(q)
\ge 0 & \forall i, \\ & \phi(q) = 0 \textbf{ or } f^{c_i}_{C_z} = 0
&\forall i, \\ & \text{joint limits}.\end{align*} This is a nonlinear
optimization problem: it includes the nonconvex non-penetration
constraints we discussed in the last chapter. The second-to-last
constraints (a logical or) is particularly interesting; constraints of
the form $x \ge 0, y \ge 0, x =0 \textbf{ or } y = 0$ are known as
complementarity constraints, and are often written as $x \ge 0, y \ge 0,
xy = 0$. We can make the problem easier for the nonlinear optimization
solver by relaxing the equality to $0 \le \phi(q) f^{c_i}_{C_z} \le
\text{tol}$, which provides a proper gradient for the optimizer to follow
at the cost of allowing some force at a distance.</p>
<p>It's easy to add additional costs and constraints; for initial
conditions we might use an objective that keeps $q$ close to an initial
configuration-space sample.</p>
<example><h1>Tall Towers</h1></example>
<p>So how well does it work? </p> <todo>finish this...</todo>
</subsubsection>
</subsection>
</section>
<section><h1>A few of the nuances of simulating contact</h1>
<example><h1>Contact force inspector</h1>
<p>I've created a simple GUI that allows you to pick any two primitive
geometry types and inspect the contact information that is computed when
those object are put into penetration. There is a lot of information
displayed there! Take a minute to make sure you understand the colors,
and the information provided in the textbox display.
</p>
<script>document.write(notebook_link('clutter', deepnote, '', 'contact_inspector'))</script>
</example>
<p>When I play with the GUI above, I feel almost overwhelmed. First,
overwhelmed by the sheer number of cases that we have to get right in the
code; it's unfortunately extremely common for open-source tools to have
bugs in here. But second, overwhelmed by the sense that we are asking the
wrong question. Asking to summarize the forces between two bodies in deep
penetration with a single Cartesian force applied at a point is fraught
with peril. As you move the objects, you will find many discontinuities;
this is a common reason why you sometimes see rigid body simulations
"explode". It might seem natural to try to use multiple contact points
instead of a single contact point to summarize the forces, and some
simulators do, but it is very hard to write an algorithm that only depends
on the current configurations of the geometry which applies forces
consistently from one time step to the next with these approaches.</p>
<p>I currently feel that the only fundamental way to get around these
numerical issues is to start reasoning about the entire volume of
penetration. The Drake developers have recently proposed a version of
this that we believe we can make computationally tractable enough to be
viable for real-time simulation <elib>Elandt19</elib>, and are working
hard to bring this into full service in Drake. You will find many
references to "hydroelastic" contact throughout the code. We hope to turn
it on by default soon.</p>
<todo>Add the hydroelastic skittles video here. https://youtu.be/5-k2Pc6OmzE</todo>
<p>In the mean time, we can make simulations robust by carefully curating the contact geometry...</p>
</section>
<section><h1>Model-based grasp selection</h1>
<p>What makes a good grasp? This topic has been studied extensively for
decades in robotics, with an original focus on thinking of a (potentially
dexterous) hand interacting with a known object.
<elib>Prattichizzo08</elib> is an excellent survey of that literature; I
will summarize just a few of the key ideas here. To do it, let's start by
extended our spatial algebra to include forces.</p>
<subsection id="spatial_force"><h1>Spatial force</h1>
<p>In our discussion of contact forces above, we started by thinking of a
force as a three-element vector (with components for $x$, $y$, and $z$)
applied to a rigid body at some point. More generally, we will define a
six-component vector for <a
href="https://drake.mit.edu/doxygen_cxx/group__multibody__spatial__vectors.html"><i>spatial
force</i></a>: \begin{equation}F^{B_p}_{{\text{name}},C} =
\begin{bmatrix} \tau^{B_p}_{\text{name},C} \\ f^{B_p}_{\text{name},C}
\end{bmatrix} \quad \text{ or } \quad
\left[F^{B_p}_{\text{name}}\right]_C = \begin{bmatrix}
\left[\tau^{B_p}_{\text{name}}\right]_C \\
\left[f^{B_p}_{\text{name}}\right]_C \end{bmatrix}.\end{equation}
$F^{B_p}_{\text{name},C}$ is the named spatial force applied to a point,
or frame origin, $B_p$, expressed in frame $C$. The form with the
parentheses is preferred in Drake, but is a verbose for my taste here in
the notes. The name is optional, and the expressed in frame, if
unspecified, is the world frame. For forces in particular, it is
recommended that we include the body, $B$, that the force is being
applied to in the symbol for the point $B_p$, especially since we will
often have equal and opposite forces. In code, we write
<code>Fname_Bp_C</code>.</p>
<p>Like spatial velocity, spatial forces have a rotational component and
a translational component; $\tau^{B_p}_C \in \Re^3$ is the <i>torque</i>
(on body $B$ applied at point $p$ expressed in frame $C$), and $f^{B_p}_C
\in \Re^3$ is the translational or Cartesian force. A spatial force is
also commonly referred to as a <a
href="https://en.wikipedia.org/wiki/Screw_theory#Wrench">wrench</a>. If
you find it strange to think of forces as having a rotational component,
then think of it this way: the world might only impart Cartesian forces
at the points of contact, but we want to summarize the combined effect of
many Cartesian forces applied to different points into one common frame.
To do this, we represent the equivalent effect of each Cartesian force at
the point of application as a force + torque applied at a different point
on the body.</p>
<p>Spatial forces fit neatly into our spatial algebra:
<ul>
<li>Spatial forces add when they are applied to the same body in the
same frame, e.g.: \begin{equation}F^{B_p}_{\text{total},C}
= \sum_i F^{B_p}_{i,C} .\end{equation}</li>
<li>Shifting a spatial force from one application point, $B_p$, to
another point, $B_q$, uses the cross product: \begin{equation} f^{B_q}_C
= f^{B_p}_C, \qquad \tau^{B_q}_C = \tau^{B_p}_C + {}^{B_q}p^{B_p}_C
\times f^{B_p}_C.\label{eq:spatial_force_shift}\end{equation}</li>
<li>As with all spatial vectors, rotations can be used to change
between the "expressed-in" frames: \begin{equation} f^{B_p}_D = {}^DR^C
f^{B_p}_C, \qquad \tau^{B_p}_D = {}^DR^C
\tau^{B_p}_C.\end{equation}</li>
</ul>
</p>
<p>Now we are starting to have the vocabulary to provide one answer to
the question: what makes a good grasp? If the goal of a grasp is to
stabilize an object in the hand, then a good grasp can be one that is
able to resist disturbances described as an "adversarial" wrench applied
to the body.</p>
</subsection>
<subsection><h1>The contact wrench cone</h1>
<p>Above, we introduced the friction cone as the range of possible forces
that friction is able to produce in order to resist motion. For the
purposes of grasp planning, by applying the additive inverse to the
forces in the friction cone, we can obtain all of the "adversarial"
forces that can be resisted at the point of contact. And to understand
the total ability of all contact forces (applied at multiple points of
contact) to resist motion of the body, we want to somehow add all of
these forces together. Fortunately, the spatial algebra for spatial
forces can be readily extended from operating on spatial force vectors to
operating on entire sets of spatial forces.</p>
<p>Because our sets of interest here are convex cones, I will use the
relatively standard choice of $\mathcal{K}$ for the six-dimensional
<i>wrench cone</i>. Specifically, we have
$\mathcal{K}^{B_p}_{\text{name}, C}$ for the cone corresponding to
potential spatial forces for $F^{B_p}_{\text{name}, C}$. For instance,
our Coulomb friction cone for the point contact model (which, as we've
defined it, has no torsional friction) in the contact frame could be:
\begin{equation}\mathcal{K}^C_C = \left\{ \begin{bmatrix} 0 \\ 0 \\ 0 \\
f^{C}_{C_x} \\ f^C_{C_y} \\ f^C_{C_z} \end{bmatrix} : \sqrt{
\left(f^{C}_{C_x}\right)^2 + \left(f^{C}_{C_y}\right)^2 } \le \mu
f^C_{C_z} \right\}.\end{equation}</p>
<p>The spatial algebra for spatial forces can be applied directly to the
wrench cones:
<ul>
<li>For addition of wrenches applied at the same point and expressed
in the same frame, the interpretation we seek is that the cone formed
by the sum of wrench cones describes the set of wrenches that could
be obtained by choosing one element from each of the individual cones
and summing them up. This set operator is the <a
href="https://wiki.algo.is/Minkowski%20sum">Minkowski sum</a>, which
we denote with $\oplus$, e.g.:
\begin{equation}\mathcal{K}^{B_p}_{\text{total},C}=
\mathcal{K}^{B_p}_{0,C} \oplus \mathcal{K}^{B_p}_{1,C} \oplus
\cdots\end{equation}</li>
<li>Shifting a wrench cone from one application frame, $B_p$, to
another frame, $B_q$, is a linear operation on the cones; to
emphasize that I will write Eq $\ref{eq:spatial_force_shift}$ in
matrix form: \begin{equation} \mathcal{K}^{B_q}_C = \begin{bmatrix}
I_{3 \times 3} & \left[ {}^{B_q}p^{B_p}_C \right]_\times \\ 0_{3
\times 3} & I_{3 \times 3} \end{bmatrix}
\mathcal{K}^{B_p}_C,\end{equation} where the notation $[p]_{\times}$
is the skew-symmetric matrix corresponding to the cross product.</li>
<li>Rotations can be used to change between the "expressed-in"
frames: \begin{equation} \mathcal{K}^{B_p}_D = \begin{bmatrix} {}^DR^C
& 0_{3 \times 3} \\ 0_{3 \times 3} & {}^DR^C \end{bmatrix}
\mathcal{K}^{B_p}_C.\end{equation}</li>
</ul>
</p>
<example><h1>A contact wrench cone visualization</h1>
<p>I've made a simple interactive visualization for you to play with to
help your intuition about these wrench cones. I have a box that fixed
in space and only subjected to contact forces (no gravity is
illustrated here); I've drawn the friction cone at the point of contact
and at the center of the body.</p>
<p>There is one major caveat: the wrench cone lines in $\Re^6$, but I
can only draw cones in $\Re^3$. So I've made the (standard) choice to
draw the projection of the 6d cone into 3d space with two cones: one
for the translational components (in green for the contact point and
again in red for the body frame) and another for the rotational
components (in blue for the body frame). This can be slightly
misleading because one cannot actually select independently from both
sets.</p>
<script>document.write(notebook_link('clutter', deepnote, '', 'contact_wrench'))</script>
<p>Here is the contact wrench cone for a single contact point, visualized for two different contact locations:</p>
<figure>
<img src="figures/contact_wrench_1.png" width="40%">
<img src="figures/contact_wrench_2.png" width="40%">
</figure>
<p>I hope that you immediately notice that the rotational component of
the wrench cone is low dimensional -- due to the cross product, all
vectors in that space must be orthogonal to the vector ${}^Bp^C$. Of
course it's way better to run the notebook yourself and get the 3d
interactive visualization.</p>
<p>Here is the contact wrench cone for a two contact points on that are
directly opposite from each other:</p>
<figure>
<img src="figures/contact_wrench_3.png" width="50%">
</figure>
<p>Notice that while each of the rotational cones are low dimensional,
they span different spaces. Together (as understood by the Minkowski
sum of these two sets) they can resist all pure torques applied at the
body frame. This intuition is largely correct, but this is also where
the projection of the 6d cone onto two 3d cones becomes a bit
misleading.there are some wrenches that cannot be resisted by this
grasp. Specifically, if I were to visualize the wrench cone at the
point directly between the two contact points, we would see that the
wrench cones <i>do not include</i> torques applied directly along the
axis between the two contacts. The two contact points alone, without
torsional friction, are unable to resist torques about that axis.</p>
<p>In practice, the gripper model that we use in our simulation
workflow has multiple contact points places along the round surface of
the gripper; even though each of them alone has no torsional friction
the net wrench from these multiple points of contact provides some
amount of wrench in the axis between the fingers. The exact amount one
gets will be proportional to how hard the gripper is squeezing the
object.</p>
</example>
<p>Now we can compute the cone of possible wrenches that any set of
contacts can apply on a body -- the contact wrench cone -- by putting all
of the individual contact wrench cones into a common frame and summing
them together. A classic metric for grasping would say that a good grasp
is one where the contact wrench cone is large (can resist many
adversarial wrench disturbances). If the contact wrench cone is all of
$\Re^6$, then we say the contacts have a achieved <i>force
closure</i><elib>Prattichizzo08</elib>.</p>
<p>It's worth mentioning that the elegant (linear) spatial algebra of the wrench cones also makes these quantities very suitable for use in optimization (e.g. <elib>Dai16</elib>).</p>
</subsection>
<subsection><h1>Colinear antipodal grasps</h1>
<p>The beauty of this wrench analysis originally inspires a very
model-based analysis of grasping, where one could try to optimize the
contact locations in order to maximize the contact wrench cone. But our
goals for this chapter are to assume very little about the object that we
are grasping, so we'll (for now) avoid optimizing over the surface of an
object model for the best grasp location. Nevertheless, our model-based
grasp analysis gives us a few very good heuristics for grasping even
unknown objects.</p>
<p>In particular, a good heuristic for a two fingered gripper to have a
large contact wrench cone is to find colinear "antipodal" grasp points.
Antipodal here means that the normal vectors of the contact (the $z$ axis
of the contact frames) are pointing in exactly opposite directions. And
"colinear" means that they are on the same line -- the line between the
two contact points. As you can see in the two-fingered contact wrench
visualization above, this is a reasonably strong heuristic for having a
large total contact wrench cone. As we will see next, we can apply this
heuristic even without knowing much of anything about the objects.</p>
</subsection>
</section>
<section><h1>Grasp selection from point clouds</h1>
<p>Rather than looking into the bin of objects, trying to recognize
specific objects and estimate their pose, the newer approach to grasp
selection that has proven incredibly useful for bin picking is to just look
for graspable areas directly on the (unsegmented) point cloud. Some of the
most visible proponents of this approach are
<elib>tenPas17+Mahler17+Zeng18</elib>. If you look at those references,
you'll see that they all use learning to somehow decide where in the point
cloud to grasp. And I <b>do</b> think learning has a lot to offer in terms
of choosing good grasps -- it's a nice and natural learning formulation and
there is significant information besides just what appears in the
immediate sensor returns that can contribute to deciding where to grasp. But
often you will see that underlying those learning approaches is an approach
to selecting grasps based on geometry alone, if only to produce the original
training data. I also think that the community doesn't regularly
acknowledge just how well the purely geometric approaches still work. I'd
like to discuss those first.</p>
<subsection><h1>Point cloud pre-processing</h1>
<p>To get a good view into the bin, we're going to set up multiple RGB-D
cameras. I've used three per bin in all of my examples here. And those
cameras don't only see the objects in the bin; they also see the bins, the
other cameras, and anything else in the viewable workspace. So we have a
little work to do to merge the point clouds from multiple cameras into a
single point cloud that only includes the area of interest.</p>
<p>First, we can <i>crop</i> the point cloud to discard any points that
are from outside the area of interest (which we'll define as an
axis-aligned bounding box immediately above the known location of the
bin).</p>
<p>As we will discuss in some detail below, many of our grasp selection
strategies will benefit from <i>estimating the "normals"</i> of the point
cloud (a unit vector that estimates the normal direction relative to the
surface of the underlying geometry). It is actually better to estimate
the normals on the individual point clouds, making use of the camera
location that generated those points, than to estimate the normal after
the point cloud has been merged.</p>
<p>For sensors mounted on the real world, <i>merging point clouds</i>
requires high-quality camera calibration and must deal with the messy
depth returns. All of the tools from the last chapter are relevant, as
the tasks of merging the point clouds is another instance of the
point-cloud-registration problem. For the perfect depth measurements we
can get out of simulation, given known camera locations, we can skip this
step and simply concatenate the list of points in the point clouds
together.</p>
<p>Finally, the resulting raw point clouds likely include many more points
then we actually need for our grasp planning. One of the standard
approaches for <i>down-sampling</i> the point clouds is using a <a
href="https://en.wikipedia.org/wiki/Voxel">voxel</a> grid -- regularly
sized cubes tiled in 3D. We then summarize the original point cloud with
exactly one point per voxel (see, for instance <a
href="http://www.open3d.org/docs/latest/tutorial/Advanced/voxelization.html">Open3D's
note on voxelization</a>). Since point clouds typically only occupy a
small percentage of the voxels in 3D space, we use sparse data structures
to store the voxel grid. In noisy point clouds, this voxelization step is
also a useful form of filtering.</p>
<example><h1>Mustard bottle point clouds</h1>
<figure><img style="width:60%"
src="data/mustard_normals.png"></figure>
<script>document.write(notebook_link('clutter', deepnote, '', 'point_cloud_processing'))</script>
<p>I've produced a scene with three cameras looking at our favorite YCB
mustard bottle. I've taken the individual point clouds (already
converted to the world frame by the
<code>DepthImageToPointCloud</code> system), cropped the point clouds
(to get rid of the geometry from the other cameras), estimated their
normals, merged the point clouds, then down-sampled the point clouds.
The order is important!</p>
<p>I've pushed all of the point clouds to meshcat, but with many of them
set to not be visible by default. Use the drop-down menu to turn them
on and off, and make sure you understand basically what is happening on
each of the steps. For this one, I can also give you the <a
href="data/mustard_bottle_point_clouds.html">meshcat output
directly</a>, if you don't want to run the code.</p>
</example>
</subsection>
<subsection><h1>Estimating normals and local curvature</h1>
<p>The grasp selection strategy that we will develop below will be based on the local geometry (normal direction and curvature) of the scene. Understanding how to estimate those quantities from point clouds is an excellent exercise in point cloud processing, and is representative of other similar point cloud algorithms.</p>
<p>Let's think about the problem of fitting a plane, in a least-squares
sense, to a set of points<elib>Shakarji98</elib>. We can describe a plane
in 3D with a position $p$ and a unit length normal vector, $n$. The
distance between a point $p^i$ and a plane is simply given by the
magnitude of the inner product, $\left| (p^i - p)^T n \right|.$ So our
least-squares optimization becomes $$\min_{p, n} \quad \sum_{i=1}^N
\left| (p^i - p)^T n \right|^2, \quad \subjto \quad |n| = 1. $$ Taking
the gradient of the Lagrangian with respect to $p$ and setting it equal
to zero gives us that $$p^* = \frac{1}{N} \sum_{i=1}^N p^i.$$ Inserting
this back into the objective, we can write the problem as $$\min_n n^T W
n, \quad \subjto \quad |n|=1, \quad \text{where } W = \left[ \sum_i (p^i
- p^*) (p^i - p^*)^T \right].$$ Geometrically, this objective is a
quadratic bowl centered at the origin, with a unit circle constraint. So
the optimal solution is given by the (unit-length) eigenvector
corresponding to the <i>smallest</i> eigenvalue of the data matrix, $W$.
And for any optimal $n$, the "flipped" normal $-n$ is also optimal. We
can pick arbitrarily for now, and then flip the normals in a
post-processing step (to make sure that the normals all point towards the
camera origin).</p>
<p>What is really interesting is that the second and third
eigenvalues/eigenvectors also tell us something about the local geometry.
Because $W$ is symmetric, it has orthogonal eigenvectors, and these
eigenvectors form a (local) basis for the point cloud. The smallest
eigenvalue pointed along the normal, and the <i>largest</i> eigenvalue
corresponds to the direction of least curvature (the squared dot product
with this vector is the largest). This information can be very useful for
finding and planning grasps. <elib>tenPas17</elib> and others before them
use this as a primary heuristic in generating candidate grasps.</p>
<p>In order to approximate the local curvature of a mesh represented by a
point cloud, we can use our fast nearest neighbor queries to find a
handful of local points, and use this plane fitting algorithm on just
those points. When doing normal estimation directly on a depth image,
people often forgo the nearest-neighbor query entirely; simply using the
approximation that neighboring points in pixel coordinates are often
nearest neighbors in the point cloud. We can repeat that entire
procedure for every point in the point cloud.</p>
<p>I remember when working with point clouds started to become a bigger
part of my life, I thought that surely doing anything moderately
computational like this on every point in some dense point cloud would be
incompatible with online perception. But I was wrong! Even years ago,
operations like this one were often used inside real-time perception
loops. (And they pale in comparison to the number of <a
href="https://en.wikipedia.org/wiki/FLOPS">FLOPs</a> we spend these days
evaluating large neural networks).</p>
<example id="normal_estimation"><h1>Normals and local curvature of the mustard bottle.</h1>
<figure><img style="width:60%"
src="data/mustard_surface_estimation.png"></figure>
<script>document.write(notebook_link('clutter', deepnote, '', 'normal_estimation'))</script>
<p>I've coded up the basic least-squares surface estimation algorithm,
with the query point in green, the nearest neighbors in blue, and the
local least squares estimation drawn with our RGB$\Leftrightarrow$XYZ
frame graphic. You should definitely slide it around and see if you can
understand how the axes line up with the normal and local curvature.</p>
</example>
<p>You might wonder where you can read more about algorithms of this type.
I don't have a great reference for you. But Radu Rusu was the main author
of the point cloud library<elib>Rusu11</elib>, and his thesis has a lot of
nice summaries of the point cloud algorithms of
2010<elib>Rusu10</elib>.</p>
</subsection>
<subsection id="grasp_candidates"><h1>Evaluating a candidate grasp</h1>
<p>Now that we have processed our point cloud, we have everything we need
to start planning grasps. I'm going to break that discussion down into
two steps. In this section we'll come up with a cost function that scores
grasp candidates. In the following section, we'll discuss some very
simple ideas for trying to find grasps candidates that have a low
cost.</p>
<p>Following our discussion of "model-based" grasp selection above, once
we pick up an object -- or whatever happens to be between our fingers
when we squeeze -- then we will expect the contact forces between our
fingers to have to resist at least the <i>gravitational wrench</i> (the
spatial force due to gravity) of the object. The closing force provided
by our gripper is in the gripper's $x$-axis, but if we want to be able to
pick up the object without it slipping from our hands, then we need
forces inside the friction cones of our contacts to be able to resist the
gravitational wrench. Since we don't know what that wrench will be (and
are somewhat constrained by the geometry of our fingers), a reasonable
strategy is to look the colinear antipodal points on the surface of the
point cloud which also align with $x$-axis of the gripper. In a real
point cloud, we are unlikely to find perfect antipodal pairs, but finding
areas with normals pointing in nearly opposite directions is a good
strategy for grasping!</p>
<example><h1>Scoring grasp candidates</h1>
<p>In practice, the contact between our fingers and the object(s) will
be better described by a <i>patch contact</i> than by a point contact
(due to the deflection of the rubber fingertips and any deflection of
the objects being picked). So it makes sense to look for patches of
points with agreeable normals. There are many ways one could write
this, I've done it here by transforming the processed point cloud of
the scene into the candidate frame of the gripper, and cropped away all
of the points except the ones that are inside a bounding box between
the finger tips (I've marked them in red in MeshCat). The first term
in my grasping cost function is just reward for all of the points in
the point cloud, based on how aligned their normal is to the $x$-axis
of the gripper: $$\text{cost} = -\sum_i (n^i_{G_x})^2,$$ where
$n^i_{G_x}$ is the $x$ component of the $i$th point in the cropped
point cloud expressed in the gripper frame.</p>
<figure><img style="width:80%"
src="data/grasp_candidate_evaluation.png"></figure>
<p>There are other considerations for what might make a good grasp, too.
For our kinematically limited robot reaching into a bin, we might favor
grasps that put the hand in favorable orientation for the arm. In the
grasp metric I've implemented in the code, I added a cost for the hand
deviating from vertical. I can reward the dot product of the vector
world $-z$ vector, $[0, 0, -1]$ with the $y$-axis in gripper frame
rotated into world frame with : $$\text{cost} \mathrel{{+}{=}} -\alpha
\begin{bmatrix} 0 & 0 &-1\end{bmatrix}R^G \begin{bmatrix}0 \\ 1 \\
0\end{bmatrix} = \alpha R^G_{3,2},$$ where $\alpha$ is relative cost
weight, and $R^G_{3,2}$ is the scalar element of the rotation matrix in
the third row and second column.</p>
<p>Finally, we need to consider collisions between the candidate grasp
and both the bins and with the point cloud. I simply return infinite
cost when the gripper is in collision. I've implemented all of those
terms in the notebook, and given you a sliders to move the hand around
and see how the cost changes.</p>
<script>document.write(notebook_link('clutter', deepnote, '', 'grasp_selection'))</script>
</example>
</subsection>
<subsection><h1>Generating grasp candidates</h1>
<p>We've defined a cost function that, given a point cloud from the scene
and a model of the environment (e.g. the location of the bins), can score
a candidate grasp pose, $X^G$. So now we would like to solve the
optimization problem: find $X^G$ that minimizes the cost subject to the
collision constraints.</p>
<p>Unfortunately, this is an extremely difficult optimization problem,
with a highly nonconvex objective and constraints. Moreover, the cost
terms corresponding to the antipodal points is zero for most $X^G$ --
since most random $X^G$ will not have any points between the fingers. As
a result, instead of using the typical mathematical programming solvers,
most approaches in the literature resort to a randomized sampling-based