-
Notifications
You must be signed in to change notification settings - Fork 83
/
DependentInductiveTypes.v
1555 lines (1326 loc) · 60.7 KB
/
DependentInductiveTypes.v
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
(** Formal Reasoning About Programs <http://adam.chlipala.net/frap/>
* Supplementary Coq material: dependent inductive types
* Author: Adam Chlipala
* License: https://creativecommons.org/licenses/by-nc-nd/4.0/
* Much of the material comes from CPDT <http://adam.chlipala.net/cpdt/> by the same author. *)
Require Import FrapWithoutSets SubsetTypes.
Set Implicit Arguments.
Set Asymmetric Patterns.
(* Subset types and their relatives help us integrate verification with
* programming. Though they reorganize the certified programmer's workflow,
* they tend not to have deep effects on proofs. We write largely the same
* proofs as we would for classical verification, with some of the structure
* moved into the programs themselves. It turns out that, when we use dependent
* types to their full potential, we warp the development and proving process
* even more than that, picking up "free theorems" to the extent that often a
* certified program is hardly more complex than its uncertified counterpart in
* Haskell or ML.
*
* In particular, we have only scratched the tip of the iceberg that is Coq's
* inductive definition mechanism. *)
(** * Length-Indexed Lists *)
(* Many introductions to dependent types start out by showing how to use them to
* eliminate array bounds checks. When the type of an array tells you how many
* elements it has, your compiler can detect out-of-bounds dereferences
* statically. Since we are working in a pure functional language, the next
* best thing is length-indexed lists, which the following code defines. *)
Section ilist.
Variable A : Set.
(* Note how now we are sure to write out the type of each constructor in full,
* instead of using the shorthand notation we favored previously. The reason
* is that now the index to the inductive type [ilist] depends on details of a
* constructor's arguments. We are also using [Set], the type containing the
* normal types of programming. *)
Inductive ilist : nat -> Set :=
| Nil : ilist O
| Cons : forall n, A -> ilist n -> ilist (S n).
(* We see that, within its section, [ilist] is given type [nat -> Set].
* Previously, every inductive type we have seen has either had plain [Set] as
* its type or has been a predicate with some type ending in [Prop]. The full
* generality of inductive definitions lets us integrate the expressivity of
* predicates directly into our normal programming.
*
* The [nat] argument to [ilist] tells us the length of the list. The types
* of [ilist]'s constructors tell us that a [Nil] list has length [O] and that
* a [Cons] list has length one greater than the length of its tail. We may
* apply [ilist] to any natural number, even natural numbers that are only
* known at runtime. It is this breaking of the _phase distinction_ that
* characterizes [ilist] as _dependently typed_.
*
* In expositions of list types, we usually see the length function defined
* first, but here that would not be a very productive function to code.
* Instead, let us implement list concatenation. *)
Fixpoint app n1 (ls1 : ilist n1) n2 (ls2 : ilist n2) : ilist (n1 + n2) :=
match ls1 with
| Nil => ls2
| Cons _ x ls1' => Cons x (app ls1' ls2)
end.
(* Past Coq versions signalled an error for this definition. The code is
* still invalid within Coq's core language, but current Coq versions
* automatically add annotations to the original program, producing a valid
* core program. These are the annotations on [match] discriminees that we
* began to study with subset types. We can rewrite [app] to give the
* annotations explicitly. *)
Fixpoint app' n1 (ls1 : ilist n1) n2 (ls2 : ilist n2) : ilist (n1 + n2) :=
match ls1 in (ilist n1) return (ilist (n1 + n2)) with
| Nil => ls2
| Cons _ x ls1' => Cons x (app' ls1' ls2)
end.
(* Using [return] alone allowed us to express a dependency of the [match]
* result type on the _value_ of the discriminee. What [in] adds to our
* arsenal is a way of expressing a dependency on the _type_ of the
* discriminee. Specifically, the [n1] in the [in] clause above is a
* _binding occurrence_ whose scope is the [return] clause.
*
* We may use [in] clauses only to bind names for the arguments of an
* inductive type family. That is, each [in] clause must be an inductive type
* family name applied to a sequence of underscores and variable names of the
* proper length. The positions for _parameters_ to the type family must all
* be underscores. Parameters are those arguments declared with section
* variables or with entries to the left of the first colon in an inductive
* definition. They cannot vary depending on which constructor was used to
* build the discriminee, so Coq prohibits pointless matches on them. It is
* those arguments defined in the type to the right of the colon that we may
* name with [in] clauses.
*
* Here's a useful function with a surprisingly subtle type, where the return
* type depends on the _value_ of the argument. *)
Fixpoint inject (ls : list A) : ilist (length ls) :=
match ls with
| nil => Nil
| h :: t => Cons h (inject t)
end.
(* We can define an inverse conversion and prove that it really is an
* inverse. *)
Fixpoint unject n (ls : ilist n) : list A :=
match ls with
| Nil => nil
| Cons _ h t => h :: unject t
end.
Theorem inject_inverse : forall ls, unject (inject ls) = ls.
Proof.
induct ls; simplify; equality.
Qed.
(* Now let us attempt a function that is surprisingly tricky to write. In ML,
* the list head function raises an exception when passed an empty list. With
* length-indexed lists, we can rule out such invalid calls statically, and
* here is a first attempt at doing so. We write [_] for a term that we wish
* Coq would fill in for us, but we'll have no such luck. *)
Fail Definition hd n (ls : ilist (S n)) : A :=
match ls with
| Nil => _
| Cons _ h _ => h
end.
(* It is not clear what to write for the [Nil] case, so we are stuck before we
* even turn our function over to the type checker. We could try omitting the
* [Nil] case. *)
Fail Fail Definition hd n (ls : ilist (S n)) : A :=
match ls with
| Cons _ h _ => h
end.
(* Actually, these days, Coq is smart enough to make that definition work!
* However, it will be educational to look at how Coq elaborates this code
* into its core language, where, unlike in ML, all pattern matching must be
* _exhaustive_. We might try using an [in] clause somehow. *)
Fail Fail Definition hd n (ls : ilist (S n)) : A :=
match ls in (ilist (S n)) with
| Cons _ h _ => h
end.
(* Due to some relatively new heuristics, Coq does accept this code, but in
* general it is not legal to write arbitrary patterns for the arguments of
* inductive types in [in] clauses. Only variables are permitted there, in
* Coq's core language. A completely general mechanism could only be
* supported with a solution to the problem of higher-order unification, which
* is undecidable.
*
* Our final, working attempt at [hd] uses an auxiliary function and a
* surprising [return] annotation. *)
Definition hd' n (ls : ilist n) :=
match ls in (ilist n) return (match n with O => unit | S _ => A end) with
| Nil => tt
| Cons _ h _ => h
end.
Check hd'.
Definition hd n (ls : ilist (S n)) : A := hd' ls.
(* We annotate our main [match] with a type that is itself a [match]. We
* write that the function [hd'] returns [unit] when the list is empty and
* returns the carried type [A] in all other cases. In the definition of [hd],
* we just call [hd']. Because the index of [ls] is known to be nonzero, the
* type checker reduces the [match] in the type of [hd'] to [A]. *)
(* In fact, when we "got lucky" earlier with Coq accepting simpler
* definitions, under the hood it was desugaring _almost_ to this one. *)
Definition easy_hd n (ls : ilist (S n)) : A :=
match ls with
| Cons _ h _ => h
end.
Print easy_hd.
End ilist.
(** * The One Rule of Dependent Pattern Matching in Coq *)
(* The rest of this chapter will demonstrate a few other elegant applications of
* dependent types in Coq. Readers encountering such ideas for the first time
* often feel overwhelmed, concluding that there is some magic at work whereby
* Coq sometimes solves the halting problem for the programmer and sometimes
* does not, applying automated program understanding in a way far beyond what
* is found in conventional languages. The point of this section is to cut off
* that sort of thinking right now! Dependent type-checking in Coq follows just
* a few algorithmic rules, with just one for _dependent pattern matching_ of
* the kind we met in the previous section.
*
* A dependent pattern match is a [match] expression where the type of the
* overall [match] is a function of the value and/or the type of the
* _discriminee_, the value being matched on. In other words, the [match] type
* _depends_ on the discriminee.
*
* When exactly will Coq accept a dependent pattern match as well-typed? Some
* other dependently typed languages employ fancy decision procedures to
* determine when programs satisfy their very expressive types. The situation
* in Coq is just the opposite. Only very straightforward symbolic rules are
* applied. Such a design choice has its drawbacks, as it forces programmers to
* do more work to convince the type checker of program validity. However, the
* great advantage of a simple type checking algorithm is that its action on
* _invalid_ programs is easier to understand!
*
* We come now to the one rule of dependent pattern matching in Coq. A general
* dependent pattern match assumes this form (with unnecessary parentheses
* included to make the syntax easier to parse):
[[
match E as y in (T x1 ... xn) return U with
| C z1 ... zm => B
| ...
end
]]
* The discriminee is a term [E], a value in some inductive type family [T],
* which takes [n] arguments. An [as] clause binds the name [y] to refer to the
* discriminee [E]. An [in] clause binds an explicit name [xi] for the [i]th
* argument passed to [T] in the type of [E].
*
* We bind these new variables [y] and [xi] so that they may be referred to in
* [U], a type given in the [return] clause. The overall type of the [match]
* will be [U], with [E] substituted for [y], and with each [xi] substituted by
* the actual argument appearing in that position within [E]'s type.
*
* In general, each case of a [match] may have a pattern built up in several
* layers from the constructors of various inductive type families. To keep
* this exposition simple, we will focus on patterns that are just single
* applications of inductive type constructors to lists of variables. Coq
* actually compiles the more general kind of pattern matching into this more
* restricted kind automatically, so understanding the typing of [match]
* requires understanding the typing of [match]es lowered to match one
* constructor at a time.
*
* The last piece of the typing rule tells how to type-check a [match] case. A
* generic constructor application [C z1 ... zm] has some type [T x1' ... xn'],
* an application of the type family used in [E]'s type, probably with
* occurrences of the [zi] variables. From here, a simple recipe determines
* what type we will require for the case body [B]. The type of [B] should be
* [U] with the following two substitutions applied: we replace [y] (the [as]
* clause variable) with [C z1 ... zm], and we replace each [xi] (the [in]
* clause variables) with [xi']. In other words, we specialize the result type
* based on what we learn from which pattern has matched the discriminee.
*
* This is an exhaustive description of the ways to specify how to take
* advantage of which pattern has matched! No other mechanisms come into play.
* For instance, there is no way to specify that the types of certain free
* variables should be refined based on which pattern has matched.
*
* A few details have been omitted above. Inductive type families may have both
* _parameters_ and regular arguments. Within an [in] clause, a parameter
* position must have the wildcard [_] written, instead of a variable. (In
* general, Coq uses wildcard [_]'s either to indicate pattern variables that
* will not be mentioned again or to indicate positions where we would like type
* inference to infer the appropriate terms.) Furthermore, recent Coq versions
* are adding more and more heuristics to infer dependent [match] annotations in
* certain conditions. The general annotation-inference problem is undecidable,
* so there will always be serious limitations on how much work these heuristics
* can do. When in doubt about why a particular dependent [match] is failing to
* type-check, add an explicit [return] annotation! At that point, the
* mechanical rule sketched in this section will provide a complete account of
* "what the type checker is thinking." Be sure to avoid the common pitfall of
* writing a [return] annotation that does not mention any variables bound by
* [in] or [as]; such a [match] will never refine typing requirements based on
* which pattern has matched. (One simple exception to this rule is that, when
* the discriminee is a variable, that same variable may be treated as if it
* were repeated as an [as] clause.) *)
(** * A Tagless Interpreter *)
(* A favorite example for motivating the power of functional programming is
* implementation of a simple expression language interpreter. In ML and
* Haskell, such interpreters are often implemented using an algebraic datatype
* of values, where at many points it is checked that a value was built with the
* right constructor of the value type. With dependent types, we can implement a
* _tagless_ interpreter that both removes this source of runtime inefficiency
* and gives us more confidence that our implementation is correct. *)
Inductive type : Set :=
| Nat : type
| Bool : type
| Prod : type -> type -> type.
Inductive exp : type -> Set :=
| NConst : nat -> exp Nat
| Plus : exp Nat -> exp Nat -> exp Nat
| Eq : exp Nat -> exp Nat -> exp Bool
| BConst : bool -> exp Bool
| And : exp Bool -> exp Bool -> exp Bool
| If : forall t, exp Bool -> exp t -> exp t -> exp t
| Pair : forall t1 t2, exp t1 -> exp t2 -> exp (Prod t1 t2)
| Fst : forall t1 t2, exp (Prod t1 t2) -> exp t1
| Snd : forall t1 t2, exp (Prod t1 t2) -> exp t2.
(* We have a standard algebraic datatype [type], defining a type language of
* naturals, Booleans, and product (pair) types. Then we have the indexed
* inductive type [exp], where the argument to [exp] tells us the encoded type
* of an expression. In effect, we are defining the typing rules for
* expressions simultaneously with the syntax.
*
* We can give types and expressions semantics in a new style, based critically
* on the chance for _type-level computation_. *)
Fixpoint typeDenote (t : type) : Set :=
match t with
| Nat => nat
| Bool => bool
| Prod t1 t2 => typeDenote t1 * typeDenote t2
end%type.
(* The [typeDenote] function compiles types of our object language into "native"
* Coq types. It is deceptively easy to implement. The only new thing we see
* is the [%type] annotation, which tells Coq to parse the [match] expression
* using the notations associated with types. Without this annotation, the [*]
* would be interpreted as multiplication on naturals, rather than as the
* product type constructor. The token [%type] is one example of an identifier
* bound to a _notation scope delimiter_.
*
* We can define a function [expDenote] that is typed in terms of
* [typeDenote]. *)
Fixpoint expDenote t (e : exp t) : typeDenote t :=
match e with
| NConst n => n
| Plus e1 e2 => expDenote e1 + expDenote e2
| Eq e1 e2 => if expDenote e1 ==n expDenote e2 then true else false
| BConst b => b
| And e1 e2 => expDenote e1 && expDenote e2
| If _ e' e1 e2 => if expDenote e' then expDenote e1 else expDenote e2
| Pair _ _ e1 e2 => (expDenote e1, expDenote e2)
| Fst _ _ e' => fst (expDenote e')
| Snd _ _ e' => snd (expDenote e')
end.
(* Despite the fancy type, the function definition is routine. In fact, it is
* less complicated than what we would write in ML or Haskell 98, since we do
* not need to worry about pushing final values in and out of an algebraic
* datatype. The only unusual thing is the use of an expression of the form
* [if E then true else false] in the [Eq] case. Remember that [==n] has
* a rich dependent type, rather than a simple Boolean type. Coq's native [if]
* is overloaded to work on a test of any two-constructor type, so we can use
* [if] to build a simple Boolean from the [sumbool] that [==n] returns.
*
* We can implement our old favorite, a constant-folding function, and prove it
* correct. It will be useful to write a function [pairOut] that checks if an
* [exp] of [Prod] type is a pair, returning its two components if so.
* Unsurprisingly, a first attempt leads to a type error. *)
Fail Definition pairOut t1 t2 (e : exp (Prod t1 t2)) : option (exp t1 * exp t2) :=
match e in (exp (Prod t1 t2)) return option (exp t1 * exp t2) with
| Pair _ _ e1 e2 => Some (e1, e2)
| _ => None
end.
(* We run again into the problem of not being able to specify non-variable
* arguments in [in] clauses (and this time Coq's avant-garde heuristics don't
* save us). The problem would just be hopeless without a use of an [in]
* clause, though, since the result type of the [match] depends on an argument
* to [exp]. Our solution will be to use a more general type, as we did for
* [hd]. First, we define a type-valued function to use in assigning a type to
* [pairOut]. *)
Definition pairOutType (t : type) := option (match t with
| Prod t1 t2 => exp t1 * exp t2
| _ => unit
end).
(* When passed a type that is a product, [pairOutType] returns our final desired
* type. On any other input type, [pairOutType] returns the harmless
* [option unit], since we do not care about extracting components of non-pairs.
* Now [pairOut] is easy to write. *)
Definition pairOut t (e : exp t) :=
match e in (exp t) return (pairOutType t) with
| Pair _ _ e1 e2 => Some (e1, e2)
| _ => None
end.
(* With [pairOut] available, we can write [cfold] in a straightforward way.
* There are really no surprises beyond that Coq verifies that this code has
* such an expressive type, given the small annotation burden. *)
Fixpoint cfold t (e : exp t) : exp t :=
match e with
| NConst n => NConst n
| Plus e1 e2 =>
let e1' := cfold e1 in
let e2' := cfold e2 in
match e1', e2' with
| NConst n1, NConst n2 => NConst (n1 + n2)
| _, _ => Plus e1' e2'
end
| Eq e1 e2 =>
let e1' := cfold e1 in
let e2' := cfold e2 in
match e1', e2' with
| NConst n1, NConst n2 => BConst (if eq_nat_dec n1 n2 then true else false)
| _, _ => Eq e1' e2'
end
| BConst b => BConst b
| And e1 e2 =>
let e1' := cfold e1 in
let e2' := cfold e2 in
match e1', e2' with
| BConst b1, BConst b2 => BConst (b1 && b2)
| _, _ => And e1' e2'
end
| If _ e e1 e2 =>
let e' := cfold e in
match e' with
| BConst true => cfold e1
| BConst false => cfold e2
| _ => If e' (cfold e1) (cfold e2)
end
| Pair _ _ e1 e2 => Pair (cfold e1) (cfold e2)
| Fst _ _ e =>
let e' := cfold e in
match pairOut e' with
| Some p => fst p
| None => Fst e'
end
| Snd _ _ e =>
let e' := cfold e in
match pairOut e' with
| Some p => snd p
| None => Snd e'
end
end.
(* The correctness theorem for [cfold] turns out to be easy to prove, once we
* get over one serious hurdle. *)
Theorem cfold_correct : forall t (e : exp t), expDenote e = expDenote (cfold e).
Proof.
induct e; simplify; try equality.
(* We would like to do a case analysis on [cfold e1], and we attempt to do so
* in the way that has worked so far. *)
Fail cases (cfold e1).
(* A nasty error message greets us! The book's [cases] tactic could be
* extended to handle this case, but we don't generally need to do case
* analysis on dependently typed values, outside the one excursion of this
* "bonus" source file. Still, the book defines a tactic [dep_cases] that
* mostly appeals to built-in tactic [dependent destruction]. *)
dep_cases (cfold e1).
(* Incidentally, general and fully precise case analysis on dependently typed
* variables is undecidable, as witnessed by a simple reduction from the
* known-undecidable problem of higher-order unification, which has come up a
* few times already. The tactic [dep_cases] makes a best effort to handle
* some common cases.
*
* This successfully breaks the subgoal into 5 new subgoals, one for each
* constructor of [exp] that could produce an [exp Nat]. Note that
* [dep_cases] is successful in ruling out the other cases automatically, in
* effect automating some of the work that we have done manually in
* implementing functions like [hd] and [pairOut].
*
* This is the only new trick we need to learn to complete the proof. We can
* back up and give a short, automated proof. *)
Restart.
induct e; simplify;
repeat (match goal with
| [ |- context[match cfold ?E with NConst _ => _ | _ => _ end] ] =>
dep_cases (cfold E)
| [ |- context[match pairOut (cfold ?E) with Some _ => _
| None => _ end] ] =>
dep_cases (cfold E)
| [ |- context[if ?E then _ else _] ] => cases E
| [ H : _ = _ |- _ ] => rewrite H
end; simplify); try equality.
Qed.
(** * Interlude: The Convoy Pattern *)
(* Here are some examples, contemplation of which may provoke enlightenment.
* See more discussion later of the idea behind the examples. *)
Fail Definition firstElements n A B (ls1 : ilist A n) (ls2 : ilist B n) : option (A * B) :=
match ls1 with
| Cons _ v1 _ =>
Some (v1,
match ls2 in ilist _ N return match N with O => unit | S _ => B end with
| Cons _ v2 _ => v2
| Nil => tt
end)
| Nil => None
end.
Definition firstElements n A B (ls1 : ilist A n) (ls2 : ilist B n) : option (A * B) :=
match ls1 in ilist _ N return ilist B N -> option (A * B) with
| Cons _ v1 _ => fun ls2 =>
Some (v1,
match ls2 in ilist _ N return match N with O => unit | S _ => B end with
| Cons _ v2 _ => v2
| Nil => tt
end)
| Nil => fun _ => None
end ls2.
(* Note use of a [struct] annotation to tell Coq which argument should decrease
* across recursive calls. It's an artificial choice here, since usually those
* annotations are inferred. Here we are making an effort to demonstrate a
* decently common problem! *)
Fail Fixpoint zip n A B (ls1 : ilist A n) (ls2 : ilist B n) {struct ls1} : ilist (A * B) n :=
match ls1 in ilist _ N return ilist B N -> ilist (A * B) N with
| Cons _ v1 ls1' =>
fun ls2 =>
match ls2 in ilist _ N return match N with
| O => unit
| S N' => ilist A N' -> ilist (A * B) N
end with
| Cons _ v2 ls2' => fun ls1' => Cons (v1, v2) (zip ls1' ls2')
| Nll => tt
end ls1'
| Nil => fun _ => Nil _
end ls2.
Fixpoint zip n A B (ls1 : ilist A n) (ls2 : ilist B n) {struct ls1} : ilist (A * B) n :=
match ls1 in ilist _ N return ilist B N -> ilist (A * B) N with
| Cons _ v1 ls1' =>
fun ls2 =>
match ls2 in ilist _ N return match N with
| O => unit
| S N' => (ilist B N' -> ilist (A * B) N') -> ilist (A * B) N
end with
| Cons _ v2 ls2' => fun zip_ls1' => Cons (v1, v2) (zip_ls1' ls2')
| Nll => tt
end (zip ls1')
| Nil => fun _ => Nil _
end ls2.
(** * Dependently Typed Red-Black Trees *)
(* Red-black trees are a favorite purely functional data structure with an
* interesting invariant. We can use dependent types to guarantee that
* operations on red-black trees preserve the invariant. For simplicity, we
* specialize our red-black trees to represent sets of [nat]s. *)
Inductive color : Set := Red | Black.
Inductive rbtree : color -> nat -> Set :=
| Leaf : rbtree Black 0
| RedNode : forall n, rbtree Black n -> nat -> rbtree Black n -> rbtree Red n
| BlackNode : forall c1 c2 n, rbtree c1 n -> nat -> rbtree c2 n -> rbtree Black (S n).
(* A value of type [rbtree c d] is a red-black tree whose root has color [c] and
* that has black depth [d]. The latter property means that there are exactly
* [d] black-colored nodes on any path from the root to a leaf. *)
(* At first, it can be unclear that this choice of type indices tracks any
* useful property. To convince ourselves, we will prove that every red-black
* tree is balanced. We will phrase our theorem in terms of a depth-calculating
* function that ignores the extra information in the types. It will be useful
* to parameterize this function over a combining operation, so that we can
* reuse the same code to calculate the minimum or maximum height among all
* paths from root to leaf. *)
Section depth.
Variable f : nat -> nat -> nat.
Fixpoint depth c n (t : rbtree c n) : nat :=
match t with
| Leaf => 0
| RedNode _ t1 _ t2 => S (f (depth t1) (depth t2))
| BlackNode _ _ _ t1 _ t2 => S (f (depth t1) (depth t2))
end.
End depth.
(* Our proof of balanced-ness decomposes naturally into a lower bound and an
* upper bound. We prove the lower bound first. Unsurprisingly, a tree's black
* depth provides such a bound on the minimum path length. *)
Theorem depth_min : forall c n (t : rbtree c n), depth min t >= n.
Proof.
induction t; simplify; linear_arithmetic.
Qed.
(* There is an analogous upper-bound theorem based on black depth.
* Unfortunately, a symmetric proof script does not suffice to establish it. *)
Theorem depth_max : forall c n (t : rbtree c n), depth max t <= 2 * n + 1.
Proof.
induction t; simplify; try linear_arithmetic.
(* In the remaining goal, we see that [IHt1] is _almost_ the fact we need, but
* it is not quite strong enough. We will need to strengthen our induction
* hypothesis to get the proof to go through. *)
Abort.
(* In particular, we prove a lemma that provides a stronger upper bound for
* trees with black root nodes. We got stuck above in a case about a red root
* node. Since red nodes have only black children, our IH strengthening will
* enable us to finish the proof. *)
Lemma depth_max' : forall c n (t : rbtree c n), match c with
| Red => depth max t <= 2 * n + 1
| Black => depth max t <= 2 * n
end.
Proof.
induction t; simplify;
repeat match goal with
| [ _ : context[match ?C with Red => _ | Black => _ end] |- _ ] =>
cases C
end; linear_arithmetic.
Qed.
(* The original theorem follows easily from the lemma. *)
Theorem depth_max : forall c n (t : rbtree c n), depth max t <= 2 * n + 1.
Proof.
simplify.
pose proof (depth_max' t).
cases c; simplify; linear_arithmetic.
Qed.
(* The final balance theorem establishes that the minimum and maximum path
* lengths of any tree are within a factor of two of each other. *)
Theorem balanced : forall c n (t : rbtree c n), 2 * depth min t + 1 >= depth max t.
Proof.
simplify.
pose proof (depth_min t).
pose proof (depth_max t).
linear_arithmetic.
Qed.
(* Now we are ready to implement an example operation on our trees, insertion.
* Insertion can be thought of as breaking the tree invariants locally but then
* rebalancing. In particular, in intermediate states we find red nodes that
* may have red children. The type [rtree] captures the idea of such a node,
* continuing to track black depth as a type index. *)
Inductive rtree : nat -> Set :=
| RedNode' : forall c1 c2 n, rbtree c1 n -> nat -> rbtree c2 n -> rtree n.
(* Before starting to define [insert], we define predicates capturing when a
* data value is in the set represented by a normal or possibly invalid tree. *)
Section present.
Variable x : nat.
Fixpoint present c n (t : rbtree c n) : Prop :=
match t with
| Leaf => False
| RedNode _ a y b => present a \/ x = y \/ present b
| BlackNode _ _ _ a y b => present a \/ x = y \/ present b
end.
Definition rpresent n (t : rtree n) : Prop :=
match t with
| RedNode' _ _ _ a y b => present a \/ x = y \/ present b
end.
End present.
(* Insertion relies on two balancing operations. It will be useful to give types
* to these operations using a relative of the subset types from SubsetTypes.
* While subset types let us pair a value with a proof about that value, here we
* want to pair a value with another non-proof dependently typed value. The
* [sigT] type fills this role. *)
Locate "{ _ : _ & _ }".
Print sigT.
(* It will be helpful to define a concise notation for the constructor of
* [sigT]. *)
Notation "{< x >}" := (existT _ _ x).
(* Each balance function is used to construct a new tree whose keys include the
* keys of two input trees, as well as a new key. One of the two input trees
* may violate the red-black alternation invariant (that is, it has an [rtree]
* type), while the other tree is known to be valid. Crucially, the two input
* trees have the same black depth.
*
* A balance operation may return a tree whose root is of either color. Thus,
* we use a [sigT] type to package the result tree with the color of its root.
* Here is the definition of the first balance operation, which applies when the
* possibly invalid [rtree] belongs to the left of the valid [rbtree].
*
* A quick word of encouragement: After writing this code, even I do not
* understand the precise details of how balancing works! I consulted Chris
* Okasaki's paper "Red-Black Trees in a Functional Setting" and transcribed the
* code to use dependent types. Luckily, the details are not so important here;
* types alone will tell us that insertion preserves balanced-ness, and we will
* prove that insertion produces trees containing the right keys.*)
Definition balance1 n (a : rtree n) (data : nat) c2 :=
match a in rtree n return rbtree c2 n
-> { c : color & rbtree c (S n) } with
| RedNode' _ c0 _ t1 y t2 =>
match t1 in rbtree c n return rbtree c0 n -> rbtree c2 n
-> { c : color & rbtree c (S n) } with
| RedNode _ a x b => fun c d =>
{<RedNode (BlackNode a x b) y (BlackNode c data d)>}
| t1' => fun t2 =>
match t2 in rbtree c n return rbtree Black n -> rbtree c2 n
-> { c : color & rbtree c (S n) } with
| RedNode _ b x c => fun a d =>
{<RedNode (BlackNode a y b) x (BlackNode c data d)>}
| b => fun a t => {<BlackNode (RedNode a y b) data t>}
end t1'
end t2
end.
(* We apply a trick that I call the _convoy pattern_. Recall that [match]
* annotations only make it possible to describe a dependence of a [match]
* _result type_ on the discriminee. There is no automatic refinement of the
* types of free variables. However, it is possible to effect such a refinement
* by finding a way to encode free variable type dependencies in the [match]
* result type, so that a [return] clause can express the connection.
*
* In particular, we can extend the [match] to return _functions over the free
* variables whose types we want to refine_. In the case of [balance1], we only
* find ourselves wanting to refine the type of one tree variable at a time. We
* match on one subtree of a node, and we want the type of the other subtree to
* be refined based on what we learn. We indicate this with a [return] clause
* starting like [rbtree _ n -> ...], where [n] is bound in an [in] pattern.
* Such a [match] expression is applied immediately to the "old version" of the
* variable to be refined, and the type checker is happy.
*
* Here is the symmetric function [balance2], for cases where the possibly
* invalid tree appears on the right rather than on the left. *)
Definition balance2 n (a : rtree n) (data : nat) c2 :=
match a in rtree n return rbtree c2 n -> { c : color & rbtree c (S n) } with
| RedNode' _ c0 _ t1 z t2 =>
match t1 in rbtree c n return rbtree c0 n -> rbtree c2 n
-> { c : color & rbtree c (S n) } with
| RedNode _ b y c => fun d a =>
{<RedNode (BlackNode a data b) y (BlackNode c z d)>}
| t1' => fun t2 =>
match t2 in rbtree c n return rbtree Black n -> rbtree c2 n
-> { c : color & rbtree c (S n) } with
| RedNode _ c z' d => fun b a =>
{<RedNode (BlackNode a data b) z (BlackNode c z' d)>}
| b => fun a t => {<BlackNode t data (RedNode a z b)>}
end t1'
end t2
end.
(* Now we are almost ready to get down to the business of writing an [insert]
* function. First, we enter a section that declares a variable [x], for the
* key we want to insert. *)
Section insert.
Variable x : nat.
(* Most of the work of insertion is done by a helper function [ins], whose
* return types are expressed using a type-level function [insResult]. *)
Definition insResult c n :=
match c with
| Red => rtree n
| Black => { c' : color & rbtree c' n }
end.
(* That is, inserting into a tree with root color [c] and black depth [n], the
* variety of tree we get out depends on [c]. If we started with a red root,
* then we get back a possibly invalid tree of depth [n]. If we started with
* a black root, we get back a valid tree of depth [n] with a root node of an
* arbitrary color.
*
* Here is the definition of [ins]. Again, we do not want to dwell on the
* functional details. *)
Fixpoint ins c n (t : rbtree c n) : insResult c n :=
match t with
| Leaf => {< RedNode Leaf x Leaf >}
| RedNode _ a y b =>
if le_lt_dec x y
then RedNode' (projT2 (ins a)) y b
else RedNode' a y (projT2 (ins b))
| BlackNode c1 c2 _ a y b =>
if le_lt_dec x y
then
match c1 return insResult c1 _ -> _ with
| Red => fun ins_a => balance1 ins_a y b
| _ => fun ins_a => {< BlackNode (projT2 ins_a) y b >}
end (ins a)
else
match c2 return insResult c2 _ -> _ with
| Red => fun ins_b => balance2 ins_b y a
| _ => fun ins_b => {< BlackNode a y (projT2 ins_b) >}
end (ins b)
end.
(* The one new trick is a variation of the convoy pattern. In each of the
* last two pattern matches, we want to take advantage of the typing
* connection between the trees [a] and [b]. We might naively apply the
* convoy pattern directly on [a] in the first [match] and on [b] in the
* second. This satisfies the type checker per se, but it does not satisfy
* the termination checker. Inside each [match], we would be calling [ins]
* recursively on a locally bound variable. The termination checker is not
* smart enough to trace the dataflow into that variable, so the checker does
* not know that this recursive argument is smaller than the original
* argument. We make this fact clearer by applying the convoy pattern on _the
* result of a recursive call_, rather than just on that call's argument.
*
* Finally, we are in the home stretch of our effort to define [insert]. We
* just need a few more definitions of non-recursive functions. First, we
* need to give the final characterization of [insert]'s return type.
* Inserting into a red-rooted tree gives a black-rooted tree where black
* depth has increased, and inserting into a black-rooted tree gives a tree
* where black depth has stayed the same and where the root is an arbitrary
* color. *)
Definition insertResult c n :=
match c with
| Red => rbtree Black (S n)
| Black => { c' : color & rbtree c' n }
end.
(* A simple clean-up procedure translates [insResult]s into
* [insertResult]s. *)
Definition makeRbtree {c n} : insResult c n -> insertResult c n :=
match c with
| Red => fun r =>
match r with
| RedNode' _ _ _ a x b => BlackNode a x b
end
| Black => fun r => r
end.
(* Finally, we define [insert] as a simple composition of [ins] and
* [makeRbtree]. *)
Definition insert c n (t : rbtree c n) : insertResult c n :=
makeRbtree (ins t).
(* As we noted earlier, the type of [insert] guarantees that it outputs
* balanced trees whose depths have not increased too much. We also want to
* know that [insert] operates correctly on trees interpreted as finite sets,
* so we finish this section with a proof of that fact. *)
Section present.
Variable z : nat.
(* The variable [z] stands for an arbitrary key. We will reason about [z]'s
* presence in particular trees. As usual, outside the section the theorems
* we prove will quantify over all possible keys, giving us the facts we wanted.
*
* We start by proving the correctness of the balance operations. It is
* useful to define a custom tactic [present_balance] that encapsulates the
* reasoning common to the two proofs. *)
Ltac present_balance :=
simplify;
repeat (match goal with
| [ _ : context[match ?T with Leaf => _ | _ => _ end] |- _ ] =>
dep_cases T
| [ |- context[match ?T with Leaf => _ | _ => _ end] ] => dep_cases T
end; simplify); propositional.
(* The balance correctness theorems are simple first-order logic
* equivalences, where we use the function [projT2] to project the payload
* of a [sigT] value. *)
Lemma present_balance1 : forall n (a : rtree n) (y : nat) c2 (b : rbtree c2 n),
present z (projT2 (balance1 a y b))
<-> rpresent z a \/ z = y \/ present z b.
Proof.
simplify; cases a; present_balance.
Qed.
Lemma present_balance2 : forall n (a : rtree n) (y : nat) c2 (b : rbtree c2 n),
present z (projT2 (balance2 a y b))
<-> rpresent z a \/ z = y \/ present z b.
Proof.
simplify; cases a; present_balance.
Qed.
(* To state the theorem for [ins], it is useful to define a new type-level
* function, since [ins] returns different result types based on the type
* indices passed to it. Recall that [x] is the section variable standing
* for the key we are inserting. *)
Definition present_insResult c n :=
match c return (rbtree c n -> insResult c n -> Prop) with
| Red => fun t r => rpresent z r <-> z = x \/ present z t
| Black => fun t r => present z (projT2 r) <-> z = x \/ present z t
end.
(* Now the statement and proof of the [ins] correctness theorem are
* straightforward, if verbose. We proceed by induction on the structure of
* a tree, followed by finding case-analysis opportunities on expressions we
* see being analyzed in [if] or [match] expressions. After that, we
* pattern-match to find opportunities to use the theorems we proved about
* balancing. *)
Theorem present_ins : forall c n (t : rbtree c n),
present_insResult t (ins t).
Proof.
induct t; simplify;
repeat (match goal with
| [ _ : context[if ?E then _ else _] |- _ ] => cases E
| [ |- context[if ?E then _ else _] ] => cases E
| [ _ : context[match ?C with Red => _ | Black => _ end]
|- _ ] => cases C
end; simplify);
try match goal with
| [ _ : context[balance1 ?A ?B ?C] |- _ ] =>
pose proof (present_balance1 A B C)
end;
try match goal with
| [ _ : context[balance2 ?A ?B ?C] |- _ ] =>
pose proof (present_balance2 A B C)
end;
try match goal with
| [ |- context[balance1 ?A ?B ?C] ] =>
pose proof (present_balance1 A B C)
end;
try match goal with
| [ |- context[balance2 ?A ?B ?C] ] =>
pose proof (present_balance2 A B C)
end;
simplify; propositional.
Qed.
(* The hard work is done. The most readable way to state correctness of
* [insert] involves splitting the property into two color-specific
* theorems. We write a tactic to encapsulate the reasoning steps that work
* to establish both facts. *)
Ltac present_insert :=
unfold insert; intros n t;
pose proof (present_ins t); simplify;
cases (ins t); propositional.
Theorem present_insert_Red : forall n (t : rbtree Red n),
present z (insert t)
<-> (z = x \/ present z t).
Proof.
present_insert.
Qed.
Theorem present_insert_Black : forall n (t : rbtree Black n),
present z (projT2 (insert t))
<-> (z = x \/ present z t).
Proof.
present_insert.
Qed.
End present.
End insert.
(* We can generate executable OCaml code with the command
* [Recursive Extraction insert], which also automatically outputs the OCaml
* versions of all of [insert]'s dependencies. In our previous extractions, we
* wound up with clean OCaml code. Here, we find uses of <<Obj.magic>>, OCaml's
* unsafe cast operator for tweaking the apparent type of an expression in an
* arbitrary way. Casts appear for this example because the return type of
* [insert] depends on the _value_ of the function's argument, a pattern that
* OCaml cannot handle. Since Coq's type system is much more expressive than
* OCaml's, such casts are unavoidable in general. Since the OCaml type-checker
* is no longer checking full safety of programs, we must rely on Coq's
* extractor to use casts only in provably safe ways. *)
Recursive Extraction insert.
(** * A Certified Regular Expression Matcher *)
(* Another interesting example is regular expressions with dependent types that
* express which predicates over strings particular regexps implement. We can
* then assign a dependent type to a regular expression matching function,
* guaranteeing that it always decides the string property that we expect it to
* decide.
*
* Before defining the syntax of expressions, it is helpful to define an
* inductive type capturing the meaning of the Kleene star. That is, a string
* [s] matches regular expression [star e] if and only if [s] can be decomposed
* into a sequence of substrings that all match [e]. We use Coq's string
* support, which comes through a combination of the [String] library and some
* parsing notations built into Coq. Operators like [++] and functions like
* [length] that we know from lists are defined again for strings. Notation
* scopes help us control which versions we want to use in particular
* contexts. *)