-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathspex.html
4343 lines (3193 loc) · 285 KB
/
spex.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta http-equiv="Content-Style-Type" content="text/css" />
<meta name="author" content="Mikio Hirabayashi" />
<meta name="keywords" content="QDBM, DBM, database, hash, B+ tree" />
<meta name="description" content="fundamental specifications of QDBM" />
<link rel="contents" href="./" />
<link rel="alternate" href="spex-ja.html" hreflang="ja" title="the Japanese version" />
<link rev="made" href="mailto:[email protected]" />
<title>Specifications of QDBM Version 1</title>
<style type="text/css">html { margin: 0em 0em; padding: 0em 0em; background: #eeeeee none; }
body { margin: 2em 2em; padding: 0em 0em;
background: #eeeeee none; color: #111111;
font-style: normal; font-weight: normal; }
h1 { margin-top: 1.8em; margin-bottom: 1.3em; font-weight: bold; }
h2 { margin-top: 1.8em; margin-bottom: 1.1em; font-weight: bold;
border-left: solid 0.6em #445555; border-bottom: solid 1pt #bbbbbb;
padding: 0.5em 0.5em; width: 60%; }
h3 { margin-top: 1.8em; margin-bottom: 0.8em; font-weight: bold; }
hr { margin-top: 2.5em; margin-bottom: 1.5em; height: 1pt;
color: #999999; background-color: #999999; border: none; }
div.note,div.navi { text-align: right; }
div.logo { text-align: center; margin: 3em 0em; }
div.logo img { border: inset 2pt #ccccdd; }
p { margin: 0.8em 0em; line-height: 140%; }
p,dd { text-indent: 0.8em; }
div,pre { margin-left: 1.7em; margin-right: 1.7em; }
pre { background-color: #ddddee; padding: 0.2em; border: 1pt solid #bbbbcc; font-size: smaller; }
kbd { color: #111111; font-style: normal; font-weight: bold; }
a { color: #0022aa; text-decoration: none; }
a:hover,a:focus { color: #0033ee; text-decoration: underline; }
a.head { color: #111111; text-decoration: none; }
table { padding: 1pt 2pt 1pt 2pt; border: none; margin-left: 1.7em; border-collapse: collapse; }
th { padding: 1pt 4pt 1pt 4pt; border-style: none;
text-align: left; vertical-align: bottom; }
td { padding: 1pt 4pt 1pt 4pt; border: 1pt solid #333333;
text-align: left; vertical-align: top; }
ul,ol,dl { line-height: 140%; }
dt { margin-left: 1.2em; }
dd { margin-left: 2.0em; }
ul.lines { list-style-type: none; }
@media print {
html,body { margin: 0em 0em; background-color: #ffffff; color: #000000; }
h1 { padding: 8em 0em 0.5em 0em; text-align: center; }
h2 { page-break-before: always; }
div.note { text-align: center; }
div.navi,div.logo { display: none }
hr { display: none; }
pre { margin: 0.8em 0.8em; background-color: #ffffff;
border: 1pt solid #aaaaaa; font-size: smaller; }
a,kbd { color: #000000; text-decoration: none; }
h1,h2,h3 { font-family: sans-serif; }
p,div,li,dt,dd { font-family: serif; }
pre,kbd { font-family: monospace; }
dd { font-size: smaller; }
}
</style>
</head>
<body>
<h1>Fundamental Specifications of QDBM Version 1</h1>
<div class="note">Copyright (C) 2000-2007 Mikio Hirabayashi</div>
<div class="note">Last Update: Thu, 26 Oct 2006 15:00:20 +0900</div>
<div class="navi">[<a href="spex-ja.html" hreflang="ja">Japanese</a>] [<a href="http://qdbm.sourceforge.net/">Home</a>]</div>
<hr />
<h2>Table of Contents</h2>
<ol>
<li><a href="#overview">Overview</a></li>
<li><a href="#features">Features</a></li>
<li><a href="#installation">Installation</a></li>
<li><a href="#depotapi">Depot: Basic API</a></li>
<li><a href="#depotcli">Commands for Depot</a></li>
<li><a href="#curiaapi">Curia: Extended API</a></li>
<li><a href="#curiacli">Commands for Curia</a></li>
<li><a href="#relicapi">Relic: NDBM-compatible API</a></li>
<li><a href="#reliccli">Commands for Relic</a></li>
<li><a href="#hovelapi">Hovel: GDBM-compatible API</a></li>
<li><a href="#hovelcli">Commands for Hovel</a></li>
<li><a href="#cabinapi">Cabin: Utility API</a></li>
<li><a href="#cabincli">Commands for Cabin</a></li>
<li><a href="#villaapi">Villa: Advanced API</a></li>
<li><a href="#villacli">Commands for Villa</a></li>
<li><a href="#odeumapi">Odeum: Inverted API</a></li>
<li><a href="#odeumcli">Commands for Odeum</a></li>
<li><a href="#fileformat">File Format</a></li>
<li><a href="#porting">Porting</a></li>
<li><a href="#bugs">Bugs</a></li>
<li><a href="#faq">Frequently Asked Questions</a></li>
<li><a href="#copying">Copying</a></li>
</ol>
<hr />
<h2><a name="overview" id="overview" class="head">Overview</a></h2>
<p>QDBM is a library of routines for managing a database. The database is a simple data file containing records, each is a pair of a key and a value. Every key and value is serial bytes with variable length. Both binary data and character string can be used as a key and a value. There is neither concept of data tables nor data types. Records are organized in hash table or B+ tree.</p>
<p>As for database of hash table, each key must be unique within a database, so it is impossible to store two or more records with a key overlaps. The following access methods are provided to the database: storing a record with a key and a value, deleting a record by a key, retrieving a record by a key. Moreover, traversal access to every key are provided, although the order is arbitrary. These access methods are similar to ones of DBM (or its followers: NDBM and GDBM) library defined in the UNIX standard. QDBM is an alternative for DBM because of its higher performance.</p>
<p>As for database of B+ tree, records whose keys are duplicated can be stored. Access methods of storing, deleting, and retrieving are provided as with the database of hash table. Records are stored in order by a comparing function assigned by a user. It is possible to access each record with the cursor in ascending or descending order. According to this mechanism, forward matching search for strings and range search for integers are realized. Moreover, transaction is available in database of B+ tree.</p>
<p>QDBM is written in C, and provided as APIs of C, C++, Java, Perl, and Ruby. QDBM is available on platforms which have API conforming to POSIX. QDBM is a free software licensed under the GNU Lesser General Public License.</p>
<hr />
<h2><a name="features" id="features" class="head">Features</a></h2>
<h3>Effective Implementation of Hash Database</h3>
<p>QDBM is developed referring to GDBM for the purpose of the following three points: higher processing speed, smaller size of a database file, and simpler API. They have been achieved. Moreover, as with GDBM, the following three restrictions of traditional DBM: a process can handle only one database, the size of a key and a value is bounded, a database file is sparse, are cleared.</p>
<p>QDBM uses hash algorithm to retrieve records. If a bucket array has sufficient number of elements, the time complexity of retrieval is `O(1)'. That is, time required for retrieving a record is constant, regardless of the scale of a database. It is also the same about storing and deleting. Collision of hash values is managed by separate chaining. Data structure of the chains is binary search tree. Even if a bucket array has unusually scarce elements, the time complexity of retrieval is `O(log n)'.</p>
<p>QDBM attains improvement in retrieval by loading RAM with the whole of a bucket array. If a bucket array is on RAM, it is possible to access a region of a target record by about one path of file operations. A bucket array saved in a file is not read into RAM with the `read' call but directly mapped to RAM with the `mmap' call. Therefore, preparation time on connecting to a database is very short, and two or more processes can share the same memory map.</p>
<p>If the number of elements of a bucket array is about half of records stored within a database, although it depends on characteristic of the input, the probability of collision of hash values is about 56.7% (36.8% if the same, 21.3% if twice, 11.5% if four times, 6.0% if eight times). In such case, it is possible to retrieve a record by two or less paths of file operations. If it is made into a performance index, in order to handle a database containing one million of records, a bucket array with half a million of elements is needed. The size of each element is 4 bytes. That is, if 2M bytes of RAM is available, a database containing one million records can be handled.</p>
<p>QDBM provides two modes to connect to a database: `reader' and `writer'. A reader can perform retrieving but neither storing nor deleting. A writer can perform all access methods. Exclusion control between processes is performed when connecting to a database by file locking. While a writer is connected to a database, neither readers nor writers can be connected. While a reader is connected to a database, other readers can be connect, but writers can not. According to this mechanism, data consistency is guaranteed with simultaneous connections in multitasking environment.</p>
<p>Traditional DBM provides two modes of the storing operations: `insert' and `replace'. In the case a key overlaps an existing record, the insert mode keeps the existing value, while the replace mode transposes it to the specified value. In addition to the two modes, QDBM provides `concatenate' mode. In the mode, the specified value is concatenated at the end of the existing value and stored. This feature is useful when adding a element to a value as an array. Moreover, although DBM has a method to fetch out a value from a database only by reading the whole of a region of a record, QDBM has a method to fetch out a part of a region of a value. When a value is treated as an array, this feature is also useful.</p>
<p>Generally speaking, while succession of updating, fragmentation of available regions occurs, and the size of a database grows rapidly. QDBM deal with this problem by coalescence of dispensable regions and reuse of them, and featuring of optimization of a database. When overwriting a record with a value whose size is greater than the existing one, it is necessary to remove the region to another position of the file. Because the time complexity of the operation depends on the size of the region of a record, extending values successively is inefficient. However, QDBM deal with this problem by alignment. If increment can be put in padding, it is not necessary to remove the region.</p>
<p>As for many file systems, it is impossible to handle a file whose size is more than 2GB. To deal with this problem, QDBM provides a directory database containing multiple database files. Due to this feature, it is possible to handle a database whose total size is up to 1TB in theory. Moreover, because database files can be deployed on multiple disks, the speed of updating operations can be improved as with RAID-0 (striping). It is also possible for the database files to deploy on multiple file servers using NFS and so on.</p>
<h3>Useful Implementation of B+ Tree Database</h3>
<p>Although B+ tree database is slower than hash database, it features ordering access to each record. The order can be assigned by users. Records of B+ tree are sorted and arranged in logical pages. Sparse index organized in B tree that is multiway balanced tree are maintained for each page. Thus, the time complexity of retrieval and so on is `O(log n)'. Cursor is provided to access each record in order. The cursor can jump to a position specified by a key and can step forward or backward from the current position. Because each page is arranged as double linked list, the time complexity of stepping cursor is `O(1)'.</p>
<p>B+ tree database is implemented, based on above hash database. Because each page of B+ tree is stored as each record of hash database, B+ tree database inherits efficiency of storage management of hash database. Because the header of each record is smaller and alignment of each page is adjusted according to the page size, in most cases, the size of database file is cut by half compared to one of hash database. Although operation of many pages are required to update B+ tree, QDBM expedites the process by caching pages and reducing file operations. In most cases, because whole of the sparse index is cached on memory, it is possible to retrieve a record by one or less path of file operations.</p>
<p>B+ tree database features transaction mechanism. It is possible to commit a series of operations between the beginning and the end of the transaction in a lump, or to abort the transaction and perform rollback to the state before the transaction. Even if the process of an application is crashed while the transaction, the database file is not broken.</p>
<p>In case that QDBM was built with ZLIB, LZO, or BZIP2 enabled, a lossless data-compression library, the content of each page of B+ tree is compressed and stored in a file. Because each record in a page has similar patterns, high efficiency of compression is expected due to the Lempel-Ziv algorithm and the like. In case handling text data, the size of a database is reduced to about 25%. If the scale of a database is large and disk I/O is the bottleneck, featuring compression makes the processing speed improved to a large extent.</p>
<h3>Simple but Various Interfaces</h3>
<p>QDBM provides very simple APIs. You can perform database I/O as usual file I/O with `FILE' pointer defined in ANSI C. In the basic API of QDBM, entity of a database is recorded as one file. In the extended API, entity of a database is recorded as several files in one directory. Because the two APIs are very similar with each other, porting an application from one to the other is easy.</p>
<p>APIs which are compatible with NDBM and GDBM are also provided. As there are a lot of applications using NDBM or GDBM, it is easy to port them onto QDBM. In most cases, it is completed only by replacement of header including (#include) and re-compiling. However, QDBM can not handle database files made by the original NDBM or GDBM.</p>
<p>In order to handle records on memory easily, the utility API is provided. It implements memory allocating functions, sorting functions, extensible datum, array list, hash map, and so on. Using them, you can handle records in C language cheaply as in such script languages as Perl or Ruby.</p>
<p>B+ tree database is used with the advanced API. The advanced API is implemented using the basic API and the utility API. Because the advanced API is also similar to the basic API and the extended API, it is easy to learn how to use it.</p>
<p>In order to handle an inverted index which is used by full-text search systems, the inverted API is provided. If it is easy to handle an inverted index of documents, an application can focus on text processing and natural language processing. Because this API does not depend on character codes nor languages, it is possible to implement a full-text search system which can respond to various requests from users.</p>
<p>Along with APIs for C, QDBM provides APIs for C++, Java, Perl, and Ruby. APIs for C are composed of seven kinds: the basic API, the extended API, the NDBM-compatible API, the GDBM-compatible API, the utility API, the advanced API, and the inverted API. Command line interfaces corresponding to each API are also provided. They are useful for prototyping, testing, debugging, and so on. The C++ API encapsulates database handling functions of the basic API, the extended API, and the advanced API with class mechanism of C++. The Java API has native methods calling the basic API, the extended API, and the advanced API with Java Native Interface. The Perl API has methods calling the basic API, the extended API, and the advanced API with XS language. The Ruby API has method calling the basic API, the extended API, and the advanced API as modules of Ruby. Moreover, CGI scripts for administration of databases, file uploading, and full-text search are provided.</p>
<h3>Wide Portability</h3>
<p>QDBM is implemented being based on syntax of ANSI C (C89) and using only APIs defined in ANSI C or POSIX. Thus, QDBM works on most UNIX and its compatible OSs. As for C API, checking operations have been done at least on the following platforms.</p>
<ul>
<li>Linux (2.2, 2.4, 2.6) (IA32, IA64, AMD64, PA-RISC, Alpha, PowerPC, M68000, ARM)</li>
<li>FreeBSD (4.9, 5.0, 5.1, 5.2, 5.3) (IA32, IA64, SPARC, Alpha)</li>
<li>NetBSD (1.6) (IA32)</li>
<li>OpenBSD (3.4) (IA32)</li>
<li>SunOS (5.6, 5.7, 5.8, 5.9, 5.10) (IA32, SPARC)</li>
<li>HP-UX (11.11, 11.23) (IA64, PA-RISC)</li>
<li>AIX (5.2) (POWER)</li>
<li>Windows (2000, XP) (IA32, IA64, AMD64) (Cygwin, MinGW, Visual C++)</li>
<li>Mac OS X (10.2, 10.3, 10.4) (IA32, PowerPC)</li>
<li>Tru64 (5.1) (Alpha)</li>
<li>RISC OS (5.03) (ARM)</li>
</ul>
<p>Although a database file created by QDBM depends on byte order of the processor, to do with it, utilities to dump data in format which is independent to byte orders are provided.</p>
<hr />
<h2><a name="installation" id="installation" class="head">Installation</a></h2>
<h3>Preparation</h3>
<p>To install QDBM from a source package, GCC of 2.8 or later version and `make' are required.</p>
<p>When an archive file of QDBM is extracted, change the current working directory to the generated directory and perform installation.</p>
<h3>Usual Steps</h3>
<p>Follow the procedures below on Linux, BSD, or SunOS.</p>
<p>Run the configuration script.</p>
<pre>./configure
</pre>
<p>Build programs.</p>
<pre>make
</pre>
<p>Perform self-diagnostic test.</p>
<pre>make check
</pre>
<p>Install programs. This operation must be carried out by the root user.</p>
<pre>make install
</pre>
<h3>Using GNU Libtool</h3>
<p>If above steps do not work, try the following steps. This way needs GNU Libtool of 1.5 or later version.</p>
<p>Run the configuration script.</p>
<pre>./configure
</pre>
<p>Build programs.</p>
<pre>make -f LTmakefile
</pre>
<p>Perform self-diagnostic test.</p>
<pre>make -f LTmakefile check
</pre>
<p>Install programs. This operation must be carried out by the root user.</p>
<pre>make -f LTmakefile install
</pre>
<h3>Result</h3>
<p>When a series of work finishes, the following files will be installed. As for the rest, manuals will be installed under `/usr/local/man/man1' and '/usr/local/man/man3', other documents will be installed under `/usr/local/share/qdbm'. A configuration file for `pkg-config' will be installed under `/usr/local/lib/pkgconfig'.</p>
<pre>/usr/local/include/depot.h
/usr/local/include/curia.h
/usr/local/include/relic.h
/usr/local/include/hovel.h
/usr/local/include/cabin.h
/usr/local/include/villa.h
/usr/local/include/vista.h
/usr/local/include/odeum.h
/usr/local/lib/libqdbm.a
/usr/local/lib/libqdbm.so.14.13.0
/usr/local/lib/libqdbm.so.14
/usr/local/lib/libqdbm.so
/usr/local/bin/dpmgr
/usr/local/bin/dptest
/usr/local/bin/dptsv
/usr/local/bin/crmgr
/usr/local/bin/crtest
/usr/local/bin/crtsv
/usr/local/bin/rlmgr
/usr/local/bin/rltest
/usr/local/bin/hvmgr
/usr/local/bin/hvtest
/usr/local/bin/cbtest
/usr/local/bin/cbcodec
/usr/local/bin/vlmgr
/usr/local/bin/vltest
/usr/local/bin/vltsv
/usr/local/bin/odmgr
/usr/local/bin/odtest
/usr/local/bin/odidx
/usr/local/bin/qmttest
</pre>
<p>When you run a program linked dynamically to `libqdbm.so', the library search path should include `/usr/local/lib'. You can set the library search path with the environment variable `LD_LIBRARY_PATH'.</p>
<p>To uninstall QDBM, execute the following command after `./configure'. This operation must be carried out by the root user.</p>
<pre>make uninstall
</pre>
<p>If an old version of QDBM is installed on your system, uninstall it before installation of a new one.</p>
<p>The other APIs except for C nor CGI scripts are not installed by default. Refer to `plus/xspex.html' to know how to install the C++ API. Refer to `java/jspex.html' to know how to install the Java API. Refer to `perl/plspex.html' to know how to install the Perl API. Refer to `ruby/rbspex.html' to know how to install the Ruby API. Refer to `cgi/cgispex.html' to know how to install the CGI script.</p>
<p>To install QDBM from such a binary package as RPM, refer to the manual of the package manager. For example, if you use RPM, execute like the following command by the root user.</p>
<pre>rpm -ivh qdbm-1.x.x-x.i386.rpm
</pre>
<h3>For Windows</h3>
<p>On Windows (Cygwin), you should follow the procedures below for installation.</p>
<p>Run the configuration script.</p>
<pre>./configure
</pre>
<p>Build programs.</p>
<pre>make win
</pre>
<p>Perform self-diagnostic test.</p>
<pre>make check-win
</pre>
<p>Install programs. As well, perform `make uninstall-win' to uninstall them.</p>
<pre>make install-win
</pre>
<p>On Windows, the import library `libqdbm.dll.a' is created as well as the static library `libqdbm.a', and the dynamic linking library `qdbm.dll' is created instead of such shared libraries as `libqdbm.so'. `qdbm.dll' is installed into `/usr/local/bin'.</p>
<p>In order to build QDBM using MinGW on Cygwin, you should perform `make mingw' instead of `make win'. With the UNIX emulation layer of Cygwin, generated programs depend on `cygwin1.dll' (they come under GNU GPL). This problem is solved by linking them to the Win32 native DLL with MinGW.</p>
<p>In order to build QDBM using Visual C++, you should edit `VCmakefile' and set the search paths for libraries and headers. And perform `nmake /f VCMakefile'. Applications linking to `qdbm.dll' should link to `msvcrt.dll' by `/MD' or `/MDd' option of the compiler. Refer to `VCmakefile' for detail configurations.</p>
<h3>For Mac OS X</h3>
<p>On Mac OS X (Darwin), you should follow the procedures below for installation.</p>
<p>Run the configuration script.</p>
<pre>./configure
</pre>
<p>Build programs.</p>
<pre>make mac
</pre>
<p>Perform self-diagnostic test.</p>
<pre>make check-mac
</pre>
<p>Install programs. As well, perform `make uninstall-mac' to uninstall them.</p>
<pre>make install-mac
</pre>
<p>On Mac OS X, `libqdbm.dylib' and so on are created instead of `libqdbm.so' and so on. You can set the library search path with the environment variable `DYLD_LIBRARY_PATH'.</p>
<h3>For HP-UX</h3>
<p>On HP-UX, you should follow the procedures below for installation.</p>
<p>Run the configuration script.</p>
<pre>./configure
</pre>
<p>Build programs.</p>
<pre>make hpux
</pre>
<p>Perform self-diagnostic test.</p>
<pre>make check-hpux
</pre>
<p>Install programs. As well, perform `make uninstall-hpux' to uninstall them.</p>
<pre>make install-hpux
</pre>
<p>On HP-UX, `libqdbm.sl' is created instead of `libqdbm.so' and so on. You can set the library search path with the environment variable `SHLIB_PATH'.</p>
<h3>For RISC OS</h3>
<p>On RISC OS, you should follow the procedures below for installation.</p>
<p>Build programs. As `cc' is used for compilation by default, if you want to use `gcc', add the argument `CC=gcc'.</p>
<pre>make -f RISCmakefile
</pre>
<p>When a series of work finishes, the library file `libqdbm' and such commands as `dpmgr' are generated. Because how to install them is not defined, copy them manually for installation. As with it, such header files as `depot.h' should be installed manually.</p>
<h3>Detail Configurations</h3>
<p>You can configure building processes by the following optional arguments of `./configure'.</p>
<ul class="lines">
<li><kbd>--enable-debug</kbd> : build for debugging. Enable debugging symbols, do not perform optimization, and perform static linking.</li>
<li><kbd>--enable-devel</kbd> : build for development. Enable debugging symbols, perform optimization, and perform dynamic linking.</li>
<li><kbd>--enable-stable</kbd> : build for stable release. Perform conservative optimization, and perform dynamic linking.</li>
<li><kbd>--enable-pthread</kbd> : feature POSIX thread and treat global variables as thread specific data.</li>
<li><kbd>--disable-lock</kbd> : build for environments without file locking.</li>
<li><kbd>--disable-mmap</kbd> : build for environments without memory mapping.</li>
<li><kbd>--enable-zlib</kbd> : feature ZLIB compression for B+ tree and inverted index.</li>
<li><kbd>--enable-lzo</kbd> : feature LZO compression for B+ tree and inverted index.</li>
<li><kbd>--enable-bzip</kbd> : feature BZIP2 compression for B+ tree and inverted index.</li>
<li><kbd>--enable-iconv</kbd> : feature ICONV utilities for conversion of character encodings.</li>
</ul>
<p>Usually, QDBM and its applications can be built without any dependency on non-standard libraries except for `libqdbm.*'. However, they depend on `libpthread.*' if POSIX thread is enabled, and they depend on `libz.*' if ZLIB is enabled, and they depend on `liblzo2.*' if LZO is enabled, and they depend on `libbz2.*' if BZIP2 is enabled, and they depend on `libiconv.*' if ICONV is enabled.</p>
<p>Because the license of LZO is GNU GPL, note that applications linking to `liblzo2.*' should meet commitments of GNU GPL.</p>
<hr />
<h2><a name="depotapi" id="depotapi" class="head">Depot: Basic API</a></h2>
<h3>Overview</h3>
<p>Depot is the basic API of QDBM. Almost all features for managing a database provided by QDBM are implemented by Depot. Other APIs are no more than wrappers of Depot. Depot is the fastest in all APIs of QDBM.</p>
<p>In order to use Depot, you should include `depot.h' and `stdlib.h' in the source files. Usually, the following description will be near the beginning of a source file.</p>
<dl>
<dt><kbd>#include <depot.h></kbd></dt>
<dt><kbd>#include <stdlib.h></kbd></dt>
</dl>
<p>A pointer to `DEPOT' is used as a database handle. It is like that some file I/O routines of `stdio.h' use a pointer to `FILE'. A database handle is opened with the function `dpopen' and closed with `dpclose'. You should not refer directly to any member of the handle. If a fatal error occurs in a database, any access method via the handle except `dpclose' will not work and return error status. Although a process is allowed to use multiple database handles at the same time, handles of the same database file should not be used.</p>
<h3>API</h3>
<p>The external variable `dpversion' is the string containing the version information.</p>
<dl>
<dt><kbd>extern const char *dpversion;</kbd></dt>
</dl>
<p>The external variable `dpecode' is assigned with the last happened error code. Refer to `depot.h' for details of the error codes.</p>
<dl>
<dt><kbd>extern int dpecode;</kbd></dt>
<dd>The initial value of this variable is `DP_ENOERR'. The other values are `DP_EFATAL', `DP_EMODE', `DP_EBROKEN', `DP_EKEEP', `DP_ENOITEM', `DP_EALLOC', `DP_EMAP', `DP_EOPEN', `DP_ECLOSE', `DP_ETRUNC', `DP_ESYNC', `DP_ESTAT', `DP_ESEEK', `DP_EREAD', `DP_EWRITE', `DP_ELOCK', `DP_EUNLINK', `DP_EMKDIR', `DP_ERMDIR', and `DP_EMISC'.</dd>
</dl>
<p>The function `dperrmsg' is used in order to get a message string corresponding to an error code.</p>
<dl>
<dt><kbd>const char *dperrmsg(int <var>ecode</var>);</kbd></dt>
<dd>`ecode' specifies an error code. The return value is the message string of the error code. The region of the return value is not writable.</dd>
</dl>
<p>The function `dpopen' is used in order to get a database handle.</p>
<dl>
<dt><kbd>DEPOT *dpopen(const char *<var>name</var>, int <var>omode</var>, int <var>bnum</var>);</kbd></dt>
<dd>`name' specifies the name of a database file. `omode' specifies the connection mode: `DP_OWRITER' as a writer, `DP_OREADER' as a reader. If the mode is `DP_OWRITER', the following may be added by bitwise or: `DP_OCREAT', which means it creates a new database if not exist, `DP_OTRUNC', which means it creates a new database regardless if one exists. Both of `DP_OREADER' and `DP_OWRITER' can be added to by bitwise or: `DP_ONOLCK', which means it opens a database file without file locking, or `DP_OLCKNB', which means locking is performed without blocking. `DP_OCREAT' can be added to by bitwise or: `DP_OSPARSE', which means it creates a database file as a sparse file. `bnum' specifies the number of elements of the bucket array. If it is not more than 0, the default value is specified. The size of a bucket array is determined on creating, and can not be changed except for by optimization of the database. Suggested size of a bucket array is about from 0.5 to 4 times of the number of all records to store. The return value is the database handle or `NULL' if it is not successful. While connecting as a writer, an exclusive lock is invoked to the database file. While connecting as a reader, a shared lock is invoked to the database file. The thread blocks until the lock is achieved. If `DP_ONOLCK' is used, the application is responsible for exclusion control.</dd>
</dl>
<p>The function `dpclose' is used in order to close a database handle.</p>
<dl>
<dt><kbd>int dpclose(DEPOT *<var>depot</var>);</kbd></dt>
<dd>`depot' specifies a database handle. If successful, the return value is true, else, it is false. Because the region of a closed handle is released, it becomes impossible to use the handle. Updating a database is assured to be written when the handle is closed. If a writer opens a database but does not close it appropriately, the database will be broken.</dd>
</dl>
<p>The function `dpput' is used in order to store a record.</p>
<dl>
<dt><kbd>int dpput(DEPOT *<var>depot</var>, const char *<var>kbuf</var>, int <var>ksiz</var>, const char *<var>vbuf</var>, int <var>vsiz</var>, int <var>dmode</var>);</kbd></dt>
<dd>`depot' specifies a database handle connected as a writer. `kbuf' specifies the pointer to the region of a key. `ksiz' specifies the size of the region of the key. If it is negative, the size is assigned with `strlen(kbuf)'. `vbuf' specifies the pointer to the region of a value. `vsiz' specifies the size of the region of the value. If it is negative, the size is assigned with `strlen(vbuf)'. `dmode' specifies behavior when the key overlaps, by the following values: `DP_DOVER', which means the specified value overwrites the existing one, `DP_DKEEP', which means the existing value is kept, `DP_DCAT', which means the specified value is concatenated at the end of the existing value. If successful, the return value is true, else, it is false.</dd>
</dl>
<p>The function `dpout' is used in order to delete a record.</p>
<dl>
<dt><kbd>int dpout(DEPOT *<var>depot</var>, const char *<var>kbuf</var>, int <var>ksiz</var>);</kbd></dt>
<dd>`depot' specifies a database handle connected as a writer. `kbuf' specifies the pointer to the region of a key. `ksiz' specifies the size of the region of the key. If it is negative, the size is assigned with `strlen(kbuf)'. If successful, the return value is true, else, it is false. False is returned when no record corresponds to the specified key.</dd>
</dl>
<p>The function `dpget' is used in order to retrieve a record.</p>
<dl>
<dt><kbd>char *dpget(DEPOT *<var>depot</var>, const char *<var>kbuf</var>, int <var>ksiz</var>, int <var>start</var>, int <var>max</var>, int *<var>sp</var>);</kbd></dt>
<dd>`depot' specifies a database handle. `kbuf' specifies the pointer to the region of a key. `ksiz' specifies the size of the region of the key. If it is negative, the size is assigned with `strlen(kbuf)'. `start' specifies the offset address of the beginning of the region of the value to be read. `max' specifies the max size to be read. If it is negative, the size to read is unlimited. `sp' specifies the pointer to a variable to which the size of the region of the return value is assigned. If it is `NULL', it is not used. If successful, the return value is the pointer to the region of the value of the corresponding record, else, it is `NULL'. `NULL' is returned when no record corresponds to the specified key or the size of the value of the corresponding record is less than `start'. Because an additional zero code is appended at the end of the region of the return value, the return value can be treated as a character string. Because the region of the return value is allocated with the `malloc' call, it should be released with the `free' call if it is no longer in use.</dd>
</dl>
<p>The function `dpgetwb' is used in order to retrieve a record and write the value into a buffer.</p>
<dl>
<dt><kbd>int dpgetwb(DEPOT *<var>depot</var>, const char *<var>kbuf</var>, int <var>ksiz</var>, int <var>start</var>, int <var>max</var>, char *<var>vbuf</var>);</kbd></dt>
<dd>`depot' specifies a database handle. `kbuf' specifies the pointer to the region of a key. `ksiz' specifies the size of the region of the key. If it is negative, the size is assigned with `strlen(kbuf)'. `start' specifies the offset address of the beginning of the region of the value to be read. `max' specifies the max size to be read. It shuld be equal to or less than the size of the writing buffer. `vbuf' specifies the pointer to a buffer into which the value of the corresponding record is written. If successful, the return value is the size of the written data, else, it is -1. -1 is returned when no record corresponds to the specified key or the size of the value of the corresponding record is less than `start'. Note that no additional zero code is appended at the end of the region of the writing buffer.</dd>
</dl>
<p>The function `dpvsiz' is used in order to get the size of the value of a record.</p>
<dl>
<dt><kbd>int dpvsiz(DEPOT *<var>depot</var>, const char *<var>kbuf</var>, int <var>ksiz</var>);</kbd></dt>
<dd>`depot' specifies a database handle. `kbuf' specifies the pointer to the region of a key. `ksiz' specifies the size of the region of the key. If it is negative, the size is assigned with `strlen(kbuf)'. If successful, the return value is the size of the value of the corresponding record, else, it is -1. Because this function does not read the entity of a record, it is faster than `dpget'.</dd>
</dl>
<p>The function `dpiterinit' is used in order to initialize the iterator of a database handle.</p>
<dl>
<dt><kbd>int dpiterinit(DEPOT *<var>depot</var>);</kbd></dt>
<dd>`depot' specifies a database handle. If successful, the return value is true, else, it is false. The iterator is used in order to access the key of every record stored in a database.</dd>
</dl>
<p>The function `dpiternext' is used in order to get the next key of the iterator.</p>
<dl>
<dt><kbd>char *dpiternext(DEPOT *<var>depot</var>, int *<var>sp</var>);</kbd></dt>
<dd>`depot' specifies a database handle. `sp' specifies the pointer to a variable to which the size of the region of the return value is assigned. If it is `NULL', it is not used. If successful, the return value is the pointer to the region of the next key, else, it is `NULL'. `NULL' is returned when no record is to be get out of the iterator. Because an additional zero code is appended at the end of the region of the return value, the return value can be treated as a character string. Because the region of the return value is allocated with the `malloc' call, it should be released with the `free' call if it is no longer in use. It is possible to access every record by iteration of calling this function. However, it is not assured if updating the database is occurred while the iteration. Besides, the order of this traversal access method is arbitrary, so it is not assured that the order of storing matches the one of the traversal access.</dd>
</dl>
<p>The function `dpsetalign' is used in order to set alignment of a database handle.</p>
<dl>
<dt><kbd>int dpsetalign(DEPOT *<var>depot</var>, int <var>align</var>);</kbd></dt>
<dd>`depot' specifies a database handle connected as a writer. `align' specifies the size of alignment. If successful, the return value is true, else, it is false. If alignment is set to a database, the efficiency of overwriting values is improved. The size of alignment is suggested to be average size of the values of the records to be stored. If alignment is positive, padding whose size is multiple number of the alignment is placed. If alignment is negative, as `vsiz' is the size of a value, the size of padding is calculated with `(vsiz / pow(2, abs(align) - 1))'. Because alignment setting is not saved in a database, you should specify alignment every opening a database.</dd>
</dl>
<p>The function `dpsetfbpsiz' is used in order to set the size of the free block pool of a database handle.</p>
<dl>
<dt><kbd>int dpsetfbpsiz(DEPOT *<var>depot</var>, int <var>size</var>);</kbd></dt>
<dd>`depot' specifies a database handle connected as a writer. `size' specifies the size of the free block pool of a database. If successful, the return value is true, else, it is false. The default size of the free block pool is 16. If the size is greater, the space efficiency of overwriting values is improved with the time efficiency sacrificed.</dd>
</dl>
<p>The function `dpsync' is used in order to synchronize updating contents with the file and the device.</p>
<dl>
<dt><kbd>int dpsync(DEPOT *<var>depot</var>);</kbd></dt>
<dd>`depot' specifies a database handle connected as a writer. If successful, the return value is true, else, it is false. This function is useful when another process uses the connected database file.</dd>
</dl>
<p>The function `dpoptimize' is used in order to optimize a database.</p>
<dl>
<dt><kbd>int dpoptimize(DEPOT *<var>depot</var>, int <var>bnum</var>);</kbd></dt>
<dd>`depot' specifies a database handle connected as a writer. `bnum' specifies the number of the elements of the bucket array. If it is not more than 0, the default value is specified. If successful, the return value is true, else, it is false. In an alternating succession of deleting and storing with overwrite or concatenate, dispensable regions accumulate. This function is useful to do away with them.</dd>
</dl>
<p>The function `dpname' is used in order to get the name of a database.</p>
<dl>
<dt><kbd>char *dpname(DEPOT *<var>depot</var>);</kbd></dt>
<dd>`depot' specifies a database handle. If successful, the return value is the pointer to the region of the name of the database, else, it is `NULL'. Because the region of the return value is allocated with the `malloc' call, it should be released with the `free' call if it is no longer in use.</dd>
</dl>
<p>The function `dpfsiz' is used in order to get the size of a database file.</p>
<dl>
<dt><kbd>int dpfsiz(DEPOT *<var>depot</var>);</kbd></dt>
<dd>`depot' specifies a database handle. If successful, the return value is the size of the database file, else, it is -1.</dd>
</dl>
<p>The function `dpbnum' is used in order to get the number of the elements of the bucket array.</p>
<dl>
<dt><kbd>int dpbnum(DEPOT *<var>depot</var>);</kbd></dt>
<dd>`depot' specifies a database handle. If successful, the return value is the number of the elements of the bucket array, else, it is -1.</dd>
</dl>
<p>The function `dpbusenum' is used in order to get the number of the used elements of the bucket array.</p>
<dl>
<dt><kbd>int dpbusenum(DEPOT *<var>depot</var>);</kbd></dt>
<dd>`depot' specifies a database handle. If successful, the return value is the number of the used elements of the bucket array, else, it is -1. This function is inefficient because it accesses all elements of the bucket array.</dd>
</dl>
<p>The function `dprnum' is used in order to get the number of the records stored in a database.</p>
<dl>
<dt><kbd>int dprnum(DEPOT *<var>depot</var>);</kbd></dt>
<dd>`depot' specifies a database handle. If successful, the return value is the number of the records stored in the database, else, it is -1.</dd>
</dl>
<p>The function `dpwritable' is used in order to check whether a database handle is a writer or not.</p>
<dl>
<dt><kbd>int dpwritable(DEPOT *<var>depot</var>);</kbd></dt>
<dd>`depot' specifies a database handle. The return value is true if the handle is a writer, false if not.</dd>
</dl>
<p>The function `dpfatalerror' is used in order to check whether a database has a fatal error or not.</p>
<dl>
<dt><kbd>int dpfatalerror(DEPOT *<var>depot</var>);</kbd></dt>
<dd>`depot' specifies a database handle. The return value is true if the database has a fatal error, false if not.</dd>
</dl>
<p>The function `dpinode' is used in order to get the inode number of a database file.</p>
<dl>
<dt><kbd>int dpinode(DEPOT *<var>depot</var>);</kbd></dt>
<dd>`depot' specifies a database handle. The return value is the inode number of the database file.</dd>
</dl>
<p>The function `dpmtime' is used in order to get the last modified time of a database.</p>
<dl>
<dt><kbd>time_t dpmtime(DEPOT *<var>depot</var>);</kbd></dt>
<dd>`depot' specifies a database handle. The return value is the last modified time of the database.</dd>
</dl>
<p>The function `dpfdesc' is used in order to get the file descriptor of a database file.</p>
<dl>
<dt><kbd>int dpfdesc(DEPOT *<var>depot</var>);</kbd></dt>
<dd>`depot' specifies a database handle. The return value is the file descriptor of the database file. Handling the file descriptor of a database file directly is not suggested.</dd>
</dl>
<p>The function `dpremove' is used in order to remove a database file.</p>
<dl>
<dt><kbd>int dpremove(const char *<var>name</var>);</kbd></dt>
<dd>`name' specifies the name of a database file. If successful, the return value is true, else, it is false.</dd>
</dl>
<p>The function `dprepair' is used in order to repair a broken database file.</p>
<dl>
<dt><kbd>int dprepair(const char *<var>name</var>);</kbd></dt>
<dd>`name' specifies the name of a database file. If successful, the return value is true, else, it is false. There is no guarantee that all records in a repaired database file correspond to the original or expected state.</dd>
</dl>
<p>The function `dpexportdb' is used in order to dump all records as endian independent data.</p>
<dl>
<dt><kbd>int dpexportdb(DEPOT *<var>depot</var>, const char *<var>name</var>);</kbd></dt>
<dd>`depot' specifies a database handle. `name' specifies the name of an output file. If successful, the return value is true, else, it is false.</dd>
</dl>
<p>The function `dpimportdb' is used in order to load all records from endian independent data.</p>
<dl>
<dt><kbd>int dpimportdb(DEPOT *<var>depot</var>, const char *<var>name</var>);</kbd></dt>
<dd>`depot' specifies a database handle connected as a writer. The database of the handle must be empty. `name' specifies the name of an input file. If successful, the return value is true, else, it is false.</dd>
</dl>
<p>The function `dpsnaffle' is used in order to retrieve a record directly from a database file.</p>
<dl>
<dt><kbd>char *dpsnaffle(const char *<var>name</var>, const char *<var>kbuf</var>, int <var>ksiz</var>, int *<var>sp</var>);</kbd></dt>
<dd>`name' specifies the name of a database file. `kbuf' specifies the pointer to the region of a key. `ksiz' specifies the size of the region of the key. If it is negative, the size is assigned with `strlen(kbuf)'. `sp' specifies the pointer to a variable to which the size of the region of the return value is assigned. If it is `NULL', it is not used. If successful, the return value is the pointer to the region of the value of the corresponding record, else, it is `NULL'. `NULL' is returned when no record corresponds to the specified key. Because an additional zero code is appended at the end of the region of the return value, the return value can be treated as a character string. Because the region of the return value is allocated with the `malloc' call, it should be released with the `free' call if it is no longer in use. Although this function can be used even while the database file is locked by another process, it is not assured that recent updated is reflected.</dd>
</dl>
<p>The function `dpinnerhash' is a hash function used inside Depot.</p>
<dl>
<dt><kbd>int dpinnerhash(const char *<var>kbuf</var>, int <var>ksiz</var>);</kbd></dt>
<dd>`kbuf' specifies the pointer to the region of a key. `ksiz' specifies the size of the region of the key. If it is negative, the size is assigned with `strlen(kbuf)'. The return value is the hash value of 31 bits length computed from the key. This function is useful when an application calculates the state of the inside bucket array.</dd>
</dl>
<p>The function `dpouterhash' is a hash function which is independent from the hash functions used inside Depot.</p>
<dl>
<dt><kbd>int dpouterhash(const char *<var>kbuf</var>, int <var>ksiz</var>);</kbd></dt>
<dd>`kbuf' specifies the pointer to the region of a key. `ksiz' specifies the size of the region of the key. If it is negative, the size is assigned with `strlen(kbuf)'. The return value is the hash value of 31 bits length computed from the key. This function is useful when an application uses its own hash algorithm outside Depot.</dd>
</dl>
<p>The function `dpprimenum' is used in order to get a natural prime number not less than a number.</p>
<dl>
<dt><kbd>int dpprimenum(int <var>num</var>);</kbd></dt>
<dd>`num' specified a natural number. The return value is a natural prime number not less than the specified number. This function is useful when an application determines the size of a bucket array of its own hash algorithm.</dd>
</dl>
<h3>Examples</h3>
<p>The following example stores and retrieves a phone number, using the name as the key.</p>
<pre>#include <depot.h>
#include <stdlib.h>
#include <stdio.h>
#define NAME "mikio"
#define NUMBER "000-1234-5678"
#define DBNAME "book"
int main(int argc, char **argv){
DEPOT *depot;
char *val;
/* open the database */
if(!(depot = dpopen(DBNAME, DP_OWRITER | DP_OCREAT, -1))){
fprintf(stderr, "dpopen: %s\n", dperrmsg(dpecode));
return 1;
}
/* store the record */
if(!dpput(depot, NAME, -1, NUMBER, -1, DP_DOVER)){
fprintf(stderr, "dpput: %s\n", dperrmsg(dpecode));
}
/* retrieve the record */
if(!(val = dpget(depot, NAME, -1, 0, -1, NULL))){
fprintf(stderr, "dpget: %s\n", dperrmsg(dpecode));
} else {
printf("Name: %s\n", NAME);
printf("Number: %s\n", val);
free(val);
}
/* close the database */
if(!dpclose(depot)){
fprintf(stderr, "dpclose: %s\n", dperrmsg(dpecode));
return 1;
}
return 0;
}
</pre>
<p>The following example shows all records of the database.</p>
<pre>#include <depot.h>
#include <stdlib.h>
#include <stdio.h>
#define DBNAME "book"
int main(int argc, char **argv){
DEPOT *depot;
char *key, *val;
/* open the database */
if(!(depot = dpopen(DBNAME, DP_OREADER, -1))){
fprintf(stderr, "dpopen: %s\n", dperrmsg(dpecode));
return 1;
}
/* initialize the iterator */
if(!dpiterinit(depot)){
fprintf(stderr, "dpiterinit: %s\n", dperrmsg(dpecode));
}
/* scan with the iterator */
while((key = dpiternext(depot, NULL)) != NULL){
if(!(val = dpget(depot, key, -1, 0, -1, NULL))){
fprintf(stderr, "dpget: %s\n", dperrmsg(dpecode));
free(key);
break;
}
printf("%s: %s\n", key, val);
free(val);
free(key);
}
/* close the database */
if(!dpclose(depot)){
fprintf(stderr, "dpclose: %s\n", dperrmsg(dpecode));
return 1;
}
return 0;
}
</pre>
<h3>Notes</h3>
<p>For building a program using Depot, the program should be linked with a library file `libqdbm.a' or `libqdbm.so'. For example, the following command is executed to build `sample' from `sample.c'.</p>
<pre>gcc -I/usr/local/include -o sample sample.c -L/usr/local/lib -lqdbm
</pre>
<p>If QDBM was built with POSIX thread enabled, the global variable `dpecode' is treated as thread specific data, and functions of Depot are reentrant. In that case, they are thread-safe as long as a handle is not accessed by threads at the same time, on the assumption that `errno', `malloc', and so on are thread-safe.</p>
<hr />
<h2><a name="depotcli" id="depotcli" class="head">Commands for Depot</a></h2>
<p>Depot has the following command line interfaces.</p>
<p>The command `dpmgr' is a utility for debugging Depot and its applications. It features editing and checking of a database. It can be used for database applications with shell scripts. This command is used in the following format. `name' specifies a database name. `key' specifies the key of a record. `val' specifies the value of a record.</p>
<dl>
<dt><kbd>dpmgr create [-s] [-bnum <var>num</var>] <var>name</var></kbd></dt>
<dd>Create a database file.</dd>
<dt><kbd>dpmgr put [-kx|-ki] [-vx|-vi|-vf] [-keep|-cat] [-na] <var>name</var> <var>key</var> <var>val</var></kbd></dt>
<dd>Store a record with a key and a value.</dd>
<dt><kbd>dpmgr out [-kx|-ki] <var>name</var> <var>key</var></kbd></dt>
<dd>Delete a record with a key.</dd>
<dt><kbd>dpmgr get [-nl] [-kx|-ki] [-start <var>num</var>] [-max <var>num</var>] [-ox] [-n] <var>name</var> <var>key</var></kbd></dt>
<dd>Retrieve a record with a key and output it to the standard output.</dd>
<dt><kbd>dpmgr list [-nl] [-k|-v] [-ox] <var>name</var></kbd></dt>
<dd>List all keys and values delimited with tab and line-feed to the standard output.</dd>
<dt><kbd>dpmgr optimize [-bnum <var>num</var>] [-na] <var>name</var></kbd></dt>
<dd>Optimize a database.</dd>
<dt><kbd>dpmgr inform [-nl] <var>name</var></kbd></dt>
<dd>Output miscellaneous information to the standard output.</dd>
<dt><kbd>dpmgr remove <var>name</var></kbd></dt>
<dd>Remove a database file.</dd>
<dt><kbd>dpmgr repair <var>name</var></kbd></dt>
<dd>Repair a broken database file.</dd>
<dt><kbd>dpmgr exportdb <var>name</var> <var>file</var></kbd></dt>
<dd>Dump all records as endian independent data.</dd>
<dt><kbd>dpmgr importdb [-bnum <var>num</var>] <var>name</var> <var>file</var></kbd></dt>
<dd>Load all records from endian independent data.</dd>
<dt><kbd>dpmgr snaffle [-kx|-ki] [-ox] [-n] <var>name</var> <var>key</var></kbd></dt>
<dd>Retrieve a record from a locked database with a key and output it to the standard output.</dd>
<dt><kbd>dpmgr version</kbd></dt>
<dd>Output version information of QDBM to the standard output.</dd>
</dl>
<p>Options feature the following.</p>
<ul class="lines">
<li><kbd>-s</kbd> : make the file sparse.</li>
<li><kbd>-bnum <var>num</var></kbd> : specify the number of the elements of the bucket array.</li>
<li><kbd>-kx</kbd> : treat `key' as a binary expression of hexadecimal notation.</li>
<li><kbd>-ki</kbd> : treat `key' as an integer expression of decimal notation.</li>
<li><kbd>-vx</kbd> : treat `val' as a binary expression of hexadecimal notation.</li>
<li><kbd>-vi</kbd> : treat `val' as an integer expression of decimal notation.</li>
<li><kbd>-vf</kbd> : read the value from a file specified with `val'.</li>
<li><kbd>-keep</kbd> : specify the storing mode for `DP_DKEEP'.</li>
<li><kbd>-cat</kbd> : specify the storing mode for `DP_DCAT'.</li>
<li><kbd>-na</kbd> : do not set alignment.</li>
<li><kbd>-nl</kbd> : open the database without file locking.</li>
<li><kbd>-start</kbd> : specify the beginning offset of a value to fetch.</li>
<li><kbd>-max</kbd> : specify the max size of a value to fetch.</li>
<li><kbd>-ox</kbd> : treat the output as a binary expression of hexadecimal notation.</li>
<li><kbd>-n</kbd> : do not output the tailing newline.</li>
<li><kbd>-k</kbd> : output keys only.</li>
<li><kbd>-v</kbd> : output values only.</li>
</ul>
<p>This command returns 0 on success, another on failure. The environment variable `QDBMDBGFD' specifies the file descriptor to output the history of updating the variable `dpecode'.</p>
<p>The command `dptest' is a utility for facility test and performance test. Check a database generated by the command or measure the execution time of the command. This command is used in the following format. `name' specifies a database name. `rnum' specifies the number of the records. `bnum' specifies the number of the elements of the bucket array. `pnum' specifies the number of patterns of the keys. `align' specifies the basic size of alignment. `fbpsiz' specifies the size of the free block pool.</p>
<dl>
<dt><kbd>dptest write [-s] <var>name</var> <var>rnum</var> <var>bnum</var></kbd></dt>
<dd>Store records with keys of 8 bytes. They change as `00000001', `00000002'...</dd>
<dt><kbd>dptest read [-wb] <var>name</var></kbd></dt>
<dd>Retrieve all records of the database above.</dd>
<dt><kbd>dptest rcat [-c] <var>name</var> <var>rnum</var> <var>bnum</var> <var>pnum</var> <var>align</var> <var>fbpsiz</var></kbd></dt>
<dd>Store records with partway duplicated keys using concatenate mode.</dd>
<dt><kbd>dptest combo <var>name</var></kbd></dt>
<dd>Perform combination test of various operations.</dd>
<dt><kbd>dptest wicked [-c] <var>name</var> <var>rnum</var></kbd></dt>
<dd>Perform updating operations selected at random.</dd>
</dl>
<p>Options feature the following.</p>
<ul class="lines">
<li><kbd>-s</kbd> : make the file sparse.</li>
<li><kbd>-wb</kbd> : use the function `dpgetwb' instead of the function `dpget'.</li>
<li><kbd>-c</kbd> : perform comparison test with map of Cabin.</li>
</ul>
<p>This command returns 0 on success, another on failure. The environment variable `QDBMDBGFD' specifies the file descriptor to output the history of updating the variable `dpecode'.</p>
<p>The command `dptsv' features mutual conversion between a database of Depot and a TSV text. This command is useful when data exchange with another version of QDBM or another DBM, or when data exchange between systems which have different byte orders. This command is used in the following format. `name' specifies a database name. The subcommand `export' reads TSV data from the standard input. If a key overlaps, the latter is adopted. `-bnum' specifies the number of the elements of the bucket array. The subcommand `import' writes TSV data to the standard output.</p>
<dl>
<dt><kbd>dptsv import [-bnum <var>num</var>] [-bin] <var>name</var></kbd></dt>
<dd>Create a database from TSV.</dd>
<dt><kbd>dptsv export [-bin] <var>name</var></kbd></dt>
<dd>Write TSV data of a database.</dd>
</dl>
<p>Options feature the following.</p>
<ul class="lines">
<li><kbd>-bnum <var>num</var></kbd> : specify the number of the elements of the bucket array.</li>
<li><kbd>-bin</kbd> : treat records as Base64 format.</li>
</ul>
<p>This command returns 0 on success, another on failure.</p>
<p>Commands of Depot realize a simple database system. For example, to make a database to search `/etc/password' by a user name, perform the following command.</p>
<pre>cat /etc/passwd | tr ':' '\t' | dptsv import casket
</pre>
<p>Thus, to retrieve the information of a user `mikio', perform the following command.</p>
<pre>dpmgr get casket mikio
</pre>
<p>It is easy to implement functions upsides with these commands, using the API of Depot.</p>
<hr />
<h2><a name="curiaapi" id="curiaapi" class="head">Curia: Extended API</a></h2>
<h3>Overview</h3>
<p>Curia is the extended API of QDBM. It provides routines for managing multiple database files in a directory. Restrictions of some file systems that the size of each file is limited are escaped by dividing a database file into two or more. If the database files deploy on multiple devices, the scalability is improved.</p>
<p>Although Depot creates a database with a file name, Curia creates a database with a directory name. A database file named as `depot' is placed in the specified directory. Although it keeps the attribute of the database, it does not keep the entities of the records. Besides, sub directories are created by the number of division of the database, named with 4 digits. The database files are placed in the subdirectories. The entities of the records are stored in the database file. For example, in the case that a database directory named as `casket' and the number of division is 3, `casket/depot', `casket/0001/depot', `casket/0002/depot' and `casket/0003/depot' are created. No error occurs even if the namesake directory exists when creating a database. So, if sub directories exists and some devices are mounted on the sub directories, the database files deploy on the multiple devices.</p>
<p>Curia features managing large objects. Although usual records are stored in some database files, records of large objects are stored in individual files. Because the files of large objects are deployed in different directories named with the hash values, the access speed is part-way robust although it is slower than the speed of usual records. Large and not often accessed data should be secluded as large objects. By doing this, the access speed of usual records is improved. The directory hierarchies of large objects are placed in the directory named as `lob' in the sub directories of the database. Because the key spaces of the usual records and the large objects are different, the operations keep out of each other.</p>
<p>In order to use Curia, you should include `depot.h', `curia.h' and `stdlib.h' in the source files. Usually, the following description will be near the beginning of a source file.</p>
<dl>
<dt><kbd>#include <depot.h></kbd></dt>
<dt><kbd>#include <curia.h></kbd></dt>
<dt><kbd>#include <stdlib.h></kbd></dt>
</dl>
<p>A pointer to `CURIA' is used as a database handle. It is like that some file I/O routines of `stdio.h' use a pointer to `FILE'. A database handle is opened with the function `cropen' and closed with `crclose'. You should not refer directly to any member of the handle. If a fatal error occurs in a database, any access method via the handle except `crclose' will not work and return error status. Although a process is allowed to use multiple database handles at the same time, handles of the same database directory should not be used.</p>
<p>Curia also assign the external variable `dpecode' with the error code. The function `dperrmsg' is used in order to get the message of the error code.</p>
<h3>API</h3>
<p>The function `cropen' is used in order to get a database handle.</p>
<dl>
<dt><kbd>CURIA *cropen(const char *<var>name</var>, int <var>omode</var>, int <var>bnum</var>, int <var>dnum</var>);</kbd></dt>
<dd>`name' specifies the name of a database directory. `omode' specifies the connection mode: `CR_OWRITER' as a writer, `CR_OREADER' as a reader. If the mode is `CR_OWRITER', the following may be added by bitwise or: `CR_OCREAT', which means it creates a new database if not exist, `CR_OTRUNC', which means it creates a new database regardless if one exists. Both of `CR_OREADER' and `CR_OWRITER' can be added to by bitwise or: `CR_ONOLCK', which means it opens a database directory without file locking, or `CR_OLCKNB', which means locking is performed without blocking. `CR_OCREAT' can be added to by bitwise or: `CR_OSPARSE', which means it creates database files as sparse files. `bnum' specifies the number of elements of each bucket array. If it is not more than 0, the default value is specified. The size of each bucket array is determined on creating, and can not be changed except for by optimization of the database. Suggested size of each bucket array is about from 0.5 to 4 times of the number of all records to store. `dnum' specifies the number of division of the database. If it is not more than 0, the default value is specified. The number of division can not be changed from the initial value. The max number of division is 512. The return value is the database handle or `NULL' if it is not successful. While connecting as a writer, an exclusive lock is invoked to the database directory. While connecting as a reader, a shared lock is invoked to the database directory. The thread blocks until the lock is achieved. If `CR_ONOLCK' is used, the application is responsible for exclusion control.</dd>
</dl>
<p>The function `crclose' is used in order to close a database handle.</p>
<dl>
<dt><kbd>int crclose(CURIA *<var>curia</var>);</kbd></dt>
<dd>`curia' specifies a database handle. If successful, the return value is true, else, it is false. Because the region of a closed handle is released, it becomes impossible to use the handle. Updating a database is assured to be written when the handle is closed. If a writer opens a database but does not close it appropriately, the database will be broken.</dd>
</dl>
<p>The function `crput' is used in order to store a record.</p>
<dl>
<dt><kbd>int crput(CURIA *<var>curia</var>, const char *<var>kbuf</var>, int <var>ksiz</var>, const char *<var>vbuf</var>, int <var>vsiz</var>, int <var>dmode</var>);</kbd></dt>
<dd>`curia' specifies a database handle connected as a writer. `kbuf' specifies the pointer to the region of a key. `ksiz' specifies the size of the region of the key. If it is negative, the size is assigned with `strlen(kbuf)'. `vbuf' specifies the pointer to the region of a value. `vsiz' specifies the size of the region of the value. If it is negative, the size is assigned with `strlen(vbuf)'. `dmode' specifies behavior when the key overlaps, by the following values: `CR_DOVER', which means the specified value overwrites the existing one, `CR_DKEEP', which means the existing value is kept, `CR_DCAT', which means the specified value is concatenated at the end of the existing value. If successful, the return value is true, else, it is false.</dd>
</dl>
<p>The function `crout' is used in order to delete a record.</p>
<dl>
<dt><kbd>int crout(CURIA *<var>curia</var>, const char *<var>kbuf</var>, int <var>ksiz</var>);</kbd></dt>
<dd>`curia' specifies a database handle connected as a writer. `kbuf' specifies the pointer to the region of a key. `ksiz' specifies the size of the region of the key. If it is negative, the size is assigned with `strlen(kbuf)'. If successful, the return value is true, else, it is false. False is returned when no record corresponds to the specified key.</dd>
</dl>
<p>The function `crget' is used in order to retrieve a record.</p>
<dl>
<dt><kbd>char *crget(CURIA *<var>curia</var>, const char *<var>kbuf</var>, int <var>ksiz</var>, int <var>start</var>, int <var>max</var>, int *<var>sp</var>);</kbd></dt>
<dd>`curia' specifies a database handle. `kbuf' specifies the pointer to the region of a key. `ksiz' specifies the size of the region of the key. If it is negative, the size is assigned with `strlen(kbuf)'. `start' specifies the offset address of the beginning of the region of the value to be read. `max' specifies the max size to be read. If it is negative, the size to read is unlimited. `sp' specifies the pointer to a variable to which the size of the region of the return value is assigned. If it is `NULL', it is not used. If successful, the return value is the pointer to the region of the value of the corresponding record, else, it is `NULL'. `NULL' is returned when no record corresponds to the specified key or the size of the value of the corresponding record is less than `start'. Because an additional zero code is appended at the end of the region of the return value, the return value can be treated as a character string. Because the region of the return value is allocated with the `malloc' call, it should be released with the `free' call if it is no longer in use.</dd>
</dl>
<p>The function `crgetwb' is used in order to retrieve a record and write the value into a buffer.</p>
<dl>
<dt><kbd>int crgetwb(CURIA *<var>curia</var>, const char *<var>kbuf</var>, int <var>ksiz</var>, int <var>start</var>, int <var>max</var>, char *<var>vbuf</var>);</kbd></dt>
<dd>`curia' specifies a database handle. `kbuf' specifies the pointer to the region of a key. `ksiz' specifies the size of the region of the key. If it is negative, the size is assigned with `strlen(kbuf)'. `start' specifies the offset address of the beginning of the region of the value to be read. `max' specifies the max size to be read. It shuld be equal to or less than the size of the writing buffer. `vbuf' specifies the pointer to a buffer into which the value of the corresponding record is written. If successful, the return value is the size of the written data, else, it is -1. -1 is returned when no record corresponds to the specified key or the size of the value of the corresponding record is less than `start'. Note that no additional zero code is appended at the end of the region of the writing buffer.</dd>
</dl>
<p>The function `crvsiz' is used in order to get the size of the value of a record.</p>
<dl>
<dt><kbd>int crvsiz(CURIA *<var>curia</var>, const char *<var>kbuf</var>, int <var>ksiz</var>);</kbd></dt>
<dd>`curia' specifies a database handle. `kbuf' specifies the pointer to the region of a key. `ksiz' specifies the size of the region of the key. If it is negative, the size is assigned with `strlen(kbuf)'. If successful, the return value is the size of the value of the corresponding record, else, it is -1. Because this function does not read the entity of a record, it is faster than `crget'.</dd>
</dl>
<p>The function `criterinit' is used in order to initialize the iterator of a database handle.</p>
<dl>
<dt><kbd>int criterinit(CURIA *<var>curia</var>);</kbd></dt>
<dd>`curia' specifies a database handle. If successful, the return value is true, else, it is false. The iterator is used in order to access the key of every record stored in a database.</dd>
</dl>
<p>The function `criternext' is used in order to get the next key of the iterator.</p>
<dl>
<dt><kbd>char *criternext(CURIA *<var>curia</var>, int *<var>sp</var>);</kbd></dt>
<dd>`curia' specifies a database handle. `sp' specifies the pointer to a variable to which the size of the region of the return value is assigned. If it is `NULL', it is not used. If successful, the return value is the pointer to the region of the next key, else, it is `NULL'. `NULL' is returned when no record is to be get out of the iterator. Because an additional zero code is appended at the end of the region of the return value, the return value can be treated as a character string. Because the region of the return value is allocated with the `malloc' call, it should be released with the `free' call if it is no longer in use. It is possible to access every record by iteration of calling this function. However, it is not assured if updating the database is occurred while the iteration. Besides, the order of this traversal access method is arbitrary, so it is not assured that the order of storing matches the one of the traversal access.</dd>
</dl>
<p>The function `crsetalign' is used in order to set alignment of a database handle.</p>
<dl>
<dt><kbd>int crsetalign(CURIA *<var>curia</var>, int <var>align</var>);</kbd></dt>
<dd>`curia' specifies a database handle connected as a writer. `align' specifies the size of alignment. If successful, the return value is true, else, it is false. If alignment is set to a database, the efficiency of overwriting values is improved. The size of alignment is suggested to be average size of the values of the records to be stored. If alignment is positive, padding whose size is multiple number of the alignment is placed. If alignment is negative, as `vsiz' is the size of a value, the size of padding is calculated with `(vsiz / pow(2, abs(align) - 1))'. Because alignment setting is not saved in a database, you should specify alignment every opening a database.</dd>
</dl>
<p>The function `crsetfbpsiz' is used in order to set the size of the free block pool of a database handle.</p>
<dl>
<dt><kbd>int crsetfbpsiz(CURIA *<var>curia</var>, int <var>size</var>);</kbd></dt>
<dd>`curia' specifies a database handle connected as a writer. `size' specifies the size of the free block pool of a database. If successful, the return value is true, else, it is false. The default size of the free block pool is 16. If the size is greater, the space efficiency of overwriting values is improved with the time efficiency sacrificed.</dd>
</dl>
<p>The function `crsync' is used in order to synchronize updating contents with the files and the devices.</p>
<dl>
<dt><kbd>int crsync(CURIA *<var>curia</var>);</kbd></dt>
<dd>`curia' specifies a database handle connected as a writer. If successful, the return value is true, else, it is false. This function is useful when another process uses the connected database directory.</dd>
</dl>
<p>The function `croptimize' is used in order to optimize a database.</p>
<dl>
<dt><kbd>int croptimize(CURIA *<var>curia</var>, int <var>bnum</var>);</kbd></dt>
<dd>`curia' specifies a database handle connected as a writer. `bnum' specifies the number of the elements of each bucket array. If it is not more than 0, the default value is specified. In an alternating succession of deleting and storing with overwrite or concatenate, dispensable regions accumulate. This function is useful to do away with them.</dd>
</dl>
<p>The function `crname' is used in order to get the name of a database.</p>
<dl>
<dt><kbd>char *crname(CURIA *<var>curia</var>);</kbd></dt>
<dd>`curia' specifies a database handle. If successful, the return value is the pointer to the region of the name of the database, else, it is `NULL'. Because the region of the return value is allocated with the `malloc' call, it should be released with the `free' call if it is no longer in use.</dd>
</dl>