Each data stream is a sequence of octets or bytes. The octets encode a sequence of characters according to the UTF-8 character encoding as described in §10.2 of ISO/IEC 10646:2020 and in RFC 3629.
:::note Previous versions allowed multiple character encodings, defaulting to ANSEL. 7.0 only uses the UTF-8 character encoding. :::
A file containing a FamilySearch GEDCOM data stream should use the filename extension .ged
.
The first character in each data stream should be U+FEFF, the byte-order mark. If present, this initial character has no meaning within this specification but serves to indicate to other systems that the file uses the UTF-8 character encoding.
Certain characters must not appear anywhere within a data stream:
- The C0 control characters other than tab and line endings (U+0000--U+001F except U+0009, U+000A and U+000D)
- The DEL character (U+007F)
- Surrogates (U+D800--U+DFFF)
- Invalid code points (U+FFFE and U+FFFF)
Implementations should be aware that bytes per character and characters per glyph are both variable when using UTF-8. Use of Unicode-aware processing and display libraries is recommended.
Character-level grammars are specified in this document using Augmented Backus-Naur Form (ABNF) as defined in STD 68 and modified in RFC 7405. We use the term "production" to refer to an ABNF rule, supported by any other rules it references.
:::note The following is a brief summary of the parts of ABNF, as defined by STD 68 and RFC 7405, that are used in this document:
- A rule consists of a rulename, an equals sign
=
, and 1 or more alternative matches. - Alternatives are separated by slashes
/
. - The first line of a rule must not be indented; the second and subsequent lines of a rule must be indented.
- Comments are introduced with a semi-colon
;
. - Unicode codepoints are given in hexadecimal preceded by
%x
. Ranges of allowed codepoints are given with a hyphen-
. - Double quotes delimit literal strings. Literal strings are case-insensitive unless they are preceded by
%s
. - Parentheses
()
group elements. Brackets[]
mark optional content. Preceding a group or element by*
means any number may be included. Preceding a group or element by1*
means 1 or more may be included. :::
The banned characters can be expressed in ABNF as production banned
:
banned = %x00-08 / %x0B-0C / %x0E-1F ; C0 other than LF CR and Tab
/ %x7F ; DEL
/ %x80-9F ; C1
/ %xD800-DFFF ; Surrogates
/ %xFFFE-FFFF ; invalid
; All other rules assume the absence of any banned characters
All other ABNF expressions in this document assume the absence of any characters matching production banned
.
This document additionally makes use of the following named character sets in ABNF:
digit = %x30-39 ; 0 through 9
nonzero = %x31-39 ; 1 through 9
ucletter = %x41-5A ; A through Z
underscore = %x5F ; _
atsign = %x40 ; @
A structure consists of a structure type, an optional payload, and a collection of substructures. The payload is a value expressed as a string using 1 of several data types, as described in Chapter 2.
Every structure is either a record, meaning it is not contained in any other structure's collection of substructures, or it is a substructure of exactly 1 other structure. The other structure is called its superstructure. Each substructure either refines the meaning of its superstructure, provides metadata about its superstructure, or introduces new data that is closely related to its superstructure.
Each structure type is identified by a URI and defines several properties of any structure with that type, including
-
The meaning of structures of this type.
-
The payload type of the structure's payload, which shall be one of
- no payload, or
- a pointer to a record with a specific structure type, or
- a data type; if an enumeration or list of enumerations, also a set of permitted enumeration values.
-
Which structure types may appear as substructures of the structure and with what cardinality they may appear. Cardinality is specified by two flags:
- whether a substructure of this type is required or not; and
- whether multiple substructures of this type are permitted or not.
The collection of substructures is partially ordered. Substructures with the same structure type are in a fixed order, but substructures with different structure types may be reordered. The order of substructures of a single type indicates user preference, with the first substructure being the most-preferred value, unless a different meaning is explicitly indicated in the structure's definition.
A structure must have either a non-empty payload or at least 1 substructure. Empty payloads and missing payloads are considered equivalent. The remainder of this document uses "payload" as shorthand for "non-empty payload".
:::note
Unlike structures, pseudo-structures needn't have either payloads or substructures. TRLR
never has either, and CONT
doesn't when payloads contain empty lines.
:::
A structure is a representation of data about its subject. Examples include the entity, event, claim, or activity that the structure describes.
Datasets also contain 3 types of pseudo-structures:
-
The header resembles a record, comes first in each document, and contains metadata about the entire document in its substructures. See The Header for more.
-
The trailer resembles a record, comes last in each document, and cannot contain substructures.
-
A line continuation resembles a substructure, comes before any other substructures, is used to encode multi-line payloads, and cannot contain substructures.
Previous versions limited the number of characters that could appear in a structure, record, and payload. Those restrictions were removed in 7.0.
A line is a string representation of (part of) a structure.
A line consists of a level, optional cross-reference identifier, tag, optional line value, and line terminator.
It matches the production Line
:
Line = Level D [Xref D] Tag [D LineVal] EOL
Level = "0" / nonzero *digit
D = %x20 ; space
Xref = atsign 1*tagchar atsign ; but not "@VOID@"
Tag = stdTag / extTag
LineVal = pointer / lineStr
EOL = %x0D [%x0A] / %x0A ; CR-LF, CR, or LF
stdTag = ucletter *tagchar
extTag = underscore 1*tagchar
tagchar = ucletter / digit / underscore
pointer = voidPtr / Xref
voidPtr = %s"@VOID@"
nonAt = %x09 / %x20-3F / %x41-10FFFF ; non-EOL, non-@
nonEOL = %x09 / %x20-10FFFF ; non-EOL
lineStr = (nonAt / atsign atsign) *nonEOL ; leading @ doubled
The level matches production Level
and is used to encode substructure relationships.
Any line with level
:::note Previous versions allowed spaces and blank lines to precede the level of a line. That permission was removed from 7.0 to simplify parsing. :::
The cross-reference identifier matches production Xref
(but not voidPtr
) and indicates that this is a structure to which pointer-type payloads may point.
Each cross-reference identifier must be unique within a given data document.
Cross-reference identifiers are not retained between data streams and should not be made visible to the user to avoid them referring to transient data within notes or other durable data.
Each record to which other structures point must have a cross-reference identifier. A record to which no structures point may have a cross-reference identifier, but does not need to have one. A substructure or pseudo-structure must not have a cross-reference identifier.
The tag matches production Tag
and encodes the structure's type.
Tags that match the production stdTag
are defined in this document.
Tags that match extTag
are defined according to [Extensions].
The same tag may be used to represent multiple structure types. The structure type of each structure is identified by its tag and the type of its superstructure. The mapping between (superstructure type, tag) pairs and structure types is given elsewhere in this document (for standard structure types and tags) or the [schema] and extension authors' documentation (for extension structure types and tags).
:::example
The tag ADOP
is used in this document to represent two structure types.
Which one is meant can be identified by the superstructure type as follows:
Superstructure type | Structure type identified by tag ADOP |
---|---|
g7:record-INDI |
g7:ADOP |
g7:ADOP-FAMC |
g7:FAMC-ADOP |
An extension-defined substructure could also be used to place either of these structure types in extension superstructures.
The ADOP
tag is also used in the set of enumerated values permitted by the g7:DATA-EVEN
, g7:SOUR-EVEN
, and g7:NO
structure types.
:::
The line value matches production LineVal
and encodes the structure's payload.
Line value content is sufficient to distinguish between pointers and line strings.
Pointers are encoded as the cross-reference identifier of the pointed-to structure.
Each non-pointer payload may be encoded in 1 or more line strings (line continuations encode multi-line payloads in several line strings).
The exact encoding of non-pointer payloads is dependent on the data type of the payload, as determined by the structure type.
The data type of non-pointer payloads cannot be fully determined by line value content alone.
Note that production LineVal
does not match the empty string.
Because empty payloads and missing payloads are considered equivalent,
both a structure with no payload
and a structure with the empty string as its payload
are encoded with no LineVal
and no space after the Tag
.
:::example
The payload of a MARR
structure has type [Y|<NULL>]
, which is optional but if present cannot be the empty string.
The payload of a EVEN
structure has type Text
, which is not optional but can be the empty string.
The Line
encoding a no-payload MARR
is "1 MARR
"
and the Line
encoding an empty-payload EVEN
is "1 EVEN
";
both Line
s have no LineVal
and no trailing space.
:::
If a line value matches production Xref
, the same value must occur as the cross-reference identifier of a structure within the document.
The special voidPtr
production is provided to encode null pointers.
If the first character of the string stored in a line string is U+0040 (@
), the line string must escape that character by doubling that @
.
:::note
Previous versions required doubling all @
in a line value, but such doubling was not widely implemented in practice.
@
is only doubled in this version if it is the first character of a line string.
:::
:::example
A structure with tag NOTE
, level 1, and a 2-line payload where the first line is "[email protected] is my email
" and the second line is "@me and @I are my social media handles
" would be encoded as
1 NOTE me@example.com is my email
2 CONT @@me and @I are my social media handles
:::
:::note
Line values that match neither Xref
nor lineStr
are prohibited. They have been used in previous versions (for example, a line value beginning @#D
was a date in versions 4.0 through 5.5.1) and may be used again in a future version if an appropriate need arises.
:::
The components of a line are each separated by a single delimiter matching production D
. A delimiter is always a single space character (U+0020). Using multiple delimiters between components of a line is prohibited. Thus if the tag is followed by 2 spaces, the first space is a delimiter and the second space is part of the line value.
All characters in a payload must be preserved in the corresponding line value, including preserving any leading or trailing spaces.
Each line is ended by a line terminator matching production EOL
. A line terminator may be a carriage return U+000D, line feed U+000A, or a carriage return followed by a line feed. The same line terminator should be used on every line of a given document.
Line values cannot contain internal line terminators, but some payloads can.
If a payload contains a line terminator, the payload is split on the line terminators into several payloads.
The first of these split payloads is encoded as the line value of the structure's line,
and each subsequent split payload is encoded as the line value of a line continuation pseudo-structure placed immediately following, and with one greater level than, the structure's line.
The tag of a line continuation pseudo-structure is CONT
.
The order of the line continuation pseudo-structures matches the order of the lines of text in the payload.
:::note
Versions prior to 7.0 had another CONT
-like tag, CONC
, which split line values without introducing a line break.
CONC
does not appear in version 7.
To support multi-version GEDCOM parsers, the CONC
tag is reserved and will not appear as the tag of a structure type.
:::
Line continuation pseudo-structures are not considered to be structures.
While they match production Line
and their level and position makes them appear to be substructures of the structure, they are actually a continuation of the encoding of the structure's payload and are not part of a structure's collection of substructures.
They must appear immediately following the line whose payload they are encoding and before any other line.
Because line terminators in payloads are encoded using line continuations, it is not possible to distinguish between U+000D and U+000A in payloads.
:::note
Previous versions limited the number of characters that could appear in a tag, cross-reference identifier, and line-value.
Those restrictions were removed in version 7.0.
The CONC
pseudo-structure, which allowed line values to have a shorter length restriction than payloads, was also removed.
:::
:::example The following are examples of valid but unrelated lines:
-
level 0, cross-reference identifier
@I1234@
, tagINDI
, no line value.0 @I1234@ INDI
-
level 1, no cross-reference identifier, tag
CHIL
, pointer line value pointing to the structure with cross-reference identifier "@I1234@
".1 CHIL @I1234@
-
level 1, no cross-reference identifier, tag
NOTE
, and line value + continuation pseudo-structure to encode a 4-line payload string: "This is a note field that
", "spans four lines.
", “”, and "(the third line was blank)
". Note that leading and trailing spaces are preserved.1 NOTE This is a note field that 2 CONT spans four lines. 2 CONT 2 CONT (the third line was blank)
:::
Every dataset must begin with a header pseudo-structure and end with a trailer pseudo-structure.
The trailer pseudo-structure has level 0
, tag TRLR
and no line value or substructures.
The trailer has no semantic meaning; it is present only to mark the end of the dataset.
The header pseudo-structure has level 0
, tag HEAD
, and no line value.
The substructures of the header pseudo-structure provide metadata about the entire dataset.
Some of those substructures are defined here;
others are defined in Chapter 3 or by extensions.
Every header must contain a substructure with a known tag that identifies the specification to which the dataset complies.
For FamilySearch GEDCOM 7.0, this is the GEDC
structure described in Chapter 3.
A header should contain an extension schema structure with tag SCHMA
as described in [Extensions].
A standard structure is a structure whose type, tag, meaning, superstructure, and cardinality within the superstructure are described in this document. This includes records such as INDI
and substructures such as INDI
.NAME
.
The recommended way to go beyond the set of standard structure types in this specification or to expand their usage is to submit a feature request on the FamilySearch GEDCOM development page so that the ramifications of the proposed addition and its interplay with other proposals may be discussed and the addition may be included in a subsequent version of this specification.
This specification also provides multiple ways for extension authors to go beyond the specification without submitting a feature request, which are described in the remainder of this section.
Extensions can introduce new structure types, new enumeration values, new calendars with their associated months, and new data types. They can also extend existing structures with new permitted substructure types and extend existing enumeration-type payloads with new permitted values. Extensions cannot change existing meanings, cardinalities, or calendars.
A tagged extension structure is a structure whose tag matches production extTag
. Tagged extension structures may appear as records or substructures of any other structure. Their meaning is defined by their tag, as is discussed more fully in the section [Extension Tags].
Any substructure of a tagged extension structure that uses a tag matching stdTag
is an extension-defined substructure.
Substructures of an extension-defined substructure that uses a tag matching stdTag
are also extension-defined substructures.
The meaning and use of each extension-defined substructure is defined by the tagged extension structure it occurs within, not by its tag alone nor by this specification.
:::example In the following
0 @P1@ _LOC
1 NAME Βυζάντιον
2 DATE FROM 667 BCE TO 324
1 _POP 15149358
2 DATE 31 DEC 2020
0 @I1@ INDI
1 BIRT
2 _LOC @P1@
- Both uses of
_LOC
are tagged extension structures, as is_POP
. _LOC
.NAME
and_LOC
.NAME
.DATE
are both extension-defined substructures. Their meaning is defined by the specification defining_LOC
._POP
.DATE
is an extension-defined substructure. Its meaning is defined by the specification defining_POP
.- Even though both
DATE
s appear to haveg7:type-DATE
payloads, we can't know that is the intended data type without consulting the defining specifications of_LOC
and_POP
, respectively. The first might be ag7:type-DATE#period
and the second ag7:type-DATE#exact
, for example. :::
If an extension-defined substructure has a tag that is also used by one or more standard structures, its meaning and payload type should match at least one of those standard structure types.
:::example
An extension-defined substructure with tag "DATE
" should provide a date or date period relevant to its superstructure, as do all DATE
-tagged structures in this specification. Extensions should not use "DATE
" to tag a structure describing anything else (even something that might reasonably be abbreviated "date", such as someone an individual dated).
:::
As a special case, a tagged extension structure can be defined to have a standard structure type. These are called relocated standard structures and can only appear with superstructures that are not documented as a superstructure of that structure type in this specification. The extension-defined substructures of a relocated standard structure are the substructure types documented in this specification for that structure type, including usual limitations on cardinality, payloads, substructures, etc.
:::example
Suppose _DATE
is defined to mean a g7:DATE
(using a documented extension tag). Then in the following
0 @I1@ INDI
1 NAME John /Doe/
2 _DATE FROM 6 APR 1917 TO 11 NOV 1918
3 PHRASE During America's involvement in the Great War
1 BIRT
2 PLAC Queens, New York, New York, USA
_DATE
is a relocated standard structure with typeg7:DATE
, with the usual payload type and meaning of ag7:DATE
.PHRASE
is the structure type expected with that tag as a substructure ofg7:DATE
: namely,g7:PHRASE
._DATE
can not be used as a substructure ofBIRT
becauseBIRT
has a documentedg7:DATE
substructure with tagDATE
.BIRT
can not be used as a substructure of_DATE
or_DATE
.PHRASE
because neither structure type has a documented substructure with tagBIRT
. :::
All other non-standard structures are prohibited. Examples of prohibited structures include, but are not limited to,
- a record or substructure of a standard structure using a tag matching production
stdTag
that is not defined in this document; - any substructure with cardinality
{0:1}
appearing more than once; - a standard substructure appearing as a record or vice-versa;
- a standard structure whose payload does not match the requirements of this document.
:::note In some cases, an extension may need to allow multiple structures where this document allows only 1. The recommended way to do this is to create an extension tag and URI and serve a page describing how the semantics of the structure have been extended to allow multiple instances.
:::example
Suppose I have multiple sources that give different ages of the wife at a wedding; however, this specification allows only 1 MARR
.WIFE
.AGE
. An extension could not include multiple MARR
.WIFE
nor MARR
.WIFE
.AGE
, but could define a new extension _AGE
, give it a URL, and provide the following definition of this extension structure type at that URL:
Alternate age: an age attested by some source, but not accepted by the researcher as the actual age of the individual. If the age is accepted by the researcher, the standard tag
AGE
should be used instead.
This alternate age extension structure could be used as follows:
1 MARR
2 WIFE
3 AGE 27y
3 _AGE 22y
::: :::
Enumerated values may be extended with new values that match production extTag
.
Enumerations may not use standard values from other enumeration sets.
:::example
The following is not allowed because PARENT
is defined as a value for ROLE
, not for RESN
0 @BAD@ INDI
1 RESN PARENT
1 NOTE The above enumeration value is not allowed
:::
Dates may be extended provided they use a calendar that matches production extTag
.
Dates with extension calendars may also use extension months and epochs.
Each use of the extTag
production is called an extension tag,
including when used as a tag, calendar, month, epoch, or enumerated value.
Each extTag
is either a documented extension tag or an undocumented extension tag.
It is recommended that documented extension tags be used instead of undocumented extension tags wherever possible.
A documented extension tag is a tag that is mapped to a URI using the schema structure.
The schema structure is a substructure of the header with tag SCHMA
.
It should appear within the document before any extension tags.
The schema's substructures are tag definitions.
A tag definition is a structure with tag TAG
.
Its payload is an extension tag, a space, and a URI
and defines that extension tag to be an abbreviation for that URI within the current document.
:::example The following header
0 HEAD
1 SCHMA
2 TAG _SKYPEID http://xmlns.com/foaf/0.1/skypeID
2 TAG _MEMBER http://xmlns.com/foaf/0.1/member
defines the following tags
Tag | Means |
---|---|
_SKYPEID |
http://xmlns.com/foaf/0.1/skypeID |
_MEMBER |
http://xmlns.com/foaf/0.1/member |
Note that at the time of writing, the FOAF URIs used in this example are not URLs. :::
The meaning of a documented extension tag is identified by its superstructure type and its URI, not its tag. As such each documented extension tag needs its own URI: it is its URI, not its tag, that defines its meaning. Documented extension tags can be changed freely by modifying the schema, though it is recommended that documented extension tags not be changed. However, a tag change may be necessary if a product picks the same tags for URIs that another product uses for different URIs. A given schema should map only one tag to each URI.
:::example The following 2 document fragments are semantically equivalent and a system importing one may export it as the other without change of meaning.
0 HEAD
1 SCHMA
2 TAG _SKYPEID http://xmlns.com/foaf/0.1/skypeID
0 @I0@ INDI
1 _SKYPEID example.person
0 HEAD
1 SCHMA
2 TAG _SI http://xmlns.com/foaf/0.1/skypeID
0 @I0@ INDI
1 _SI example.person
:::
It is recommended that the URIs used for documented extension tags be URLs that can be used to access documentation for the meaning of the tag.
:::note The W3C has an interest group note that discusses several ways of achieving this URI/URL mapping, including how a single webpage can describe multiple tags using either HTTP redirects (which requires some server setup) or what they call "Hash URIs" (which require no setup).
That interest group note also explains why it might be desirable to have a separate URIs for a concept and the document describing that concept. Because of the structure of the schema, that separation is less important for FamilySearch GEDCOM 7 than it is for the semantic web, but it remains good advice where feasible. :::
The schema structure may contain the same tag more than once with different URIs. Reusing tags in this way must not be done unless the concepts identified by those URIs cannot appear in the same place in a dataset, and should not be done unless the URIs identify closely related concepts.
:::example Consider three extensions:
https://example.com/LocationRecord
, a record that describes a location.https://example.com/LocationPointer
, a substructure of most events that points to ahttps://example.com/LocationRecord
.https://example.com/inLocoParentis
, a substructure of some events indicating a non-parent entity that filled the legal role of a parent for that event.
Given this, we have the following:
https://example.com/LocationPointer
andhttps://example.com/inLocoParentis
must not be given the same tag because they can appear in the same place in a dataset.https://example.com/LocationRecord
andhttps://example.com/inLocoParentis
may be given the same tag, but should not be given the same tag because they identify unrelated concepts.https://example.com/LocationRecord
andhttps://example.com/LocationStructure
may be given the same tag.
One way to satisfy these constraints and recommendations is with the following schema:
1 SCHMA
2 TAG _LOC https://example.com/LocationRecord
2 TAG _LOC https://example.com/LocationPointer
2 TAG _ILP https://example.com/inLocoParentis
:::
An extension tag that is not given a URI in the schema structure is called an undocumented extension tag. The meaning of an undocumented extension tag is identified by its superstructure type and its tag.
- It is recommended that applications not use undocumented extension tags.
- It is required that each tag definition's extension tag be unique within the document.
- It is recommended that each documented extension tag's URI be unique within the document.
- It is recommended that extension creators use URLs as their URIs and serve a page describing the meaning of an extension at its URL.
Future versions may include additional recommendations relating to documentation, machine-readable documentation, or embedded metadata about extensions within the schema.
Standard structures take priority over extensions. Data contained in extension tags will not be interpreted by other systems correctly unless the other system supports that particular extension. In particular, those supporting extensions should keep in mind the following:
-
If a standard structure is present that contradicts an extension that is present, the standard structure has priority and the extension should be updated to align with it.
If a document has an extension
_ISODATE
in ISO 8601 format that disagrees with aDATE
in theDateValue
format, theDATE
shall be taken as more correct and the_ISODATE
updated to reflect that. -
If a standard structure can be extracted as a subset of the semantics of an extension, the standard tag must be generated along with the extension and kept in sync with it by systems understanding the extension.
If a document has an extension
_LOC
providing a detailed hierarchical place representation with historical names, boundaries, and the like, it must also generate the correspondingPLAC
structures with the subset of that information whichPLAC
can represent. -
If an extension can be extracted as a subset of the semantics of a standard structure, or if the extension and standard structure only sometimes align, then the standard structure should be included if and only if the semantics align in this case.
If a document has an extension
_PARTNER
that generalizesHUSB
andWIFE
and someASSO
ROLE
s, then it should pair the extension with those standard structures if and only if it knows which one applies.If a document has an extension
_HOUSEHOLD
that is the same asFAM
in some situations but not in others, then it should keep the_HOUSEHOLD
andFAM
in sync if and only if they align. -
Six standard structure types are exceptions to these rules:
NOTE
,SNOTE
,INDI
.EVEN
,FAM
.EVEN
,INDI
.FACT
, andFAM
.FACT
. Each of these allows human-readable text to describe information that cannot be captured in more-specific structures. As such, all other structures express information that could be described using 1 or more of those structure types. Extensions do not need to duplicate their information using any of those structures.If a document has an extension
_MEMBER
that indicates membership in clubs, boards, and other groups, it is not required to duplicate that information in anINDI
.FACT
becauseINDI
.FACT
is 1 of the 6 special structure types listed above.If a document has an extension
_WEIGHT
that describes the weight of a person, it must duplicate that information in anINDI
.DSCR
becauseINDI
.DSCR
is not 1 of the 6 generic structure types.
There may be situations where data needs to be removed from a dataset, such as when a user requests its deletion or marks it as confidential and not for export.
In general, removed data should result in removed structures.
Pointers to a removed structure should be replaced with voidPtr
s.
If removal of a structure makes the superstructure invalid because the superstructure required the substructure, the structure should instead be retained and have its payload changed to a voidPtr
if a pointer, or to a data type-appropriate empty value if a non-pointer.
If removing a structure leaves its superstructure with no payload and no substructures, the superstructure should also be removed.
A structure can also be removed if it provides no new information. For example,
0 @I1@ INDI
1 NAME John /Doe/
1 NAME John /Doe/
1 FAMC @F1@
1 FAMC @F1@
0 @F1@ FAM
1 CHIL @I1@
1 CHIL @I1@
provides no information beyond the simpler form:
0 @I1@ INDI
1 NAME John /Doe/