Which characters should be only Unicoded #523
-
In which range are the characters that should be Unicoded? Best regards, |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 2 replies
-
GEDCOM 7 files are encoded in UTF-8 in their entirety. UTF-8 encodes code points 0 through 127 as a single byte with that code point as its numerical value, while other code points are encoded as multi-byte sequences. UTF-8 is one of the most widely supported character encodings in the world: it is likely that there is library support for it in whatever language you are using (or a setting for it in your editor if you are typing GEDCOM by hand). I've written multiple GEDCOM files and GEDCOM-handling applications without doing anything more to handle UTF-8 than selecting it as the encoding of the file. |
Beta Was this translation helpful? Give feedback.
-
Am I right to say that U+00xx can be replaced by its regular character? |
Beta Was this translation helpful? Give feedback.
-
Hi Marianne. I just looked at your site, and I think you are mis-understanding Unicode. You have entered things such as
You should always be using the actual characters in your data, such as No encoding is necessary. These will "just work". |
Beta Was this translation helpful? Give feedback.
-
Hi Greg,
You are right.
Last year I changed some characters to the Unicode format in the beginning of Gedcom 7. Now I’ve changed most of the Unicode characters back to the literal one.
When checking my gedcom using Chronoplex’s Gedcom Validator, I see many good old 5.5.1 tags (which are equivalent to the 7.0 tag). Except these “errors”, a few other errors remain related to Gedcom 7… 😊
Vriendelijke groet,
best regards,
<https://vanharten.net/wapen.jpg>
Marianne van Harten
Van: Greg Roach ***@***.***>
Verzonden: maandag 29 juli 2024 17:58
Aan: FamilySearch/GEDCOM ***@***.***>
CC: Marianne van Harten ***@***.***>; Author ***@***.***>
Onderwerp: Re: [FamilySearch/GEDCOM] Which characters should be only Unicoded (Discussion #523)
Hi Marianne. I just looked at your site, and I think you are mis-understanding Unicode.
You have entered things such as Alleb%U+00E9 into your data. i.e. the literal characters U, +, 0, 0, E, 9.
U+0039 is just an unambigious, human-friendly way to refer to the unicode character é.
You should always be using the actual characters in your data, such as é.
No encoding is necessary. These will "just work".
—
Reply to this email directly, view it on GitHub <#523 (comment)> , or unsubscribe <https://github.com/notifications/unsubscribe-auth/ASJ2CDIBW6M47UGNOPZAXUTZOZQ75AVCNFSM6AAAAABLSBXJ6OVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTAMJYGE2TENA> .
You are receiving this because you authored the thread. <https://github.com/notifications/beacon/ASJ2CDJMTSC2STU3QE2MWR3ZOZQ75A5CNFSM6AAAAABLSBXJ6OWGG33NNVSW45C7OR4XAZNRIRUXGY3VONZWS33OINXW23LFNZ2KUY3PNVWWK3TUL5UWJTQATNNZI.gif> Message ID: ***@***.*** ***@***.***> >
|
Beta Was this translation helpful? Give feedback.
Hi Marianne. I just looked at your site, and I think you are mis-understanding Unicode.
You have entered things such as
Alleb%U+00E9
into your data. i.e. the literal charactersU
,+
,0
,0
,E
,9
.U+0039
is just an unambigious, human-friendly way to refer to the unicode characteré
.You should always be using the actual characters in your data, such as
é
.No encoding is necessary. These will "just work".