Which characters should be only Unicoded #523

mariannevanharten · 2024-07-27T17:57:29Z

mariannevanharten
Jul 27, 2024

In which range are the characters that should be Unicoded?
So, in which range I may not to use the regular character?

Best regards,
Marianne van Harten

Jul 29, 2024

Hi Marianne. I just looked at your site, and I think you are mis-understanding Unicode.

You have entered things such as Alleb%U+00E9 into your data. i.e. the literal characters U, +, 0, 0, E, 9.

U+0039 is just an unambigious, human-friendly way to refer to the unicode character é.

You should always be using the actual characters in your data, such as é.

No encoding is necessary. These will "just work".

View full answer

tychonievich · 2024-07-27T18:14:34Z

tychonievich
Jul 27, 2024
Maintainer

GEDCOM 7 files are encoded in UTF-8 in their entirety. UTF-8 encodes code points 0 through 127 as a single byte with that code point as its numerical value, while other code points are encoded as multi-byte sequences.

UTF-8 is one of the most widely supported character encodings in the world: it is likely that there is library support for it in whatever language you are using (or a setting for it in your editor if you are typing GEDCOM by hand). I've written multiple GEDCOM files and GEDCOM-handling applications without doing anything more to handle UTF-8 than selecting it as the encoding of the file.

0 replies

mariannevanharten · 2024-07-27T20:24:38Z

mariannevanharten
Jul 27, 2024
Author

Am I right to say that U+00xx can be replaced by its regular character?

2 replies

tychonievich Jul 27, 2024
Maintainer

I'm not sure what you mean by "regular character." If you you mean something like C's char type (a single 8-bit byte) then that only works for U+0000 through U+007F, not for U+0080 or larger code points.

mariannevanharten Jul 28, 2024
Author

I think that you already helped me.
I want to know in which range I can use characters such as ë, á, etc. and when I SHOULD use Unicode.

fisharebest · 2024-07-29T15:57:27Z

fisharebest
Jul 29, 2024

Hi Marianne. I just looked at your site, and I think you are mis-understanding Unicode.

You have entered things such as Alleb%U+00E9 into your data. i.e. the literal characters U, +, 0, 0, E, 9.

U+0039 is just an unambigious, human-friendly way to refer to the unicode character é.

You should always be using the actual characters in your data, such as é.

No encoding is necessary. These will "just work".

0 replies

mariannevanharten · 2024-07-30T18:02:31Z

mariannevanharten
Jul 30, 2024
Author

Hi Greg, You are right. Last year I changed some characters to the Unicode format in the beginning of Gedcom 7. Now I’ve changed most of the Unicode characters back to the literal one. When checking my gedcom using Chronoplex’s Gedcom Validator, I see many good old 5.5.1 tags (which are equivalent to the 7.0 tag). Except these “errors”, a few other errors remain related to Gedcom 7… 😊 Vriendelijke groet, best regards, <https://vanharten.net/wapen.jpg> Marianne van Harten Van: Greg Roach ***@***.***> Verzonden: maandag 29 juli 2024 17:58 Aan: FamilySearch/GEDCOM ***@***.***> CC: Marianne van Harten ***@***.***>; Author ***@***.***> Onderwerp: Re: [FamilySearch/GEDCOM] Which characters should be only Unicoded (Discussion #523) Hi Marianne. I just looked at your site, and I think you are mis-understanding Unicode. You have entered things such as Alleb%U+00E9 into your data. i.e. the literal characters U, +, 0, 0, E, 9. U+0039 is just an unambigious, human-friendly way to refer to the unicode character é. You should always be using the actual characters in your data, such as é. No encoding is necessary. These will "just work". — Reply to this email directly, view it on GitHub <#523 (comment)> , or unsubscribe <https://github.com/notifications/unsubscribe-auth/ASJ2CDIBW6M47UGNOPZAXUTZOZQ75AVCNFSM6AAAAABLSBXJ6OVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTAMJYGE2TENA> . You are receiving this because you authored the thread. <https://github.com/notifications/beacon/ASJ2CDJMTSC2STU3QE2MWR3ZOZQ75A5CNFSM6AAAAABLSBXJ6OWGG33NNVSW45C7OR4XAZNRIRUXGY3VONZWS33OINXW23LFNZ2KUY3PNVWWK3TUL5UWJTQATNNZI.gif> Message ID: ***@***.*** ***@***.***> >

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Which characters should be only Unicoded #523

{{title}}

Replies: 4 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Which characters should be only Unicoded #523

mariannevanharten Jul 27, 2024

Replies: 4 comments · 2 replies

tychonievich Jul 27, 2024 Maintainer

mariannevanharten Jul 27, 2024 Author

tychonievich Jul 27, 2024 Maintainer

mariannevanharten Jul 28, 2024 Author

fisharebest Jul 29, 2024

mariannevanharten Jul 30, 2024 Author

mariannevanharten
Jul 27, 2024

Replies: 4 comments 2 replies

tychonievich
Jul 27, 2024
Maintainer

mariannevanharten
Jul 27, 2024
Author

tychonievich Jul 27, 2024
Maintainer

mariannevanharten Jul 28, 2024
Author

fisharebest
Jul 29, 2024

mariannevanharten
Jul 30, 2024
Author