From 3900bf2c9e7ab15ae396365d2a4faaa167cf8e3d Mon Sep 17 00:00:00 2001 From: "Tineke Kosmis (Granny10)" <33637841+mother10@users.noreply.github.com> Date: Fri, 15 Aug 2025 14:35:35 +0200 Subject: [PATCH 01/11] Draft v8.0 TEMPLATE, STICKY, FLEX, ASSET, GROUP, PROOF, alongside V7 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This PR models new structures for GEDCOM V8, to try and take that to a higher level. It is based on TEMPLATEs and STICKYs. They describe the evidence in documents, on gravestones, in a census, and any other evidence "entity" that can hold genealogical information. It has 3 huge enumtables so it becomes a bit "Data driven". It is structured alongside GEDCOM 7.x (for now) to help users and developers get used to this new system. But before implementation the group might decide to totally disconnect it from V7.x and start a new direction. In that case changes have to be applied to make this draft work. While designing, time and effort have been spend to make it extendable, reliable, easily changeable, logical, solid and easy for users and developers. It needs polishing, refinements, help from the group to find missing ends. It has a huge extra file with examples, they are not there for fun (although some have some fun inside) but they are there as a testbed. I heavily used that to make sure things would fit. But I now want to show it all, I can hardly wait.๐Ÿ˜ --- .../gedcom-1-hierarchical-container-format.md | 35 + specification/gedcom-2-data-types.md | 180 + .../gedcom-3-structures-1-organization.md | 1619 ++++- .../gedcom-3-structures-3-meaning.md | 73 + ...dcom-3-structures-4-enumerations-entity.md | 336 + ...dcom-3-structures-4-enumerations-events.md | 442 ++ ...edcom-3-structures-4-enumerations-roles.md | 451 ++ .../gedcom-3-structures-4-enumerations.md | 270 + .../gedcom-6-appendix-examples-sticky.md | 6130 +++++++++++++++++ 9 files changed, 9502 insertions(+), 34 deletions(-) create mode 100644 specification/gedcom-3-structures-4-enumerations-entity.md create mode 100644 specification/gedcom-3-structures-4-enumerations-events.md create mode 100644 specification/gedcom-3-structures-4-enumerations-roles.md create mode 100644 specification/gedcom-6-appendix-examples-sticky.md diff --git a/specification/gedcom-1-hierarchical-container-format.md b/specification/gedcom-1-hierarchical-container-format.md index ab1ef74b..175eb631 100644 --- a/specification/gedcom-1-hierarchical-container-format.md +++ b/specification/gedcom-1-hierarchical-container-format.md @@ -286,6 +286,41 @@ The following are examples of valid but unrelated lines: ```` ::: +## Comments โ€“ Inline and standalone comments + +GEDCOM 8 supports comments using the `/* ... */` syntax, inspired by common programming languages. +A comment may appear either: + +- **On its own line**, e.g.: `/* This is a comment */` +- **At the end of a GEDCOM line**, preceded by **at least 1 space**. e.g.: `0 STICKY @ST0140@ /* PERSON, GROOM */` + +Rules: +- Comments must begin with /* and end with */, on the same line. +- Comments may not be nested. +- Comments are ignored by parsers and have no effect on validation. +- Comments are intended for human readers and may be used freely to annotate records. +- Inline comments (i.e., following a GEDCOM line) must be preceded by **at least one space**. + +A lot of the comments in this specification, could be autogenerated by software. As very often they contain: +- name of a person +- event type +- roles of a person + +So it is not always necessary for users to do that themselves. + +:::example + +````gedcom +2 STICKY @ST0140@ /* PERSON, GROOM */ +/* Here we have a comment */ +```` +Illegal use: +````gedcom +2 STICKY @ST0140@/* PERSON, GROOM */ +/* Here we have a /*comment*/ */ +```` +::: + ## The Header and Trailer {#the-header} Every dataset must begin with a header pseudo-structure and end with a trailer pseudo-structure. diff --git a/specification/gedcom-2-data-types.md b/specification/gedcom-2-data-types.md index be8641a5..01103fa1 100644 --- a/specification/gedcom-2-data-types.md +++ b/specification/gedcom-2-data-types.md @@ -167,6 +167,141 @@ The URI for the `DateExact` data type is `g7:type-Date#exact`. The URI for the `DatePeriod` data type is `g7:type-Date#period`. +## Boolean + +A `Boolean` is a binary truth value that indicates a positive or negative condition, choice, or state. + +GEDCOM 8 SHALL use the values `TRUE` and `FALSE`, in all uppercase, consistent with other enumerated values in the specification. +These values MUST appear in uppercase and without quotation marks. + +```abnf +Boolean = "TRUE" / "FALSE" +``` + +**Examples**: +- `TRUE` โ†’ value or condition applies +- `FALSE` โ†’ value or condition does not apply + +This data type may be used in `FLEX.CONTENTS`, or any other structure where an on/off or yes/no condition is required. + +**Note**: Values such as `YES`, `NO`, `1`, or `0` are NOT valid Boolean representations in GEDCOM 8. + +The URI for the Boolean data type is: `xsd:boolean` + +## Decimal + +A decimal is a non-empty sequence of ASCII decimal digits that may include an ASCII period (".") to indicate the decimal point. It represents a rational number in base-10 and can have any number of digits after the decimal point. Leading zeros have no semantic meaning and should be omitted unless immediately preceding the decimal point. + +```abnf +Decimal = 1*DIGIT [ "." 1*DIGIT ] +``` +The number of digits before the decimal point, plus the number of digits after the decimal point are maximal 18 digits in total. +Negative Decimal numbers are not supported by this specification. + +The URI for the `Decimal` data type is `xsd:decimal`. + +## HexaDecimal + +A `HexaDecimal` is a non-empty sequence of ASCII hexadecimal digits, representing an unsigned integer in base-16. + +Characters MUST be chosen from `0โ€“9` and uppercase letters `Aโ€“F`. +Letters **must be uppercase** in GEDCOM files to ensure consistency and compatibility across systems. + +Leading zeros are permitted but have no semantic meaning. +Negative values are not supported. + +This type may be used for scalable, compact identifiers where decimal values become inefficient or overly long. + +```abnf +Hexadecimal = 1*( %x30-39 / %x41-46 ) ; 0โ€“9 or Aโ€“F (uppercase only) +``` +:::example +**Examples**: +- `1` +- `7F` +- `00A3` +- `00F4240` (decimal 1,000,000) +- `2DC6C0` (decimal 3,000,000) +::: + +The URI for the `HexaDecimal` data type is: +`xsd:hexBinary` (loosely aligned; GEDCOM format is numeric-only, without spacing or byte-grouping). + +## HexID + +A `HexID` is a cross-reference identifier used to uniquely identify a record within a GEDCOM file using a hexadecimal format. +This format can be used when the quantity of records (such as STICKYs) may exceed the limitations of base-10 padded numbering. + +It is composed of: +- A record-type prefix (e.g., `ST`, `I`, `R`) +- A required separator digit `0` (zero) +- A `Hexadecimal` number that can grow in length as needed + +The entire identifier is enclosed in `@` symbols, consistent with GEDCOM cross-reference ID syntax. + +```abnf +HexID = "@" Prefix "0" Hexadecimal "@" +Prefix = %x41-5A / 2(%x41-5A) ; one or two uppercase letters Aโ€“Z +``` +::: example +**Examples**: +- `@ST00001@` +- `@ST003E8@` +- `@ST0F4240@` (decimal 1,000,000) +- `@R02DC6C0@` (REPO example, decimal 3,000,000) +::: + +- The `Hexadecimal` portion MUST conform to the rules in the `Hexadecimal` data type definition. +- The record-type prefix MUST consist of exactly one or two uppercase ASCII letters (Aโ€“Z). +This includes standard GEDCOM record types (S, I, F, etc.) and new custom types such as ST for `STICKY` or `SP` for `SPLAC`. + +The URI for the `HexID` type is: `g8:HexID` +_This format is GEDCOM-specific and has no standard XML Schema equivalent._ + +## PercentLiteral + +A percent is a rational number expressed in base-10 with an optional decimal point, representing a proportion out of 100. It includes the percent symbol (`%`). Values should be between `0` and `100`, inclusive. Precision may extend to decimal places, but the total number of digits before and after the decimal point **must not exceed 6 digits**. + +```abnf +Percent = ( "0" / [1-9] 1*DIGIT ) [ "." 1*DIGIT ] "%" +``` +- Leading zeros have no semantic meaning and should be avoided unless immediately preceding a decimal point (e.g., 0.5% is valid). +- Values over 100% are not permitted. +- Negative values are not permitted. + +Examples: +Valid: `25%, 0.5%, 99.99%, 100%` +Invalid: `-5%, 105%, 001.00%, 12.0` + +### Genealogical use cases: + +- Degree of confidence in a transcription: 95% +- Estimated certainty of match: 87.5% +- Percentage of form completion: 100% + +The URI for the Percent data type is: `g8:type-PercentLiteral` + +## PercentRatioLiteral + +This data type expresses ratios or comparisons using a percentage beyond 100%. It is suitable for contexts where the value exceeds a full unit, typically for historical, economic, or biological scaling. +Negative values are not permitted. + +```abnf +PercentRatioLiteral = 1*DIGIT [ "." 1*DIGIT ] "%" +``` +:::example +Valid: `110%, 150.75%, 220.5%` +Invalid: `-10%, 2, 001.0%` +::: + +### Genealogical use cases: + +- Increase in estate value compared to previous decade: 175% +- Year-on-year increase in property tax: 122.5% +- Dowry size compared to average: 300% + +The URI for the PercentRatio data type is: `g8:type-PercentRatioLiteral` + ## Time Time is represented on a 24-hour clock (for example, 23:00 rather than 11:00 PM). @@ -370,3 +505,48 @@ Version 7.0 only supports the following URLs: Additional URLs may be supported in future versions of this specification. The URI for the `FilePath` data type is `g7:type-FilePath`. + +### TupleList Data Type + +The `TupleList` data type represents a structured, multi-line encoding of simple two-column tabular data. +It is designed for storing genetic data such as DNA marker/allele pairs in a readable and consistent format. + +The data type `TupleList` represents a list of structured value pairs, each encoded on a **separate GEDCOM CONT line**, in tuple format: + +``` +(rs123456, A) +(rs789101, C) +(rs999999, GG) +``` + +Each line must contain exactly one tuple: +- A **label** (typically a marker ID like `rs123456`) +- A **value** (typically an allele value like `A`, `T`, `GG`, etc.) + +These are wrapped in parentheses `()` and separated by a **comma** `,`. +A single optional space **may appear after the comma**. + +```abnf +TupleList = Tuple *( Newline Tuple ) +Tuple = "(" Label "," SP? Value ")" +Label = 1*VCHAR ; Must not contain comma, newline, or parenthesis +Value = 1*VCHAR ; Must not contain comma, newline, or parenthesis +Newline = CRLF / LF / CR +SP = %x20 ; ASCII space +``` + +All tuples must follow this structure. +The format does **not** allow blank lines, comments, or nested groups. + +The URI for this data type is: +`g8:type-TupleList` + +> $\color{Coral}\large{\textsf{โœจ Forward Compatibility Note:}}$ +> Future versions of GEDCOM may allow compact representations such as: +> - Semicolon-separated tuples on a single line +> `[(rs123456, A); (rs789101, C)]` +> - Tab-delimited formats resembling CSV or raw DNA exports +> This v8 format ensures maximum compatibility and readability for current software, while keeping open the door for compressed imports/exports. + +> ๐Ÿ“Ž **See Also:** The structure and usage of this data type is demonstrated in the `ARRAY_STRUCTURE` element. + diff --git a/specification/gedcom-3-structures-1-organization.md b/specification/gedcom-3-structures-1-organization.md index a153daae..36fdad46 100644 --- a/specification/gedcom-3-structures-1-organization.md +++ b/specification/gedcom-3-structures-1-organization.md @@ -110,22 +110,39 @@ with any `RECORD`s in between. ```gedstruct [ +n <> {1:1} +| n <> {1:1} | +n <> {1:1} +| n <> {1:1} | n <> {1:1} | +n <> {1:1} +| n <> {1:1} | n <> {1:1} | n <> {1:1} | +n <> {1:1} +| n <> {1:1} +| +n <> {1:1} ] ``` +*** +$\color{Coral}\large{\textbf{Gedcom 8:}}$ + +>- **This Draft-PR is written to allow the new structures to go *alongside* GEDCOM 7.X, and because of that in this GEDCOM 8 specification, lines are added in GEDCOM 7 parts, to be able to allow that.** +>- **If the GEDCOM group would decide to add this PR, but make it a new start, without old GEDCOM 7.x constructions inside it, a few parts of this draft have to be changed to remove some of the old structures and records of current GEDCOM 7.x.** +*** + #### `HEADER` := ```gedstruct @@ -134,28 +151,30 @@ n HEAD {1:1} g7:HEAD +2 VERS {1:1} g7:GEDC-VERS +1 SCHMA {0:1} g7:SCHMA +2 TAG {0:M} g7:TAG - +1 SOUR {0:1} g7:HEAD-SOUR - +2 VERS {0:1} g7:VERS - +2 NAME {0:1} g7:NAME - +2 CORP {0:1} g7:CORP + +1 SOUR {1:1} g7:HEAD-SOUR + +2 VERS {1:1} g7:VERS + +2 NAME {1:1} g7:NAME + +2 CORP {1:1} g7:CORP +3 <> {0:1} +3 PHON {0:M} g7:PHON - +3 EMAIL {0:M} g7:EMAIL + +3 EMAIL {1:M} g7:EMAIL +3 FAX {0:M} g7:FAX - +3 WWW {0:M} g7:WWW + +3 WWW {1:M} g7:WWW +2 DATA {0:1} g7:HEAD-SOUR-DATA +3 DATE {0:1} g7:DATE-exact +4 TIME