Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 8 additions & 5 deletions .github/copilot-instructions.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,8 @@ make
If the build completes but the `extracted-files/tags/` directory is empty, run the URI extraction manually:
```bash
cd build
python3 uri-def.py ../specification/gedcom*.md ../extracted-files/tags
python3 extract-yaml.py --spec=../specification/ --dest=../extracted-files/
python3 yaml-to-tsv.py --dest=../extracted-files/ ../extracted-files/tags
```

This command generates:
Expand Down Expand Up @@ -94,7 +95,8 @@ mkdir -p ../extracted-files/tags
make

# If tags directory is empty, run URI extraction manually
python3 uri-def.py ../specification/gedcom*.md ../extracted-files/tags
python3 extract-yaml.py --spec=../specification/ --dest=../extracted-files/
python3 yaml-to-tsv.py --dest=../extracted-files/ ../extracted-files/tags

# Verify generated files exist
ls -la ../specification/gedcom.html ../specification/gedcom.pdf
Expand All @@ -117,7 +119,7 @@ The repository has automated workflows that run on pushes and pull requests:
- Creates PRs with updated extracted files if changes detected
- Uses commands:
- `python3 extract-grammars.py ../specification/gedcom*.md ../extracted-files/`
- `python3 uri-def.py ../specification/gedcom*.md ../extracted-files/tags`
- `python3 extract-yaml.py --spec=../specification/ --dest=../extracted-files/`

## Repository Structure

Expand All @@ -144,7 +146,8 @@ The repository has automated workflows that run on pushes and pull requests:
- `hyperlink.py` - Adds hyperlinks to markdown
- `hyperlink-code.py` - Adds hyperlinks to code blocks in HTML
- `extract-grammars.py` - Extracts ABNF and structure grammars
- `uri-def.py` - Extracts tag definitions and generates YAML files
- `extract-yaml.py` - Extracts tag definitions and generates YAML files
- `yaml-to-tsv.py` - Extracts TSV files from YAML files
- `push_to_gedcomio.py` - Uploads to gedcom.io (requires special access)

## Common Development Tasks
Expand Down Expand Up @@ -195,4 +198,4 @@ If build fails:
The build process emits CSS-related warnings from weasyprint - these are normal and documented. Only stop the build for actual errors, not warnings.

### File Publishing
Publishing to gedcom.io requires access to the separate GEDCOM.io repository and is not part of normal development workflows.
Publishing to gedcom.io requires access to the separate GEDCOM.io repository and is not part of normal development workflows.
2 changes: 1 addition & 1 deletion .github/workflows/generate-files.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ jobs:

steps:
- name: Check out GEDCOM
uses: actions/checkout@v6
uses: actions/checkout@9c091bb21b7c1c1d1991bb908d89e4e9dddfe3e0 # v7.0.0

- name: Get the branch name
id: extract_branch
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/propagate-main-to-v7.1.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ jobs:

steps:
- name: Check out GEDCOM
uses: actions/checkout@v6
uses: actions/checkout@9c091bb21b7c1c1d1991bb908d89e4e9dddfe3e0 # v7.0.0

- name: Set git config
env:
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/validate-yaml.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ jobs:

steps:
- name: Checkout GEDCOM
uses: actions/checkout@v6
uses: actions/checkout@9c091bb21b7c1c1d1991bb908d89e4e9dddfe3e0 # v7.0.0

- name: Validate YAML
run: yamllint .
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
Expand Up @@ -186,7 +186,7 @@
same "printed page" as the copyright notice for easier
identification within third-party archives.

Copyright 1984-2025 Intellectual Reserve, Inc. All rights reserved. A service provided by The Church of Jesus Christ of Latter-day Saints.
Copyright 1984-2026 Intellectual Reserve, Inc. All rights reserved. A service provided by The Church of Jesus Christ of Latter-day Saints.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand Down
2 changes: 1 addition & 1 deletion NOTICE
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
NOTICE:

This work comprises, is based on, or is derived from the FAMILYSEARCH GEDCOM™
Specification, © 1984-2025 Intellectual Reserve, Inc. All rights reserved.
Specification, © 1984-2026 Intellectual Reserve, Inc. All rights reserved.

"FAMILYSEARCH GEDCOM™" and "FAMILYSEARCH®" are trademarks of Intellectual
Reserve, Inc. and may not be used except as allowed by the Apache 2.0 license
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,13 @@ If you are looking for FamilySearch's GEDCOM 5.5.1 Java parser, which previously
- `specification/gedcom-`number`-`title`.md` files are the source documents used to define the FamilySearch GEDCOM specification. It is written in pandoc-flavor markdown and is intended to be more easily written than read. It is split into several files (ordered by the integer in their names) to facilitate comparing files.
- In a local check-out, this is also where the build scripts place rendered files `gedcom.html` and `gedcom.pdf`; see [releases](releases/latest) for a pre-rendered copy of these.
- [`specification/terms/`](specification/terms/)
- YAML files to be served in the <https://gedcom.io/terms/v7/> namespace, augmenting those automatically extracted from the specification itself by [`build/uri-def.py`](build/uri-def.py).
- YAML files to be served in the <https://gedcom.io/terms/v7/> namespace, augmenting those automatically extracted from the specification itself by [`build/extract-yaml.py`](build/extract-yaml.py).
- [`build/`](build/) contains files needed to render the specification
- See [`build/README.md`](build/) for more
- [`extracted-files/`](extracted-files/) contains digested information automatically extracted from the specification. All files in this directory are automatically generated by scripts in the [`build/`](build/) directory.
- [`extracted-files/grammar.abnf`](extracted-files/grammar.abnf) contains all the character-level ABNF for parsing lines and datatypes.
- [`extracted-files/grammar.gedstruct`](extracted-files/grammar.gedstruct) contains a custom structure organization metasyntax.
- various `.tsv` files to assist automated validation of files, including:
- various `.tsv` files to assist automated validation of files, extracted from the YAML files by [`build/yaml-to-tsv.py`](build/yaml-to-tsv.py), including:
- [`extracted-files/cardinalities.tsv`](extracted-files/cardinalities.tsv) with columns "superstructure type ID, substructure type ID, cardinality marker"
- [`extracted-files/enumerations.tsv`](extracted-files/enumerations.tsv) with columns "superstructure type ID, enumeration string, enumeration ID"
- [`extracted-files/payloads.tsv`](extracted-files/payloads.tsv) with columns "structure type ID, payload type"
Expand Down
1 change: 1 addition & 0 deletions build/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ $(HTML_FILE): hyperlink-code.py GEDCOM-tmp.html
python3 hyperlink-code.py GEDCOM-tmp.html $(HTML_FILE)

$(TAGDEFS): $(MD_FILES) $(TERMS_FILES) $(EXTDIR)grammar.gedstruct extract-yaml.py
mkdir -p $(TAGDEFS)
python3 extract-yaml.py --spec=$(SPECDIR) --dest=$(EXTDIR)
rsync -au $(TERMS_FILES) $(EXTDIR)tags
python3 yaml-to-tsv.py --dest=$(EXTDIR) $(TAGDEFS)
Expand Down
8 changes: 4 additions & 4 deletions build/extract-yaml.py
Original file line number Diff line number Diff line change
Expand Up @@ -157,12 +157,12 @@ def type_specific(self) -> list[str]:
if val is None: ans.append(key+': null')
elif val == [] or isinstance(val, bool): ans.append(key+': '+str(val).lower())
elif isinstance(val, str):
assert '"' not in val and '\n' not in val, f"Simplified serialization failed for {uri}'s {key}"
assert '"' not in val and '\n' not in val, f"Simplified serialization failed for {self.uri}'s {key}"
ans.append(key+': "'+val+'"')
else:
entry = key+':'
for v in (sorted(val) if key != 'months' else val):
assert '"' not in v and '\n' not in v, f"Simplified serialization failed for {uri}'s {key}"
assert '"' not in v and '\n' not in v, f"Simplified serialization failed for {self.uri}'s {key}"
entry += '\n - "'+v+'"'
ans.append(entry)

Expand Down Expand Up @@ -392,7 +392,7 @@ def do_pfx(uri:str) -> str:
if not uri.startswith('https://gedcom.io'): continue # not ours to define
if uri not in data: data[uri] = Concept('data type', uri)
data[uri].set('label', header)
if re.search(f'^{typename.replace(':','-')} +=', section, flags=re.M):
if re.search(f"^{typename.replace(':','-')} +=", section, flags=re.M):
data[uri].set('abnf_production', typename.replace(':','-'))
data[uri].spec.append(section)

Expand Down Expand Up @@ -436,7 +436,7 @@ def do_pfx(uri:str) -> str:

# step 1: read the files
src_gedstruct = open(Path(args.dest, 'grammar.gedstruct')).read()
src_markdown = '\n\n'.join(open(s).read().replace('\xA0',' ') for s in args.spec.glob('gedcom*.md'))
src_markdown = '\n\n'.join(open(s).read().replace('\xA0',' ') for s in sorted(args.spec.glob('gedcom*.md')))

# step 2: find all tables and convert them to {section header: [{column header: column value}]}
tables = all_tables(src_markdown)
Expand Down
1 change: 0 additions & 1 deletion build/hyperlink-code.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,6 @@ def anchorify(m):
return full

doc = re.sub(r'<code>(g7:[^<]*)</code></h', r'<code class="uri">\1</code></h', doc)
doc = re.sub(r'<code>(g7.1:[^<]*)</code></h', r'<code class="uri">\1</code></h', doc)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line should not be deleted


chunks = re.split(r'(<pre[^>]*ged(?:struct|com)[^>]*>.*?</pre>)', doc, flags=re.DOTALL)

Expand Down
6 changes: 1 addition & 5 deletions build/hyperlink.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,10 +20,6 @@ def slugify(bit):
si = bit.rfind('`g7:')+4
ei = bit.find('`', si)
slug = bit[si:ei].replace('#','-')
elif '`g7.1:' in bit:
si = bit.rfind('`g7.1:')+6
ei = bit.find('`', si)
slug = bit[si:ei].replace('#','-')
Comment on lines -23 to -26

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These lines should not be deleted

elif '`' in bit:
bit = re.search('`[A-Z0-9_`.]+`', bit)
slug = bit.group(0).replace('`','').replace('.','-')
Expand Down Expand Up @@ -91,7 +87,7 @@ def abnf(m):
slug = table_tags[m.group(1)]
return linkify(m.group(0), slug)
return m.group(0)
uried = re.sub(r'(?<![\[.`])`g7(?:\.1)?:[-A-Z0-9a-z`._#]+`', repl, line)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change should not be deleted

uried = re.sub(r'(?<![\[.`])`g7:[-A-Z0-9a-z`._#]+`', repl, line)
if istable: return uried
tagged = re.sub(r'(?<![\[.`])`[A-Z0-9`._#]+`', repl, uried)
abnfed = re.sub(r'(?<![\[.`])`([A-Za-z0-9]+)`', abnf, tagged)
Expand Down
11 changes: 9 additions & 2 deletions build/yaml-to-tsv.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,9 +40,10 @@
rows[(sup, uri)] = card
for sub,card in obj['substructures'].items():
if rows.get((uri, sub),card) != card:
raise Error(f"{uri} and {sub} disagree about their mutual cardinality")
raise Exception(f"{uri} and {sub} disagree about their mutual cardinality")
rows[(uri, sub)] = card
with open(Path(args.dest, "cardinalities.tsv"), 'w') as dst:
print('superstructure\tstructure\tcardinality', file=dst)
for row in sorted([k+(v,) for k,v in rows.items()]):
print('\t'.join(row), file=dst)

Expand All @@ -52,10 +53,11 @@
if obj['type'] == 'enumeration set':
for u in obj['enumeration values']:
rows.add((uri,u))
if 'vaue of' in obj:
if 'value of' in obj:
for u in obj['value of']:
rows.add((u,uri))
with open(Path(args.dest, "enumerationsets.tsv"), 'w') as dst:
print('set\tvalue', file=dst)
for row in sorted(rows):
print('\t'.join(row), file=dst)

Expand All @@ -66,6 +68,7 @@
if obj['type'] == 'structure' and 'enumeration set' in obj:
rows.add((uri, obj['enumeration set']))
with open(Path(args.dest, "enumerations.tsv"), 'w') as dst:
print('structure\tset', file=dst)
for row in sorted(rows):
print('\t'.join(row), file=dst)

Expand All @@ -76,6 +79,7 @@
if obj['type'] == 'structure':
rows.add((uri, obj['payload'] or ''))
with open(Path(args.dest, "payloads.tsv"), 'w') as dst:
print('structure\tpayload', file=dst)
for row in sorted(rows):
print('\t'.join(row), file=dst)

Expand All @@ -89,6 +93,8 @@
rows.add((sup, obj['standard tag'], uri))
for tag in obj.get('extension tags',[]):
rows.add((sup, tag, uri))
for tag in obj.get('nonconformant tags',[]):
rows.add((sup, tag, uri))
if len(obj['superstructures']) == 0:
if 'standard tag' in obj:
rows.add(('', obj['standard tag'], uri))
Expand All @@ -100,5 +106,6 @@
for tag in data[sub].get('extension tags',[]):
rows.add((uri, tag, sub))
with open(Path(args.dest, "substructures.tsv"), 'w') as dst:
print('superstructure\ttag\tstructure', file=dst)
for row in sorted(rows):
print('\t'.join(row), file=dst)
47 changes: 47 additions & 0 deletions changelog.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,50 @@
# Version 7.0.18

- Fixed typo in the ABNF for the Longitude data type introduced in 7.0.17. Because ABNF is machine-readable, this typo caused some applications using 7.0.17 to fail to parse valid data. 7.0.17 should not be used by applications utilizing the ABNF in the specification in their tooling.

# Version 7.0.17

- Add URI, Latitude, Longitude, and Tag definition data types.

Previously the formats permitted for these were specified in plain text with the corresponding structure types.
Those definitions have been moved to the data types section to better match how other data types are defined in the specification.

- Clarify the deprecation of older extensions that use non-underscore tags.

These violated the standard in both 7.0 and 5.5.1, but exist in the wild and there was unclear text "deprecating" them when they were never supported to begin with. That has been changed to be more clear about when it is an extension-defined substructure and when it violates the specification.

- Clarify how file paths encode non-ASCII characters.

- Clarify rules for pointer-based cycles:

- A cycle asserting someone is their own ancestor (such as being both the `CHIL` and `FAMS` of the same person) is unlikely to be correct, but is permitted by GEDCOM.

- A self-referential `ALIA` is (`INDI`.`AILA` pointing to the `INDI`) is meaningless and prohibited.

- A `SOUR`-`OBJE` cycle (the source of an image is the image itself) is meaningless and prohibited.

- Clarify that extension media types for notes (such as `text/markdown` that several applications are known to employ) do not require extension tags, being covered by the existing standard.

- Clarify the wording of the `ELECTRONIC` enumerated value.

- Clarify the wording of the `AGE` structure generally and `HUSB`.`AGE` and `WIFE`.`AGE` in particular.

- Add example of `PHRASE` used with a non-`OTHER` enumeration value.

- Update UUID defintion from RFC 4122 to RFC 9562

- Remove redundant and confusing references to RFC 3986, which were subsumed by existing references to WHATWG URL.

- Note that `FILE` payloads and GEDZIP file paths follow distinct standards, with the former using percent-escaping but the latter not.

- Note that GEDZIP inherits from zip the ability to have multiple levels of compression, with some suggestions on performance implications of the chosen compression level.

- Note that GEDZIP inherits from zip the ability to encrypt file contents, but not file names or sizes.

- Note how `ALIA` is known to be used by existing applications and users.

- Various typo corrections.

# Version 7.0.16

- Recommend that `ASSO` not be used to replicate other standard structures.
Expand Down
7 changes: 3 additions & 4 deletions specification/gedcom-0-introduction.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
---
title: The FamilySearch GEDCOM Specification
subtitle: 7.0.16
subtitle: 7.0.18
email: GEDCOM@FamilySearch.org
copyright: |
:::{style="page-break-after: always;page-break-before: always;"}
Copyright 1984–2025 Intellectual Reserve, Inc. All rights reserved. A service provided by The Church of Jesus Christ of Latter-day Saints.
Copyright 1984–2026 Intellectual Reserve, Inc. All rights reserved. A service provided by The Church of Jesus Christ of Latter-day Saints.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand All @@ -20,7 +20,7 @@ copyright: |

> NOTICE:
>
> This work comprises, is based on, or is derived from the FAMILYSEARCH GEDCOM™ Specification, © 1984-2025 Intellectual Reserve, Inc. All rights reserved.
> This work comprises, is based on, or is derived from the FAMILYSEARCH GEDCOM™ Specification, © 1984-2026 Intellectual Reserve, Inc. All rights reserved.
>
> "FAMILYSEARCH GEDCOM™" and "FAMILYSEARCH®" are trademarks of Intellectual Reserve, Inc. and may not be used except as allowed by the Apache 2.0 license that governs this work or as expressly authorized in writing and in advance by Intellectual Reserve, Inc.
:::
Expand Down Expand Up @@ -147,7 +147,6 @@ is shorthand for a URI beginning with the corresponding URI prefix
| Short Prefix | URI Prefix |
|:-------------|:------------------------------------|
| `g7` | `https://gedcom.io/terms/v7/` |
| `g7.1` | `https://gedcom.io/terms/v7.1/` |
| `xsd` | `http://www.w3.org/2001/XMLSchema#` |
| `dcat` | `http://www.w3.org/ns/dcat#` |

Expand Down
12 changes: 7 additions & 5 deletions specification/gedcom-1-hierarchical-container-format.md
Original file line number Diff line number Diff line change
Expand Up @@ -176,9 +176,9 @@ The tag `ADOP` is used in this document to represent two structure types.
Which one is meant can be identified by the superstructure type as follows:

| Superstructure type | Structure type identified by tag `ADOP` |
|---------------------|-----------------------------------------|
| `g7.1:record-INDI` | `g7:ADOP` |
| `g7:ADOP-FAMC` | `g7:FAMC-ADOP` |
Comment on lines -180 to -181

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes should not be deleted

|------------------|------------------|
| `g7:record-INDI` | `g7:ADOP` |
| `g7:ADOP-FAMC` | `g7:FAMC-ADOP` |

An [extension-defined substructure](#extensions) could also be used to place either of these structure types in extension superstructures.

Expand Down Expand Up @@ -319,7 +319,7 @@ Extensions cannot change existing meanings, cardinalities, or calendars.
A **tagged extension structure** is a structure whose tag matches production `extTag`. Tagged extension structures may appear as records or substructures of any other structure. Their meaning is defined by their tag, as is discussed more fully in the section [Extension Tags].

Any substructure of a tagged extension structure that uses a tag matching `stdTag` is an **extension-defined substructure**.
Substructures of an extension-defined substructure that uses a tag matching `stdTag` are also extension-defined substructures, but this specification deprecates using a `stdTag` with a definition that does not match any standard type with that tag.
Substructures of an extension-defined substructure that uses a tag matching `stdTag` are also extension-defined substructures.
The meaning and use of each extension-defined substructure is defined by the tagged extension structure it occurs within, not by its tag alone nor by this specification.

:::example
Expand All @@ -343,7 +343,9 @@ deprecated.
- Even though both `DATE`s appear to have `g7:type-DATE` payloads, we can't know that is the intended data type without consulting the defining specifications of `_LOC` and `_POP`, respectively. The first might be a `g7:type-DATE#period` and the second a `g7:type-DATE#exact`, for example.
:::

If an extension-defined substructure has a tag that is also used by one or more standard structures, its meaning and payload type should match at least one of those standard structure types.
Extension-defined substructures should match the structure type, payload, and substructure collection of at least one
standard type with the same tag, though it can add more substructures to the substructure collection.
This specification deprecates using a `stdTag` with a definition that does not match any standard type with that tag.

:::example
An extension-defined substructure with tag "`DATE`" should provide a date or date period relevant to its superstructure, as do all `DATE`-tagged structures in this specification. Extensions should not use "`DATE`" to tag a structure describing anything else (even something that might reasonably be abbreviated "date", such as someone an individual dated).
Expand Down
Loading