Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed empty ADDR in Gedcom export #1619

Conversation

JeroenVanOort
Copy link

The Gedcom specs don't allow ADDR to be empty. Gramps however, does create ADDR lines, even if there is no street. I fixed that.

@ennoborg
Copy link
Contributor

I don't like this, because the information is still redundant, and there is no logical reason to include an ADDR with a PLAC, ever, even if the GEDCOM standard suggests otherwise.

ADDR records should only be exported when a user specifies a mailing address, and that is not part of our location tree. Synthesizing addresses is time and culture dependent, and should therefore be avoided.

@JeroenVanOort
Copy link
Author

I'm not sure I fully understand. So you are saying Gramps should never export ADDR lines?

I'm not an expert on the possibilities and limitations of Gramps and therefore wouldn't like to make a call on that. Also, I doubt if it would make what I'm trying to do in the PR have less of a value; my changes would at least make more of the exports Gramps makes compliant with GEDCOM specs. If there is a reason to make broader changes, I would consider that a separate topic.

But maybe I'm just not getting what you're trying to say.

@ennoborg
Copy link
Contributor

ennoborg commented Nov 17, 2023

Well, sort of. ADDR lines should be exported when an object has an address. I can specify an address for myself, in the address tab, and that shows a form with street, city, state, zip code, country, telephone number, etc. I can also do that for an archive location, where I see the exact same form.

An event location is not an address, but a single line of information that can and should be exported as a PLAC line. And in modern Gramps, that line is composed from the elements of the place hierarchy, so if there is a street address in there, like there is in my latest residence event, it will appear at the start of that line, where it will be followed by village, province, and country.

The code that I removed says that an optional ADDR structure can be written to the event detail, but most programs don't do that, and it is completely redundant, because it takes information that was already used to build the PLAC line. It also confused the software of Genealogie Online, and the code is also very bad, partly because it's way too American. An example is, that it looks for a city, so it won't export part of my address when I live in a village. And it will also not export a state when I live in a province.

Because of this, I am inclined to say that it's brain dead, and should be completely removed, also because it's redundant. It creates extra work for our friend Bob Coret, and it is useless, because it doesn't add information.

I know programs where it is actually used to add information, and one of those is RootsMagic. That program has a place database that lives outside your tree, and if you use that, the PLAC field will be filled with a hierarchy that starts with your city, town, or other locality. And if you want to add your street address, you are advised to enter that into the Place Details field, that will be exported as a single ADDR line. And when you do that, the ADDR part really adds value. Gramps' fictional ADDR doesn't do that, which is why I think that it should be removed, like I did in this commit:

93120fc

@JeroenVanOort
Copy link
Author

Ok, but how do we get either of our changes merged? Is master the correct target branch for is PR? And shouldn't your changes be in a PR too? I've never seen a commit like yours before: not on a branch, possibly on a fork but even then it's unusual that the commit is visible on the main project.

@ennoborg
Copy link
Contributor

It's been a long time since one of my PR's was accepted, but master is the right place. And I think that my commit was orphaned, because it was somewhere on my 5.1 branch, which I later merged with 5.1 code from upstream, after a pull rebase failed. There are some other changes on my 5.1 branch that didn't make it, like a hack that I made to accept our dd-mm-yyyy style for dates.

It might have been better if I put my changes on feature branches, so that they can be better integrated with the main repo, but I probably was either too lazy, or not git smart enough for that, and there are changes that I want, regardless of what the community thinks of them. And then, things can become a bit messy.

@Nick-Hall
Copy link
Member

Ok, but how do we get either of our changes merged?

A PR is the correct way to submit a change. Large changes should be discussed on the mailing list first, but that doesn't apply in this case.

The Gedcom specs don't allow ADDR to be empty.

The tag isn't actually empty because we use the CONT tag to continue the line. For example, "London, England" would be exported as:

ADDR
  CONT London
  CONT England
CITY London
CTRY England

It would probably be neater to export it as:

ADDR London
  CONT England
CITY London
CTRY England

I think that the reasoning to include both PLAC and ADDR structures was to make the export readable by as many programs as possible. I'm not a Gedcom expert though.

@ennoborg
Copy link
Contributor

ennoborg commented Nov 18, 2023

I think that the reasoning to include both PLAC and ADDR structures was to make the export readable by as many programs as possible. I'm not a Gedcom expert though.

OK, I can understand the idea, but the code that we have does more bad than good, and Jeroen is the 2nd person who approached the Genealogie Online help desk for this, because it leads to duplicate information after import, which is also incomplete, because the ADDR exporter only exports the location elements that are defined as street, city, state, and country, so it ignores towns and villages, and provinces, and so forth. And I know that, when needed, the site owner puts a lot of effort into making sure that GEDCOM files exported by its users' programs are displayed in a proper fashion.

I can understand part of the reasoning when I read the GEDCOM 3 specification, released in Ocrober 1987. which describes the purpose of the ADDR structure as:

Where a person or has resided, does reside,
or receives mail. This includes the street, city,
state, zip code, and any other information needed
for identification. All of this information could be
placed on one line and separated by commas. or
separate lines if a CONTinuation arrangement is
desired.

The definition has expanded a bit, and now includes specific tags for all sorts of things, including phone, email, and www, and today it is even more focused on the idea of a contact address, which is proven by this:

The address structure should be formed as it would appear on a mailing label using the ADDR and
the CONT lines to form the address structure.

And with only the place information at our disposal, as it exists in our database now, there is no way to form an address as it would appear on a mailing label, because that appearance is culture dependent. And this is largely why the GEDCOM definition is still based on lines that are filled by the user, who is assumed to follow the cultural habits of the addressant.

There is a bit of a caveat here, because that same GEDCOM 3 definition says this about places:

Where an event occurred. This includes the city,
county, state, or any other information needed for
identification such as parish, ward, diocese, stake,
or other identifier or equivalent that may be found
in the source document. An address that might be
included in a source record should be identified
with the ADDRess tag.

And that could lead to the suggestion that when PLAC records are restricted to populated places, and looked up in a gazetteer like in RootsMagic, they won't include a street address, and that deficiency could then be repaired by attaching that street address to the head of the PLAC line, which is sort of what RootsMagic's GEDCOM export format expects.

This is not needed in Gramps however, because the street address is automatically included in the PLAC line if it's embedded in a proper hierarchy.

And in theory, the whole problem should be solved by the PLAC.FORM structure, which very few programs actually use.

Note that Gramps has no good support for the ADDR structure now, because what we have is already more structured than the standard can handle. And at the same time, it lacks the free format features that are needed to form an address as it would appear on a mailing label.

@JeroenVanOort
Copy link
Author

The Gedcom specs don't allow ADDR to be empty.

The tag isn't actually empty because we use the CONT tag to continue the line. For example, "London, England" would be exported as:

ADDR
  CONT London
  CONT England
CITY London
CTRY England

This isn't true, at least not on the master branch. Case and point: https://github.com/gramps-project/gramps/blob/master/data/tests/exp_sample_ged.ged#L629

According to GED-inline this is not allowed:

*** Line 629: Invalid content for ADDR tag: '' missing value for <ADDRESS_LINE>

Full report
Generated by                   Gramps
Submitted by                   Alex Roitman,,,
Encoding                       UTF-8
GEDCOM version in file         5.5.1
GEDCOM version assumed         5.5.1

Analysis time                  0 seconds to analyse the file (excluding upload time)
Speed                          792 records per second

Lines                    1485  Number of lines in the GEDCOM file
Records                   107  Number of records
Warnings                   47  Total number of warning messages
User-defined               10  Number of lines with user-defined tags

Individuals                52  Number of individuals in the GEDCOM file
Males                      28  Number of males
Females                    23  Number of females
Other                       1  

Families                   16  Number of families
Marriages                  14  Number of marriages
Places                     85  Number of places mentioned (not necessarily unique)
Source records              8  Number of source records

*** Line 78:      Invalid content for ADDR tag: '' missing value for <ADDRESS_LINE>
*** Line 134:     Invalid content for ADDR tag: '' missing value for <ADDRESS_LINE>
*** Line 230:     Invalid content for ROLE tag: 'The Event type Role' is not a valid <ROLE_IN_EVENT>
*** Line 629:     Invalid content for ADDR tag: '' missing value for <ADDRESS_LINE>
*** Line 651:     Invalid content for ADDR tag: '' missing value for <ADDRESS_LINE>
*** Line 743:     Invalid content for AGE tag: '23' is not a valid <AGE_AT_EVENT>
*** Line 872:     Tag EVEN has non empty content: A very bad day
*** Line 880:     Invalid content for STAT tag: 'INFANT' is not a valid <LDS_BAPTISM_DATE_STATUS>
*** Line 880:     Mandatory tag DATE not found under STAT
*** Line 887:     Invalid content for ADDR tag: '' missing value for <ADDRESS_LINE>
*** Line 887:     Tag CONT occurs too many times, maximum is 3 times
*** Line 926:     Tag PHON is not allowed under INDI
*** Line 927:     Tag PHON is not allowed under INDI
*** Line 928:     Tag EMAIL is not allowed under INDI
*** Line 929:     Tag FAX is not allowed under INDI
*** Line 930:     Tag WWW is not allowed under INDI
*** Line 958:     Invalid content for DATE tag: 'I think 1970 to 1971' is not a valid <DATE_VALUE>
*** Line 965:     Tag MAP is not allowed under PLAC
*** Line 968:     Tag ADDR is not allowed under SLGC
*** Line 978:     Tag FORM is not allowed under OBJE
*** Line 994:     Tag TIME is not allowed under DATE
*** Line 997:     Tag HUSB is not allowed under BIRT
*** Line 999:     Tag WIFE is not allowed under BIRT
*** Line 1002:    Invalid content for OCCU tag: '' missing value for <OCCUPATION>
*** Line 1013:    Invalid content for RESN tag: '' is not a valid <RESTRICTION_NOTICE>
*** Line 1023:    Tag MAP is not allowed under PLAC
*** Line 1026:    Tag ADDR is not allowed under BAPL
*** Line 1030:    Invalid content for STAT tag: 'QUALIFIED' is not a valid <LDS_BAPTISM_DATE_STATUS>
*** Line 1030:    Mandatory tag DATE not found under STAT
*** Line 1036:    Tag CONT occurs too many times, maximum is 3 times
*** Line 1048:    Tag FORM is not allowed under OBJE
*** Line 1050:    Mandatory tag FORM not found under FILE
*** Line 1087:    Invalid content for PEDI tag: 'stepchild' is not a valid <PEDIGREE_LINKAGE_TYPE>
*** Line 1098:    Invalid content for PEDI tag: 'Sponsored' is not a valid <PEDIGREE_LINKAGE_TYPE>
*** Line 1281:    Tag ADDR is not allowed under SLGS
*** Line 1283:    Invalid content for STAT tag: 'CLEARED' is not a valid <LDS_SPOUSE_SEALING_DATE_STATUS>
*** Line 1283:    Mandatory tag DATE not found under STAT
*** Line 1286:    Invalid content for AGE tag: '45' is not a valid <AGE_AT_EVENT>
*** Line 1288:    Invalid content for AGE tag: '15' is not a valid <AGE_AT_EVENT>
*** Line 1293:    Invalid content for EVEN tag: '' missing value for <EVENT_DESCRIPTOR>
*** Line 1296:    Tag RFN is not allowed under FAM
*** Line 1297:    Tag FACT is not allowed under FAM
*** Line 1299:    Tag FACT is not allowed under FAM
*** Line 1389:    Tag CONT occurs too many times, maximum is 3 times
*** Line 1421:    Mandatory tag NAME not found under REPO
*** Line 1422:    Mandatory tag NAME not found under REPO
*** Line 1478:    Invalid content for FILE tag: 'd:\users\prc\documents\gramps\data\tests\O0.jpg' is more than 30 characters, the maximum length for <MULTIMEDIA_FILE_REFERENCE>


Report generated on 20-11-2023 at 09:18 by GED-inline 3.1.2

@Nick-Hall Nick-Hall added the bug label Feb 6, 2024
@Nick-Hall
Copy link
Member

This PR fails to export any address which does not have a street. In such cases I think that we should export the ADDR structure as follows:

ADDR London
  CONT England
CITY London
CTRY England

There is duplication of the information, but the extra tags could be useful for some applications.

For places I tend to agree with Enno, but perhaps we should keep the ADDR tag to export a street only if one exists. This should keep compatibility with applications such as RootsMagic. We can probably remove all ADDR sub-tags for places and possibly use PLAC.FORM instead.

I'd also like to get the opinions of developers such as @prculley and @kulath.

@ennoborg
Copy link
Contributor

ennoborg commented Feb 7, 2024

From here, I think it's nice if we can process the RootsMagic ADDR on import, but exporting ADDR lines derived from location types is a big NO for me, because it's against the current GEDCOM standard, and it will confuse most other pograms, including the big sites like Ancestry, Geneanet, My Heritage, and our national Genealogie Online.

RM is the only program that I know that has a way with it.

@Nick-Hall
Copy link
Member

Replaced with PR #1705. I think that the new PR solves the issues raised here.

@Nick-Hall Nick-Hall closed this Apr 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants