fastgedcom.parser

Functions to parse gedcom files into Document.

On module import, register the ansel and gedcom codecs from the ansel python library.

Module Contents

Classes

ParsingWarning

Base warning class.

LineParsingWarning

Warn about a line with a single word.

DuplicateXRefWarning

Warn about a cross-reference identifier that is defined twice.

LevelInconsistencyWarning

Warn about a line without correct parent line.

LevelParsingWarning

Warn about an unparsable line level.

EmptyLineWarning

Warn about an empty line.

CharacterInsteadOfLineWarning

Warn about the presents of a 1-character-long line.

Functions

parse(→ tuple[fastgedcom.base.Document, ...)

Parse the text input to create the

guess_encoding(→ str | None)

Return the guessed encoding of the file. None if unknown.

strict_parse(→ fastgedcom.base.Document)

Open and parse the gedcom file.

Attributes

IS_ANSEL_INSTALLED

fastgedcom.parser.IS_ANSEL_INSTALLED = False[source]
exception fastgedcom.parser.ParsingError[source]

Bases: Exception

Error raise by strict_parse().

exception fastgedcom.parser.NothingParsed[source]

Bases: ParsingError

Raised by strict_parse() when the resulting document is empty.

class fastgedcom.parser.ParsingWarning[source]

Base warning class.

class fastgedcom.parser.LineParsingWarning[source]

Bases: ParsingWarning

Warn about a line with a single word. There should be at least a line level and a tag.

line_number: int[source]
line_content: str[source]
class fastgedcom.parser.DuplicateXRefWarning[source]

Bases: ParsingWarning

Warn about a cross-reference identifier that is defined twice.

xref: fastgedcom.base.XRef[source]
class fastgedcom.parser.LevelInconsistencyWarning[source]

Bases: ParsingWarning

Warn about a line without correct parent line.

line_number: int[source]
class fastgedcom.parser.LevelParsingWarning[source]

Bases: ParsingWarning

Warn about an unparsable line level.

line_number: int[source]
class fastgedcom.parser.EmptyLineWarning[source]

Bases: ParsingWarning

Warn about an empty line.

line_number: int[source]
class fastgedcom.parser.CharacterInsteadOfLineWarning[source]

Bases: ParsingWarning

Warn about the presents of a 1-character-long line. This happens when the object parsed is an iterable on characters, whereas an iterable on lines is expected.

line_number: int[source]
fastgedcom.parser.parse(lines: Iterable[str]) tuple[fastgedcom.base.Document, list[ParsingWarning]][source]

Parse the text input to create the Document object.

List of possible ParsingWarning:

Only CharacterInsteadOfLineWarning stops the parsing. If other warnings occur, the parsing continues with the next line.

fastgedcom.parser.guess_encoding(file: str | pathlib.Path) str | None[source]

Return the guessed encoding of the file. None if unknown.

A gedcom should precise its encoding in the header under the tag CHAR.

However, indication of that field are often misleading or incomplete. For example: - ANSEL refers to the gedcom version of the ansel charset. - The use of a BOM mark is recommended, but not stated, and not automatically handled by Python. - UNICODE refers to UTF-16.

fastgedcom.parser.strict_parse(file: str | pathlib.Path) fastgedcom.base.Document[source]

Open and parse the gedcom file. Return the Document representing the gedcom file.

Raise ParsingError when an error occurs in the parsing process. Raise NothingParsed when the input is empty or isn’t gedcom.