fastgedcom.parser

Functions to parse gedcom files into Document.

On module import, register the ansel and gedcom codecs from the ansel python library.

Module Contents

Classes

ParsingWarning

Base warning class.

LineParsingWarning

Warn about a line with a single word.

DuplicateXRefWarning

Warn about a cross-reference identifier that is defined twice.

LevelInconsistencyWarning

Warn about a line without correct parent line.

LevelParsingWarning

Warn about an unparsable line level. Failed to parse it to an integer.

EmptyLineWarning

Warn about an empty line.

CharacterInsteadOfLineWarning

Warn about the presents of a 1-character-long line.

Functions

parse(→ tuple[fastgedcom.base.Document, ...)

Parse the text input to create a

guess_encoding(→ str | None)

Return the guessed encoding of the file. None if unknown.

strict_parse(→ fastgedcom.base.Document)

Open and parse the gedcom file.

Attributes

IS_ANSEL_INSTALLED

fastgedcom.parser.IS_ANSEL_INSTALLED = False[source]
class fastgedcom.parser.ParsingWarning[source]

Base warning class.

class fastgedcom.parser.LineParsingWarning[source]

Bases: ParsingWarning

Warn about a line with a single word. There should be at least a line level and a tag.

line_number: int[source]
line_content: str[source]
class fastgedcom.parser.DuplicateXRefWarning[source]

Bases: ParsingWarning

Warn about a cross-reference identifier that is defined twice.

xref: fastgedcom.base.XRef[source]
class fastgedcom.parser.LevelInconsistencyWarning[source]

Bases: ParsingWarning

Warn about a line without correct parent line.

line_number: int[source]
line_content: str[source]
class fastgedcom.parser.LevelParsingWarning[source]

Bases: ParsingWarning

Warn about an unparsable line level. Failed to parse it to an integer.

line_number: int[source]
line_content: str[source]
class fastgedcom.parser.EmptyLineWarning[source]

Bases: ParsingWarning

Warn about an empty line.

line_number: int[source]
class fastgedcom.parser.CharacterInsteadOfLineWarning[source]

Bases: ParsingWarning

Warn about the presents of a 1-character-long line. This happens when the object parsed is an iterable on characters, whereas an iterable on lines is expected.

line_number: int[source]
fastgedcom.parser.parse(lines: Iterable[str]) tuple[fastgedcom.base.Document, list[ParsingWarning]][source]

Parse the text input to create a Document object.

When a malformed line is encountered, a warning is created and we pass continue with the next line. Only CharacterInsteadOfLineWarning stops the parsing. If other warnings occur, the parsing continues with the next line. For LevelInconsistencyWarning, the line is still inserted in the tree.

Return the Document and the list of ParsingWarning encountered.

fastgedcom.parser.guess_encoding(file: str | pathlib.Path) str | None[source]

Return the guessed encoding of the file. None if unknown.

A gedcom should precise its encoding in the header under the tag CHAR.

However, indication of that field are often misleading or incomplete. For example: - ANSEL refers to the gedcom version of the ansel charset. - The use of a BOM mark is recommended but not stated, and not automatically handled by Python. - UNICODE refers to UTF-16.

exception fastgedcom.parser.ParsingError[source]

Bases: Exception

Error raise by strict_parse().

exception fastgedcom.parser.NothingParsedError[source]

Bases: ParsingError

Raised by strict_parse() when the resulting document is empty.

exception fastgedcom.parser.MalformedError[source]

Bases: ParsingError

Raised by strict_parse() when there is warnings.

warnings: list[ParsingWarning][source]
fastgedcom.parser.strict_parse(file: str | pathlib.Path) fastgedcom.base.Document[source]

Open and parse the gedcom file. Return the Document representing the gedcom file.

Raise NothingParsed when the input is empty or isn’t gedcom. Raise MalformedError when an error occurs in the parsing process.