Knowledge Graph becomes the foundation for Total Airport Management. An analysis of how the combination of LLM and Knowledge Engineering addresses the issues of data and traceability.
The problem does not manifest immediately — until the airport begins to operate as a set of isolated systems. Operational data is distributed among departments, terminology is not standardized, and processes are described in disparate documents. In such conditions, even basic synchronization among participants (pilots, dispatchers, ground services) becomes a source of errors. The criticality of this issue is heightened by the requirements for data provenance — every decision must be verifiable and linked to a source. This is where the Knowledge Graph serves as a unified layer of semantics, but its construction traditionally relies on manual work and poor scalability.
The authors propose a compromise architecture: scaffolded symbolic fusion, where a formal ontology (Knowledge Engineering) constrains the probabilistic behavior of LLM. At its core is a predefined schema based on the NASA ATM ontology, which defines classes and relationships. LLM is used not as an autonomous generator but as a mechanism for extracting triplets (entity–relation–entity) strictly tied to this schema. For this purpose, LangExtract is employed — a library that combines few-shot prompting with a rigid output structure and mandatory linkage to the source text. An important detail is the dual mechanism: probabilistic generation + deterministic verification through string matching (text.find, SequenceMatcher). This eliminates the typical problem of LLM — the “black box” without explainability.
A key experiment concerns the context window. Two modes are tested: segment-level (page-level) and full document (document-level). Industry expectations suggest a degradation in quality with long contexts due to “lost-in-the-middle.” However, the results show the opposite. When processing a full document, Precision reaches 0.967, Recall — 0.982, F1 — 0.975. The number of missed connections (FN) decreases from 13 to 8. The reason is the nonlinear nature of processes: in A-CDM, dependencies are often broken throughout the text, and only a long context allows for the restoration of causal chains. Segmentation, on the other hand, disrupts these connections. This is an important signal for systems with process logic: local optimization of attention deteriorates global integrity.
An additional layer is traceability. Each extracted triplet is linked to a specific sentence in the source. All extractions successfully pass the text conformity check. This is achieved through a hybrid approach: LLM generates candidates, but the final fixation occurs only upon confirmation through string matching (exact or fuzzy). Interestingly, most errors (FP) arise specifically in fuzzy matches, which is expected — the further from the literal text, the higher the risk of hallucinations. Nevertheless, the system maintains strict verifiability, which is critical for safety-critical domains.
Practical application extends beyond the Knowledge Graph itself. The constructed graph is automatically transformed into swimlane diagrams, where each step of the process is linked to a specific stakeholder. The algorithm uses a modified topological traversal (BFS) to restore the order of operations and distributes them across “lanes” of responsibility. This addresses a typical problem: the KG itself is machine-readable but poorly perceived by humans. The automatic generation of visual artifacts bridges this gap and makes the data suitable for operational analysis and training.
For the industry, this appears as a pragmatic path for integrating LLM into strict domains. Fully automated systems without constraints currently do not provide the required reliability. However, the combination of formal ontology, managed prompts, and deterministic validation offers a balance between scalability and accuracy. It is also important to consider the impact of long context: if processes are nonlinear, document-level inference may be preferable despite the cost.
Limitations remain. The architecture requires an initial ontology and curated examples. The task of extracting knowledge from multimodal sources is also not fully resolved — it has been marked as the next step (video, telemetry). However, even now, the approach shows that the Knowledge Graph can become not just a repository but an operational layer for TAM — provided that each connection is explainable and verifiable.
Information source
arXiv is the largest open preprint repository (since 1991, under the auspices of Cornell), where researchers quickly post working versions of papers; the materials are publicly accessible but do not undergo full peer review, so results should be considered preliminary and, where possible, checked against updated versions or peer‑reviewed journals. arxiv.org