Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 2 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,5 @@
<p align="center">
<img alt="ArDoCo" src="assets/img/logo.png" height="210"/>
<img alt="ARDoCo" src="assets/img/logo.png" height="210"/>
</p>

# [ArDoCo - Architecture Documentation Consistency](https://github.com/ArDoCo)

In this research project, we aim to provide consistency analyses between different kind of documentation, namely formal models and informal (textual) documentation.
# [ARDoCo - Automating Requirements and Documentation Comprehension](https://github.com/ardoco)
17 changes: 17 additions & 0 deletions _approaches/arcotl.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
---
title: ArCoTL
description: ArCoTL – TLR between Software Architecture Models and Code.
permalink: /approaches/arcotl/
importance: 2
layout: approach
---

![ArCoTL Overview](/assets/img/approaches/icse24-transarc.svg){:width="100%" style="background-color: white; border-radius: 8px; padding: 10px; display: block; margin: 0 auto;"}

ArCoTL (Architecture–Code Trace Links) focuses on linking a given architecture model (SAM) to the source code.
It assumes you have a formal model of the system's components and interfaces, and wants to find the corresponding code.
ArCoTL transforms both the architecture model and the code into intermediate representations (e.g. simplified graphs) and then applies various heuristics to match elements
These heuristics include standalone rules and dependent rules (which consider relationships) plus filters to refine the links.

- How it works: Starting from a SAM and the codebase, ArCoTL builds simplified model and code representations. It then uses text similarity, naming conventions, and dependency heuristics to propose links between each model component and code artifact.
- Effectiveness: ArCoTL turned out to be very effective on its own. In experiments, the model-to-code step (ArCoTL) achieved an average F1 of ~0.98.
18 changes: 18 additions & 0 deletions _approaches/ardocode.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
---
title: ArDoCode
description: ArDoCode – TLR between Software Architecture Documentation and Code.
permalink: /approaches/ardocode/
importance: 5
layout: approach
---

![ArCoTL Overview](/assets/img/approaches/icse24-ardocode.svg){:width="100%" style="border-radius: 8px; padding: 10px; display: block; margin: 0 auto;"}

ArDoCode is a simpler variant of trace recovery that treats source code itself as the "model".
Instead of first building a formal model, ArDoCode directly matches architecture document content with code elements using the same heuristics designed for linking docs to models.
In practice, it extracts key terms from the documentation and tries to align them with names in the code (e.g. class or module names) as if the code were the model.

- Key idea: Apply the SWATTR approach without an explicit SAM by interpreting the codebase as a model. For example, if the doc mentions a component "WebUI" and there is a WebUI package in code, ArDoCode will link them.
- Effectiveness: Because it skips the formal modeling step, ArDoCode is easier to apply but less precise. In evaluations, ArDoCode achieved a weighted F1 of only ~0.62, substantially lower than the full TransArC method. It serves mainly as a baseline and demonstrates that without structured models, the TLR performance drops.

See our [ICSE 2024 publication page](/c/icse24) for details, links, and resources.
22 changes: 22 additions & 0 deletions _approaches/inconsistency-detection.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
---
title: Inconsistency Detection
description: Documentation-Model-Inconsistency-Analysis pipeline.
permalink: /approaches/inconsistency-detection/
importance: 8
layout: approach
---

![Approach Overview](/assets/img/approaches/icsa23-inconsistency.svg){:width="100%" style="background-color: white; border-radius: 8px; padding: 10px; display: block; margin: 0 auto;"}

The ArDoCo inconsistency detection approach uses trace link recovery to detect inconsistencies between natural-language architecture documentation and formal models.
It identifies two kinds of issues:

(a) Unmentioned Model Elements (UMEs): components or interfaces that appear in the model but are never described in the documentation;
(b) Missing Model Elements (MMEs): elements mentioned in the text that do not exist in the model.

The method runs a TLR procedure (namely SWATTR) and then flags any model element with no corresponding text link (a UME) or any sentence that refers to a non-modeled item (an MME).

- Detection strategy: Use the TLR results as a bridge. After linking as many sentences to model elements as possible, any "orphan" model nodes or text mentions indicate a consistency gap. For example, if the model has a "Cache" component with no sentence linked, that is an UME; if the doc talks about "Common" but the model lacks it, that is an MME.
- Results: The approach achieved an excellent F1 (0.81) for the underlying trace recovery. For inconsistency detection, it attained ~93% accuracy in identifying UMEs and ~75% for MMEs, significantly better than naive baselines. These results suggest that using trace links is a promising way to find documentation-model mismatches.

See our [ICSA 2023 publication page](/c/icsa23) for details, links, and resources.
19 changes: 19 additions & 0 deletions _approaches/lissa.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
---
title: LiSSA
description: LiSSA – LLM/RAG-based TLR.
permalink: /approaches/lissa/
importance: 6
layout: approach
---

![LiSSA Overview](/assets/img/approaches/icse25-lissa.svg){:width="100%" style="background-color: white; border-radius: 8px; padding: 10px; display: block; margin: 0 auto;"}

LiSSA (Linking Software System Artifacts) is a retrieval-augmented, LLM-based approach that aims to be generic across artifact types.
The key idea is to use a Large Language Model (LLM) together with information retrieval (IR) to find trace links.
For a given source artifact (e.g. a requirement or a sentence in documentation), LiSSA first uses IR techniques to retrieve a small set of potentially relevant target artifacts (code files, model elements, etc.).
It then queries the LLM with the retrieved context to generate or suggest the most likely trace link.

- Scope: LiSSA was tested on multiple tasks including requirements→code, documentation→code, and architecture-docs→models. The same RAG process is applied in each case, making it a one-size-fits-many solution.
- Effectiveness: In experiments, LiSSA significantly outperformed state-of-the-art tools on the code-centric tasks. For example, it showed much higher accuracy when linking requirements to code than prior methods.

LiSSA is primarily associated with our [ICSE 2025 publication page](/c/icse25), but is also related to our [REFSQ 2025 publication page](/c/refsq25). See these pages for details, links, and resources.
9 changes: 9 additions & 0 deletions _approaches/secdragon.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
---
title: SecDragon
description: SecDragon – TLR for Security Requirements.
permalink: /approaches/secdragon/
importance: 7
layout: page
---

🚧 This approach is not available yet.
20 changes: 20 additions & 0 deletions _approaches/swattr.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
---
title: SWATTR
description: SWATTR – TLR between Software Architecture Documentation and Software Architecture Models.
permalink: /approaches/swattr/
importance: 1
layout: approach
---

![SWATTR Overview](/assets/img/approaches/ecsa21-swattr.svg){:width="100%" style="border-radius: 8px; padding: 10px; display: block; margin: 0 auto;"}

SWATTR (SoftWare Architecture TexT TRace link recovery) is an agent-based framework for linking textual architecture documentation (SAD) and formal models (SAM).
Rather than focusing on a single algorithm, SWATTR defines a pipeline with multiple stages where different "agents" can operate.
First it extracts and preprocesses text from the SAD and components from the architecture model.
Next, it uses NLP and heuristics to identify architecture elements (like component names) mentioned in the text.
Finally, it connects these identified text elements to model elements to form trace links.

- Pipeline stages: The framework is extendable, meaning you can plug in different strategies at each step. For example, one agent might use term matching to find components in sentences, while another uses more advanced similarity measures. All results are aggregated to produce the final links.
- Results: SWATTR was evaluated on three case studies and achieved a weighted average F1-score of about 0.72 for trace recovery. This was a strong performance (outperforming simple baselines by ~0.24 F1) and demonstrated the benefit of the multi-stage approach.

See our [ECSA 2021 publication page](/c/ecsa21) for details, links, and resources.
19 changes: 19 additions & 0 deletions _approaches/transarc.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
---
title: TransArC
description: TransArC – TLR between Software Architecture Documentation, Models, and Code.
permalink: /approaches/transarc/
importance: 3
layout: approach
---

![TransArC Overview](/assets/img/approaches/icse24-transarc.svg){:width="100%" style="background-color: white; border-radius: 8px; padding: 10px; display: block; margin: 0 auto;"}

TransArC is a transitive trace link recovery approach that connects architecture documents to code via an intermediate architecture model.
It first uses an existing method (SWATTR) to connect the textual architecture documentation and component-based architecture model (SAM), then applies a new method (ArCoTL) to link the model elements to code.
In other words, TransArC builds a bridge: document ⟶ model ⟶ code.
This two-step strategy helps bridge the semantic gap between informal text and code.

- How it works: TransArC extracts combines the two link sets of trace links, namely SWATTR and ArCoTL, to produce trace links transitively from documentation to code.
- Results: In experiments on five systems, TransArC achieved a high average F1 score (~0.82) for recovering documentation-to-code links, significantly outperforming baseline methods. This shows that combining the two specialized steps yields much more accurate links than simpler approaches.

See our [ICSE 2024 publication page](/c/icse24) for details, links, and resources.
20 changes: 20 additions & 0 deletions _approaches/transarcai.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
---
title: "TransArC-AI"
description: "TransArC-AI – LLM-based TLR between Software Architecture Documentation, Models, and Code."
permalink: /approaches/transarc-ai/
importance: 4
layout: approach
---

![TransArC-AI Overview](/assets/img/approaches/icsa25-transarc.svg){:width="100%" style="background-color: white; border-radius: 8px; padding: 10px; display: block; margin: 0 auto;"}

TransArC-AI extends the TransArC idea by using an LLM to generate a simple architecture mode (SAM).
In this approach, instead of requiring a hand-made SAM, a large language model (such as GPT-4) is prompted to extract or invent the main component names from the SAD (and optionally from code).
These names serve as a minimal architecture model (i.e. a list of components).
Then, as in TransArC, these LLM-derived components are matched to code.
The goal is to bridge the SAD–code gap without manual modeling.

- How it works: Given the software architecture text and the codebase, the system asks the LLM to list likely component names. That list of names forms a "Simple Software Architecture Model" (SSAM). Finally, code elements with matching names or descriptions are linked to the documentation. This pipeline avoids needing an explicit UML model.
- Effectiveness: TransArC-AI achieved very competitive results. Using GPT-4o, it obtained a weighted F1 of about 0.86, nearly as good as the original TransArC with a hand-made model (F1 0.87). It also substantially outperformed the ArDoCode baseline (which scored ~0.62). This shows that LLMs can automatically infer the key architectural components.

See our [ICSA 2025 publication page](/c/icsa25) for details, links, and resources.
11 changes: 11 additions & 0 deletions _approaches/tv.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
---
title: ARDoCo-TV
description: "Trace View: a viewer for trace links."
permalink: /approaches/tv/
importance: 9
layout: page
---

ARDoCo-TV is a tool for visualizing trace links between software artifacts, supporting the analysis and understanding of traceability in software projects.

See our [ARDoCo TV](https://tv.ardoco.de) for more information.
6 changes: 4 additions & 2 deletions _pages/conferences/aire25.md → _conferences/aire25.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,13 @@ authors:
- stefan_schwedt
- jan_keim
- tobias_hey
approaches:
- LiSSA
---

To be published at the [33rd International Requirements Engineering Conference Workshops (REW)](https://aire-ws.github.io/aire25/).

![Approach Overview](/assets/img/aire-approach.svg){:width="100%" style="background-color: white; border-radius: 8px; padding: 10px; display: block; margin: 0 auto;"}
![AIRE25 Overview](/assets/img/approaches/aire25-aire.svg){:width="100%" style="background-color: white; border-radius: 8px; padding: 10px; display: block; margin: 0 auto;"}

## Abstract

Expand All @@ -39,4 +41,4 @@ Moreover, it provides insights into the performance of traditional IR techniques
## Links

- Paper on [KITopen](https://publikationen.bibliothek.kit.edu/1000183058)
- Replication Package on [Zenodo](https://doi.org/10.5281/zenodo.15837231) and the corresponding [GitHub repository](https://github.com/ArDoCo/Replication-Package-AIRE25_Beyond-Retrieval-Using-LLM-Ensembles-for-Candidate-Filtering-in-Req-TLR)
- Replication Package on [Zenodo](https://doi.org/10.5281/zenodo.15837231) and the corresponding [GitHub repository](https://github.com/ardoco/Replication-Package-AIRE25_Beyond-Retrieval-Using-LLM-Ensembles-for-Candidate-Filtering-in-Req-TLR)
6 changes: 5 additions & 1 deletion _pages/conferences/ecsa21.md → _conferences/ecsa21.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,14 @@ authors:
- claudius_kocher
- janek_speit
- anne_koziolek
approaches:
- SWATTR
---

Published at the [15th European Conference on Software Architecture (ECSA 2021), September 13-17 2021](https://conf.researchr.org/home/ecsa-2021)

![SWATTR Overview](/assets/img/approaches/ecsa21-swattr.svg){:width="100%" style="border-radius: 8px; padding: 10px; display: block; margin: 0 auto;"}

## Abstract

Software Architecture Documentation often consists of different artifacts.
Expand All @@ -32,5 +36,5 @@ Moreover, our approach outperforms the baseline approaches on non-weighted avera
## Links

- Paper on [Springer Link](https://doi.org/10.1007/978-3-030-86044-8_7) and on [KITopen](https://doi.org/10.5445/IR/1000138399)
- Replication Package on [Zenodo](https://doi.org/10.5281/zenodo.4730621) and the corresponding [GitHub repository](https://github.com/ArDoCo/SWATTR)
- Replication Package on [Zenodo](https://doi.org/10.5281/zenodo.4730621) and the corresponding [GitHub repository](https://github.com/ardoco/SWATTR)
- [Slides](/assets/pdf/presentation_21_ecsa_TLR.pdf)
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,7 @@ authors:
- tobias_hey
---

<p align="center">
<img alt="ArDoCo" src="/assets/img/titleslide-fg-arch24.png" width="100%"/>
</p>
![FGARCH24 Titleslide](/assets/img/approaches/fgarch24-titleslide.png){:width="100%" style="background-color: white; border-radius: 8px; padding: 10px; display: block; margin: 0 auto;"}

Vortrag bei der Jahrestagung der GI-Fachgruppe "Architekturen" am 24. und 25. Oktober 2024 in Paderborn.

Expand Down
7 changes: 5 additions & 2 deletions _pages/conferences/icsa23.md → _conferences/icsa23.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,16 @@ authors:
- sophie_corallo
- dominik_fuchss
- anne_koziolek
approaches:
- SWATTR
- "Inconsistency Detection"
---

Published at the [20th IEEE International Conference on Software Architecture (ICSA 2023), March 13-17 2023](https://icsa-conferences.org/2023/).

Additional presentation at the [Software Engineering 2024 (SE24)](https://se2024.se.jku.at/), the symposium of the German Computer Science Society (Gesellschaft für Informatik (GI)) together with the Austrian Computer Society.

![Approach Overview](/assets/img/approach_overview_icsa23.svg){:width="100%"}
![Inconsistency Detection Overview](/assets/img/approaches/icsa23-inconsistency.svg){:width="100%" style="background-color: white; border-radius: 8px; padding: 10px; display: block; margin: 0 auto;"}

## Abstract

Expand All @@ -24,6 +27,6 @@ Documenting software architecture is important for a system’s success. Softwar
## Links

- Paper on [IEEE Xplore](https://doi.org/10.1109/ICSA56044.2023.00021) and on [KITopen](https://doi.org/10.5445/IR/1000158208)
- Replication Package on [Zenodo](https://doi.org/10.5281/zenodo.7555194) and the corresponding [GitHub repository](https://github.com/ArDoCo/DetectingInconsistenciesInSoftwareArchitectureDocumentationUsingTraceabilityLinkRecovery)
- Replication Package on [Zenodo](https://doi.org/10.5281/zenodo.7555194) and the corresponding [GitHub repository](https://github.com/ardoco/DetectingInconsistenciesInSoftwareArchitectureDocumentationUsingTraceabilityLinkRecovery)
- [Slides (ICSA23)](/assets/pdf/presentation_23_ICSA_InconsistencyDetection.pdf)
- [Slides (SE24)](/assets/pdf/presentation_24_SE_InconsistencyDetection.pdf)
7 changes: 5 additions & 2 deletions _pages/conferences/icsa25.md → _conferences/icsa25.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,14 @@ authors:
- tobias_hey
- jan_keim
- anne_koziolek
approaches:
- TransArC-AI
- TransArC
---

Published at the [22nd IEEE International Conference on Software Architecture (ICSA 2025), March 31 - April 04 2025](https://conf.researchr.org/home/icsa-2025/).

![Approach Overview](/assets/img/icsa25-approach.svg){:width="100%" style="background-color: white; border-radius: 8px; padding: 10px; display: block; margin: 0 auto;"}
![TransArC-AI Overview](/assets/img/approaches/icsa25-transarc.svg){:width="100%" style="background-color: white; border-radius: 8px; padding: 10px; display: block; margin: 0 auto;"}

## Abstract

Expand All @@ -34,5 +37,5 @@ In summary, our approach shows that LLMs can be used to make TLR between SAD and
## Links

- Paper on [KITopen](https://publikationen.bibliothek.kit.edu/1000179830)
- Replication Package on [Zenodo](https://doi.org/10.5281/zenodo.14506935) and the corresponding [GitHub repository](https://github.com/ArDoCo/ReplicationPackage-EnablingArchitectureTraceabilitybyLLM-basedArchitectureComponentNameExtraction)
- Replication Package on [Zenodo](https://doi.org/10.5281/zenodo.14506935) and the corresponding [GitHub repository](https://github.com/ardoco/ReplicationPackage-EnablingArchitectureTraceabilitybyLLM-basedArchitectureComponentNameExtraction)
- Slides as [pptx](/assets/pdf/presentation_icsa25.pptx) or [pdf](/assets/pdf/presentation_icsa25.pdf)
11 changes: 7 additions & 4 deletions _pages/conferences/icse24.md → _conferences/icse24.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,15 +11,18 @@ authors:
- tobias_hey
- tobias_telge
- anne_koziolek
approaches:
- TransArC
- ArCoTL
- SWATTR
- ArDoCode
---

Published at the [46th International Conference on Software Engineering (ICSE 2024), April 14-20 2024](https://conf.researchr.org/home/icse-2024).

Additional presentation at the [Software Engineering 2025 (SE25)](https://se2025.sdq.kastel.kit.edu/), the symposium of the German Computer Science Society (Gesellschaft für Informatik (GI)).

<p align="center">
<img src="/assets/img/approach_overview_icse24.svg" alt="Approach Overview"/>
</p>
![TransArC Overview](/assets/img/approaches/icse24-transarc.svg){:width="100%" style="background-color: white; border-radius: 8px; padding: 10px; display: block; margin: 0 auto;"}

## Abstract

Expand All @@ -44,6 +47,6 @@ In future research, we will explore further possibilities for such transitive ap
## Links

- Paper (Open Access) on [ACM](https://doi.org/10.1145/3597503.3639130) or [KITopen](https://doi.org/10.5445/IR/1000165692)
- Replication Package on [Zenodo](https://doi.org/10.5281/zenodo.10411853) and the corresponding [GitHub repository](https://github.com/ArDoCo/Replication-Package-ICSE24_Recovering-Trace-Links-Between-Software-Documentation-And-Code)
- Replication Package on [Zenodo](https://doi.org/10.5281/zenodo.10411853) and the corresponding [GitHub repository](https://github.com/ardoco/Replication-Package-ICSE24_Recovering-Trace-Links-Between-Software-Documentation-And-Code)
- Slides as [pptx](/assets/pdf/presentation_icse24.pptx) or [pdf](/assets/pdf/presentation_icse24.pdf)
- [Slides (SE25)](/assets/pdf/presentation_25_SE_TransArC.pdf)
Loading