Lineage, Traceability, and Reproducibility as Reliability Requirements in Enterprise AI Systems
Main Article Content
Abstract
Artificial intelligence is being applied to key business and compliance choices by more systems in the enterprise. One of the most common systems is concerned with the accuracy of the model and does not factor in the reliability aspect, like the lineage or traceability, or reproducibility. In this paper, we obtain these three aspects as fundamental reliability expectations of enterprise AI. The study was a real enterprise AI applied in 12 months with a before and after quantitative design. Lineage coverage, version control and reproducibility controls were introduced thus, the lineage coverage rose to 0.91 and the success of reproducibility rose to 92% after these tools were applied on structured lineage. Rapid time to incident investigation was less by 66%, audit preparation was also less by 62% and compliance findings were also less by 75%. Monte Carlo simulation also indicated that the risk variability was smaller when the lineage controls had been incorporated. This observation is in full agreement with the results that indicated that integrating lineage, traceability, and reproducibility into AI platforms enhances reliability, audit readiness, and trust in AI results.
Article Details
Section
How to Cite
References
[1] Mora-Cantallops, M., Sánchez-Alonso, S., García-Barriocanal, E., & Sicilia, M. (2021). Traceability for Trustworthy AI: A review of Models and tools. Big Data and Cognitive Computing, 5(2),20. https://doi.org/10.3390/bdcc5020020
[2] Souza, R., Azevedo, L., Lourenço, V., Soares, E., Thiago, R., Brandão, R., Civitarese, D., Brazil, E. V., Moreno, M., Valduriez, P., Mattoso, M., Cerqueira, R., & Netto, M. a. S. (2019, October 9). Provenance data in the machine learning lifecycle in computational science and engineering. arXiv.org. https://arxiv.org/abs/1910.04223
[3] Magagna, B., Goldfarb, D., Martin, P., Atkinson, M., Koulouzis, S., & Zhao, Z. (2020). Data provenance. In Lecture notes in computer science (pp. 208–225). https://doi.org/10.1007/978-3-030-52829-4_12
[4] Johns, M., Meurers, T., Wirth, F. N., Haber,A. C., Müller, A., Halilovic, M., Balzer, F., & Prasser,F. (2023). Data Provenance in Biomedical Research: Scoping Review. Journal of Medical Internet Research, 25, e42289. https://doi.org/10.2196/42289
[5] Spoczynski, M., Melara, M. S., & Szyller, S. (2025). Atlas: A Framework for ML Lifecycle Provenance & Transparency. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2502.1956 7
[6] Schneider, J., Abraham, R., Meske, C., & Brocke, J. V. (2022). Artificial Intelligence governance for businesses. Information Systems Management, 40(3), 229–249. https://doi.org/10.1080/10580530.2022.2085825
[7] Yang, W., Fu, R., Amin, M. B., & Kang, B. (2025). The impact of modern AI in metadata management. Human-Centric Intelligent Systems, 5(3), 323–350. https://doi.org/10.1007/s44230-025-00106-5
[8] Yin, J., Chen, Y., Lee, M., & Liu, X. (2025).Schema Lineage Extraction at scale: multilingual pipelines, composite evaluation, and Language-Model Benchmarks. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2508.0717 9
[9] Longpre, S., Mahari, R., Chen, A., Obeng- Marnu, N., Sileo, D., Brannon, W., Muennighoff, N., Khazam, N., Kabbara, J., Perisetla, K., Wu, X., Shippole, E., Bollacker, K., Wu, T., Villa, L., Pentland, S., & Hooker, S. (2023). The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing & Attribution in AI. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2310.1678 7
[10] Mason-Williams, I., & Mason-Williams, G. (2025). Reproducibility: the new frontier in AI governance. arXiv (Cornell
University). https://doi.org/10.48550/arxiv.2510.1159 5
[11] Raghupathi, W., Raghupathi, V., & Ren, J. (2022). Reproducibility in Computing Research: An Empirical study. IEEE Access, 10, 29207–
29223. https://doi.org/10.1109/access.2022.3158675
[12] Pineau, J., Vincent-Lamarre, P., Sinha, K., Larivière, V., Beygelzimer, A., D’Alché-Buc, F., Fox, E., & Larochelle, H. (2020). Improving Reproducibility in Machine Learning Research (A Report from the NeurIPS 2019 Reproducibility Program). arXiv (Cornell
University). https://doi.org/10.48550/arxiv.2003.1220 6
[13] Hasham, K., Munir, K., & McClatchey, R. (2017). Cloud infrastructure provenance collection and management to reproduce scientific workflows execution. Future Generation Computer Systems, 86, 799–820. https://doi.org/10.1016/j.future.2017.07.015
[14] Rupprecht, L., Davis, J. C., Arnold, C., Gur, Y., & Bhagwat, D. (2020). Improving reproducibility of data science pipelines through transparent provenance capture. Proceedings of the VLDB Endowment, 13(12), 3354–3368. https://doi.org/10.14778/3415478.3415556
[15] Kalokyri, V., Tachos, N. S., Kalantzopoulos,C. N., Sfakianakis, S., Kondylakis, H., Zaridis, D. I., Colantonio, S., Regge, D., Papanikolaou, N., Marias, K., Fotiadis, D. I., & Tsiknakis, M. (2025). AI Model Passport: Data and system traceability framework for transparent AI in health. Computational and Structural Biotechnology Journal, 28, 386–404. https://doi.org/10.1016/j.csbj.2025.09.041
[16] Li, Z., Kesselman, C., Nguyen, T. H., Xu, B. Y., Bolo, K., & Yu, K. (2025). From Data to Decision: Data-Centric Infrastructure for Reproducible ML in Collaborative eScience. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2506.1605 1
[17] Safronov, V., McCaigue, A., Allott, N., & Martin, A. (2025). TAIBOM: Bringing Trustworthiness to AI-Enabled Systems. arXiv (Cornell
University). https://doi.org/10.48550/arxiv.2510.0216 9
[18] Kang, B. H., Yang, W., & Amin, M. B. (2025). Trustworthy Orchestration Artificial Intelligence by the Ten Criteria with Control-Plane Governance. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2512.1030 4