Schema-Grounded Agentic AI for Regulatory-Compliant Data Access: Bridging Natural Language and Enterprise Data Governance in Financial Services

Main Article Content

Sridhar Vadlapatla

Abstract

While many text-to-SQL benchmarks exist in academia, a fundamental tension in enterprise data access using LLM agents remains unaddressed: the system must be powerful enough to synthesise complex query logic that operates on proprietary schemas, yet strict enough to support data governance, access control, and auditability requirements imposed by financial regulators. In this paper, we address this tension through the design and implementation of a schema-grounded multi-agent system called MarketingMind at a Fortune 500 financial services enterprise, benefiting marketing analysts who interact with customer data via an SFMC environment and Google BigQuery. Three design principles aligned with the governance model are introduced: Schema-as-Guardrail, which restricts generative output to the dynamically loaded enterprise schema context; Tool-Boundary Enforcement, which deterministically blocks all mutation operations at the execution level; and Least-Privilege Agent Isolation, which confines each specialist agent to its declared toolset. Evaluation results show 97.3% schema fidelity under grounded conditions, compared with 71.8% in the ungrounded baselines; 100% mutation rejection across 10,000 adversarial inputs; and 0% tool leakage across 5,000 multi-turn sessions. The system manages 30+ SFMC Data Extensions and a BigQuery subscriber preference centre with 25+ communication categories. Findings highlight the need to treat governance as an architectural primitive and to complement prompt instructions with deterministic enforcement mechanisms when deploying enterprise LLMs in compliance-sensitive areas.


 

Article Details

Section

Articles

How to Cite

Schema-Grounded Agentic AI for Regulatory-Compliant Data Access: Bridging Natural Language and Enterprise Data Governance in Financial Services. (2024). International Journal of Research Publications in Engineering, Technology and Management (IJRPETM), 7(4), 10861-10868. https://doi.org/10.15662/IJRPETM.2024.0704006

References

[1] T. Yu et al., "Spider: A large-scale human-labelled dataset for complex and cross-domain semantic parsing and text-to-SQL task," in Proc. 2018 Conf. Empirical Methods Natural Language Processing, pp. 3911–3921. doi: 10.18653/v1/D18-1425.

[2] K. Greshake, S. Abdelnabi, S. Mishra, C. Endres, T. Holz, and M. Fritz, "Not what you've signed up for: Compromising real-world LLM-integrated applications with indirect prompt injection," in Proc. 16th ACM Workshop on Artificial Intelligence and Security, pp. 79–90, 2023. doi: 10.1145/3605764.3623985.

[3] Z. Ji et al., "Survey of hallucination in natural language generation," ACM Comput. Surv., vol. 55, no. 12, Art. no. 248, 2023. doi: 10.1145/3571730.

[4] D. Gao et al., "Text-to-SQL empowered by large language models: A benchmark evaluation," Proc. VLDB Endowment, vol. 17, pp. 1132–1145, 2023. doi: 10.48550/arXiv.2308.15363.

[5] T. T. Ke and K. Sudhir, "Privacy rights and data security: GDPR and personal data markets," Management Science, vol. 69, no. 8, pp. 4389–4412, 2023. doi: 10.1287/mnsc.2022.4614.

[6] H. Li, J. Zhang, C. Li, and H. Chen, "RESDSQL: Decoupling schema linking and skeleton parsing for text-to-SQL," in Proc. AAAI Conf. Artificial Intelligence, vol. 37, no. 11, pp. 13067–13075, 2023. doi: 10.48550/arXiv.2302.05965.

[7] J. Li et al., "Can LLM already serve as a database interface? A big bench for large-scale database grounded text-to-SQLs," in Adv. Neural Inf. Process. Syst., vol. 36, 2023. doi: 10.48550/arXiv.2305.03111.

[8] P. Lewis et al., "Retrieval-augmented generation for knowledge-intensive NLP tasks," in Adv. Neural Inf. Process. Syst., vol. 33, pp. 9459–9474, 2020. doi: 10.48550/arXiv.2005.11401.

[9] D. McNulty, A. Miglionico, and A. Milne, "Data access technologies and the 'new governance' techniques of financial regulation," J. Financial Regulation, vol. 9, no. 2, pp. 225–248, 2023. doi: 10.1093/jfr/fjad008.

[10] P. Mehta, V. Mehta, H. Pardeshi, and P. Bide, "Survey on natural language interfaces to databases," in Advances in Data-Driven Computing and Intelligent Systems, Lecture Notes in Networks and Systems, vol. 698, pp. 361–369, 2023. doi: 10.1007/978-981-99-3250-4_28.

[11] OpenAI, "GPT-4 technical report," arXiv, 2023. doi: 10.48550/arXiv.2303.08774.

[12] M. Pourreza and D. Rafiei, "DIN-SQL: Decomposed in-context learning of text-to-SQL with self-correction," in Adv. Neural Inf. Process. Syst., vol. 36, pp. 36339–36348, 2023. doi: 10.48550/arXiv.2304.11015.

[13] L. Wang et al., "PROTON: Probing schema linking information from pre-trained language models for text-to-SQL parsing," in Proc. 28th ACM SIGKDD Conf. Knowledge Discovery and Data Mining, pp. 1875–1884, 2022. doi: 10.1145/3534678.3539305.

[14] J. Wei et al., "Chain-of-thought prompting elicits reasoning in large language models," in Adv. Neural Inf. Process. Syst., vol. 35, pp. 24824–24837, 2022. doi: 10.48550/arXiv.2201.11903.

[15] V. Zhong, C. Xiong, and R. Socher, "Seq2SQL: Generating structured queries from natural language using reinforcement learning," arXiv, 2017. doi: 10.48550/arXiv.1709.00103.

[16] S. Yao et al., "ReAct: Synergising reasoning and acting in language models," in Int. Conf. Learning Representations, 2023. doi: 10.48550/arXiv.2210.03629.

[17] C. Brewis, S. Dibb, and M. Meadows, "Leveraging big data for strategic marketing: A dynamic capabilities model for incumbent firms," Technological Forecasting and Social Change, vol. 190, p. 122402, 2023. doi: 10.1016/j.techfore.2023.122402.