AI-Driven Audio & Speech Analytics as a Cloud BI Input Layer
Main Article Content
Abstract
After introducing Artificial Intelligence (AI) technologies in Audio Analytics/Speech processing, the world of Cloud Business Intelligence (BI) started to evolve and to deliver real time insightful business using unstructured voice data. The application of the Artificial Intelligence (AI) technology based Speech Recognition, Natural Language Processing (NLP), sentiment analysis and predictive analytics as intelligent input parts of the cloud based Business Intelligence (BI) system will be emphasized in this paper. In the corporate world today there's a wealth of voice data – a customer interaction in a service, a virtual team meeting, social data or interactions with voice assistants – but BI tools are traditionally geared towards structured data. An environment in a cloud BI is shown that contains a proposed frame structure to capture, manipulate and convert the converted speech to analyzable data streams.
It is a 5 layers architecture comprising of – Audio Data Acquisition, Speech to Text, NLP & Sentiment Processing, Cloud Data Integration and BI Visualization & Decision Support. It offers a scalable cloud architecture, and automatically generates reports, monitors customer behaviour and real-time fraud detection, gives operation intelligence. Key advantages of the data are; better decision-making, revolutionised customer experience, faster customer service response and less manual analysis. Most importantly data privacy, their plurilinguality and biases/drift of models' pronunciation are addressed.
The study suggests that moving towards a “Client” model with AI tools that can perform audio and speech analytics to deliver insights based on the contents of conversations may be beneficial for the effectiveness of the cloud BI systems. These synergies enable businesses to leverage on Data Intelligence and move towards Data driven Intelligent Automation and that's what will be a critical element in their success in the constantly evolving digital world.
Article Details
Section
How to Cite
References
[1] N. A. Parikh, “Empowering Business Transformation: The Positive Impact and Ethical Considerations of Generative AI in Software Product Management—A Systematic Literature Review,” arXiv preprint arXiv:2306.04605, 2023. doi: 10.48550/arxiv.2306.04605.
[2] L. Zhao and A. Zhao, “Sentiment analysis based requirement evolution prediction,” Future Internet, vol. 11, no. 2, p. 52, 2019. doi: 10.3390/fi11020052.
[3] M. Grootendorst, “BERTopic: Neural topic modeling with a class-based TF-IDF procedure,” arXiv preprint arXiv:2203.05794, 2022. doi: 10.48550/arxiv.2203.05794.
[4] P. A. Olujimi and A. Ade-Ibijola, “NLP techniques for automating responses to customer queries: A systematic review,” Discover Artificial Intelligence, vol. 3, no. 1, 2023. doi: 10.1007/s44163-023-00065-5.
[5] N. A. Alghamdi and H. H. Al-Baity, “Augmented Analytics Driven by AI: A Digital Transformation beyond Business Intelligence,” Sensors, vol. 22, no. 20, p. 8071, 2022. doi: 10.3390/s22208071.
[6] M. Mashaabi, A. Alotaibi, H. Qudaih, R. Alnashwan, and H. Al-Khalifa, “Natural Language Processing in Customer Service: A Systematic Review,” arXiv preprint arXiv:2212.09523, 2022. doi: 10.48550/arxiv.2212.09523.
[7] M. M. Kabir, M. F. Mridha, J. Shin, I. Jahan, and A. Q. Ohi, “A survey of speaker recognition: Fundamental theories, recognition methods and opportunities,” IEEE Access, vol. 9, pp. 79236–79263, 2021. doi: 10.1109/ACCESS.2021.3084299.
[8] K. Wei, B. Li, H. Lv, Q. Lu, N. Jiang, and L. Xie, “Conversational speech recognition by learning audio-textual cross-modal contextual representation,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 2432–2444, 2024. doi: 10.1109/TASLP.2024.3381234.
[9] D. Snyder, D. G. Romero, G. Sell, D. Povey, and S. Khudanpur, “X-vectors: Robust DNN embeddings for speaker recognition,” in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 2018.
[10] V. Panayotov, G. Chen, D. Povey, and S. Khudanpur, “LibriSpeech: An ASR corpus based on public domain audio books,” in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), South Brisbane, QLD, Australia, 2015, pp. 5206–5210.
[11] S. Hershey, S. Chaudhuri, D. P. W. Ellis, J. F. Gemmeke, A. Jansen, R. C. Moore, et al., “CNN architectures for large-scale audio classification,” in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA, 2017.
[12] D. Salvati, C. G. Drioli, and L. Foresti, “A late fusion deep neural network for robust speaker identification using raw waveforms and gammatone cepstral coefficients,” Expert Systems with Applications, vol. 222, p. 119750, 2023. doi: 10.1016/j.eswa.2023.119750.