What would happen if an artificial intelligence (AI) system designed to predict the future of the stock market and trained during a period of economic stability faced an impending crisis? If it was not trained to recognize such signals, it might interpret a slight increase in transactions as a sign of ongoing growth. The model could also mistakenly predict that stock prices will rise, with serious consequences in the market.
If an AI tool analyzing the sentiment of the financial market based on news and social media posts received inadequate training, it could misinterpret expressions or contexts, leading to an analysis that does not represent the true market opinion, resulting in investment decisions based on distorted information.
These examples show that as AI becomes more prevalent in the financial sector, not only are pathways opened for innovation and automation, but challenges also arise, such as the so-called AI “hallucinations,” a term referring to situations where AI models generate and disseminate false or misleading information.
In the fintech world, AI is here to stay: it was valued at $1.12 trillion in 2023, and its rapid growth rates suggest it will reach $4.37 trillion by 2027, according to estimates from Market.us. However, according to analyses from the startup Vectara, the “hallucination” rate of chatbots ranges from 3% to 27%, posing a problem for the financial sector, where precise decisions are crucial.
Julián Colombo, CEO and founder of N5, argues that AI can present hallucinations that refer to errors or misinterpretations of data, which “leads to incorrect conclusions.” Julio Blanco, co-founder & CBO of Zentricx, clarifies that essentially, “the result is an invention of the model and is not supported by real information.”
Large language models (LLMs) — as explained by Weslley Rosalem, AI lead at Red Hat — operate based on conditional probabilities learned from training data. “They generate the next word or token based on the probability distributions of those sequences. Hallucinations occur when the model produces outputs that are statistically plausible but do not correspond to factual reality. These models capture statistical relationships but do not have a true understanding of the content,” he clarifies.
In Focus
In financial areas, these hallucinations can occur in several domains, such as credit analysis, where a model may assign a customer “a risk profile that does not reflect their true financial situation, potentially resulting in inappropriate credit granting decisions,” points out Colombo.
Blanco adds that in customer service, there may be query seekers (instead of “frequently asked questions”) and the seeker could make incorrect recommendations regarding services or their costs. They could also hallucinate in such a way that they do not resolve customer inquiries in any manner. Models can also hallucinate in generating financial reports, especially when performing complex calculations of estimation or trend prediction: “Rather than predicting, they would be guessing a future without any real foundation,” he notes.
In automated financial advisory, hallucinations may recommend inappropriate investment strategies based on faulty data or algorithms. Additionally, they can bring problems in fraud detection and risk management. “Hallucinations can lead to false positives or negatives, compromising the effectiveness of identifying fraudulent activities or assessing risks,” indicates Rosalem.
The nature of these hallucinations in the sector can lead to significant financial losses, damage to the reputation of institutions, and customer dissatisfaction. “Moreover, decisions based on erroneous analysis can increase the risk of fraud or regulatory non-compliance, exposing companies to regulatory sanctions. It is crucial to implement validation and monitoring measures to ensure that AI systems operate with accuracy and transparency, thus minimizing associated risks,” emphasizes Colombo.
Furthermore, hallucinations can lead to inefficient decision-making. “Hallucinations can compromise the quality of strategic decisions, affecting the institution’s competitiveness in the market,” adds Red Hat.
From Zentricx, they comment that the main way to minimize hallucinations is to ensure that the information used is credible. “If the model is fed with false information, it learns to repeat the same falsehoods. We always recommend a data advisory project before developing a complex AI model.”
Regarding data quality, Blanco highlights that it is a “central” point for reducing hallucinations. “It is necessary to ensure that AI models are trained with diverse, balanced, and well-structured data. Stress tests on the AI model should also be conducted,” he emphasizes.
At Red Hat, they suggest that strategies like RAG (Retrieval-Augmented Generation) or RIG (Retrieval Interleaved Generation) minimize the effects of hallucinations in LLMs, as they “combine language models with information retrieval systems.” The LLM is fed with specific information retrieved from a database or relevant documents, allowing the model to generate more precise and up-to-date responses, reducing reliance solely on training data that may be outdated or incomplete.
Open-source methods and tools like TrustyAI guardrails can be implemented with large language models (LLMs) to mitigate hallucinations and improve reliability.
TrustyAI is a set of tools focused on the explainability and reliability of AI models, providing resources to interpret the decisions of models, identify biases, and monitor performance. “By applying TrustyAI to LLMs, it is possible to better understand how the model generates responses and identify potential hallucinations or incorrect information,” adds Rosalem.
Guardrails, in turn, are mechanisms that impose restrictions or checks on the outputs of AI models. They can be implemented to ensure that responses fall within a certain range, follow specific policies, or are factually correct.
Human Reviews
Colombo also adds the necessity of implementing human reviews “for critical data and sensitive responses, which can increase accuracy, especially in areas such as risk and compliance.”
At N5, they developed the Fin Sky solution, which combines two effective approaches. First, they adopted a distributed model, using multiple AIs working together. They implemented a feedback process where they continuously validate the input, processing, and output of each user query.
“This has allowed us to reduce the hallucination rate to 0.3%, compared to the 3% to 27% rate observed in chatbots, according to data from the startup Vectara,” clarifies Colombo, concluding by emphasizing that their AIs are trained with exclusive data from the institution, avoiding queries to random information from the internet, which further increases the accuracy of the responses. “This combination of methods ensures that the solutions are reliable and secure, considering a sector where information plays a critical role.”