Misleading data poses a threat in sensitive areas such as investment management or credit granting
What would happen if an artificial intelligence (AI) system designed to predict the future of the stock market and trained in a period of economic stability faced an imminent crisis? If you weren’t trained to recognize signals like this, you could interpret a small increase in transactions as a sign of continued growth. The model could also wrongly predict that stock prices will rise, with serious consequences for the market.
If an AI tool that analyzes financial market sentiment based on news and social media posts receives inadequate training, it may misinterpret expressions or contexts, which could lead to an analysis that does not represent the true opinion of the market, leading to investment decisions based on distorted information.
These examples show that as AI becomes more present in the financial sector, not only are avenues for innovation and automation opening up, but also challenges such as so-called AI “hallucinations”, a term that refers to situations in which AI models generate and disseminate false or misleading information.
In the fintech world, AI is here to stay: it was valued at $1.12 billion in 2023, and its rapid growth rates suggest that it will reach $4.37 billion by 2027, according to Market.us estimates. However, according to analysis by the startup Vectara, the “hallucination” rate of chatbots ranges from 3% to 27%, which becomes a problem for the financial area, where accurate decisions are crucial.
Julián Colombo, CEO and founder of N5, states that AI can present hallucinations that refer to errors or incorrect interpretations of data, which “leads to erroneous conclusions”. Julio Blanco, co-founder and CBO of Zentricx, clarifies that basically “the result is an invention of the model and is not supported by real information”.
Large natural language models (LLMs) – explains Weslley Rosalem, senior AI lead at Red Hat – work based on conditional probabilities learned from training data. “They generate the next word or token based on the probability distributions of those sequences. Hallucinations occur when the model produces results that are statistically plausible but do not correspond to factual reality. These models capture statistical relationships, but they do not have a true understanding of the content,” he clarifies.
In the spotlight
In the financial sector, these hallucinations can occur in several areas, such as credit analysis, where a model can assign a customer “a risk profile that does not reflect their true financial situation, potentially resulting in inappropriate credit granting decisions,” says Colombo.
Blanco adds that, in the case of customer service, there may be query search engines (replacing “frequently asked questions”) and the search engine may make wrong recommendations about the services or their costs. In turn, they can be so delusional that they don’t solve customers’ queries at all.” The models can also hallucinate in the generation of financial reports, in the case of performing complex calculations to estimate or predict trends: “More than predicting, they would be guessing a future with no real basis”, he points out.
In automated financial advice, hallucinations can recommend inappropriate investment strategies based on faulty data or algorithms. Likewise, they can cause problems in fraud detection and risk management. “Hallucinations can lead to false positives or negatives, compromising the effectiveness of identifying fraudulent activities or assessing risks,” Roselem says.
The nature of these hallucinations in the sector can lead to significant financial losses, reputational damage to institutions, and customer dissatisfaction. “In addition, decisions based on flawed analysis can increase the risk of fraud or regulatory non-compliance, exposing companies to regulatory sanctions. It is crucial to implement validation and monitoring measures to ensure that AI systems work accurately and transparently, thus minimizing the associated risks,” emphasizes Colombo.
Similarly, hallucinations can lead to inefficient decision-making. “Hallucinations can compromise the quality of strategic decisions, affecting the institution’s competitiveness in the market,” they add from Red Hat.
Minimize risk
From Zentricx they comment that the main way to minimize hallucination is that the information used is reliable. “If the model receives false information, it learns to repeat the same falsehoods.” We always recommend a data consulting project before developing a complex AI model.”
As for the quality of the data, Blanco points out that it is a “central” point to reduce hallucinations. “It is necessary to ensure that AI models are trained with diverse, balanced and well-structured data. Also perform stress tests on the AI model”, he highlights.
At Red Hat they suggest that strategies such as RAG (Retrieval-Augmented Generation) or RIG (Retrieval Interleaved Generation) minimize the effects of hallucinations in LLMs, since they “combine language models with information retrieval systems”. The LLM is fed with specific information retrieved from relevant databases or documents, which allows the model to generate more accurate and up-to-date responses and reduces the exclusive reliance on training data, which may be outdated or incomplete.
Open-source methods and tools like TrustyAI and guardrails can be implemented with large language models (LLMs) to mitigate hallucinations and improve reliability.
TrustyAI is a suite of tools that targets the explainability and reliability of AI models, providing capabilities for interpreting model decisions, identifying biases, and monitoring performance. “By applying TrustyAI to LLMs, it is possible to better understand how the model generates responses and identify possible hallucinations or incorrect information,” says Rosalem.
Guardrails, on the other hand, are mechanisms that impose restrictions or checks on the outputs of AI models. They can be implemented to ensure that responses are within a certain scope, follow specific policies, or are factually correct.
Colombo further adds the need to implement human reviews “for critical data and sensitive responses can increase accuracy, especially in areas such as risk and compliance.”
At N5 they have developed the Fin Sky solution that combines two effective approaches. First, they adopted a distributed model, which uses multiple AIs working together. They have implemented a feedback process, where they continuously validate the input, processing, and output of each user query. “This allowed us to reduce the hallucination rate to 0.3%, compared to the rate of 3% to 27% observed in chatbots, according to data from the startup Vectara”, explains Colombo and closes by stressing that its AIs are trained with data exclusive to the institution, avoiding queries to random information on the internet, which further increases the accuracy of the answers. “This combination of methods ensures that solutions are reliable and secure in a sector where information plays a critical role.”