This blog post is also available in French below.
In the banking sector, integrating chatbots can present major technical challenges, particularly in terms of performance measurement and risk management.
On Thursday, October 2 at noon, Guillaume Delroeux, President of Prométhée Consultants, and Marc Queudot, Practice Lead in Language Science and Technology at CRIM, will share the results of a rigorous comparative study conducted on conversational assistants used by a range of major financial institutions in Canada.
Performance indicators: granularity and relevance
The first step in evaluating an assistant is to define robust metrics.
“In our analysis, we focused on simple answers. Is the answer to the user’s question correct?” explains Marc Queudot in an interview in French.
He distinguishes between metrics related to system performance and response accuracy, and those concerning user experience.
“In well-designed assistants, the experience is fluid and the support is much higher-you’re offered the information you need rather than just an answer close to your question. It’s subtle, but you can only really know by experiencing the bot.”
The technical analysis should also consider the assistant’s ability to manage multi-turn conversations rather than isolated question-answer pairs, as well as conversational context-two elements that complicate evaluation but are essential for advanced use cases.
Risk management: intrusion testing and monitoring
Chatbots can come with multiple risks: hallucinations, bias, critical errors and damage to a brand’s image.
Marc Queudot emphasizes the need for intrusion testing.
“The classic approach in security for assistants is called red teaming, which involves a team trying to break the chatbot.”
This type of test helps identify vulnerabilities, especially those that could expose the organization to regulatory risks.
He also highlights the importance of limiting transactional capabilities.
“Often, answering questions is enough for an assistant. You should think twice before adding features that increase risks.”

Ongoing improvement: automation and human oversight
Ongoing improvement relies on automated collection and analysis of conversations.
“As usage grows, you’ll gather data, detect conversations where performance is lacking and identify areas for improvement.”
Systems must group problematic cases to allow human experts to intervene on complex issues. This hybrid approach ensures rapid adaptation to new needs and effective correction of detected flaws.
You should think twice before adding features that increase risks.
CRIM: expertise in risk mitigation
As a non-profit, CRIM offers a methodology for mitigating technological risks, combining experimentation, measurement and continuous optimization.
“What we offer at CRIM is to carry out the technological risk mitigation and answer the question: can we develop a reliable, high-performing system and deploy it with minimal risk? That’s our core expertise and it enables organiszations to make informed investment decisions.”
Webinar: results and recommendations
On Thursday, October 2, at noon, the webinar will present the results of a comparative study on conversational assistants used by major Canadian financial institutions. Participants will benefit from the involvement of Prométhée’s team, which brings customer experience expertise to enhance the work’s impact.
“We’ll uncover the challenges faced by current systems. Then, we’ll discuss how to evaluate these systems to ensure we have the right vision and path for improvement.”
On the agenda for the session, to secure and maximize the value of banking chatbots:
- technologies used
- analysis methods
- identified risks
- technical recommendations
Evaluating and making reliable banking chatbots – Technical approaches and real-world challenges
In the banking sector, the integration of conversational assistants, or chatbots, raises major technical challenges, both in terms of performance measurement and risk management.
On Thursday, October 2 at noon, Guillaume Delroeux, President of Prométhée Consultants, and Marc Queudot, Practice Leader, Language Sciences and Technologies at CRIM, will share the results of a rigorous comparative study of conversational assistants used by a range of major financial institutions in Canada.
Performance indicators: granularity and relevance
The first step in evaluating an assistant is to define robust metrics.
“In our analysis, we focused on simple answers. Is the answer to the user’s question correct or not?” explains Marc Queudot.
It distinguishes between metrics relating to system performance and the accuracy of responses, and those relating to the user experience.
“In well-designed assistants, it’s fluid and there’s a much higher level of support, you’re offered the information you need rather than just the answer to a question close to yours. It’s subtle, but you can only really know it by experiencing the bot.”
Technical analysis must also take into account an assistant’s ability to manage a conversation that spans several successive exchanges, rather than a single question and isolated answer, or the conversational context, two elements that complicate evaluation but are essential for advanced uses.
Risk management: penetration testing and monitoring
The risks associated with assistants are manifold: hallucinations, biases, critical errors, and damage to brand image.
Marc Queudot insists on the need for penetration testing: “The great classic in the field of security, around assistants, is called red teaming, which involves outright a team of people trying to break the chatbot.”
This type of test helps identify weaknesses, particularly those that could expose the organization to regulatory risks.
He also stresses the importance of limiting chatbots’ transactional capabilities: “Answering questions is often enough for an assistant. You have to ask yourself twice before adding functionalities that would increase the risk.”
Continuous improvement: automation and human supervision
Continuous improvement is based on the automated collection and analysis of conversations.
“You’ll, as you use it, amass data, detect conversations where you’re not performing well enough and thus identify where you could improve.”
Systems need to group problem cases together, allowing human experts to intervene in complex cases. This hybrid approach guarantees rapid adaptation to new needs and effective correction of any faults detected.
You need to think twice before adding features that increase risk.
CRIM: expertise and technological breakthroughs
As an NPO, CRIM proposes a technological derisking methodology, combining experimentation, measurement and continuous optimization.
“What we propose to CRIM is to go and do the technological derisking and give the answer to: is it possible to develop a reliable, high-performance system, and deploy systems like that, at lower risk. That’s our core business, and it enables the organization to make informed investment decisions.
Webinar: results and recommendations
The webinar on Thursday, October 2 at noon, offered in English, will present the results of a comparative study of conversational assistants at major Canadian financial institutions. Participants will benefit from the participation of the Prométhée team, which brings customer experience expertise to this work to give it greater impact.
“We’ll get a sense of the challenges faced by the systems in place. And then we’ll discuss how we evaluate those systems to make sure we have the right vision and the right avenue to improve them.”
On the agenda, to make banking chatbots more reliable and maximize their value:
- technologies used
- analysis methods
- identified risks
- technical recommendations