{"id":27815,"date":"2025-09-16T10:16:20","date_gmt":"2025-09-16T14:16:20","guid":{"rendered":"https:\/\/www.crim.ca\/?p=27815"},"modified":"2026-04-17T10:41:08","modified_gmt":"2026-04-17T14:41:08","slug":"evaluating-and-securing-banking-chatbots-technical-approaches-and-real-world-challenges","status":"publish","type":"post","link":"https:\/\/www.crim.ca\/fr\/evaluating-and-securing-banking-chatbots-technical-approaches-and-real-world-challenges\/","title":{"rendered":"Evaluating and securing banking chatbots: technical approaches and real-world challenges"},"content":{"rendered":"<p><em>Ce billet de blogue est \u00e9galement disponible <a href=\"#francais\">en fran\u00e7ais<\/a> ci-dessous.<\/em><\/p>\n<p><strong>In the banking sector, integrating chatbots can present major technical challenges, particularly in terms of performance measurement and risk management.<\/strong><\/p>\n<p>On <a href=\"https:\/\/events.teams.microsoft.com\/event\/2b5fa4bc-d66d-46c5-bb75-4e16bab91db0@c0023636-b5f8-42f4-a5be-2bc466b64d81\">Thursday, October 2 at noon<\/a>, Guillaume Delroeux, President of <a href=\"https:\/\/www.prometheeconsultants.ca\/\">Prom\u00e9th\u00e9e Consultants<\/a>, and Marc Queudot, Practice Lead in Language Science and Technology at <a href=\"https:\/\/crim.ca\/\">CRIM<\/a>, will share the results of a rigorous comparative study conducted on conversational assistants used by a range of major financial institutions in Canada.<\/p>\n<h2>Performance indicators: granularity and relevance<\/h2>\n<p>The first step in evaluating an assistant is to define robust metrics.<\/p>\n<p>\u201cIn our analysis, we focused on simple answers. Is the answer to the user&#8217;s question correct?\u201d explains Marc Queudot in an interview in French.<\/p>\n<p>He distinguishes between metrics related to system performance and response accuracy, and those concerning user experience.<\/p>\n<p>\u201cIn well-designed assistants, the experience is fluid and the support is much higher\u2014you\u2019re offered the information you need rather than just an answer close to your question. It\u2019s subtle, but you can only really know by experiencing the bot.\u201d<\/p>\n<p>The technical analysis should also consider the assistant\u2019s ability to manage multi-turn conversations rather than isolated question-answer pairs, as well as conversational context\u2014two elements that complicate evaluation but are essential for advanced use cases.<\/p>\n<h2>Risk management: intrusion testing and monitoring<\/h2>\n<p>Chatbots can come with multiple risks: hallucinations, bias, critical errors and damage to a brand&#8217;s image.<\/p>\n<p>Marc Queudot emphasizes the need for intrusion testing.<\/p>\n<p>\u201cThe classic approach in security for assistants is called red teaming, which involves a team trying to break the chatbot.\u201d<\/p>\n<p>This type of test helps identify vulnerabilities, especially those that could expose the organization to regulatory risks.<\/p>\n<p>He also highlights the importance of limiting transactional capabilities.<\/p>\n<p>\u201cOften, answering questions is enough for an assistant. You should think twice before adding features that increase risks.\u201d<\/p>\n<figure id=\"attachment_27820\" aria-describedby=\"caption-attachment-27820\" style=\"width: 1920px\" class=\"wp-caption alignnone\"><img fetchpriority=\"high\" decoding=\"async\" class=\"wp-image-27820 size-full\" src=\"https:\/\/www.crim.ca\/wp-content\/uploads\/2025\/09\/marc-presente.jpg\" alt=\"Marc Queudot presenting a technical analysis on conversational AI while an audience attends a professional conference in an indoor event space.\" width=\"1920\" height=\"1080\" srcset=\"https:\/\/www.crim.ca\/wp-content\/uploads\/2025\/09\/marc-presente.jpg 1920w, https:\/\/www.crim.ca\/wp-content\/uploads\/2025\/09\/marc-presente-300x169.jpg 300w, https:\/\/www.crim.ca\/wp-content\/uploads\/2025\/09\/marc-presente-1024x576.jpg 1024w, https:\/\/www.crim.ca\/wp-content\/uploads\/2025\/09\/marc-presente-768x432.jpg 768w, https:\/\/www.crim.ca\/wp-content\/uploads\/2025\/09\/marc-presente-1536x864.jpg 1536w\" sizes=\"(max-width: 1920px) 100vw, 1920px\" \/><figcaption id=\"caption-attachment-27820\" class=\"wp-caption-text\">Marc Queudot, Practice Lead in Language Science and Technology at CRIM<\/figcaption><\/figure>\n<h2>Ongoing improvement: automation and human oversight<\/h2>\n<p>Ongoing improvement relies on automated collection and analysis of conversations.<\/p>\n<p>\u201cAs usage grows, you\u2019ll gather data, detect conversations where performance is lacking and identify areas for improvement.\u201d<\/p>\n<p>Systems must group problematic cases to allow human experts to intervene on complex issues. This hybrid approach ensures rapid adaptation to new needs and effective correction of detected flaws.<\/p>\n<blockquote><p>You should think twice before adding features that increase risks.<\/p><\/blockquote>\n<h2>CRIM: expertise in risk mitigation<\/h2>\n<p>As a non-profit, CRIM offers a methodology for mitigating technological risks, combining experimentation, measurement and continuous optimisation.<\/p>\n<p>\u201cWhat we offer at CRIM is to carry out the technological risk mitigation and answer the question: can we develop a reliable, high-performing system and deploy it with minimal risk? That\u2019s our core expertise and it enables organiszations to make informed investment decisions.\u201d<\/p>\n<h2><strong>Webinar: results and recommendations<\/strong><\/h2>\n<p>On Thursday, October 2, at noon, the webinar will present the results of a comparative study on conversational assistants used by major Canadian financial institutions. Participants will benefit from the involvement of Prom\u00e9th\u00e9e&#8217;s team, which brings customer experience expertise to enhance the work&#8217;s impact.<\/p>\n<p>\u201cWe\u2019ll uncover the challenges faced by current systems. Then, we\u2019ll discuss how to evaluate these systems to ensure we have the right vision and path for improvement.\u201d<\/p>\n<p>On the agenda for the session, to secure and maximize the value of banking chatbots :<\/p>\n<ul>\n<li>technologies used<\/li>\n<li>analysis methods<\/li>\n<li>risks identified<\/li>\n<li>technical recommendations<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<p><a href=\"https:\/\/events.teams.microsoft.com\/event\/2b5fa4bc-d66d-46c5-bb75-4e16bab91db0@c0023636-b5f8-42f4-a5be-2bc466b64d81\"><button id=\"francais\">Registration is free<\/button><\/a><\/p>\n<p>&nbsp;<\/p>\n<hr style=\"width: 0%; margin: 0 auto;\" \/>\n<h2><\/h2>\n<h2>\u00c9valuer et fiabiliser les chatbots bancaires \u2013 Approches techniques et enjeux r\u00e9els<\/h2>\n<p>Dans le secteur bancaire, l\u2019int\u00e9gration des assistants conversationnels, ou chatbots, soul\u00e8ve des d\u00e9fis techniques majeurs, tant sur le plan de la mesure de performance que de la gestion des risques.<\/p>\n<p>Le <a href=\"https:\/\/events.teams.microsoft.com\/event\/2b5fa4bc-d66d-46c5-bb75-4e16bab91db0@c0023636-b5f8-42f4-a5be-2bc466b64d81\">jeudi 2 octobre \u00e0 midi<\/a>, Guillaume Delroeux, Pr\u00e9sident de <a href=\"https:\/\/www.prometheeconsultants.ca\/\">Prom\u00e9th\u00e9e Consultants<\/a> et Marc Queudot, Chef de pratique, Sciences et technologies du langage au <a href=\"https:\/\/crim.ca\/\">CRIM<\/a>, partageront les r\u00e9sultats d&#8217;une \u00e9tude comparative rigoureuse men\u00e9e sur les assistants conversationnels utilis\u00e9s par une panoplie de grandes institutions financi\u00e8res au Canada.<\/p>\n<h2>Indicateurs de performance : granularit\u00e9 et pertinence<\/h2>\n<p>La premi\u00e8re \u00e9tape pour \u00e9valuer un assistant consiste \u00e0 d\u00e9finir des m\u00e9triques robustes.<\/p>\n<p>\u00ab Dans notre analyse, nous nous sommes focalis\u00e9s sur les r\u00e9ponses simples. Est-ce que oui ou non la r\u00e9ponse \u00e0 la question de l\u2019utilisateur est correcte? \u00bb explique Marc Queudot.<\/p>\n<p>Il distingue les m\u00e9triques li\u00e9es \u00e0 la performance du syst\u00e8me et \u00e0 la justesse des r\u00e9ponses de celles autour de l\u2019exp\u00e9rience utilisateur.<\/p>\n<p>\u00ab Dans les assistants bien con\u00e7us, c\u2019est fluide et il y a une prise en charge bien plus \u00e9lev\u00e9e, on te propose l\u2019information dont tu as besoin plut\u00f4t que seulement la r\u00e9ponse \u00e0 une question proche de la tienne. C\u2019est subtil, mais on ne peut vraiment le savoir qu\u2019en faisant l\u2019exp\u00e9rience du bot. \u00bb<\/p>\n<p>L\u2019analyse technique doit \u00e9galement prendre en compte la capacit\u00e9 d\u2019un assistant \u00e0 g\u00e9rer une conversation qui s\u2019\u00e9tend sur plusieurs \u00e9changes successifs, plut\u00f4t qu\u2019une seule question et une r\u00e9ponse isol\u00e9es, ou le contexte conversationnel, deux \u00e9l\u00e9ments qui complexifient l\u2019\u00e9valuation mais sont essentiels pour des usages avanc\u00e9s.<\/p>\n<h2>Gestion des risques : tests d\u2019intrusion et monitoring<\/h2>\n<p>Les risques li\u00e9s aux assistants sont multiples : hallucinations, biais, erreurs critiques, et atteinte \u00e0 l\u2019image de marque.<\/p>\n<p>Marc Queudot insiste sur la n\u00e9cessit\u00e9 de tests d\u2019intrusion : \u00ab Le grand classique dans le domaine de la s\u00e9curit\u00e9, autour des assistants, \u00e7a s\u2019appelle le <em>red teaming<\/em>, qui implique carr\u00e9ment une \u00e9quipe de gens qui essayent de briser le chatbot.\u00bb<\/p>\n<p>Ce type de test permet d\u2019identifier les failles, notamment celles qui exposeraient l\u2019organisation \u00e0 des risques r\u00e9glementaires.<\/p>\n<p>Il souligne aussi l\u2019importance de limiter les capacit\u00e9s transactionnelles des chatbots : \u00ab R\u00e9pondre aux questions, c\u2019est souvent suffisant pour un assistant. Il faut se poser la question deux fois avant d\u2019ajouter des fonctionnalit\u00e9s qui augmenteraient le risque. \u00bb<\/p>\n<h2>Am\u00e9lioration continue : automatisation et supervision humaine<\/h2>\n<p>L\u2019am\u00e9lioration continue repose sur la collecte et l\u2019analyse automatis\u00e9e des conversations.<\/p>\n<p>\u00ab Tu vas, au fur et \u00e0 mesure de l\u2019utilisation, amasser des donn\u00e9es, d\u00e9tecter les conversations o\u00f9 tu ne performes pas assez bien et ainsi identifier l\u00e0 o\u00f9 tu pourrais t\u2019am\u00e9liorer. \u00bb<\/p>\n<p>Les syst\u00e8mes doivent regrouper les cas probl\u00e9matiques pour permettre aux experts humains d\u2019intervenir sur les cas complexes. Cette approche hybride garantit une adaptation rapide aux nouveaux besoins et une correction efficace des failles d\u00e9tect\u00e9es.<\/p>\n<blockquote><p>Il faut se poser la question deux fois avant d\u2019ajouter des fonctionnalit\u00e9s qui augmenteraient le risque.<\/p><\/blockquote>\n<h2>Le CRIM : expertise et d\u00e9risquage technologique<\/h2>\n<p>En tant qu\u2019OBNL, le CRIM propose une m\u00e9thodologie de d\u00e9risquage technologique, combinant exp\u00e9rimentation, mesure et optimisation continue.<\/p>\n<p>\u00ab Nous, ce qu\u2019on propose au CRIM, c\u2019est d\u2019aller faire le d\u00e9risquage technologique et donner la r\u00e9ponse \u00e0 : est-ce possible de d\u00e9velopper un syst\u00e8me fiable, performant, et d\u00e9ployer des syst\u00e8mes comme \u00e7a, \u00e0 moindre risque. C\u2019est notre c\u0153ur de m\u00e9tier et \u00e7a permet ensuite \u00e0 l\u2019organisation prendre des d\u00e9cisions d\u2019investissement inform\u00e9es. \u00bb<\/p>\n<h2><strong>Webinaire : r\u00e9sultats et recommandations<\/strong><\/h2>\n<p>Le webinaire du jeudi 2 octobre \u00e0 midi, offert en anglais, pr\u00e9sentera les r\u00e9sultats d\u2019une \u00e9tude comparative sur les assistants conversationnels des grandes institutions financi\u00e8res canadiennes. Les participants profiteront de la participation de l\u2019\u00e9quipe de Prom\u00e9th\u00e9e qui am\u00e8ne l\u2019expertise de l\u2019exp\u00e9rience client \u00e0 ce travail pour lui donner plus d\u2019impact.<\/p>\n<p>\u00ab On va se rendre compte des d\u00e9fis auxquels font face les syst\u00e8mes en place. Et apr\u00e8s, on va discuter de comment on \u00e9value ces syst\u00e8mes pour s\u2019assurer qu\u2019on a la bonne vision et la bonne avenue pour les am\u00e9liorer. \u00bb<\/p>\n<p>Au programme, pour fiabiliser et maximiser la valeur des chatbots bancaires :<\/p>\n<ul>\n<li>technologies utilis\u00e9es<\/li>\n<li>m\u00e9thodes d\u2019analyse<\/li>\n<li>risques identifi\u00e9s<\/li>\n<li>recommandations techniques<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<p><a href=\"https:\/\/events.teams.microsoft.com\/event\/2b5fa4bc-d66d-46c5-bb75-4e16bab91db0@c0023636-b5f8-42f4-a5be-2bc466b64d81\"><button>L&#8217;inscription est gratuite<\/button><\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Ce billet de blogue est \u00e9galement disponible en fran\u00e7ais ci-dessous. In the banking sector, integrating chatbots can present major technical challenges, particularly in terms of performance measurement and risk management. On Thursday, October 2 at noon, Guillaume Delroeux, President of Prom\u00e9th\u00e9e Consultants, and Marc Queudot, Practice Lead in Language Science and Technology at CRIM, will [&hellip;]<\/p>\n","protected":false},"author":409,"featured_media":27860,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":"","_links_to":"","_links_to_target":""},"categories":[55],"tags":[753,743,742,752,744,745,751,747,746,750,749,748],"class_list":["post-27815","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-nouvelle","tag-assistants-conversationnels-en-entreprise","tag-assistants-conversationnels-financiers","tag-chatbots-bancaires","tag-derisquage-technologique","tag-evaluation-performance-chatbot","tag-gestion-des-risques-ia","tag-gouvernance-ia","tag-hallucinations-ia","tag-ia-conversationnelle-bancaire","tag-intelligence-artificielle-crim","tag-metriques-assistants-conversationnels","tag-red-teaming-chatbot"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.crim.ca\/fr\/wp-json\/wp\/v2\/posts\/27815","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.crim.ca\/fr\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.crim.ca\/fr\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.crim.ca\/fr\/wp-json\/wp\/v2\/users\/409"}],"replies":[{"embeddable":true,"href":"https:\/\/www.crim.ca\/fr\/wp-json\/wp\/v2\/comments?post=27815"}],"version-history":[{"count":27,"href":"https:\/\/www.crim.ca\/fr\/wp-json\/wp\/v2\/posts\/27815\/revisions"}],"predecessor-version":[{"id":29432,"href":"https:\/\/www.crim.ca\/fr\/wp-json\/wp\/v2\/posts\/27815\/revisions\/29432"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.crim.ca\/fr\/wp-json\/wp\/v2\/media\/27860"}],"wp:attachment":[{"href":"https:\/\/www.crim.ca\/fr\/wp-json\/wp\/v2\/media?parent=27815"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.crim.ca\/fr\/wp-json\/wp\/v2\/categories?post=27815"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.crim.ca\/fr\/wp-json\/wp\/v2\/tags?post=27815"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}