Open this publication in new window or tab >>2024 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]
In the current state of AI development, it is reasonable to think that AI will continue to expand and be increasingly utilized across different fields, highly impacting every aspect of humanity's welfare and livelihood. However, different AI researchers and institutions agree that AI has the potential to be extremely beneficial but also may pose existential threats to humanity. It is therefore necessary to develop tools to open the so-called black-box AI algorithms and increase their understandability and trustworthiness, in order to avoid conceivably harmful future scenarios.
The lack of interpretability of AI is a challenge to its own development: it is an obstacle equivalent to those that triggered previous AI winters, such as hardware or technological constraints or public over-expectation. In other words, research in interpretability and model understanding, both from theoretical and pragmatic perspectives, will help avoid a third AI winter, which could be devastating for the current world economy.
Specifically, from the theoretical perspective, the subfields of local explainability and algorithmic fairness require some improvements in order to enhance the explanation output. Local explainability refers to the algorithms that attempt to extract useful explanations for the output of machine learning models for individual instances, while algorithmic fairness refers to the study of biases or fairness issues among different groups of people, whenever the datasets refer to humans. Providing a higher level of explanation accuracy, explanation fidelity and explanation support for the observations of each dataset would help improve the overall level of trustworthiness and the understandability of the explanations. The explainability methods should also be applied to practical scenarios. In the area of autonomous driving, for example, providing confidence intervals on the positioning estimates and positioning errors is important for vehicle operations, and machine learning models coupled with conformal prediction may provide a solution that focuses on the confidence of these estimates, prioritizing safety.
This thesis contributes to research in the field of AI interpretability, focusing mainly on the algorithms related to local explainability, algorithmic fairness and conformal prediction. Specifically, the thesis targets the improvement of counterfactual and local surrogate explanation algorithms. These explainability methods may also reveal the existence of biases, and therefore the study of algorithmic fairness is a relevant part of interpretability. This thesis focuses on the topic of machine learning fairness assessment through the use of local explainability methods, proposing two novel elements: a single accuracy-based and counterfactual-based bias detection measure and a counterfactual generation method for groups intended for bias detection and fair recommendations across groups. Finally, the idea behind interpretability is to be able to eventually implement such methods in real-world applications. This thesis presents an application of the conformal prediction framework to a regression problem related to autonomous vehicle localization systems. In this application, the framework is able to output the predicted positioning error of a vehicle and its confidence interval with some level of significance.
Place, publisher, year, edition, pages
Stockholm: Department of Computer and Systems Sciences, Stockholm University, 2024. p. 72
Series
Report Series / Department of Computer & Systems Sciences, ISSN 1101-8526 ; 24-013
Keywords
artificial intelligence, machine learning, interpretability, explainability, counterfactual, fairness
National Category
Computer Systems
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-233360 (URN)978-91-8014-929-7 (ISBN)978-91-8014-930-3 (ISBN)
Public defence
2024-11-14, L30, NOD-huset, Borgarfjordsgatan 12, Kista, 09:00 (English)
Opponent
Supervisors
2024-10-222024-09-102024-10-08Bibliographically approved