RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
AI-Driven Multi-objective Decision-Making With Applications to IoT
Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.ORCID-id: 0000-0001-7612-4227
2026 (engelsk)Doktoravhandling, med artikler (Annet vitenskapelig)
Abstract [en]

Artificial intelligence (AI) plays an increasingly central role in enabling autonomous decision-making in complex, uncertain environments. Many modern systems must optimise multiple, often conflicting objectives while operating under dynamic resource constraints and incomplete knowledge of system dynamics. Classical approaches such as dynamic programming, constrained stochastic optimisation, and static multi-objective scalarisation provide principled solutions when accurate models are available. However, in distributed and stochastic environments such as the Internet of Things (IoT), system dynamics are often unknown, non-stationary, and resource-limited, making purely model-based methods difficult to apply.

Reinforcement learning (RL) and online learning offer an alternative by enabling policy adaptation through interaction rather than relying on explicit system models. Within multi-objective settings, existing approaches often assume fixed scalarisation weights or externally specified preferences and typically focus on learning policies for given trade-offs. In dynamic IoT systems, however, both resource constraints and preference parameters may vary over time, requiring algorithms that can adapt efficiently without repeated retraining or centralised coordination.

This thesis investigates how AI-based methods, with a primary focus on reinforcement learning and complementary distributed learning techniques, can support adaptive multi-objective decision-making under communication constraints, explicit resource limitations, and dynamically changing trade-offs. The research is organised around three themes. First, communication-efficient distributed learning methods are developed to balance model accuracy and communication cost infederated learning through adaptive sparsification. Second, constrained bandit and reinforcement learning formulations are proposed to incorporate explicit and time-varying resource constraints while maintaining theoretical performance guarantees. Third, multi-objective reinforcement learning methods are designed to adapt routing decisions indistributed IoT systems under dynamically changing energy–reliabilitytrade-offs without retraining.

Overall, the thesis demonstrates that integrating communication awareness, constraint handling, and preference adaptation directly into learning algorithms is essential for reliable AI-based decision-makingin IoT environments. The results provide both algorithmic advances and a conceptual framework for designing autonomous systems that operate robustly under dynamic objectives and limited resources.

Abstract [sv]

Artificiell intelligens (AI) spelar en allt viktigare roll för att möjliggöra autonomt beslutsfattande i komplexa och osäkra miljöer. Många moderna system måste optimera flera, ofta motstridiga mål samtidigt som de verkar under dynamiska resursbegränsningar och med ofullständig kunskap om systemets dynamik. Klassiska metoder såsom dynamisk programmering, begränsad stokastisk optimering och statisk multiobjektiv skalärisering erbjuder principiella lösningar när exakta modeller finns tillgängliga. I distribuerade och stokastiska miljöer, såsom Internet of Things (IoT), är dock systemdynamiken ofta okänd, icke-stationär och resursbegränsad, vilket gör rent modellbaserade metoder svåra att tillämpa.

Förstärkningsinlärning (reinforcement learning, RL) och onlineinlärning erbjuder ett alternativ genom att möjliggöra policyanpassning genom interaktionsnarare än genom explicita systemmodeller. Inom multiobjektiva problemantar befintliga metoder ofta fasta skaläriseringsvikter eller externt specificerade preferenser och fokuserar främst på att lära policyer för givna avvägningar.I dynamiska IoT-system kan dock både resursbegränsningar och preferensparametrarförändras över tid, vilket kräver algoritmer som kan anpassa sig effektivtutan upprepad ominlärning eller centraliserad styrning.

Denna avhandling undersöker hur AI-baserade metoder, med särskilt fokus på förstärkningsinlärning och kompletterande distribuerade inlärningsmetoder,kan stödja adaptivt multiobjektivt beslutsfattande under kommunikationsbegränsningar,explicita resurskrav och dynamiskt föränderliga avvägningar.Forskningen är organiserad kring tre huvudteman. För det första utveckla kommunikationseffektiva distribuerade inlärningsmetoder som balanserar modellnoggrannhet och kommunikationskostnad i federerad inlärning genom adaptiv sparsifiering. För det andra föreslås begränsade bandit- och förstärkningsinlärningsformuleringar för att hantera explicita och tidsvarierande resursbegränsningar med teoretiska prestandagarantier. För det tredje utvecklasmultiobjektiv förstärkningsinlärning för att möjliggöra adaptiv routingi distribuerade IoT-system under dynamiskt föränderliga avvägningar mellanenergiförbrukning och tillförlitlighet, utan behov av ominlärning.

Sammanfattningsvis visar avhandlingen att det är avgörande att explicit integrera kommunikationsmedvetenhet, begränsningshantering och preferensanpassning i inlärningsalgoritmer för att uppnå tillförlitligt AI-baserat beslutsfattande i IoT-miljöer. Resultaten bidrar med både algoritmiska framsteg och konceptuell grund för att utforma autonoma system som kan verka robusta under dynamiska mål och begränsade resurser.

sted, utgiver, år, opplag, sider
Stockholm: Department of Computer and Systems Sciences, Stockholm University , 2026. , s. 116
Serie
Report Series / Department of Computer & Systems Sciences, ISSN 1101-8526 ; 26-005
Emneord [en]
Artificial Intelligence, Multi-objective, Internet of Things, Federated Learning, Reinforcement Learning
HSV kategori
Forskningsprogram
data- och systemvetenskap
Identifikatorer
URN: urn:nbn:se:su:diva-254165ISBN: 978-91-8107-602-8 (tryckt)ISBN: 978-91-8107-603-5 (digital)OAI: oai:DiVA.org:su-254165DiVA, id: diva2:2052324
Disputas
2026-05-29, Small Auditorium NOD (Lilla hörsalen), plan 2, Borgarfjordsgatan 12, Kista, 13:00 (engelsk)
Opponent
Veileder
Tilgjengelig fra: 2026-05-06 Laget: 2026-04-13 Sist oppdatert: 2026-04-28bibliografisk kontrollert
Delarbeid
1. Energy-Efficient and Adaptive Gradient Sparsification for Federated Learning
Åpne denne publikasjonen i ny fane eller vindu >>Energy-Efficient and Adaptive Gradient Sparsification for Federated Learning
2023 (engelsk)Inngår i: IEEE International Conference on Communications (ICC), 2023, IEEE (Institute of Electrical and Electronics Engineers) , 2023, s. 1256-1261Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Federated learning is an emerging machine-learning technique that trains an algorithm across multiple decentralized edge devices or clients holding local data samples. It involves training local models on local data and uploading model parameters to a server node at regular intervals to generate a global model which is transmitted to all clients. However, edge nodes often have limited energy resources, and hence performing energy-efficient communication of model parameters is a bottleneck problem. We propose an energy-adaptive model sparsification for Federated Learning. The central idea is to adapt the sparsification level in run-time by optimizing the ratio between information content and energy cost. We illustrate the efficiency of the proposed algorithm by comparing its performance with three baseline schemes. We validate the performance of the proposed algorithm for two cost models. Simulation results show that the proposed algorithm needs exponentially less amount of communication and energy as compared to the three baseline schemes while achieving the best accuracy and fastest convergence.

sted, utgiver, år, opplag, sider
IEEE (Institute of Electrical and Electronics Engineers), 2023
Serie
IEEE International Conference on Communications, ISSN 1550-3607, E-ISSN 1938-1883
Emneord
Federated learning, Energy-efficiency, Adaptive communication, Gradient sparsification, IoT
HSV kategori
Forskningsprogram
data- och systemvetenskap
Identifikatorer
urn:nbn:se:su:diva-223318 (URN)10.1109/ICC45041.2023.10278999 (DOI)001094862601059 ()2-s2.0-85178304952 (Scopus ID)978-1-5386-7463-5 (ISBN)
Konferanse
IEEE International Conference on Communications (ICC), 2023
Tilgjengelig fra: 2023-10-25 Laget: 2023-10-25 Sist oppdatert: 2026-04-13
2. Communication-Adaptive Gradient Sparsification for Federated Learning with Error Compensation
Åpne denne publikasjonen i ny fane eller vindu >>Communication-Adaptive Gradient Sparsification for Federated Learning with Error Compensation
2025 (engelsk)Inngår i: IEEE Internet of Things Journal, ISSN 2327-4662, Vol. 12, nr 2, s. 1137-1152Artikkel i tidsskrift (Annet vitenskapelig) Published
Abstract [en]

Federated learning has emerged as a popular distributed machine-learning paradigm. It involves many rounds of iterative communication between nodes to exchange model parameters. With the increasing complexity of ML tasks, the models can be large, having millions of parameters. Moreover, edge and IoT nodes often have limited energy resources and channel bandwidths. Thus, reducing the communication cost in Federated Learning is a bottleneck problem. This cost could be in terms of energy consumed, delay involved, or amount of data communicated. We propose a communication cost-adaptive model sparsification for Federated Learning with Error Compensation. The central idea is to adapt the sparsification level in run-time by optimizing the ratio between the impact of the communicated model parameters and communication cost. We carry out a detailed convergence analysis to establish the theoretical foundations of the proposed algorithm. We conduct extensive experiments to train both convex and non-convex machine learning models on a standard dataset. We illustrate the efficiency of the proposed algorithm by comparing its performance with three baseline schemes. The performance of the proposed algorithm is validated for two communication models and three cost functions. Simulation results show that the proposed algorithm needs a substantially less amount of communication than the three baseline schemes while achieving the best accuracy and fastest convergence. The results are consistent for all the considered cost models, cost functions, and ML models. Thus, the proposed FL-CATE algorithm can substantially improve the communication efficiency of federated learning, irrespective of the ML tasks, costs, and communication models.

Emneord
Federated learning, Communication efficiency, IoT, Gradient sparsification, Distributed learning
HSV kategori
Forskningsprogram
data- och systemvetenskap
Identifikatorer
urn:nbn:se:su:diva-235702 (URN)10.1109/JIOT.2024.3490855 (DOI)001395714600019 ()2-s2.0-85208723002 (Scopus ID)
Merknad

The article is available online under early access area on IEEE Xplore. This article has been accepted for publication in a future issue of this journal, but has not been edited and content may change prior to final publication. It may be cited as an article in a future issue by its Digital Object Identifier.

Tilgjengelig fra: 2024-11-19 Laget: 2024-11-19 Sist oppdatert: 2026-04-13bibliografisk kontrollert
3. Adaptive Budgeted Multi-Armed Bandits for IoT with Dynamic Resource Constraints
Åpne denne publikasjonen i ny fane eller vindu >>Adaptive Budgeted Multi-Armed Bandits for IoT with Dynamic Resource Constraints
2025 (engelsk)Inngår i: GLOBECOM 2025 - 2025 IEEE Global Communications Conference, Piscataway: IEEE, 2025, s. 4535-4540Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Internet of Things (IoT) systems increasingly operate in environments where devices must respond in real time while managing fluctuating resource constraints, including energy and bandwidth. Yet, current approaches often fall short in addressing scenarios where operational constraints evolve over time. To address these limitations, we propose a novel Budgeted Multi-Armed Bandit framework tailored for IoT applications with dynamic operational limits. Our model introduces a decaying violation budget, which permits limited constraint violations early in the learning process and gradually enforces stricter compliance over time. We present the Budgeted Upper Confidence Bound (UCB) algorithm, which adaptively balances performance optimization and compliance with time-varying constraints. We provide theoretical guarantees showing that Budgeted UCB achieves sublinear regret and logarithmic constraint violations over the learning horizon. Extensive simulations in a wireless communication setting show that our approach achieves faster adaptation and better constraint satisfaction than standard online learning methods. These results highlight the framework’s potential for building adaptive, resource-aware IoT systems.

sted, utgiver, år, opplag, sider
Piscataway: IEEE, 2025
Serie
IEEE Conference on Global Communications (GLOBECOM), E-ISSN 2576-6813
Emneord
Online Learning, Multi-Armed Bandits, Upper Confidence Bound, Dynamic Constraints, Internet of Things
HSV kategori
Forskningsprogram
data- och systemvetenskap
Identifikatorer
urn:nbn:se:su:diva-254019 (URN)10.1109/GLOBECOM59602.2025.11432479 (DOI)2-s2.0-105036346401 (Scopus ID)979-8-3315-7781-0 (ISBN)979-8-3315-7782-7 (ISBN)
Konferanse
2025 IEEE Global Communications Conference (GLOBECOM 2025), Taipei, Taiwan, 8-12 December, 2025
Prosjekter
Digital Futures (project DEMOCRITUS) and the Swedish Research Council (Vetenskapsr˚adet), grant 2024-04058.
Tilgjengelig fra: 2026-04-02 Laget: 2026-04-02 Sist oppdatert: 2026-06-04bibliografisk kontrollert
4. Multiobjective and Constrained Reinforcement Learning for IoT
Åpne denne publikasjonen i ny fane eller vindu >>Multiobjective and Constrained Reinforcement Learning for IoT
2024 (engelsk)Inngår i: Learning techniques for Internet of Things / [ed] Praveen Kumar Donta; Abhishek Hazra; Lauri Lovén, Springer , 2024, s. 153-170Kapittel i bok, del av antologi (Fagfellevurdert)
Abstract [en]

IoT networks of the future will be characterized by autonomous decision-making by individual devices. Decision-making is done with the purpose of optimizing certain objectives. A multitude of mathematically oriented algorithms exist for solving optimization problems. However, optimization in IoT networks is challenging due to a number of uncertainties, complex network topologies, and rapid changes in the environment. This makes the data-driven and machine learning (ML) approaches more suitable for effectively handling IoT environments’ dynamic and intricate nature. However, supervised and unsupervised ML approaches depend on training data, which is not always available before training. In recent years, reinforcement learning (RL) has attracted considerable attention for solving optimization problems in IoT. This is because RL has the distinguishing feature of learning with experience while interacting with the environment without training data. A central challenge in decision-making in IoT networks is that most optimization problems consist of co-optimizing multiple conflicting objectives. With the development of multi-objective RL (MORL) approaches over the last two decades, there is great potential for utilizing them for future IoT networks. Most recently developed MORL approaches have not been applied in the IoT domain. In this chapter, we will discuss the need for efficient multi-objective optimization in IoT, the fundamentals of using RL for decision-making in IoT, an overview of existing MORL approaches, and, finally, the future scope and challenges associated with utilizing MORL for IoT.

sted, utgiver, år, opplag, sider
Springer, 2024
Emneord
Multiobjective Reinforcement Learning, Optimization, Internet of Things
HSV kategori
Forskningsprogram
data- och systemvetenskap
Identifikatorer
urn:nbn:se:su:diva-227532 (URN)10.1007/978-3-031-50514-0_8 (DOI)2-s2.0-85206043207 (Scopus ID)978-3-031-50513-3 (ISBN)978-3-031-50514-0 (ISBN)
Tilgjengelig fra: 2024-03-18 Laget: 2024-03-18 Sist oppdatert: 2026-04-13bibliografisk kontrollert
5. Intelligent Processing of Data Streams on the Edge Using Reinforcement Learning
Åpne denne publikasjonen i ny fane eller vindu >>Intelligent Processing of Data Streams on the Edge Using Reinforcement Learning
2023 (engelsk)Inngår i: 2023 IEEE International Conference on Communications Workshops (ICC Workshops), IEEE (Institute of Electrical and Electronics Engineers) , 2023Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

A key challenge in many IoT applications is to en-sure energy efficiency while processing large amounts of streaming data at the edge. Nodes often need to process time-sensitive data using limited computing and communication resources. To that end, we design a novel R - Learning based Offloading framework, RLO, that allows edge nodes to learn energy optimal decisions from experience regarding processing incoming data streams. In particular, when should the node process data locally? When should it transmit data to be processed by a fog node? And when should it store data for later processing? We validate our results on both real and simulated data streams. Simulation results show that RLO learns with time to achieve better overall-rewards with respect to three existing baseline schemes. Moreover, the proposed algorithm excels the existing baseline schemes when different priorities were set on the two objectives. We also illustrate how to adjust the priorities of the two objectives based on the application requirements and network constraints.

sted, utgiver, år, opplag, sider
IEEE (Institute of Electrical and Electronics Engineers), 2023
Serie
IEEE International Conference on Communications workshops, ISSN 2164-7038, E-ISSN 2694-2941
Emneord
Reinforcement learning, Data stream processing, Offloading, Edge, IoT
HSV kategori
Forskningsprogram
data- och systemvetenskap
Identifikatorer
urn:nbn:se:su:diva-223319 (URN)10.1109/ICCWorkshops57953.2023.10283692 (DOI)001094861300207 ()2-s2.0-85177821473 (Scopus ID)979-8-3503-3308-4 (ISBN)
Konferanse
IEEE ICC 2023 Workshop on Scalable and Trustworthy AI for 6G Wireless Networks (6GSTRAIN)
Tilgjengelig fra: 2023-10-25 Laget: 2023-10-25 Sist oppdatert: 2026-04-13
6. Dynamic and Distributed Routing in IoT Networks based on Multi-Objective Q-Learning
Åpne denne publikasjonen i ny fane eller vindu >>Dynamic and Distributed Routing in IoT Networks based on Multi-Objective Q-Learning
2026 (engelsk)Inngår i: IEEE Internet of Things Journal, ISSN 2327-4662Artikkel i tidsskrift (Fagfellevurdert) Epub ahead of print
Abstract [en]

IoT networks often face conflicting routing goals such as maximizing packet delivery, minimizing delay, and conserving limited battery energy. These priorities can also change dynamically: for example, an emergency alert requires high reliability, while routine monitoring prioritizes energy efficiency to prolong network lifetime. Existing works, including many deep reinforcement learning approaches, are typically centralized and assume static objectives, making them slow to adapt when preferences shift. We propose a dynamic and fully distributed multi-objective Q-learning routing algorithm that learns multiple per-preference Q-tables in parallel and introduces a novel greedy interpolation policy to act near-optimally for unseen preferences. The algorithm learns to optimize for energy efficiency, packet delivery ratio, and the composite reward, adapting to changing trade-offs between these metrics without retraining or centralized control. A theoretical analysis further shows that the optimal value function is Lipschitz-continuous in the preference parameter, ensuring that proposed greedy interpolation policy yields provably near-optimal behavior. Simulation results show that our approach adapts in real time to shifting priorities and achieves up to 80-90% lower energy consumption and up to 5 × higher cumulative rewards and packet delivery compared to six baseline protocols, under dynamic and distributed settings. Sensitivity analysis across varying preference window lengths confirms that the proposed DPQ framework consistently achieves higher composite reward than all baseline methods, demonstrating robustness to changes in operating conditions.

Emneord
Dynamic and Distributed Routing, Energy-efficiency, Internet of Things, Multiobjective, Q-Learning
HSV kategori
Identifikatorer
urn:nbn:se:su:diva-253302 (URN)10.1109/JIOT.2026.3666236 (DOI)2-s2.0-105030720178 (Scopus ID)
Tilgjengelig fra: 2026-03-12 Laget: 2026-03-12 Sist oppdatert: 2026-04-13

Open Access i DiVA

AI-Driven Multi-objective Decision-Making With Applications to IoT(5153 kB)92 nedlastinger
Filinformasjon
Fil FULLTEXT03.pdfFilstørrelse 5153 kBChecksum SHA-512
3e2271edf6ff2d69cef16831e7a8a11dae1a903a727047b7b0b361065ff079b54c8f067b615a48d2aa50ccaf9f5e157a5f90177137ae601b6cf4c84d3b74fdac
Type fulltextMimetype application/pdf

Person

Vaishnav, Shubham

Søk i DiVA

Av forfatter/redaktør
Vaishnav, Shubham
Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar
Totalt: 92 nedlastinger
Antall nedlastinger er summen av alle nedlastinger av alle fulltekster. Det kan for eksempel være tidligere versjoner som er ikke lenger tilgjengelige

isbn
urn-nbn

Altmetric

isbn
urn-nbn
Totalt: 190 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf