Change search
Link to record
Permanent link

Direct link
Publications (10 of 44) Show all publications
Beikmohammadi, A., Khirirat, S., Richtárik, P. & Magnússon, S. (2025). Collaborative Value Function Estimation Under Model Mismatch: A Federated Temporal Difference Analysis. In: Rita P. Ribeiro; Bernhard Pfahringer; Nathalie Japkowicz; Pedro Larrañaga; Alípio M. Jorge; Carlos Soares; Pedro H. Abreu; João Gama (Ed.), Machine Learning and Knowledge Discovery in Databases. Research Track: European Conference, ECML PKDD 2025, Porto, Portugal, September 15–19, 2025, Proceedings, Part VI. Paper presented at European Conference, ECML PKDD 2025, Porto, Portugal, September 15–19, 2025. (pp. 41-58). Springer
Open this publication in new window or tab >>Collaborative Value Function Estimation Under Model Mismatch: A Federated Temporal Difference Analysis
2025 (English)In: Machine Learning and Knowledge Discovery in Databases. Research Track: European Conference, ECML PKDD 2025, Porto, Portugal, September 15–19, 2025, Proceedings, Part VI / [ed] Rita P. Ribeiro; Bernhard Pfahringer; Nathalie Japkowicz; Pedro Larrañaga; Alípio M. Jorge; Carlos Soares; Pedro H. Abreu; João Gama, Springer , 2025, p. 41-58Conference paper, Published paper (Refereed)
Abstract [en]

Federated reinforcement learning (FedRL) enables collaborative learning while preserving data privacy by preventing direct data exchange between agents. However, many existing FedRL algorithms assume that all agents operate in identical environments, which is often unrealistic. In real-world applications, such as multi-robot teams, crowdsourced systems, and large-scale sensor networks, each agent may experience slightly different transition dynamics, leading to inherent model mismatches. In this paper, we first establish linear convergence guarantees for single-agent temporal difference learning (TD(0)) in policy evaluation and demonstrate that under a perturbed environment, the agent suffers a systematic bias that prevents accurate estimation of the true value function. This result holds under both i.i.d. and Markovian sampling regimes. We then extend our analysis to the federated TD(0) (FedTD(0)) setting, where multiple agents, each interacting with its own perturbed environment, periodically share value estimates to collaboratively approximate the true value function of a common underlying model. Our theoretical results indicate the impact of model mismatch, network connectivity, and mixing behavior on the convergence of FedTD(0). Empirical experiments corroborate our theoretical gains, highlighting that even moderate levels of information sharing significantly mitigate environment-specific errors.

Place, publisher, year, edition, pages
Springer, 2025
Series
Lecture Notes in Computer Science (LNCS), ISSN 0302-9743, E-ISSN 1611-3349 ; 16018
Keywords
Federated Reinforcement Learning, Model Mismatch in Reinforcement Learning, Temporal Difference Learning, Policy Evaluation
National Category
Computer Sciences
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-248239 (URN)10.1007/978-3-032-06106-5_3 (DOI)2-s2.0-105020022174 (Scopus ID)978-3-032-06106-5 (ISBN)978-3-032-06105-8 (ISBN)
Conference
European Conference, ECML PKDD 2025, Porto, Portugal, September 15–19, 2025.
Available from: 2025-10-20 Created: 2025-10-20 Last updated: 2025-11-06Bibliographically approved
Vaishnav, S., Khirirat, S. & Magnússon, S. (2025). Communication-Adaptive Gradient Sparsification for Federated Learning with Error Compensation. IEEE Internet of Things Journal, 12(2), 1137-1152
Open this publication in new window or tab >>Communication-Adaptive Gradient Sparsification for Federated Learning with Error Compensation
2025 (English)In: IEEE Internet of Things Journal, ISSN 2327-4662, Vol. 12, no 2, p. 1137-1152Article in journal (Other academic) Published
Abstract [en]

Federated learning has emerged as a popular distributed machine-learning paradigm. It involves many rounds of iterative communication between nodes to exchange model parameters. With the increasing complexity of ML tasks, the models can be large, having millions of parameters. Moreover, edge and IoT nodes often have limited energy resources and channel bandwidths. Thus, reducing the communication cost in Federated Learning is a bottleneck problem. This cost could be in terms of energy consumed, delay involved, or amount of data communicated. We propose a communication cost-adaptive model sparsification for Federated Learning with Error Compensation. The central idea is to adapt the sparsification level in run-time by optimizing the ratio between the impact of the communicated model parameters and communication cost. We carry out a detailed convergence analysis to establish the theoretical foundations of the proposed algorithm. We conduct extensive experiments to train both convex and non-convex machine learning models on a standard dataset. We illustrate the efficiency of the proposed algorithm by comparing its performance with three baseline schemes. The performance of the proposed algorithm is validated for two communication models and three cost functions. Simulation results show that the proposed algorithm needs a substantially less amount of communication than the three baseline schemes while achieving the best accuracy and fastest convergence. The results are consistent for all the considered cost models, cost functions, and ML models. Thus, the proposed FL-CATE algorithm can substantially improve the communication efficiency of federated learning, irrespective of the ML tasks, costs, and communication models.

Keywords
Federated learning, Communication efficiency, IoT, Gradient sparsification, Distributed learning
National Category
Computer Sciences
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-235702 (URN)10.1109/JIOT.2024.3490855 (DOI)001395714600019 ()2-s2.0-85208723002 (Scopus ID)
Note

The article is available online under early access area on IEEE Xplore. This article has been accepted for publication in a future issue of this journal, but has not been edited and content may change prior to final publication. It may be cited as an article in a future issue by its Digital Object Identifier.

Available from: 2024-11-19 Created: 2024-11-19 Last updated: 2025-02-24Bibliographically approved
Wang, H., Huang, W., Magnússon, S., Lindgren, T., Chen, C., Wu, J. & Song, Y. (2025). Crowding distance and IGD-driven grey wolf reinforcement learning approach for multi-objective agile earth observation satellite scheduling. International Journal of Digital Earth, 18(1), Article ID 2458024.
Open this publication in new window or tab >>Crowding distance and IGD-driven grey wolf reinforcement learning approach for multi-objective agile earth observation satellite scheduling
Show others...
2025 (English)In: International Journal of Digital Earth, ISSN 1753-8947, E-ISSN 1753-8955, Vol. 18, no 1, article id 2458024Article in journal (Refereed) Published
Abstract [en]

With the rise of low-cost launches, miniaturized space technology, and commercialization, the cost of space missions has dropped, leading to a surge in flexible Earth observation satellites. This increased demand for complex and diverse imaging products requires addressing multi-objective optimization in practice. To this end, we propose a multi-objective agile Earth observation satellite scheduling problem (MOAEOSSP) model and introduce a reinforcement learning-based multi-objective grey wolf optimization (RLMOGWO) algorithm. It aims to maximize observation efficiency while minimizing energy consumption. During population initialization, the algorithm uses chaos mapping and opposition-based learning to enhance diversity and global search, reducing the risk of local optima. It integrates Q-learning into an improved multi-objective grey wolf optimization framework, designing state-action combinations that balance exploration and exploitation. Dynamic parameter adjustments guide position updates, boosting adaptability across different optimization stages. Moreover, the algorithm introduces a reward mechanism based on the crowding distance and inverted generational distance (IGD) to maintain Pareto front diversity and distribution, ensuring a strong multi-objective optimization performance. The experimental results show that the algorithm excels at solving the MOAEOSSP, outperforming competing algorithms across several metrics and demonstrating its effectiveness for complex optimization problems.

Keywords
Earth observation satellite, grey wolf algorithm, multi-objective optimization, Q-learning, reinforcement learning, scheduling
National Category
Computer Sciences
Identifiers
urn:nbn:se:su:diva-240188 (URN)10.1080/17538947.2025.2458024 (DOI)001410804300001 ()2-s2.0-85216608663 (Scopus ID)
Available from: 2025-03-04 Created: 2025-03-04 Last updated: 2025-03-04Bibliographically approved
Beikmohammadi, A. & Magnússon, S. (2025). Human-inspired framework to accelerate reinforcement learning. Journal of Supercomputing, 81(12), Article ID 1239.
Open this publication in new window or tab >>Human-inspired framework to accelerate reinforcement learning
2025 (English)In: Journal of Supercomputing, ISSN 0920-8542, E-ISSN 1573-0484, Vol. 81, no 12, article id 1239Article in journal (Refereed) Published
Abstract [en]

Reinforcement learning (RL) is crucial for data science decision-making but suffers from sample inefficiency, particularly in real-world scenarios with costly physical interactions. This paper introduces a novel human-inspired framework to enhance the RL algorithm’s sample efficiency. It achieves this by initially exposing the learning agent to simpler tasks that progressively increase in complexity, ultimately leading to the main task. This method requires no pre-training and involves learning simpler tasks for just one episode. The resulting knowledge can facilitate various transfer learning approaches, such as value and policy transfer, without increasing computational complexity. It can be applied across different goals, environments, and RL algorithms, including value-based, policy-based, tabular, and deep RL methods. Experimental evaluations demonstrate the framework’s effectiveness in enhancing sample efficiency, especially in challenging main tasks, demonstrated through both a simple random walk and more complex optimal control problems with constraints.

Keywords
Deep reinforcement learning, Exploration, Policy optimization, PPO, Sample efficiency
National Category
Human Computer Interaction
Identifiers
urn:nbn:se:su:diva-246714 (URN)10.1007/s11227-025-07737-2 (DOI)001550362100002 ()2-s2.0-105013224205 (Scopus ID)
Available from: 2025-09-11 Created: 2025-09-11 Last updated: 2025-09-11Bibliographically approved
Beikmohammadi, A., Khirirat, S. & Magnússon, S. (2025). On the Convergence of Federated Learning Algorithms Without Data Similarity. IEEE Transactions on Big Data, 11(2), 659-668
Open this publication in new window or tab >>On the Convergence of Federated Learning Algorithms Without Data Similarity
2025 (English)In: IEEE Transactions on Big Data, E-ISSN 2332-7790, Vol. 11, no 2, p. 659-668Article in journal (Refereed) Published
Abstract [en]

Data similarity assumptions have traditionally been relied upon to understand the convergence behaviors of federated learning methods. Unfortunately, this approach often demands fine-tuning step sizes based on the level of data similarity. When data similarity is low, these small step sizes result in an unacceptably slow convergence speed for federated methods. In this paper, we present a novel and unified framework for analyzing the convergence of federated learning algorithms without the need for data similarity conditions. Our analysis centers on an inequality that captures the influence of step sizes on algorithmic convergence performance. By applying our theorems to well-known federated algorithms, we derive precise expressions for three widely used step size schedules: fixed, diminishing, and step-decay step sizes, which are independent of data similarity conditions. Finally, we conduct comprehensive evaluations of the performance of these federated learning algorithms, employing the proposed step size strategies to train deep neural network models on benchmark datasets under varying data similarity conditions. Our findings demonstrate significant improvements in convergence speed and overall performance, marking a substantial advancement in federated learning research.

Keywords
Compression algorithms, federated learning, gradient methods, machine learning
National Category
Computer Sciences
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-232103 (URN)10.1109/TBDATA.2024.3423693 (DOI)001445067800004 ()2-s2.0-105001081071 (Scopus ID)
Available from: 2024-07-24 Created: 2024-07-24 Last updated: 2025-04-04Bibliographically approved
Beikmohammadi, A., Khirirat, S. & Magnússon, S. (2025). Parallel Momentum Methods Under Biased Gradient Estimations. IEEE Transactions on Control of Network Systems, 12(2), 1721-1732
Open this publication in new window or tab >>Parallel Momentum Methods Under Biased Gradient Estimations
2025 (English)In: IEEE Transactions on Control of Network Systems, E-ISSN 2325-5870, Vol. 12, no 2, p. 1721-1732Article in journal (Refereed) Published
Abstract [en]

Parallel stochastic gradient methods are gaining prominence in solving large-scale machine learning problems that involve data distributed across multiple nodes. However, obtaining unbiased stochastic gradients, which have been the focus of most theoretical research, is challenging in many distributed machine learning applications. The gradient estimations easily become biased, for example, when gradients are compressed or clipped, when data is shuffled, and in meta-learning and reinforcement learning. In this work, we establish worst-case bounds on parallel momentum methods under biased gradient estimation on both general non-convex and μ-PL non-convex problems. Our analysis covers general distributed optimization problems, and we work out the implications for special cases where gradient estimates are biased, i.e. in meta-learning and when the gradients are compressed or clipped. Our numerical experiments verify our theoretical findings and show faster convergence performance of momentum methods than traditional biased gradient descent.

Keywords
Stochastic Gradient Descent, Parallel Momentum Methods, Biased Gradient Estimation, Compressed Gradients, Composite Gradients
National Category
Computer Sciences
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-237626 (URN)10.1109/TCNS.2025.3527255 (DOI)001512536600040 ()2-s2.0-85214682075 (Scopus ID)
Available from: 2025-01-09 Created: 2025-01-09 Last updated: 2025-09-16Bibliographically approved
Kharazian, Z., Lindgren, T., Magnússon, S., Steinert, O. & Andersson Reyna, O. (2025). SCANIA Component X dataset: a real-world multivariate time series dataset for predictive maintenance. Scientific Data, 12, Article ID 493.
Open this publication in new window or tab >>SCANIA Component X dataset: a real-world multivariate time series dataset for predictive maintenance
Show others...
2025 (English)In: Scientific Data, E-ISSN 2052-4463, Vol. 12, article id 493Article in journal (Refereed) Published
Abstract [en]

Predicting failures and maintenance time in predictive maintenance is challenging due to the scarcity of comprehensive real-world datasets, and among those available, few are of time series format. This paper introduces a real-world, multivariate time series dataset collected exclusively from a single anonymized engine component (Component X) across a fleet of SCANIA trucks. The dataset includes operational data, repair records, and specifications related to Component X while maintaining confidentiality through anonymization. It is well-suited for a range of machine learning applications, including classification, regression, survival analysis, and anomaly detection, particularly in predictive maintenance scenarios. The dataset’s large population size, diverse features (in the form of histograms and numerical counters), and temporal information make it a unique resource in the field. The objective of releasing this dataset is to give a broad range of researchers the possibility of working with real-world data from an internationally well-known company and introduce a standard benchmark to the predictive maintenance field, fostering reproducible research.

National Category
Reliability and Maintenance
Identifiers
urn:nbn:se:su:diva-241823 (URN)10.1038/s41597-025-04802-6 (DOI)001451143800005 ()2-s2.0-105000887799 (Scopus ID)
Available from: 2025-04-10 Created: 2025-04-10 Last updated: 2025-04-10Bibliographically approved
Wang, H., Huang, W., Magnússon, S., Lindgren, T., Wang, R. & Song, Y. (2024). A Strategy Fusion-Based Multiobjective Optimization Approach for Agile Earth Observation Satellite Scheduling Problem. IEEE Transactions on Geoscience and Remote Sensing, 62, 1-14, Article ID 5930214.
Open this publication in new window or tab >>A Strategy Fusion-Based Multiobjective Optimization Approach for Agile Earth Observation Satellite Scheduling Problem
Show others...
2024 (English)In: IEEE Transactions on Geoscience and Remote Sensing, ISSN 0196-2892, E-ISSN 1558-0644, Vol. 62, p. 1-14, article id 5930214Article in journal (Refereed) Published
Abstract [en]

Agile satellite imaging scheduling plays a vital role in improving emergency response, urban planning, national defense, and resource management. With the rise in the number of in-orbit satellites and observation windows, the need for diverse agile Earth observation satellite (AEOS) scheduling has surged. However, current research seldom addresses multiple optimization objectives, which are crucial in many engineering practices. This article tackles a multiobjective AEOS scheduling problem (MOAEOSSP) that aims to optimize total observation task profit, satellite energy consumption, and load balancing. To address this intricate problem, we propose a strategy-fused multiobjective dung beetle optimization (SFMODBO) algorithm. This novel algorithm harnesses the position update characteristics of various dung beetle populations and integrates multiple high-adaptability strategies. Consequently, it strikes a better balance between global search capability and local exploitation accuracy, making it more effective at exploring the solution space and avoiding local optima. The SFMODBO algorithm enhances global search capabilities through diverse strategies, ensuring thorough coverage of the search space. Simultaneously, it significantly improves local optimization precision by fine-tuning solutions in promising regions. This dual approach enables more robust and efficient problem-solving. Simulation experiments confirm the effectiveness and efficiency of the SFMODBO algorithm. Results indicate that it significantly outperforms competitors across multiple metrics, achieving superior scheduling schemes. In addition to these enhanced metrics, the proposed algorithm also exhibits advantages in computation time and resource utilization. This not only demonstrates the algorithm’s robustness but also underscores its efficiency and speed in solving the MOAEOSSP.

Keywords
Satellites, Optimization, Scheduling, Mathematical models, Earth, Processor scheduling, Heuristic algorithms, Search problems, Energy consumption, Computational modeling, Agile Earth observation satellite (AEOS), multiobjective dung beetle optimization (MODBO), remote sensing, satellite observation scheduling
National Category
Computer Sciences
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-237872 (URN)10.1109/TGRS.2024.3472749 (DOI)001338406700001 ()2-s2.0-85206199686 (Scopus ID)
Available from: 2025-01-14 Created: 2025-01-14 Last updated: 2025-01-14Bibliographically approved
Beikmohammadi, A. & Magnússon, S. (2024). Accelerating actor-critic-based algorithms via pseudo-labels derived from prior knowledge. Information Sciences, 661, Article ID 120182.
Open this publication in new window or tab >>Accelerating actor-critic-based algorithms via pseudo-labels derived from prior knowledge
2024 (English)In: Information Sciences, ISSN 0020-0255, E-ISSN 1872-6291, Vol. 661, article id 120182Article in journal (Refereed) Published
Abstract [en]

Despite the huge success of reinforcement learning (RL) in solving many difficult problems, its Achilles heel has always been sample inefficiency. On the other hand, in RL, taking advantage of prior knowledge, intentionally or unintentionally, has usually been avoided, so that, training an agent from scratch is common. This not only causes sample inefficiency but also endangers safety –especially during exploration. In this paper, we help the agent learn from the environment by using the pre-existing (but not necessarily exact or complete) solution for a task. Our proposed method can be integrated with any RL algorithm developed based on policy gradient and actor-critic methods. The results on five tasks with different difficulty levels by using two well-known actor-critic-based methods as the backbone of our proposed method (SAC and TD3) show our success in greatly improving sample efficiency and final performance. We have gained these results alongside robustness to noisy environments at the cost of just a slight computational overhead, which is negligible.

Keywords
Reinforcement learning, Deep RL, Actor-critic methods, Policy optimization, Sample efficiency, Exploration
National Category
Computer Sciences
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-226601 (URN)10.1016/j.ins.2024.120182 (DOI)001173851600001 ()2-s2.0-85183589956 (Scopus ID)
Available from: 2024-02-14 Created: 2024-02-14 Last updated: 2024-03-26Bibliographically approved
Beikmohammadi, A., Khirirat, S. & Magnússon, S. (2024). Compressed Federated Reinforcement Learning with a Generative Model. In: Albert Bifet; Jesse Davis; Tomas Krilavičius; Meelis Kull; Eirini Ntoutsi; Indrė Žliobaitė (Ed.), Machine Learning and Knowledge Discovery in Databases. Research Track: European Conference, ECML PKDD 2024, Vilnius, Lithuania, September 9–13, 2024, Proceedings, Part IV. Paper presented at European Conference, ECML PKDD 2024, 9-13 September, 2024, Vilnius, Lithuania. (pp. 20-37). Springer
Open this publication in new window or tab >>Compressed Federated Reinforcement Learning with a Generative Model
2024 (English)In: Machine Learning and Knowledge Discovery in Databases. Research Track: European Conference, ECML PKDD 2024, Vilnius, Lithuania, September 9–13, 2024, Proceedings, Part IV / [ed] Albert Bifet; Jesse Davis; Tomas Krilavičius; Meelis Kull; Eirini Ntoutsi; Indrė Žliobaitė, Springer , 2024, p. 20-37Conference paper, Published paper (Refereed)
Abstract [en]

Reinforcement learning has recently gained unprecedented popularity, yet it still grapples with sample inefficiency. Addressing this challenge, federated reinforcement learning (FedRL) has emerged, wherein agents collaboratively learn a single policy by aggregating local estimations. However, this aggregation step incurs significant communication costs. In this paper, we propose CompFedRL, a communication-efficient FedRL approach incorporating both \textit{periodic aggregation} and (direct/error-feedback) compression mechanisms. Specifically, we consider compressed federated Q-learning with a generative model setup, where a central server learns an optimal Q-function by periodically aggregating compressed Q-estimates from local agents. For the first time, we characterize the impact of these two mechanisms (which have remained elusive) by providing a finite-time analysis of our algorithm, demonstrating strong convergence behaviors when utilizing either direct or error-feedback compression. Our bounds indicate improved solution accuracy concerning the number of agents and other federated hyperparameters while simultaneously reducing communication costs. To corroborate our theory, we also conduct in-depth numerical experiments to verify our findings, considering Top-K and Sparsified-K sparsification operators.

Place, publisher, year, edition, pages
Springer, 2024
Series
Lecture Notes in Computer Science (LNCS), ISSN 0302-9743, E-ISSN 1611-3349
Keywords
Federated Reinforcement Learning, Communication Efficiency, Direct Compression, Error-feedback Compression
National Category
Computer Sciences
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-233206 (URN)10.1007/978-3-031-70359-1_2 (DOI)978-3-031-70359-1 (ISBN)978-3-031-70358-4 (ISBN)
Conference
European Conference, ECML PKDD 2024, 9-13 September, 2024, Vilnius, Lithuania.
Available from: 2024-09-04 Created: 2024-09-04 Last updated: 2024-09-06Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-6617-8683

Search in DiVA

Show all publications