CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Z-Series: Mining and learning from complex sequential data
Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.ORCID iD: 0000-0001-7920-7669
2023 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

The amount and complexity of sequential data collected across various domains have grown rapidly, posing significant challenges for extracting useful knowledge from such data sources. The complexity arises from diverse sequence representations with varying granularities, such as multivariate time series, histogram snapshots, and heterogeneous health records, which often describe a single data instance with multiple sequences. Due to this complexity, the underlying temporal relations between sequences may not be clear and can change over time, making knowledge discovery even more challenging.

To address these challenges, this thesis proposes event intervals as a unified representation for complex sequential data. Event intervals capture the underlying temporal relations between sequences by comparing the relative locations of event intervals in both the time and value dimensions, making them suitable for describing diverse sequential data. The proposed artifacts aim to efficiently and effectively discover patterns of interest, transform sequential data in different application domains through temporal abstraction, and provide interpretable features for machine learning tasks without compromising performance. The effectiveness of the proposed artifacts is evaluated through empirical experiments and practical evaluations, which demonstrate their applicability and performance. 

The thesis is structured into three parts. First, it introduces state-of-the-art frameworks for mining event interval sequences, including frequent arrangement mining, classification, and clustering. The utility of these frameworks is demonstrated through comparative empirical evaluations against other frameworks. Second, the thesis applies temporal abstraction to complex sequential data in different application domains, showcasing its applicability through tasks such as disproportionality analysis and local grouping detection for time series. Lastly, event intervals are used as interpretable features for learning tasks, outperforming competitive algorithms using different feature representations. This part focuses on univariate and multivariate time series, and extensive experiments are performed on the publicly available benchmark datasets with statistical tests.

Place, publisher, year, edition, pages
Stockholm: Department of Computer and Systems Sciences, Stockholm University , 2023. , p. 102
Series
Report Series / Department of Computer & Systems Sciences, ISSN 1101-8526 ; 23-009
National Category
Computer Sciences
Research subject
Computer and Systems Sciences
Identifiers
URN: urn:nbn:se:su:diva-222042ISBN: 978-91-8014-508-4 (print)ISBN: 978-91-8014-509-1 (electronic)OAI: oai:DiVA.org:su-222042DiVA, id: diva2:1803200
Public defence
2023-11-24, L30, NOD-huset, Borgarfjordsgatan 12, Kista, 13:00 (English)
Opponent
Supervisors
Available from: 2023-10-31 Created: 2023-10-07 Last updated: 2023-10-24Bibliographically approved
List of papers
1. Z-Miner: an efficient method for mining frequent arrangements of event intervals
Open this publication in new window or tab >>Z-Miner: an efficient method for mining frequent arrangements of event intervals
2020 (English)In: KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Association for Computing Machinery (ACM), 2020, p. 524-534Conference paper, Published paper (Refereed)
Abstract [en]

Mining frequent patterns of event intervals from a large collection of interval sequences is a problem that appears in several application domains. In this paper, we propose Z-Miner, a novel algorithm for solving this problem that addresses the deficiencies of existing competitors by employing two novel data structures: Z-Table, a hierarchical hash-based data structure for time-efficient candidate generation and support count, and Z-Arrangement, a data structure for efficient memory consumption. The proposed algorithm is able to handle patterns with repetitions of the same event label, allowing for gap and error tolerance constraints, as well as keeping track of the exact occurrences of the extracted frequent patterns. Our experimental evaluation on eight real-world and six synthetic datasets demonstrates the superiority of Z-Miner against four state-of-the-art competitors in terms of runtime efficiency and memory footprint.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2020
National Category
Computer Sciences
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-189155 (URN)10.1145/3394486.3403095 (DOI)978-1-4503-7998-4 (ISBN)
Conference
ACM International Conference on Knowledge Discovery and Data Mining, Virtual conference, August 25, 2020
Available from: 2021-01-18 Created: 2021-01-18 Last updated: 2023-10-07Bibliographically approved
2. Mining disproportional frequent arrangements of event intervals for investigating adverse drug events
Open this publication in new window or tab >>Mining disproportional frequent arrangements of event intervals for investigating adverse drug events
2020 (English)In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), IEEE, 2020, p. 289-292Conference paper, Published paper (Refereed)
Abstract [en]

Adverse drug events are pervasive and costly medical conditions, in which novel research approaches are needed to investigate the nature of such events further and ultimately achieve early detection and prevention. In this paper, we seek to characterize patients who experience an adverse drug event, represented as a case group, by contrasting them to similar control group patients who do not experience such an event. To achieve this goal, we utilize an extensive electronic patient record database and apply a combination of frequent arrangement mining and disproportionality analysis. Our results have identified how several adverse drug events are characterized in regards to frequent disproportional arrangements, where we highlight how such arrangements can provide additional temporal-based information compared to similar approaches.

Place, publisher, year, edition, pages
IEEE, 2020
Series
IEEE International Symposium on Computer-Based Medical Systems, ISSN 2372-918X, E-ISSN 2372-9198
National Category
Computer Sciences
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-188953 (URN)10.1109/CBMS49503.2020.00061 (DOI)978-1-7281-9429-5 (ISBN)978-1-7281-9430-1 (ISBN)
Conference
Computer-Based Medical Systems, Rochester, USA, 28-30 July, 2020
Available from: 2021-01-14 Created: 2021-01-14 Last updated: 2023-10-07Bibliographically approved
3. Z-Embedding: A Spectral Representation of Event Intervals for Efficient Clustering and Classification
Open this publication in new window or tab >>Z-Embedding: A Spectral Representation of Event Intervals for Efficient Clustering and Classification
2021 (English)In: Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2020, Ghent, Belgium, September 14–18, 2020, Proceedings, Part I / [ed] Frank Hutter; Kristian Kersting; Jefrey Lijffijt; Isabel Valera, Springer Nature , 2021, p. 710-726Conference paper, Published paper (Refereed)
Abstract [en]

Sequences of event intervals occur in several application domains, while their inherent complexity hinders scalable solutions to tasks such as clustering and classification. In this paper, we propose a novel spectral embedding representation of event interval sequences that relies on bipartite graphs. More concretely, each event interval sequence is represented by a bipartite graph by following three main steps: (1) creating a hash table that can quickly convert a collection of event interval sequences into a bipartite graph representation, (2) creating and regularizing a bi-adjacency matrix corresponding to the bipartite graph, (3) defining a spectral embedding mapping on the bi-adjacency matrix. In addition, we show that substantial improvements can be achieved with regard to classification performance through pruning parameters that capture the nature of the relations formed by the event intervals. We demonstrate through extensive experimental evaluation on five real-world datasets that our approach can obtain runtime speedups of up to two orders of magnitude compared to other state-of-the-art methods and similar or better clustering and classification performance.

Place, publisher, year, edition, pages
Springer Nature, 2021
Series
Lecture Notes in Computer Science (LNCS), ISSN 0302-9743, E-ISSN 1611-3349 ; 12457
Keywords
event intervals, bipartite graph, spectral embedding, clustering, classification.
National Category
Computer Sciences
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-200591 (URN)10.1007/978-3-030-67658-2_41 (DOI)978-3-030-67658-2 (ISBN)978-3-030-67657-5 (ISBN)
Conference
European Conference, ECML PKDD 2020, September 14–18, 2020, Ghent, Belgium,
Available from: 2022-01-08 Created: 2022-01-08 Last updated: 2023-10-07Bibliographically approved
4. Z-Hist: A Temporal Abstraction of Multivariate Histogram Snapshots
Open this publication in new window or tab >>Z-Hist: A Temporal Abstraction of Multivariate Histogram Snapshots
2021 (English)In: Advances in Intelligent Data Analysis XIX: 19th International Symposium on Intelligent Data Analysis, IDA 2021, Porto, Portugal, April 26–28, 2021, Proceedings / [ed] Pedro Henriques Abreu; Pedro Pereira Rodrigues; Alberto Fernández; João Gama, Springer Nature , 2021, p. 376-388Conference paper, Published paper (Refereed)
Abstract [en]

Multivariate histogram snapshots are complex data structures that frequently occur in predictive maintenance. Histogram snapshots store large amounts of data in devices with small memory capacity, though it remains a challenge to analyze them effectively. In this paper, we propose Z-Hist, a novel framework for representing and temporally abstracting histogram snapshots by converting them into a set of temporal intervals. This conversion enables the exploitation of frequent arrangement mining techniques for extracting disproportionally frequent patterns of such complex structures. Our experiments on a turbo failure dataset from a truck Original Equipment Manufacturer (OEM) demonstrate a promising use-case of Z-Hist. We also benchmark Z-Hist on six synthetic datasets for studying the relationship between distribution changes over time and disproportionality values.

Place, publisher, year, edition, pages
Springer Nature, 2021
Series
Lecture Notes in Computer Science (LNCS), ISSN 0302-9743, E-ISSN 1611-3349 ; 12695
Keywords
Multivariate histogram snapshots, Temporal abstraction, Disproportionality analysis
National Category
Computer Sciences
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-200592 (URN)10.1007/978-3-030-74251-5_30 (DOI)978-3-030-74251-5 (ISBN)978-3-030-74250-8 (ISBN)
Conference
19th Symposium on Intelligent Data Analysis, April 26–28, 2021, Porto, Portugal
Available from: 2022-01-08 Created: 2022-01-08 Last updated: 2023-10-07Bibliographically approved
5. Finding Local Groupings of Time Series
Open this publication in new window or tab >>Finding Local Groupings of Time Series
2023 (English)In: Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2022, Grenoble, France, September 19–23, 2022, Proceedings, Part VI / [ed] Massih-Reza Amini; Stéphane Canu; Asja Fischer; Tias Guns; Petra Kralj Novak; Grigorios Tsoumakas, Springer Nature , 2023, p. 70-86Conference paper, Published paper (Refereed)
Abstract [en]

Collections of time series can be grouped over time both globally, over their whole time span, as well as locally, over several common time ranges, depending on the similarity patterns they share. In addition, local groupings can be persistent over time, defining associations of local groupings. In this paper, we introduce Z-Grouping, a novel framework for finding local groupings and their associations. Our solution converts time series to a set of event label channels by applying a temporal abstraction function and finds local groupings of maximized time span and time series instance members. A grouping-instance matrix structure is also exploited to detect associations of contiguous local groupings sharing common member instances. Finally, the validity of each local grouping is assessed against predefined global groupings. We demonstrate the ability of Z-Grouping to find local groupings without size constraints on time ranges on a synthetic dataset, three real-world datasets, and 128 UCR datasets, against four competitors.

Place, publisher, year, edition, pages
Springer Nature, 2023
Series
Lecture Notes in Computer Science (LNCS), ISSN 0302-9743, E-ISSN 1611-3349
Keywords
Local groupings, temporal abstractions, time series
National Category
Computer Sciences
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-221694 (URN)10.1007/978-3-031-26422-1_5 (DOI)2-s2.0-85150942906 (Scopus ID)978-3-031-26422-1 (ISBN)978-3-031-26421-4 (ISBN)
Conference
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 19-23 September, 2023, Grenoble, France.
Available from: 2023-09-27 Created: 2023-09-27 Last updated: 2023-10-07Bibliographically approved
6. Z-Time: efficient and effective interpretable multivariate time series classification
Open this publication in new window or tab >>Z-Time: efficient and effective interpretable multivariate time series classification
2023 (English)In: Data mining and knowledge discovery, ISSN 1384-5810, E-ISSN 1573-756XArticle in journal (Refereed) Epub ahead of print
Abstract [en]

Multivariate time series classification has become popular due to its prevalence in many real-world applications. However, most state-of-the-art focuses on improving classification performance, with the best-performing models typically opaque. Interpretable multivariate time series classifiers have been recently introduced, but none can maintain sufficient levels of efficiency and effectiveness together with interpretability. We introduce Z-Time, a novel algorithm for effective and efficient interpretable multivariate time series classification. Z-Time employs temporal abstraction and temporal relations of event intervals to create interpretable features across multiple time series dimensions. In our experimental evaluation on the UEA multivariate time series datasets, Z-Time achieves comparable effectiveness to state-of-the-art non-interpretable multivariate classifiers while being faster than all interpretable multivariate classifiers. We also demonstrate that Z-Time is more robust to missing values and inter-dimensional orders, compared to its interpretable competitors.

Keywords
Multivariate time series, Temporal abstraction, Event interval sequences, Interpretable multivariate time series classification
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:su:diva-221736 (URN)10.1007/s10618-023-00969-x (DOI)001062746800001 ()
Available from: 2023-09-28 Created: 2023-09-28 Last updated: 2023-10-13

Open Access in DiVA

Z-Series: Mining and learning from complex sequential data(1957 kB)50 downloads
File information
File name FULLTEXT01.pdfFile size 1957 kBChecksum SHA-512
16b5ccf662184ac3d68914d9a745251cc7be1f09f0097a7800456f749fee58cc97641a7842b3b3ae2cc76e3b0b26cfc76bc75cff1a8286f7b89a0e47368c366c
Type fulltextMimetype application/pdf

Authority records

Lee, Zed

Search in DiVA

By author/editor
Lee, Zed
By organisation
Department of Computer and Systems Sciences
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 50 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 882 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf