89101112131411 of 14
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Exploring Natural Language Processing for Linking Digital Learning Materials: Towards Intelligent and Adaptive Learning Systems
Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.ORCID iD: 0000-0002-7860-1784
2024 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

The digital transformation in education has created many opportunities but also made it challenging to navigate the growing landscape of digital learning materials. The volume and diversity of learning resources create challenges for both educators and learners to identify and utilize the most relevant resources based on specific learning contexts. In light of this, there is a critical demand for systems capable of effectively connecting different learning materials to support teaching and learning activities and, for that purpose, natural language processing can be used to provide some of the essential building blocks for educational content recommendation systems. Hence, this thesis explores the use of natural language processing techniques for automatically linking and recommending relevant learning resources in the form of textbook content, exercises and curriculum goals. A key question is how to represent diverse learning materials effectively and, to that end, various language models are explored; the obtained representations are then used for measuring semantic textual similarity between learning materials. Learning materials can also be represented based on educational concepts, which is investigated in an ontology-based linking approach. To further enhance the representations and improve linking performance, different language models can be combined and augmented using external knowledge in the form of knowledge graphs and knowledge bases. Beyond approaches based on semantic textual similarity, prompting large language models is explored and a method based on retrieval-augmented generation (RAG) to improve linking performance is proposed. 

The thesis presents a systematic empirical evaluation of natural language processing techniques for representing and linking digital learning content, spanning different types of learning materials, use cases, and subjects. The results demonstrate the feasibility of unsupervised approaches based on semantic textual similarity of representations derived from pre-trained language models, and that contextual embeddings outperform traditional text representation methods. Furthermore, zero-shot prompting of large language models can outperform methods based on semantic textual similarity, leveraging RAG to exploit an external knowledge base in the form of a digital textbook. The potential practical applications of the proposed approaches for automatic linking of digital learning materials pave the way for the development of intelligent and adaptive learning systems, including intelligent textbooks.

Place, publisher, year, edition, pages
Stockholm: Department of Computer and Systems Sciences, Stockholm University , 2024. , p. 70
Series
Report Series / Department of Computer & Systems Sciences, ISSN 1101-8526 ; 24-011
Keywords [en]
Natural Language Processing, Technology Enhanced Learning, Educational Content Recommendation, Intelligent Textbooks, Pre-Trained Language Models, Large Language Models, Semantic Textual Similarity, Knowledge Graphs
National Category
Computer and Information Sciences
Research subject
Computer and Systems Sciences
Identifiers
URN: urn:nbn:se:su:diva-232990ISBN: 978-91-8014-927-3 (print)ISBN: 978-91-8014-928-0 (electronic)OAI: oai:DiVA.org:su-232990DiVA, id: diva2:1895798
Public defence
2024-10-22, Lilla hörsalen, NOD-huset, Borgarfjordsgatan 12, Kista, 13:00 (English)
Opponent
Supervisors
Available from: 2024-09-27 Created: 2024-09-06 Last updated: 2024-09-19Bibliographically approved
List of papers
1. Automatic Educational Concept Extraction Using NLP
Open this publication in new window or tab >>Automatic Educational Concept Extraction Using NLP
Show others...
2022 (English)In: Methodologies and Intelligent Systems for Technology Enhanced Learning, 12th International Conference / [ed] Marco Temperini; Vittorio Scarano; Ivana Marenzi; Milos Kravcik; Elvira Popescu; Rosa Lanzillotti; Rosella Gennari; Fernando De la Prieta; Tania Di Mascio; Pierpaolo Vittorini, Springer Nature , 2022, p. 133-138Conference paper, Published paper (Refereed)
Abstract [en]

Educational concepts are the core of teaching and learning. From the perspective of educational technology, concepts are essential meta-data, represen- tative terms that can connect different learning materials, and are the foundation for many downstream tasks. Some studies on automatic concept extraction have been conducted, but there are no studies looking at the K-12 level and focused on the Swedish language. In this paper, we use a state-of-the-art Swedish BERT model to build an automatic concept extractor for the Biology subject using fine- annotated digital textbook data that cover all content for K-12. The model gives a recall measure of 72% and has the potential to be used in real-world settings for use cases that require high recall. Meanwhile, we investigate how input data fea- tures influence model performance and provide guidance on how to effectively use text data to achieve the optimal results when building a named entity recognition (NER) model.

Place, publisher, year, edition, pages
Springer Nature, 2022
Series
Lecture Notes in Networks and Systems, ISSN 2367-3370, E-ISSN 2367-3389 ; 580
Keywords
Concept extraction, NLP, BERT, Sequence model, NER
National Category
Computer Sciences
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-213067 (URN)10.1007/978-3-031-20617-7_17 (DOI)000921287500017 ()2-s2.0-85144211791 (Scopus ID)978-3-031-20617-7 (ISBN)978-3-031-20616-0 (ISBN)
Conference
MIS4TEL 2022, 12th International Conference on Methodologies and Intelligent Systems for Technology Enhanced Learning, L'Aquila (Italy) / Hybrid, 13-15 July, 2022
Available from: 2022-12-19 Created: 2022-12-19 Last updated: 2024-09-06Bibliographically approved
2. Linking Swedish Learning Materials to Exercises through an AI-Enhanced Recommender System
Open this publication in new window or tab >>Linking Swedish Learning Materials to Exercises through an AI-Enhanced Recommender System
Show others...
2023 (English)In: Methodologies and Intelligent Systems for Technology Enhanced Learning, 13th International Conference / [ed] Marcelo Milrad, Nuno Otero, María Cruz Sánchez‑Gómez, Juan José Mena, Dalila Durães, Filippo Sciarrone, Claudio Alvarez-Gómez, Manuel Rodrigues, Pierpaolo Vittorini, Rosella Gennari, Tania Di Mascio, Marco Temperini, Fernando De la Prieta, Cham: Springer, 2023, p. 96-107Conference paper, Published paper (Refereed)
Abstract [en]

As an integral part of AI-enhanced learning, a content recommender automatically filters and recommends relevant learning materials to the learner or the instructor in a learning system. It can effectively help instructors in pedagogical practices and support students in self-regulated learning. Content recommendation technologies and applications have been studied extensively, however, the SOTA technologies have not adequately adapted to the education domain and there is very limited research on how different models and solutions can be applied in the Swedish context and for multiple subjects. In this paper, we develop a text similarity-based content recommender system. Specifically, given a quiz, we automatically recommend supportive learning resources as a reference to the answer and link back to the textbook sections where the examined knowledge points reside. We present a generic method for Swedish educational content recommendations using the most representative models, evaluate and analyze in multi-dimensions such as model types, pooling methods, subjects etc. The best results are obtained by Sentence-BERT (SBERT) with max paragraph-level pooling, outperforming traditional Natural Language Processing (NLP) models and knowledge graph-based models, obtaining on average 95% in Recall@3 and 82% in MRR, and outstanding in dealing with texts containing symbols, equations or calculations. This research provides empirical evidence and analysis, and can be used as a guidance when building a Swedish educational content recommender.

Place, publisher, year, edition, pages
Cham: Springer, 2023
Series
Lecture Notes in Networks and Systems, ISSN 2367-3370, E-ISSN 2367-3389 ; 764
Keywords
AI-enhanced Learning, Educational Content Recommender, NLP, Text Similarity, Textual Semantic Search
National Category
Computer Sciences
Identifiers
urn:nbn:se:su:diva-223047 (URN)10.1007/978-3-031-41226-4_10 (DOI)2-s2.0-85172692344 (Scopus ID)978-3-031-41225-7 (ISBN)978-3-031-41226-4 (ISBN)
Conference
13th International Conference on Methodologies and Intelligent Systems for Technology Enhanced Learning (MIS4TEL 2023), Guimarães, Portugal, July 12-14, 2023
Available from: 2023-10-18 Created: 2023-10-18 Last updated: 2024-09-06Bibliographically approved
3. Evaluating Embeddings from Pre-Trained Language Models and Knowledge Graphs for Educational Content Recommendation
Open this publication in new window or tab >>Evaluating Embeddings from Pre-Trained Language Models and Knowledge Graphs for Educational Content Recommendation
Show others...
2024 (English)In: Future Internet, E-ISSN 1999-5903, Vol. 16, no 1, p. 1-21Article in journal (Refereed) Published
Abstract [en]

Educational content recommendation is a cornerstone of AI-enhanced learning. In particular, to facilitate navigating the diverse learning resources available on learning platforms, methods are needed for automatically linking learning materials, e.g. in order to recommend textbook content based on exercises. Such methods are typically based on semantic textual similarity (STS) and the use of embeddings for text representation. However, it remains unclear what types of embeddings should be used for this task. In this study, we carry out an extensive empirical evaluation of embeddings derived from three different types of models: (i) static embeddings trained using a concept-based knowledge graph, (ii) contextual embeddings from a pre-trained language model, and (iii) contextual embeddings from a large language model (LLM). In addition to evaluating the models individually, various ensembles are explored based on different strategies for combining two models in an early vs. late fusion fashion. The evaluation is carried out using digital textbooks in Swedish for three different subjects and two types of exercises. The results show that using contextual embeddings from an LLM leads to superior performance compared to the other models, and that there is no significant improvement when combining these with static embeddings trained using a knowledge graph. When using embeddings derived from a smaller language model, however, it helps to combine them with knowledge graph embeddings. The performance of the best-performing model is high for both types of exercises, resulting in a mean Recall@3 of 0.96 and 0.95 and a mean MRR of 0.87 and 0.86 for quizzes and study questions, respectively, demonstrating the feasibility of using STS based on text embeddings for educational content recommendation. The ability to link digital learning materials in an unsupervised manner -- relying only on readily available pre-trained models -- facilitates the development of AI-enhanced learning.

Keywords
Educational Content Recommendation; AI-Enhanced Learning; Pre-Trained Language Models; Ensemble Embeddings; Knowledge Graph Embeddings; Text Similarity; Textual Semantic Search, Natural Language Processing
National Category
Language Technology (Computational Linguistics)
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-224920 (URN)10.3390/fi16010012 (DOI)001149119000001 ()2-s2.0-85183383290 (Scopus ID)
Available from: 2024-01-02 Created: 2024-01-02 Last updated: 2024-09-06
4. Supporting Teaching-to-the-Curriculum by Linking Diagnostic Tests to Curriculum Goals: Using Textbook Content as Context for Retrieval-Augmented Generation with Large Language Models
Open this publication in new window or tab >>Supporting Teaching-to-the-Curriculum by Linking Diagnostic Tests to Curriculum Goals: Using Textbook Content as Context for Retrieval-Augmented Generation with Large Language Models
Show others...
2024 (English)In: Artificial Intelligence in Education: 25th International Conference, AIED 2024, Recife, Brazil, July 8–12, 2024, Proceedings, Part I / [ed] Andrew M. Olney; Irene-Angelica Chounta; Zitao Liu; Olga C. Santos; Ig Ibert Bittencourt, Springer Nature , 2024, p. 118-132Conference paper, Published paper (Refereed)
Abstract [en]

Using AI for automatically linking exercises to curriculum goals can support many educational use cases and facilitate teaching-to-the-curriculum by ensuring that exercises adequately reflect and encompass the curriculum goals, ultimately enabling curriculum-based assessment. Here, we introduce this novel task and create a manually labeled dataset where two types of diagnostic tests are linked to curriculum goals for Biology G7-9 in Sweden. We cast the problem both as an information retrieval task and a multi-class text classification task and explore unsupervised approaches to both, as labeled data for such tasks is typically scarce. For the information retrieval task, we employ SOTA embedding model ADA-002 for semantic textual similarity (STS), while we prompt a large language model in the form of ChatGPT to classify diagnostic tests into curriculum goals. For both task formulations, we investigate different ways of using textbook content as a pivot and provide additional context for linking diagnostic tests to curriculum goals. We show that a combination of the two approaches in a retrieval-augmented generation model, whereby STS is used for retrieving textbook content as context to ChatGPT that then performs zero-shot classification, leads to the best classification accuracy (73.5%), outperforming both STS-based classification (67.5%) and LLM-based classification without context (71.5%). Finally, we showcase how the proposed method could be used in pedagogical practices.

Place, publisher, year, edition, pages
Springer Nature, 2024
Series
Lecture Notes in Computer Science (LNCS), ISSN 0302-9743, E-ISSN 1611-3349 ; 14829
Keywords
Teaching-to-the-Curriculum, Semantic Textual Similarity, Large Language Models, ChatGPT, Retrieval-Augmented Generation.
National Category
Language Technology (Computational Linguistics)
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-232105 (URN)10.1007/978-3-031-64302-6_9 (DOI)978-3-031-64302-6 (ISBN)978-3-031-64301-9 (ISBN)
Conference
Artificial Intelligence in Education. AIED 2024, Recife, Brazil, July 8–12, 2024.
Available from: 2024-07-24 Created: 2024-07-24 Last updated: 2024-09-06Bibliographically approved

Open Access in DiVA

Exploring Natural Language Processing for Linking Digital Learning Materials(13569 kB)0 downloads
File information
File name FULLTEXT01.pdfFile size 13569 kBChecksum SHA-512
06043e51dc4e41c2e6dc1b41cafac12cac1ac253eaef2b4cc45977d2cdbeb9307e4e1a5d45f8d02cad42cabdb01231cd6c32b0b12500857e189855092841b357
Type fulltextMimetype application/pdf

Authority records

Li, Xiu

Search in DiVA

By author/editor
Li, Xiu
By organisation
Department of Computer and Systems Sciences
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
89101112131411 of 14
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf