Natural Language Processing and Large Language Models for Research Data Exploration and Analysis

Schedule and location

Tuesday March 11 - Thursday March 13.

Aalto University, Espoo (visiting address: Otaniementie 14, Espoo).

Registration

Registration is open until March 4th.

Speakers

Associate professor Raghava Mukkamala, Copenhagen Business School (CBS), Denmark.

Assistant professor Sippo Rossi, Hanken School of Economics, Finland.

Course Introduction

Generative AI is a type of artificial intelligence that can create and generate new content. As part of Generative AI, Large Language Models [LLMs] are deep learning-based transformer architectures (e.g., GPT-4/Generative Pre-trained Transformer-4), which are considered a significant breakthrough in the Natural Language Processing (NLP) and AI field and have shown a substantial potential to transform organizations and society in several ways. An example of an LLM is ChatGPT, which has recently gained widespread attention for its exceptional language generation skills and has demonstrated tremendous capabilities across various domains and tasks such as question-answering and passing examinations (such as Uniform Bar Exam, etc.), thereby even challenging our wisdom and cognition.

The primary purpose of this course is to provide knowledge and a deep understanding of various concepts, techniques, and methods that serve as a foundation for LLMs such as ChatGPT. Starting from the basic NLP concepts, this course will delve into deep learning architectures for NLP and Generative AI and then into applications and data analysis using LLMs. Subsequently, the course will focus on the opportunities, challenges, and risks associated with these Generative AI models like ChatGPT and their implications for organizations and society. The following is the outline for the course.

The course is mainly designed for PhD students who want to use NLP and text analysis in their research using LLMs. It also contains hands-on exercises on these topics using the Python programming language. The PhD students are expected to have some basic understanding of either Python or R programming languages and some familiarity with running Python scripts using Jupyter Notebooks. The following is the course outline.

The course starts with some fundamental concepts of machine learning (ML) and NLP. It then focuses on using these techniques for data analysis, using supervised and unsupervised approaches, such as text classification and topic modeling.
Second, it presents the high-level architectures of deep learning, generative models, and LLMs and elaborates on why LLMs like ChatGPT have achieved so many analytical capabilities.
Third, it will provide a detailed account of these models' capabilities, possible applications to various fields, and how they will impact society and organizations in the future.
Fourth, it will present how LLMs can be used for text analysis by examining some of the techniques, such as text summarization, text classification, and code generation.
Finally, it will discuss these models' diverse societal impacts and challenges, especially in terms of inequity, misuse, and legal and ethical considerations.

Course Learning Outcomes

After completing this course, the participants should be able to:

Demonstrate the fundamental understanding of NLP and how they can be used for the analysis of text corpora.
Explain the fundamental principles of generative AI and LLMs and how they can be used for data analysis.
Compare various approaches to using Generative AI and LLMs, demonstrating their practical relevance through real-world applications and case studies.
Describe the key challenges and opportunities, including issues related to reliability, hallucination, and ethical considerations in using Generative AI and LLMs in various domains.

Pre-requisites

Some basic understanding of either Python or R programming languages and ability to run Python scripts using Jupyter Notebooks.

Pedagogy

Face to Face teaching

Session Plan

Sessions	Topic & Objective	Study Material
Day-01: Tuesday, 11-03-2025, Aalto University room Väre G202
10:00-12:00	Fundamentals of Machine Learning and Natural Language Processing (NLP) Types of Machine Learning Performance Measure Basic Text Processing and Tokenization, Word normalization and Parts-of-speech tagging	Slides, articles and other reading materials
Lunch Break
13:00-14:45	Supervised approaches for NLP: text classification sentiment analysis, Naïve Bayes Classifier	Slides, articles and other reading materials
15:00-17:00	Case Studies and Hands-on Session: Analyzing text from discussion forums on Type-2 diabetes for domain-specific text classification Sentiment analysis and text classification for movie reviews using NLTK	Jupyter notebooks and Python scripts
Day-02: Wednesday, 12-03-2025, , Aalto University room Väre Q203
09:00-11:30	Unsupervised and Deep Learning approaches for NLP: Topic modeling Word Vectors/Word Embeddings	Slides, articles and other reading materials
Lunch Break
12:30-14:30	Case Studies and Hands-on Session: Analyzing newspaper headlines using Topic modeling Building word embeddings for your own text corpus	Jupyter notebooks and Python Scripts
14:45-17:00	Introduction to Generative AI and Large Language Models (LLMs): transformers architecture, attention mechanism and generating text with transformers	Slides, articles and other reading materials
Day-03: Thursday, 13-03-2025, , Aalto University room V301 BIZ Lounge & Terrace
08:30-10:00	Configuring and finetuining LLMs for specific applications, e.g. text classification, text summarization.	Slides, articles and other reading materials
10:15-11:30	Case Studies and Hands-on Session: Hands-on: text summarization and text classification using LLMs using Google Cloud and Gemini LLM	Jupyter notebooks and Python Scripts
Lunch Break
12:30-13:30	LLMs use cases, challenges, opportunities, and ethical considerations	Slides, articles and other reading materials
13:30-14:30	Wrap-up: Discussion about exam projects! Feedback and reflections on the course

Evaluation Criteria

Sr. No.	Component	Individual / Group	Weightage
1	Final project	Individual	100%
Total			100%

Credit points

Doctoral students participating in the seminar can obtain 3 credit points. This requires participating and completing the assignments.

Registration fee

This seminar is free-of-charge for Inforte.fi member organization's staff and their PhD students. For others the participation fee is 400 €. The participation fee includes access to the event and the event materials. Lunch and dinner are not included.