Natural Language Processing and Large Language Models for Research Data Exploration and Analysis
Schedule and location
Tuesday March 11 - Thursday March 13.
Aalto University, Espoo (visiting address: Otaniementie 14, Espoo).
Registration
Speakers
Associate professor Raghava Mukkamala, Copenhagen Business School (CBS), Denmark.
Assistant professor Sippo Rossi, Hanken School of Economics, Finland.
Course Introduction
Generative AI is a type of artificial intelligence that can create and generate new content. As part of Generative AI, Large Language Models [LLMs] are deep learning-based transformer architectures (e.g., GPT-4/Generative Pre-trained Transformer-4), which are considered a significant breakthrough in the Natural Language Processing (NLP) and AI field and have shown a substantial potential to transform organizations and society in several ways. An example of an LLM is ChatGPT, which has recently gained widespread attention for its exceptional language generation skills and has demonstrated tremendous capabilities across various domains and tasks such as question-answering and passing examinations (such as Uniform Bar Exam, etc.), thereby even challenging our wisdom and cognition.
The primary purpose of this course is to provide knowledge and a deep understanding of various concepts, techniques, and methods that serve as a foundation for LLMs such as ChatGPT. Starting from the basic NLP concepts, this course will delve into deep learning architectures for NLP and Generative AI and then into applications and data analysis using LLMs. Subsequently, the course will focus on the opportunities, challenges, and risks associated with these Generative AI models like ChatGPT and their implications for organizations and society. The following is the outline for the course.
The course is mainly designed for PhD students who want to use NLP and text analysis in their research using LLMs. It also contains hands-on exercises on these topics using the Python programming language. The PhD students are expected to have some basic understanding of either Python or R programming languages and some familiarity with running Python scripts using Jupyter Notebooks. The following is the course outline.
- The course starts with some fundamental concepts of machine learning (ML) and NLP. It then focuses on using these techniques for data analysis, using supervised and unsupervised approaches, such as text classification and topic modeling.
- Second, it presents the high-level architectures of deep learning, generative models, and LLMs and elaborates on why LLMs like ChatGPT have achieved so many analytical capabilities.
- Third, it will provide a detailed account of these models' capabilities, possible applications to various fields, and how they will impact society and organizations in the future.
- Fourth, it will present how LLMs can be used for text analysis by examining some of the techniques, such as text summarization, text classification, and code generation.
- Finally, it will discuss these models' diverse societal impacts and challenges, especially in terms of inequity, misuse, and legal and ethical considerations.
Course Learning Outcomes
After completing this course, the participants should be able to:
- Demonstrate the fundamental understanding of NLP and how they can be used for the analysis of text corpora.
- Explain the fundamental principles of generative AI and LLMs and how they can be used for data analysis.
- Compare various approaches to using Generative AI and LLMs, demonstrating their practical relevance through real-world applications and case studies.
- Describe the key challenges and opportunities, including issues related to reliability, hallucination, and ethical considerations in using Generative AI and LLMs in various domains.
Pre-requisites
Some basic understanding of either Python or R programming languages and ability to run Python scripts using Jupyter Notebooks.
Pedagogy
Face to Face teaching
Session Plan
Sessions |
Topic & Objective |
Study Material |
Day-01: Tuesday, 11-03-2025, Aalto University room Väre G202 |
||
10:00-12:00 |
Fundamentals of Machine Learning and Natural Language Processing (NLP)
|
Slides, articles and other reading materials |
Lunch Break |
||
13:00-14:45 |
Supervised approaches for NLP: text classification sentiment analysis, Naïve Bayes Classifier
|
Slides, articles and other reading materials |
15:00-17:00 |
Case Studies and Hands-on Session:
|
Jupyter notebooks and Python scripts |
Day-02: Wednesday, 12-03-2025, , Aalto University room Väre G202 |
||
09:00-11:30 |
Unsupervised and Deep Learning approaches for NLP:
|
Slides, articles and other reading materials |
Lunch Break |
||
12:30-14:30 |
Case Studies and Hands-on Session:
|
Jupyter notebooks and Python Scripts |
14:45-17:00 |
Introduction to Generative AI and Large Language Models (LLMs): transformers architecture, attention mechanism and generating text with transformers |
Slides, articles and other reading materials |
Day-03: Thursday, 13-03-2025, , Aalto University room V301 BIZ Lounge & Terrace |
||
08:30-10:00 |
Configuring and finetuining LLMs for specific applications, e.g. text classification, text summarization. |
Slides, articles and other reading materials |
10:15-11:30 |
Case Studies and Hands-on Session: Hands-on: text summarization and text classification using LLMs using Google Cloud and Gemini LLM |
Jupyter notebooks and Python Scripts |
Lunch Break |
||
12:30-13:30 |
LLMs use cases, challenges, opportunities, and ethical considerations |
Slides, articles and other reading materials |
13:30-14:30 |
Wrap-up: Discussion about exam projects! Feedback and reflections on the course |
|
Evaluation Criteria
Sr. No. |
Component |
Individual / Group |
Weightage |
1 |
Final project |
Individual |
100% |
Total |
100% |
Credit points
Doctoral students participating in the seminar can obtain 3 credit points. This requires participating and completing the assignments.
Registration fee
This seminar is free-of-charge for Inforte.fi member organization's staff and their PhD students. For others the participation fee is 400 €. The participation fee includes access to the event and the event materials. Lunch and dinner are not included.