Natural Language Processing and Large Language Models for Research Data Exploration and Analysis

Schedule and location

Tuesday March 11 - Thursday March 13.

Aalto University, Espoo (visiting address: Otaniementie 14, Espoo).

Registration 

 Registration is open until March 4th.

Speakers

Associate professor Raghava Mukkamala, Copenhagen Business School (CBS), Denmark.

Assistant professor Sippo Rossi, Hanken School of Economics, Finland.

Course Introduction

Generative AI is a type of artificial intelligence that can create and generate new content. As part of Generative AI, Large Language Models [LLMs] are deep learning-based transformer architectures (e.g., GPT-4/Generative Pre-trained Transformer-4), which are considered a significant breakthrough in the Natural Language Processing (NLP) and AI field and have shown a substantial potential to transform organizations and society in several ways. An example of an LLM is ChatGPT, which has recently gained widespread attention for its exceptional language generation skills and has demonstrated tremendous capabilities across various domains and tasks such as question-answering and passing examinations (such as Uniform Bar Exam, etc.), thereby even challenging our wisdom and cognition. 

The primary purpose of this course is to provide knowledge and a deep understanding of various concepts, techniques, and methods that serve as a foundation for LLMs such as ChatGPT. Starting from the basic NLP concepts, this course will delve into deep learning architectures for NLP and Generative AI and then into applications and data analysis using LLMs. Subsequently, the course will focus on the opportunities, challenges, and risks associated with these Generative AI models like ChatGPT and their implications for organizations and society. The following is the outline for the course.

The course is mainly designed for PhD students who want to use NLP and text analysis in their research using LLMs. It also contains hands-on exercises on these topics using the Python programming language. The PhD students are expected to have some basic understanding of either Python or R programming languages and some familiarity with running Python scripts using Jupyter Notebooks. The following is the course outline.

  1. The course starts with some fundamental concepts of machine learning (ML) and NLP. It then focuses on using these techniques for data analysis, using supervised and unsupervised approaches, such as text classification and topic modeling.
  2. Second, it presents the high-level architectures of deep learning, generative models, and LLMs and elaborates on why LLMs like ChatGPT have achieved so many analytical capabilities.
  3. Third, it will provide a detailed account of these models' capabilities, possible applications to various fields, and how they will impact society and organizations in the future.
  4. Fourth, it will present how LLMs can be used for text analysis by examining some of the techniques, such as text summarization, text classification, and code generation.
  5. Finally, it will discuss these models' diverse societal impacts and challenges, especially in terms of inequity, misuse, and legal and ethical considerations.

Course Learning Outcomes

After completing this course, the participants should be able to:

  1. Demonstrate the fundamental understanding of NLP and how they can be used for the analysis of text corpora.
  2. Explain the fundamental principles of generative AI and LLMs and how they can be used for data analysis.
  3. Compare various approaches to using Generative AI and LLMs, demonstrating their practical relevance through real-world applications and case studies.
  4. Describe the key challenges and opportunities, including issues related to reliability, hallucination, and ethical considerations in using Generative AI and LLMs in various domains.

Pre-requisites

Some basic understanding of either Python or R programming languages and ability to run Python scripts using Jupyter Notebooks.

Pedagogy

Face to Face teaching

Session Plan


Sessions

Topic & Objective

Study Material


Day-01: Tuesday, 11-03-2025, Aalto University room Väre G202

10:00-12:00

Fundamentals of Machine Learning and Natural Language Processing (NLP)

    • Types of Machine Learning
    • Performance Measure
    • Basic Text Processing and Tokenization,
    • Word normalization and Parts-of-speech tagging

Slides, articles and other reading materials

Lunch Break

13:00-14:45

Supervised approaches for NLP: text classification sentiment analysis, Naïve Bayes Classifier

 

Slides, articles and other reading materials

15:00-17:00

Case Studies and Hands-on Session:

    • Analyzing text from discussion forums on Type-2 diabetes for domain-specific text classification
    • Sentiment analysis and text classification for movie reviews using NLTK

Jupyter notebooks and Python scripts


Day-02: Wednesday, 12-03-2025, , Aalto University room Väre G202

09:00-11:30

Unsupervised and Deep Learning approaches for NLP:

    • Topic modeling
    • Word Vectors/Word Embeddings

Slides, articles and other reading materials

Lunch Break

12:30-14:30

Case Studies and Hands-on Session:

    • Analyzing newspaper headlines using Topic modeling
    • Building word embeddings for your own text corpus

Jupyter notebooks and Python Scripts

14:45-17:00

Introduction to Generative AI and Large Language Models (LLMs): transformers architecture, attention mechanism and generating text with transformers

Slides, articles and other reading materials


Day-03: Thursday, 13-03-2025, , Aalto University room V301 BIZ Lounge & Terrace

08:30-10:00

Configuring and finetuining LLMs for specific applications, e.g. text classification, text summarization.

Slides, articles and other reading materials

10:15-11:30

Case Studies and Hands-on Session:

Hands-on: text summarization and text classification using LLMs using Google Cloud and Gemini LLM

Jupyter notebooks and Python Scripts

Lunch Break

12:30-13:30

LLMs use cases, challenges, opportunities, and ethical considerations

Slides, articles and other reading materials

13:30-14:30

Wrap-up: Discussion about exam projects!

Feedback and reflections on the course

 

Evaluation Criteria

Sr. No.

Component

Individual / Group

Weightage

1

Final project

Individual

100%

                                                                                Total

100%

Credit points

Doctoral students participating in the seminar can obtain 3 credit points. This requires participating and completing the assignments.

Registration fee

This seminar is free-of-charge for Inforte.fi member organization's staff and their PhD students. For others the participation fee is 400 €. The participation fee includes access to the event and the event materials. Lunch and dinner are not included.