Data Analytics Workshop

Schedule and location

Wednesday 12th September - Friday 14th September

University of Jyväskylä, Jyväskylä, Agora building (visiting address: Mattilanniemi 2)

 

Registration 

Registration is open until 5th September

Speaker

Professor Robert J. Kauffman, Singapore Management University

Organizer

Professor Tuure Tuunanen, University of Jyväskylä, Finland

Overview

The goal of this Data Analytics Workshop is to introduce participants to the new paradigm of Computational Social Science in IS, related disciplines, and interdisciplinary applied contexts. Many researchers in IS are finding these days that traditional empirical research designs are not enough. What is often taught in standard methods PHD seminars provides an excellent basis for research but does not provide the capabilities to enable them to compete effectively for empirical research publication in leading journals (MIS Quarterly, Information Systems Research, Journal of Management Information Systems, Marketing Science, and Organization Science, among others). They know that it is important to make a theoretical contribution in their work. But connecting their theory with appropriate empirical research designs to meet current expectations is not easy. So then, what are the bases for the new big data empirics? How can PHD students take advantage of the emerging abilities that IS researchers are beginning to demonstrate to combine machine-learning, data mining, natural language processing and other methods from Computer Science with Statistics and Econometrics to achieve ‘fusion analytics’ insights in the vein of Computational Social Science?  

 

In this Workshop, participants will be exposed to examples of contemporary empirical research that is different in its design, execution and data analytics than what has come before. We will explore the new philosophy of science for the ‘Age of the Internet,’ and now truly big data analytics. We will go beyond past work that establishes empirical ‘associations’ between independent and dependents variables. We also will help you to develop an understanding of ‘causal inference’-based research designs beyond laboratory experiments with full controls and randomization. For this, we will consider a variety of IS and technology research settings where rich empirical results have been obtained in leading business, consumer and social contexts. And you will be encouraged to think about and formulate your own empirical research project ideas in the context of the research methods innovations that are the focus of this Workshop.

Preparation and Assignments

 

  1. Research Topic. Select an empirical research article you are currently writing or that is at an early stage of project development. Make some notes (no slides) about its content that will enable you to give a short oral presentation to members of your discussion group. Identify issues that are appropriate for Computational Social Science methods-based empirical research inquiry, based on the Primary Readings.
  2. Primary Readings. There a number of Primary Readings. Read as many of them as you have time and interest to learn about the related ideas.  

    (i)     Preparation. Read the following 6 articles when you prepare for this class and do Assignment 1: Carley (2002); Kauffman and Wood (2009); Lazer et al. (2009); Lavelle et al. (2011); Chang et al. (2014); Kauffman et al. (2017).

    (ii)    Day 1. These are no new Primary Readings for Day 1. They are all pre-readings.

    (iii)  Day 2. These are new Primary Readings for Day 2: Aral and Walker (2011); Bhatacharjee et al. (2007); Card and Kreuger (1994); Dehejia and Wahba (2002);); Kauffman et al. (2014); Shmueli and Koppius (2011).

    (iv)  Day 3. These are the new Primary Readings for Day 3: Zhang and Zhu (2011); Ren and Kauffman (2017, 2018); Hoang and Kauffman (2018).

  3. Assignment 1. Write the 1-pager to complete this pre-course assignment.
  4. Other Readings. The Instructor will talk about all of the readings in the Readings section during the 3 days of this Workshop. This way you’ll know about more content that is referred to or used for illustration, no matter how much time you have to read.

 

Assignments and Assessments

 

#

Activity

Weight

Due

1

Evaluate an Empirical Article

15%

2 weeks before class

2

Present your Data Analytics Methods Flowchart

40%

Day 3, afternoon

3

Participate in In-Class Discussions

20%

In the class

4

Essay on Data Analytics in IS research

25%

4 weeks after class

 

Assignment 1. Evaluate an Empirical Article (1-page commentary)

  1. Select. Select an empirical research article in any IS journal or conference of your choice that conducts empirical research. But it should not already use Computational Social Science or blended fusion analytics involving Computer Science methods and Stats or Econometric explanations to achieve causal explanations of the findings.
  2. Write. Then, write a 1-page commentary that includes: (1) a brief summary of the empirical work; (2) the strengths of the article’s findings: (3) the weaknesses of the article’s findings; and (4) your comments on how the research can be improved by introducing the various ideas discussed in the Primary Readings you were asked to do as course preparation.
  3. Present. Be prepared to present your comments and critique, and to identify the various aspects of the guidance you offered as general themes and ideas that your classmates should take away.

 

Assignment 2. Present Your Data Analytics and Causal Inference Methods Flowchart

  1. Complete. In this Workshop, participants are asked to develop a Data Analytics and Causal Inference Methods Flowchart day by day. This should be completed over the 3 days of the course based on your own effort, and input that you receive from your classmates and the Instructor.
  2. Present. On Day 3, you will have ~10-15 minutes to present your final Data Analytics and Causal Inference Methods Flowchart on a single piece of A3 paper and discuss your proposed empirical research approach with the members of the class. The time you’ll have to present will be based on how many people attend this Workshop.


Assignment 3. Participate in In-Class Discussions (all 3 days of course)

 

  1. Contribute. The in-class discussions are intended to be open, supportive, interactive and a good base for trying out your ideas. All in-class contributions are appropriate. You are welcome to ask questions of your Instructor or Classmates, offer opinions, answer questions, help everyone to draw useful conclusions, and so on.
  2. Document. You’ll be asked to fill out a brief ‘Participation Slip’ on Days 1, 2 and 3. Evaluation will be based on the extent to which your contributions can be identified. They will document your contribution and support the Instructor’s memory for this part of your evaluation.

 

Assessment 4. Essay on Data Analytics in IS Research (after the course)

  1. Write. Develop an essay that discusses the current state of data analytics and empirical research in the IS discipline that touches on the following issues:
    1. How is contemporary research involving big data, natural experiments, machine-based methods and explanatory econometrics that aims to yield causal inferences different for the IS disciplinary than what has come before?
    2. What are some leading examples in the literature from this Workshop (or others you can find in ISR, MISQ or JMIS), that enable you to identify the key features of the new paradigm in recent research?   
    3. What is your assessment of the current state of empirical research in IS in terms of the extent that is able to achieve causal explanations and inferences? What has been lacking? What developments involving the use of big data and current interdisciplinary methods have made the new paradigm possible?
  2. Submit. Submit a double-spaced, 11 or 12-point font essay, with left justification only for readability.

 

Detailed Program

 

Day 1. A New Paradigm for IS Empirical Research

 

Sessions

Activities

Primary Readings

0900-1030

Class Intro; Computational Social Science. Coverage of the main elements of the new paradigm; consideration of the changes in the philosophy for empirical research in the IS discipline; experimental and quasi-experimental approaches; and the range of types of big data for managerial insights.

Carley (2002); Kauffman and Wood (2010), Chang et al. (2014)

1030-1100

Coffee Break

 

1100-1230

Combining Machine-Based Methods with Explanatory Statistics and Econometrics. High-level consideration of interdisciplinary Computer Science methods choices and blending with data analytics methods in Social Science and Management Science research.

Kauffman et al. (2017)

1230-1330

Lunch

 

1330-1500

Presentation of the ‘Evaluation of Empirical
Articles’ (Assignment 1).
For group discussion and critique in class, to understand the content more fully.

See preparation instructions on assignment and readings

1500-1530

Coffee Break

 

1530-1700

Introduction to the Data Analytics and Causal
Inference Methods Flowchart.
Make a high-level sketch of your flowchart with data collection methods and intended outcomes (Problem, Data Acquisition, Methods Proposed).

Examples presented to illustrate what to do.


Day 2: How to Implement Computation Science Science in Empirical Research

 

Sessions

Activities

Primary Readings

0900-1030

Going Beyond Traditional Empirical Methods. Causal experiments; in the field, with secondary data; massive data-based insight discovery for pricing; and randomization of research participants

Aral and Walker (2011); Bhatacharjee et al. (2007)

1030-1100

Coffee Break

 

1100-1230

Empirical Methods for Causal IS Research. Difference-in-differences, propensity score matching.

Card and Kreuger (1994); Dehejia and Wahba (2002)

1230-1330

Lunch

 

1330-1500

Retrospective Vs. Predictive Analytics. CS researchers most often focus on prediction, while IS research focus on explanatory estimation of models; comparison of roles and prediction model building.

Shmueli and Koppius (2012)

1500-1530

Coffee Break

 

1530-1700

Extending the Data Analytics and Causal Inference Methods Flowchart. Further develop your Flowchart so that includes content about the research design for causality that you can apply (Causal Research Design, Statistical Testing, Planned Causal Inferences).

Examples presented as illustration of what to do

 

Day 3: Data Analytics for Various Research Settings

 

Sessions

Activities

Primary Readings

0900-1030

Learning from Causal Empirical Research Examples. Application of causal research designs; issues of identification, natural experiments, data censoring, ; research contexts include Wikipedia participation and digital entertainment.

Zhang and Zhu (2011); Hoang and Kauffman (2017)

1030-1100

Coffee Break

 

1100-1230

Reflection: CS Methods to Blend with IS Research Methods. Innovations for music ranking prediction and impact of promotion releases for artists with fusion analytics; data mining and classification with support vector machine, bagging, random forest methods; survival, DiD, ordinal and polytomous ordinal regression models.  How can the methods discussed improve IS empiricism for causal inferences? Is the ‘Age of the Internet’ becoming the ‘Era of Digital and Social Sensing’?

Ren and Kauffman (2017, 2018)

1230-1330

Lunch

 

1330-1500

Presentation I: Analytics and Inference Methods

Student short talks

1500-1530

Coffee Break

 

1530-1700

Presentation II: Analytics and Inference Methods

Student short talks

 

Readings

  1. Aral, S., Walker, D. 2011. Identifying Social Influence in Networks Using Randomized Experiments. IEEE Intelligent Systems 26(5), 91–96.
  2. Banker, R.D., and Kauffman, R.J., Morey, R. 1990. Measuring Gains in Operational Productivity from Information Technology: A Study of the Positran Deployment at Hardee's Inc., J. Management Information Systems, 7(2), 29-54. (Day 2, optional)
  3. Bhattacharjee, S., Gopal, R.D., Lertwachara, K., Marsden, J.R., Telang, R. 2007. The Effect of Digital Sharing Technologies on Music Markets: A Survival Analysis of Albums on Ranking Charts. Management Science, 53(9), 1359–1374. (Day 2, optional)
  4. Card, D., Krueger, A.B. 1994. Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania. American Economic Review, 84(4), 772–793.
  5. Carley, K.M. 2002. Computational Organization Science: A New Frontier, Proceedings of the National Academy of Sciences 99 (Supplement 3), 7257–7262.
  6. Chang, M., Kauffman, R.J., Kwon, Y.O. 2014. Understanding the Paradigm Shift in Computational Social Science in the Presence of Big Data. Decision Support Systems, 63, 67-80.
  7. Dehejia, R.H., Wahba, S. 2002. Propensity Score-Matching Methods for Non-Experimental Causal Studies. Review of Economics and Statistics, 84(1), 151–161.
  8. Granados, N., Gupta, A., and Kauffman, R.J. Online and Offline Demand and Price Elasticities: Evidence from the Air Travel Industry, Information Systems Research, 23, 1, March 2012, 164-181. (Day 3, optional)
  9. Hoang, A.P., Kauffman, R.J. 2018. Content Sampling, Household Informedness, and the Consumption of Digital Information Goods. Journal of Management Information Systems, 35(2), 575-609.
  10. Kauffman, R.J., Kim, K., Lee, S.Y.T., Hoang, A.P., Ren, J. 2017. Combining Machine-Based and Econometrics Methods for Policy Analytics Insights. Electronic Commerce Research and Applications, 25, 115-140.
  11. Kauffman, R.J., Techatassanasoontorn, A.A., Wang, B. 2012. Event History, Spatial Analysis and Count Data Methods for Empirical Research in IS. Information Technology and Management, 13 (3), 115-147. (Day 2, optional)
  12. Kauffman, R.J., Wood, C.A. 2009. Revolutionary Research Strategies for E-Business: A Philosophy of Science View in the Age of the Internet. Chapter 2, in R.J. Kauffman and P.P. Tallon (eds.), Economics, Information Systems, and Electronic Commerce: Empirical Research. In the Advances in Management Information Systems Series, V. Zwass (ed.), M. E. Sharpe, Armonk, NY, 31-62.
  13. Kim, K., Lee, T.S.Y., Kauffman, R.J. 2016. Social Sentiment and Stock Trading via Mobile Phones. In Proceedings of American Conference on Information Systems, Association for Information Systems, Atlanta, GA.
  14. Lavelle, S. Lesser, E. Shockley, R., Hopkins, M.S., Kruschwits, N. 2011. Big Data, Analytics and the Path from Insights to Value, Sloan Management Review, 5(2) (2011) 21–32.
  15. Lazer, D., Pentland, A.S., Adamic, L., Aral, S., Barabasi, A.L., Brewer, D., Christakis, N., Contractor, N., Fowler, J., Gutmann, M., Jebara, T., King, G., Macy, M., Roy, D., Alstyne, M.V. 2009.  Life in the Network: The Coming Age of Computational Social Science. Science, 323(5915), 721–723.
  16. Li, T., Kauffman, R.J., van Heck, E., Vervest, P., Dellaert, B. 2014. Consumer Informedness and Firm Strategy. Information Systems Research, 25(2), 345-363.
  17. Lim-Wavde, K., Kauffman, R.J., Dawson, G.S. 2017. Household informedness and policy analytics for the collection and recycling of household hazardous waste in California. Resources, Conservation and Recycling, 120, 88-107.
  18. Ren, J., Kauffman, R.J. 2017. Understanding Music Track Popularity in a Social Network. In Proceedings of the European Conference on Information Systems. Association for Information Systems, Atlanta, GA.
  19. Runkel, P.J., McGrath, J.E. 1972. Research on Human Behavior: A Systematic Guide to Methods. Holt, Rinehart and Winston, New York.
  20. Shmueli, G., Koppius, O. 2011. Predictive Analytics in IS research, MIS Quarterly, 35(3), 553–572.
  21. 21.  Zhang, X., Zhu, F. Group Size and Incentives to Contribute: A Natural Experiment at Chinese Wikipedia, The American Economic Review 101 (4) (2011) 1601–1615.

Credit points

Doctoral students participating in the seminar can obtain 3 credit points. This requires participating on all of the days and completing the assignments.

Registration fee

This seminar is free-of-charge for Inforte.fi member organization's staff and their PhD students. For others the participation fee is 400 €. The participation fee includes access to the event and the event materials. Lunch and dinner are not included.