Software repositories such as code repositories, bug repositories, software Q&A sites, mobile app stores, etc. contain valuable data about a software and its history. Analysing this data allows software engineering researchers to empirically understand what methods software engineers often use in practice. It also allows software developers to better maintain a sophisticated software system. With the increasing use of data science and artificial intelligence techniques such as machine learning and deep learning, the use of these techniques in data analysis in software repositories and software engineering has also found many applications. Examples include automated bug fixings, code summarization, and analysing comments in app stores. The purpose of this course is to introduce students to the use of these techniques in practice in order to solve software engineering problems.

Topics and Schedule

  • Introduction to data science and big data (1 session)
  • An introduction to the applications of data science in software engineering (1 session)
  • An introduction to related libraries in Python for data analysis (2 sessions)
  • An overview of machine learning techniques (3 sessions)
  • An introduction to neural networks and deep learning (3 sessions)
  • Natural Language Processing (2 sessions)
  • Applications of text analysis in software engineering (2 sessions)
  • Code review (1 session)
  • Fault localization (1 session)
  • Bug prediction (1 session)
  • IDE data analysis (2 sessions)
  • Code summarization (1 session)
  • Mining mobile app stores (2 sessions)
  • Energy analysis in mobile applications (2 sessions)


  • Two exams (Midterm and Final) – Comprising 55% of the total grade.
  • Paper presentation: Each student should present at least one paper from the most recent top-tier software engineering journals and conferences – Comprising 15% of the total grade.
  • One comprehensive course project: Project’s activities include defining and conceptualizing a research topic, surveying, proposing a solution, evaluating the proposed solution, and presenting the results as a research paper that will be done throughout the semester – Comprising 30% of the total grade.

Main References

  • Some survey papers from top-tier software engineering journals and conferences
  • Joel Grus, Data Science from Scratch, O’Reilly, 2019.
  • Christian Bird, Tim Menzies, and Thomas Zimmermann, The Art and Science of Analyzing Software Data, Morgan Kaufmann, 2015.
  • Tim Menzies, Laurie Williams, and Thomas Zimmermann, Perspectives on Data Science for Software Engineering, Morgan Kaufmann, First edition, 2016.