ISERN Annual Meeting

September, 16th to 17th

ISERN is a community that believes software engineering research needs to be performed in an experimental context.  An increasing number of software engineering research groups have made the paradigm shift to an experimental/empirical software engineering view.

The purpose of this network is to encourage and support the collaboration and exchange of results and personnel among these groups. Specific emphasis is placed on experimentation and empirical studies with development technologies in different environments; the repetition of experiments across settings; and the development and exchange of methods and tools for model building, experimentation, and assessment.

ISERN is open to other academic and industrial groups worldwide that are active in experimental software engineering research and are willing to adopt the experimental framework (see ISERN Manifesto).

ISERN annual meetings are open for ISERN members, candidates and invited observers only.

ISERN meetings are not conference style with refereed papers and presentations. Instead, meetings build on previous meetings, and each session is supposed to foster collaboration, encourage discussions and contribute to developing the knowledge in experimental software engineering. The continuous knowledge building will be formalized in the ISERN Experience Factory.



Sunday, September 15
18.00 Registration opening
19:30 Welcome Reception


Monday, September 16
09:00-10:00 ISERN Welcome – News, Introductions, Observers and Candidates
10:00-10:30 Coffee-break
10:30-12:00 S1: How could ESE help to ensure trust (or other qualities) of AI and Data Science?

Chairs: A. Jedlitschka, M. Felderer, T. Baldassarre, B. Turhan

12:00-13:30 Lunch
13:30-15:00 S2: Diversity and ESE Research: Challenges and Opportunities

Chairs: C. Seaman, J. Carver, R. Prikladnicki, L. Williams

15:00-15:30 Coffee-break
15:30-17:00 S3: Research Quality in Empirical Software Engineering

Chair: J. Molléri

17:00-17:30 Summary & Wrapup
17:30-18:30 ISERN Steering Committee Meeting (by invitation)
19:30 ISERN Social Event


Tuesday, September 17
9:00-10:00 Open Space
10:00-10:30 Coffee-break (in the room)
10:30-12:00 S4: Investigating the Validity of Ground Truth in Empirical Software Engineering

Chairs: E. Tüzün, H. Erdoğmuş

12:00-13:30 Lunch
13:30-15:00 S5: Data Analysis in Software Engineering. How well are we performing?

Chairs: V. Lenarduzzi, D. Taibi, S. Vegas

15:00-15:30 Coffee-break
15:30-17:00 S6: Empirical Methods in Software Security

Chairs: L. Williams, J. Carver, M. Felderer

17:00-17:30 Summaries and ISERN Business – Closing


Detailled Infos on the Thematic Sessions

S1: How could ESE help to ensure trust (or other qualities) in the era of AI and Data Science?

Organizers: Andreas Jedlitschka, Michael Felderer, Teresa Baldassarre, Burak Turhan

Currently, topics such as Artificial Intelligence (AI) and Data Science (DS) are among the hottest topics. Even though neither are new, many organizations claim to provide AI/DS solutions. The question arises we aim to answer is whether and if so, how ESE could help to provide evidence and integrate with these emerging trends.


  • Identify and discuss the possibilities of ESE to support AI development and DS. Questions we want to answer:
  • Impact on SE practise
    • Is ESE too heavy weight to have an immediate and relevant impact on practice?
  • Impact on SE teaching
    • How should we update SE curriculum with the best of both worlds?
  • Impact on SE research
    • How to position SE research in research projects, given the hype of DS/AI
    • Rigor/Philosophy of research: Start from question vs. start from data
    • Contribution to knowledge: What are some novel contributions of AI/DS studies?

Session structure:

  • 3-5 (very short) position statements of targeted speakers working in both areas to discuss pros and cons of both approaches
  • The main part of the session is dedicated to discussing the options/offers ESE has and potential consequences. Through a Fish bowl panel guided by discussions and questions (listed above)

Expected session outcome and follow-up activity:

  • Capture the opinions of ISERN members, cross reference with other key views (e.g. IASESE2018 school, CESSERIP2019 Keynote (Microsoft)
  • ISERN position statement – Publish an opinion piece in (IEEE SW, IST, JSS, ACM Sigsoft Notes, or ESEM’20 vision paper?)

S2: Diversity and ESE Research: Challenges and Opportunities

Organizers: Carolyn Seaman, Jeffrey Carver, Rafael Prikladnicki, Laurie Williams


  • Discuss empirical software engineering research and concerns related to diversity when conducting such research
  • Discuss challenges and opportunities for empirical research on diversity in Software Engineering

Session structure:

  • Preparation: for everyone interested in this topic, bring statistics about what they can find related to diversity in their countries (workforce, undergrad and grad programs, faculty in computing or SE, specific activities or projects being developed). Bring also examples of ESE studies where diversity was/was not considered
  • Discussion in small groups
  • Each group present their perspective
  • We vote for the three most important challenges and three main important opportunities
  • We identify specific actions related to the challenges and opportunities.

Expected session outcome and follow-up activity:

  • Write a report/paper from the session
  • Plan for joint studies and stimulate collaboration on this topic

S3: Research Quality in Empirical Software Engineering

Organizer: Jefferson Seide Molléri


  • During the ISERN meeting 2018, we conducted a focus groups workshop on quality in empirical software engineering research. Some of the ISERN members helped us to understand better what constitutes “research quality” based on a set of quality dimensions proposed in Mårtensson’s et al. conceptual model.
  • Based on the results, we have revised a conceptual model of research quality to represent the needs of the SE domain better. Now, we aim to disseminate the results from our previous workshop to the ISERN community and, using the revised model resulting from our previous work, we aim to characterize some of the most influential instruments for quality assessment of different research methods (e.g., case study, experiment, and literature review).
  • By doing so, we expected to identify common aspects to be considered regardless of the methodology; and other aspects that are particular to a research method. We also intend to verify whether all the quality dimensions are represented. If this is not the case, we would like to discuss how to address the gaps in order to operationalize more complete solutions for quality appraisal of SE research.

Session structure:

  • Introduction: Present the revised model resulting from our previous work
  • Exercise: Characterize a set of instruments for assessing the quality of ESE research according to the revised model
  • Feedback: Report back the outcomes of the exercise and aggregate the data
  • Preliminary Results: Communicate preliminary findings regarding the different instruments and research methods
  • Joint discussion: Invite the participants to reason about the common and dissimilar aspects
  • Gap analysis: Identify and discuss the dimensions of quality that are not well-represented, and how to create instruments to assess them

Expected session outcome and follow-up activity

  • Our long-term goal is to aggregate the perceptions of the community about quality appraisal in research practice. This information is vital to design a shared instrument aligned to perceptions of the ESE community.
  • As a follow-up, we will aggregate the findings and report them back to the ISERN community to validation. We will also extend the characterization process to other instruments to assess quality of research we could not address with this workshop.

S4: Investigating the Validity of Ground Truth in Empirical Software Engineering

Organizers: Eray Tüzün and Hakan Erdoğmuş

Ground truth refers to the reference data with which machine learning applications are trained and evaluated.Ground truth is supposed to capture the ideal behavior of a system being modeled. Many empirical software engineering studies in the literature train and evaluate their models based on datasets collected from the real-life (e.g. open source, industrial) projects. In a real-life setting, the objective function of a human decision maker for a given task (e.g., recommending a code reviewer)) might be implicit (e.g., based on convenience), and different than the ideal objective function required for an optimally functioning system (e.g., based on technical competence).

We may characterize such instances–which implement artificial decision-making systems by mimicking the sub-optimal behaviour of imperfect real-life human systems—as suffering from a specific type of cognitive bias called attribute substitution. Attribute substitution can happen when a decision maker uses an easier-to-access attribute when faced with a complex decision where the best answer may not be readily accessible or it may be somewhat inconvenient.   Presence of attribute substitution in the data invariably leads to sub-optimal implementations. If the goal is to improve upon human-based decision systems, rather than simply mimicking them with their inherent imperfections, cognitive biases in the data need to be carefully evaluated and avoided.

The goal of the session is first to understand the typical cases in empirical software engineering where the provided ground truth is both non-problematic and potentially problematic (subject to unidentified cognitive biases). Based on this understanding, our second goal is to offer alternative solutions to tackle the cases in which the provided ground truth is potentially problematic.


  • Create awareness  of ground-truth validity in empirical software engineering studies.
  • Come up with an agreed-upon and generalized problem statement that captures the core issues.
  • Illustrate how the lack of awareness may negatively affect research outcomes through multiple examples.
  • Work with the participants to offer possible strategies that can alleviate the problem and improve research outcomes.

Session Structure

  • The session will last 90 minutes, divided into the following parts:
  • Plenary Session: 15-minute introductory lecture introducing the problem with illustrative examples.
  • Break Sessions: 35 minutes of group (3-5 participants) activity on assessing the ground truth in various empirical software engineering settings and identifying strategies that will alleviate cognitive biases.
  • Plenary Session: 25 minutes  of reporting and discussion.
  • Plenary Session: 15 minutes of synthesis, action items, and wrap-up.

Expected session outcome and follow-up activity:

  • At ISERN:
    • An agreement on the nature of the problem, together with a better problem statement.
    • A summary of the breakout session results, listing representative problem instances and potential solutions
    • A list of action items.
  • Beyond ISERN:
    • Collaboration investigating the prevalence and impact of the problem in the empirical software engineering literature. 

S5: Data Analysis in Software Engineering. How well are we performing?

Organizers: Valentina Lenarduzzi, Davide Taibi, Sira Vegas


  • As highlighted by Vegas et al and Reyes et al, many Software Engineering (SE) papers present errors in the statistical techniques applied. This means that the results of the experiment are exposed to the threat of statistical conclusion validity known as violated assumptions of statistical tests. The introduction of AI has increased this issue. As Artificial Intelligence (AI) is increasing its popularity, also in the domain of Software Engineering (SE), several papers in the most important SE venues, and especially empirical SE, are heavily adopting it for different purposes. Researchers commonly adopt different AI techniques such as Decision Trees, Ensemble Techniques, Deep Learning and many others, often not considering basic statistical techniques such as linear or logistic regression. Additionally, the vast majority of AI techniques assume that data are independent, while most of SE data is highly dependent. As example, the code committed in a specific commit depends on the code in the previous commits. The same happens with bugs, but also with developers, and many other SE related aspects. Only few papers mention this issue in the threats to validity, while only few papers adopt techniques suitable for dependent data (e.g. Markov Chain).
  • In this session we aim at discussing different statistical and machine learning techniques applied in empirical studies in software engineering, so as to highlight benefits and issues.

Session structure:

  • Workshop
  • Presentation by the organizers and open discussion between the participants.

Expected session outcome and follow-up activity:

  • Set of possible techniques applicable to different contexts
  • List of issues and benefits of the different techniques

S6: Empirical Methods in Software Security

Organizers: Laurie Williams, Jeffrey Carver, Michael Felderer


  • Evaluate the empirical rigor in top software security papers, such as the winning papers of the NSA Best Scientific Cybersecurity Paper Competition Award or those from top cybersecurity conferences.
  • Update a draft rubric for evaluating empirical software security papers.

Session structure:

  • Short presentation of baseline work lead by Jeff Carver in evaluating papers at the ACM Computers and Communications Security (CCS) conference and the IEEE Security and Privacy Symposium.
  • Introduction to a draft rubric for evaluating the empirical/scientific content of software security papers.
  • Break into groups of 2-3 people. Each group is provided a paper to evaluate using the rubric.
  • Group discussion on the experiences using the rubric and how the paper fared versus the rubric.
  • The group continues by marking suggestions on how the rubric can be enhanced.
  • Group discussion about the rubric.

Expected session outcome and follow-up activity:

  • Publication and dissemination of a rubric for empirical software security research.

Sponsored by