(413) A Survey of Software Metric Use in Research Software Development

A Survey of Software Metric Use in Research Software Development

Abstract

Background
  • Complex software libraries and tools are essential for conducting research across various disciplines (science, engineering, business, humanities).

  • Ensuring quality and reliability of software is crucial to avoid less trustworthy results that could mislead research conclusions.

Aims
  • This work aims to understand research software developers' use of traditional software engineering concepts, like metrics, to evaluate software quality and the software development process.

  • The study aims to identify relevant metrics for research software compared to traditional software engineering metrics.

Method
  • A survey of research software developers was conducted to assess their knowledge and usage of code and process metrics, along with the influence of demographics on these metrics.

Results
  • Participants: 129 respondents

  • Most respondents knew about metrics; knowledge of specific metrics was limited.

  • Metrics most used concerned performance and testing; less focus on code complexity metrics, even though it poses a challenge.

Conclusions
  • Research software developers value metrics but face obstacles in their implementation. Further research is needed to evaluate metrics for continuous process improvement.

Index Terms

  • Survey, Software Metrics, Software Engineering, Research Software

I. INTRODUCTION

  • Researchers in diverse fields increasingly use software for research (termed research software).

  • Research software engineers (RSEs) design and implement software for research and seek recognition for their contributions.

  • Quality of research software impacts the reliability of research results.

  • Previous work demonstrates the need for software engineering (SE) practices to improve research software quality, including requirements, design, testing, and code complexity.

  • Metrics are essential to evaluate software reliability and quality over time.

A. Software Metrics Overview

  • Definition: A metric is a function that assesses software or process effectiveness; measurements apply metrics to derive values.

  • Important metric categories:

    • In-process (development process metrics)

    • Code-oriented (complexity metrics)

  • Research software often includes elements like version control and testing in open-source projects.

II. RESEARCH QUESTIONS

  • The research seeks to answer:

    • RQ1: What is the level of metrics knowledge and use by research software developers?

    • RQ2: Which metrics are most commonly used?

    • RQ3: What is the relationship between knowledge of metrics and their perceived usefulness?

    • RQ4: Do developers perceive code complexity as a problem?

    • RQ5: Is there a relationship between complexity problems and the use of associated metrics?

Non-Traditional Metrics

Several unique metrics relevant to research software identified:

  1. Performance Metrics:

    • Examples: FLOPS (floating point operations/second), I/O operations, network throughput (MB/sec).

  2. Green Computing Metrics:

    • Focus on energy efficiency and carbon emissions.

  3. Correctness and Reproducibility:

    • Metrics for acceptance of results and error tolerance in modeling and simulation.

  4. Failure Rate Metrics: Frequency of software failures is critical (especially in large systems).

  5. Recognition Metrics:

    • Recognition through citations or project downloads is important to RSEs.

III. SURVEY DESIGN

  • Developed a survey to gather insights into metrics impact on projects:

    • Solicitation sent to high-performance computing and research software mailing lists.

    • Target Audience: Various domains of research software development.

Survey Questions
  • General Questions (GQ):

    • Project description, team size, project role, development stage, etc.

  • Metrics Questions (MQ): Knowledge, usefulness, specific metrics used.

  • Code Complexity Questions (CQ): Problems arising from complexity, frequency of use of complexity metrics.

IV. RESULTS

A. Demographics Analysis
  1. Project Description:

    • 79.8% respondents focused on Scientific Computing Software.

  2. Project Size:

    • Most respondents were part of small teams; this could affect their metrics usage and knowledge.

  3. Project Role:

    • Predominantly technical roles (developers/architects), impacting perceived metrics importance.

  4. Project Development Stage:

    • Majority were in the released phase, indicating established metrics programs should be in place.

B. Overall Analysis of Knowledge and Use
  • Majority reported low metrics knowledge (GQ1) and usefulness (GQ3).

  • Metrics Knowledge vs Usefulness Table: Evidence of a correlation between knowledge level and perceived usefulness (p < .01).

C. Knowledge of Specific Metrics
  • 89 unique metrics identified: / categorized into:

    • Code Metrics

    • Process Metrics

    • Testing Metrics

    • General Quality Metrics

    • Performance Metrics

    • Recognition Metrics

D. Productivity Evaluation Using Metrics
  • Metrics rarely used for individual/team evaluation.

E. Influence of Demographics on Metrics Knowledge and Use
  1. Project Size: Smaller teams reported less knowledge and perceived usefulness than larger teams (χ2, p-value < .001).

  2. Project Role: No significant difference in metrics knowledge or usefulness perception across roles.

  3. Project Stage: Researchers with unreleased software reported a more varied perception of metrics knowledge and usefulness (p-value < .01).

F. Code Complexity Findings
  • Most respondents acknowledged code complexity as a problem but not useful frequency of associated metrics.

V. DISCUSSION

Insights by Research Questions
  • RQ1: Majority reported low to very low knowledge; familiarity with a high number of metrics.

  • RQ2: Performance and testing metrics most frequently recognized and utilized.

  • RQ3: A strong relationship exists between perceived usefulness and the likelihood of using metrics.

  • RQ4: Most agree code complexity is an issue needing attention.

  • RQ5: Low usage and perceived utility of complexity metrics despite reported complexity problems.

VI. THREATS TO VALIDITY

A. Internal Threats
  1. Survey design may introduce bias; questions were neutrally worded.

  2. There may be selection bias in who participated.

B. External Threats
  • Survey sample may not represent all research software developers due to targeted mailing lists.

C. Construct Threats
  • Possible misunderstanding of survey questions by respondents.

VII. CONCLUSIONS

  • The survey showed research developers have metric knowledge but limited actual application of SE metrics; code complexity remains poorly managed despite acknowledgment of its issues

  • Need for further exploration of process metrics to increase adoption of useful metrics.

ACKNOWLEDGMENTS

  • Recognition of survey respondents and support from NSF grants.

APPENDIX: SPECIFIC METRICS IDENTIFIED

High-Level Categories:
  1. Code Metrics

  2. General Quality Metrics

  3. Performance Metrics

  4. Process Metrics

  5. Recognition Metrics

  6. Testing Metrics