(413) A Survey of Software Metric Use in Research Software Development

A Survey of Software Metric Use in Research Software Development

Abstract

Background

Complex software libraries and tools are essential for conducting research across various disciplines (science, engineering, business, humanities).
Ensuring quality and reliability of software is crucial to avoid less trustworthy results that could mislead research conclusions.

Aims

This work aims to understand research software developers' use of traditional software engineering concepts, like metrics, to evaluate software quality and the software development process.
The study aims to identify relevant metrics for research software compared to traditional software engineering metrics.

Method

A survey of research software developers was conducted to assess their knowledge and usage of code and process metrics, along with the influence of demographics on these metrics.

Results

Participants: 129 respondents
Most respondents knew about metrics; knowledge of specific metrics was limited.
Metrics most used concerned performance and testing; less focus on code complexity metrics, even though it poses a challenge.

Conclusions

Research software developers value metrics but face obstacles in their implementation. Further research is needed to evaluate metrics for continuous process improvement.

Index Terms

Survey, Software Metrics, Software Engineering, Research Software

I. INTRODUCTION

Researchers in diverse fields increasingly use software for research (termed research software).
Research software engineers (RSEs) design and implement software for research and seek recognition for their contributions.
Quality of research software impacts the reliability of research results.
Previous work demonstrates the need for software engineering (SE) practices to improve research software quality, including requirements, design, testing, and code complexity.
Metrics are essential to evaluate software reliability and quality over time.

A. Software Metrics Overview

Definition: A metric is a function that assesses software or process effectiveness; measurements apply metrics to derive values.
Important metric categories:
- In-process (development process metrics)
- Code-oriented (complexity metrics)
Research software often includes elements like version control and testing in open-source projects.

II. RESEARCH QUESTIONS

The research seeks to answer:
- RQ1: What is the level of metrics knowledge and use by research software developers?
- RQ2: Which metrics are most commonly used?
- RQ3: What is the relationship between knowledge of metrics and their perceived usefulness?
- RQ4: Do developers perceive code complexity as a problem?
- RQ5: Is there a relationship between complexity problems and the use of associated metrics?

Non-Traditional Metrics

Several unique metrics relevant to research software identified:

Performance Metrics:
- Examples: FLOPS (floating point operations/second), I/O operations, network throughput (MB/sec).
Green Computing Metrics:
- Focus on energy efficiency and carbon emissions.
Correctness and Reproducibility:
- Metrics for acceptance of results and error tolerance in modeling and simulation.
Failure Rate Metrics: Frequency of software failures is critical (especially in large systems).
Recognition Metrics:
- Recognition through citations or project downloads is important to RSEs.

III. SURVEY DESIGN

Developed a survey to gather insights into metrics impact on projects:
- Solicitation sent to high-performance computing and research software mailing lists.
- Target Audience: Various domains of research software development.

Survey Questions

General Questions (GQ):
- Project description, team size, project role, development stage, etc.
Metrics Questions (MQ): Knowledge, usefulness, specific metrics used.
Code Complexity Questions (CQ): Problems arising from complexity, frequency of use of complexity metrics.

IV. RESULTS

A. Demographics Analysis

Project Description:
- 79.8% respondents focused on Scientific Computing Software.
Project Size:
- Most respondents were part of small teams; this could affect their metrics usage and knowledge.
Project Role:
- Predominantly technical roles (developers/architects), impacting perceived metrics importance.
Project Development Stage:
- Majority were in the released phase, indicating established metrics programs should be in place.

B. Overall Analysis of Knowledge and Use

Majority reported low metrics knowledge (GQ1) and usefulness (GQ3).
Metrics Knowledge vs Usefulness Table: Evidence of a correlation between knowledge level and perceived usefulness (p < .01).

C. Knowledge of Specific Metrics

89 unique metrics identified: / categorized into:
- Code Metrics
- Process Metrics
- Testing Metrics
- General Quality Metrics
- Performance Metrics
- Recognition Metrics

D. Productivity Evaluation Using Metrics

Metrics rarely used for individual/team evaluation.

E. Influence of Demographics on Metrics Knowledge and Use

Project Size: Smaller teams reported less knowledge and perceived usefulness than larger teams (χ2, p-value < .001).
Project Role: No significant difference in metrics knowledge or usefulness perception across roles.
Project Stage: Researchers with unreleased software reported a more varied perception of metrics knowledge and usefulness (p-value < .01).

F. Code Complexity Findings

Most respondents acknowledged code complexity as a problem but not useful frequency of associated metrics.

V. DISCUSSION

Insights by Research Questions

RQ1: Majority reported low to very low knowledge; familiarity with a high number of metrics.
RQ2: Performance and testing metrics most frequently recognized and utilized.
RQ3: A strong relationship exists between perceived usefulness and the likelihood of using metrics.
RQ4: Most agree code complexity is an issue needing attention.
RQ5: Low usage and perceived utility of complexity metrics despite reported complexity problems.

VI. THREATS TO VALIDITY

A. Internal Threats

Survey design may introduce bias; questions were neutrally worded.
There may be selection bias in who participated.

B. External Threats

Survey sample may not represent all research software developers due to targeted mailing lists.

C. Construct Threats

Possible misunderstanding of survey questions by respondents.

VII. CONCLUSIONS

The survey showed research developers have metric knowledge but limited actual application of SE metrics; code complexity remains poorly managed despite acknowledgment of its issues
Need for further exploration of process metrics to increase adoption of useful metrics.

ACKNOWLEDGMENTS

Recognition of survey respondents and support from NSF grants.

APPENDIX: SPECIFIC METRICS IDENTIFIED

High-Level Categories:

Code Metrics
General Quality Metrics
Performance Metrics
Process Metrics
Recognition Metrics
Testing Metrics