Data Integrity and Documentation in Research: Key Concepts and Best Practices

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/183

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

184 Terms

1
New cards

Research Data

The recorded factual material commonly accepted in the scientific community as of sufficient quality to validate and replicate research findings, regardless of whether the data are used to support scholarly publications.

2
New cards

Research Integrity

An umbrella term that covers the use of honest and verifiable methods in proposing, performing, and evaluating research; reporting research results with particular attention to adherence to rules, regulations, guidelines; and following commonly accepted professional codes or norms.

3
New cards

Reproducibility

Efforts and strategies that are generally concerned with establishing the credibility, reliability, and validity of scientific research.

4
New cards

Methods Reproducibility

The provision of enough detail about study procedures and data so the same procedures could, in theory (or in actuality) be exactly repeated.

5
New cards

Results Reproducibility

Refers to obtaining the same results from the conduct of an independent study whose procedures are as closely matched to the original experiment as possible. Also called replicability.

6
New cards

Robustness

The stability of experimental conclusions to variations in either baseline assumptions or experimental procedures.

7
New cards

Generalizability

The persistence of an effect in settings different from and outside of an experimental framework.

8
New cards

Data Provenance

The documented trail that describes the origin of a piece of data as well as how it has been processed and transformed over time.

9
New cards

Data Management

The process of validating, organizing, protecting, maintaining, and processing scientific data to ensure the accessibility, reliability, and quality of the scientific data for its users.

10
New cards

Data Sharing

The act of making scientific data available for use by others (e.g., the larger research community, institutions, the broader public), for example, via an established repository.

<p>The act of making scientific data available for use by others (e.g., the larger research community, institutions, the broader public), for example, via an established repository.</p>
11
New cards

Data Quality

A broad concept that refers to the degree to which a set of data is fit for its intended purpose. Related to data management, but also includes topics involving methodological rigor.

12
New cards

Data Usability

The ability to open, understand, make use of, and build upon a set of data. 'Reuse' encompasses many potential activities, including using a dataset for education and training (of both human researchers and algorithms), testing new hypotheses (which can involve combining multiple extant datasets), and more.

13
New cards

Disorganization

A way to lose data that refers to the lack of structured organization of data.

14
New cards

Missing documentation/metadata

A way to lose data that refers to the absence of necessary documentation that describes the data.

15
New cards

Failure of storage media

A way to lose data that occurs when the physical devices used to store data fail.

16
New cards

Obsolescence

A way to lose data that refers to the outdated nature of technology or formats that can no longer be accessed.

17
New cards

Improper archiving

A way to lose data that occurs when data is not archived correctly, leading to potential loss.

18
New cards

Foundational to ensuring data integrity

The role of data management in maintaining the accuracy and consistency of data over its lifecycle.

19
New cards

Collaborative efforts in data management

The shared responsibility among all individuals working with research data to ensure it is properly managed.

20
New cards

Primary investigator's responsibility

The ultimate accountability for data management and sharing lies with the primary investigator.

<p>The ultimate accountability for data management and sharing lies with the primary investigator.</p>
21
New cards

Transparency in research

The principle that proper data management is essential for ensuring openness and reproducibility in the research process.

22
New cards

Requirement of proper data management

The necessity of effective data management practices to facilitate efficient, collaborative, and rigorous research.

23
New cards

Preventing information loss

The role of proper data management in ensuring that materials are not lost and research can proceed efficiently.

24
New cards

Motivations and Expectations

The reasons behind the efforts in data management and sharing.

25
New cards

Discussion on losing data

A conversation about the various ways in which data can be lost.

26
New cards

Practices and strategies in data management

The methods employed to support the quality and usability of research data over time.

27
New cards

Documentation in data management

The importance of maintaining records that describe the data for future reference and usability.

28
New cards

Storage in data management

The methods and practices for securely holding data to prevent loss.

29
New cards

Project Leadership

Empower others to manage research data well.

30
New cards

Setting policy

Responding to feedback.

31
New cards

Communicating practices

Communicating practices and procedures to project collaborators.

32
New cards

Monitoring and auditing

Monitoring and auditing data management practices.

33
New cards

Implementing standardized practices

Implementing standardized practices and procedures.

34
New cards

Providing feedback

Providing feedback to leadership.

35
New cards

Asking questions

Asking questions to clarify data management processes.

36
New cards

Project Team

Contribute to the broad practice of data management.

37
New cards

The Stanford Data Retention Policy

A policy outlining data retention requirements.

38
New cards

The NIH Data Management and Sharing Policy

A policy that governs data management and sharing practices.

<p>A policy that governs data management and sharing practices.</p>
39
New cards

Potential consequence of losing research data

A. The research findings based on the data may be called into question. B. Financial penalties from funding agencies and/or study sponsors. C. Research activities must immediately cease. D. All of the above.

40
New cards

FAIR Guiding Principles

Principles that guide data management practices.

41
New cards

Findable

Data be easy to find by both humans and computers (e.g. use of standardized file names, persistent identifiers).

42
New cards

Accessible

There is a clearly defined method for accessing the data (e.g. use of standardized file organization, data storage and backups).

43
New cards

Interoperable

Data should be usable across a range of applications and workflows (e.g. use of standards, common file formats).

44
New cards

Reusable

Data should be saved, organized, and described with its future (re)use in mind, even if there are not plans to share it openly.

45
New cards

Mischievous Meatball

A scenario where Meatball the cat accidentally knocks your computer off a table, irrevocably damaging its hard drive.

46
New cards

Saving Data

Know where data can be saved.

47
New cards

Sensitive data

Data that must be protected against unauthorized access.

48
New cards

Policies and regulations

Require that appropriate administrative, physical and technical safeguards be taken to ensure the confidentiality, integrity and security of certain information (e.g. HIPAA's 'security rule').

49
New cards

3-2-1 rule

Maintain multiple backups whenever possible.

50
New cards

Working storage vs Long term storage

They are not the same thing.

51
New cards

Excel limitations

Excel is limited in that it changes everything to dates, has compatibility issues between versions, lacks audit trails and straightforward version control, and calculations are largely invisible.

<p>Excel is limited in that it changes everything to dates, has compatibility issues between versions, lacks audit trails and straightforward version control, and calculations are largely invisible.</p>
52
New cards

Staying Organized

Making finding things easy by keeping project-related files in project folders or directories that have a standard structure.

53
New cards

Standardized naming conventions

Maintain standardized naming conventions for both files and the contents within files (i.e., variable names).

54
New cards

Data dictionary or codebook

Names should be recorded in a data dictionary or codebook.

55
New cards

ReadMe files

Files that provide details about the contents of a dataset or a collection of related files.

56
New cards

Dryad

An example of a data repository that requires a ReadMe to be uploaded with any shared data.

57
New cards

File organization description

A short description of how related files are organized (e.g. directory, subdirectory structures).

58
New cards

File contents description

A short description of what each file contains.

59
New cards

File relationships description

A short description of the relationships between different files (e.g. versions, linked files).

60
New cards

Plain text file

The recommended format for writing a README file (e.g. project-name_readme.txt).

61
New cards

Project name

The name of the project that should be included in the ReadMe.

62
New cards

Dataset authors

Individuals who created the dataset and should be credited in the ReadMe.

63
New cards

Data citation

Recommended citation format for the dataset included in the ReadMe.

64
New cards

License information

Details about the licensing of the dataset included in the ReadMe.

65
New cards

Applicable grant IDs

Grant identifiers that should be included in the ReadMe.

66
New cards

Citations

References to related papers, code, etc., that should be included in the ReadMe.

67
New cards

File naming convention

A structured way to name files, such as [Experiment Name]_[Your Name]_[Description]_[YYYY-MM-DD].

68
New cards

Data type

The form in which data is collected and/or organized (e.g. spreadsheets, images, etc).

69
New cards

Data size

The magnitude of the data, such as the number of participants or approximate file size.

70
New cards

Data restrictions

Reasons for protecting data and/or limiting how it is shared (e.g. IP-related concerns).

71
New cards

Data sensitivity

The risks of disclosure and the policies, laws, and regulations that apply.

72
New cards

Identifying information

Information that can identify an individual, such as names and geographic subdivisions smaller than a state.

73
New cards

Personally identifiable information (PII)

Information that can be used to identify a person.

74
New cards

Private Information

Information that a person could reasonably expect would not be shared.

75
New cards

Confidentiality

The protection of private information from unauthorized access.

76
New cards

Vehicle identifiers and serial numbers

Identifiers related to vehicles, including license plate numbers.

77
New cards

Device identifiers and serial numbers

Identifiers related to devices, including serial numbers.

78
New cards

Web Universal Resource Locators (URLs)

Addresses used to access resources on the internet.

79
New cards

Internet Protocol (IP) address numbers

Numerical labels assigned to devices connected to a computer network.

80
New cards

Regulated Data

Information that is protected by local, national, or international statute or regulation mandating certain restrictions.

81
New cards

Biometric identifiers

Unique biological characteristics used for identification, including finger and voice prints.

82
New cards

Full face photographs

Images capturing the entire face of an individual.

83
New cards

Unique identifying number, characteristic, or code

Any other distinct identifier that can be used to recognize an individual.

84
New cards

Protected health information (PHI)

Information that can be linked to a particular person generated in the course of healthcare.

85
New cards

Data Management Plans (DMP)

Plans outlining how data will be managed and shared in research.

86
New cards

Data Types

A description of data that will be managed and shared.

87
New cards

Software

An outline of any specialized software needed to use the data.

88
New cards

Standards

The standards that will be applied to ensure usability/interoperability.

89
New cards

Preservation and sharing

How data will be made available to others, such as through a data repository.

90
New cards

Restrictions on sharing

Limits on sharing data, such as participant privacy.

91
New cards

Standard Operating Procedure (SOP)

A set of step-by-step instructions compiled by an organization to help workers carry out routine operations.

92
New cards

Data validation and quality assurance measures

Processes to ensure the accuracy and quality of data collected.

93
New cards

Data manipulations from raw to final data

Transformations applied to raw data to prepare it for analysis.

94
New cards

Documentation of process

Records of protocols, standard operating procedures, and methodologies.

95
New cards

Documentation of content

Records such as data dictionaries and codebooks that explain the data.

96
New cards

Field notebooks

Notebooks used for recording observations and data in the field.

97
New cards

Lab notebooks

Notebooks used for documenting experiments and research findings.

98
New cards

Codes for missing values

Designations used in datasets to indicate missing data points.

99
New cards

Collection Confusion

Issues arising from inconsistent data collection methods by different researchers.

100
New cards

Good Clinical Practice (GCP) Guidelines

Detailed, written instructions to achieve uniformity of the performance of a specific function.