Data Stewardship II

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/24

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 3:21 PM on 5/13/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

25 Terms

1
New cards

What are the 3 fundamental concepts covered in Data Stewardship II?

The FAIR principles, FAIR data, and FAIRification practices.

2
New cards

What are the learning goals of Data Stewardship II?

Increase awareness for FAIR data principles, understand methods and processes to keep datasets comprehensible, know the key principles of data-level implementation FAIR, and know the key principles of ALCOA++.

3
New cards

Why has data stewardship become increasingly important in the modern context?

Because of the exponential growth of data worldwide (IoT), new technologies (AI, Edge Computing), digitalization of society and economy placing data access and sharing at the core of innovation and public trust. There are huge possibilities for new data services and insights — but also huge risks: data could be used unethically, the digital divide could become an information divide, and privacy could be invaded.

4
New cards

What are the FAIR principles and when were they first published?

The FAIR principles were first published in 2016 (Wilkinson et al., Scientific Data). They contain guidelines for good data management practice that aim at making data Findable, Accessible, Interoperable and Reusable. "Data" refers to all kinds of digital objects produced in research: research data, code, software, presentations, etc.

5
New cards

What is the overall structure of the FAIR principles?

There are 4 foundational principles (F, A, I, R) and a total of 15 guiding principles that more explicitly and measurably describe how FAIRness can be achieved through technical implementation. Although the FAIR principles originate from the life sciences, they can be applied within all research disciplines.

6
New cards

What are the 2 key thoughts at the heart of the FAIR principles?

1) Both humans AND machines are intended as digesters of data — leading to the creation of the Internet of FAIR Data and Services, an ecosystem that is fast to respond and automatically adapts.

(2) The FAIR principles apply to BOTH data AND metadata — this is why the principles use the term "(meta)data".

7
New cards

Why are research funders and publishers pushing for FAIR data?

Reusing existing datasets for new research purposes is becoming more common across all disciplines. Research funders and publishers are asking researchers to make datasets available to others. Research institutes are promoting transparency and accessibility of locally produced datasets. Datasets need to be FAIR to facilitate this sharing and reuse.

8
New cards

What does Findable mean in FAIR data?

Findable means the data can be discovered by both humans and machines — for instance by exposing meaningful machine-actionable metadata and keywords to search engines and research data catalogues. The data are referenced with unique and persistent identifiers (e.g. DOIs or Handles) and the metadata include the identifier of the data they describe. The key practice linked to Findable is Metadata.

9
New cards

What does Accessible mean in FAIR data?

Accessible means the data are archived in long-term storage and can be made available using standard technical procedures. This does NOT mean data must be openly available for everyone — but information on how the data could be retrieved (or not) must be available. Even restricted data can be FAIR as long as the access conditions are clearly stated and machine-readable (e.g. via standard licences).

10
New cards

What does Interoperable mean in FAIR data?

Interoperable means the data can be exchanged and used across different applications and systems — also in the future — by using open file formats. It also means the data can be integrated with other datasets from the same or different research fields. This is made possible by using metadata standards, standard ontologies, controlled vocabularies, and meaningful links between data and related digital research objects. The key practice linked to Interoperable is Ontologies.

11
New cards

What does Reusable mean in FAIR data?

Reusable means the data are sufficiently described and documented so that they can be used again for new research purposes. This requires rich metadata, clear usage conditions (licenses), and provenance information. The key practice linked to Reusable is Licenses.

12
New cards

What are the FAIRification practices (How to FAIR)?

Five key practices: Documentation (adds context, makes data easier to understand and reuse), File Formats (determine how data can be used — must be chosen for collection, processing, archiving and preservation), Access to Data (who you make data available to, how and under what conditions), Persistent Identifiers (long-lasting references to reliably identify, verify and locate data), and Data Licenses (legal arrangement specifying what users can do with the data).

13
New cards

What are the two types of data documentation?

Data-level documentation — includes information about specific data files: data type, structure (questions, variables, concepts), data processing procedures. Project-level documentation — describes when, how and why data were generated and by whom, how data were processed, and what quality assurance measures were used.

14
New cards

What is a data map and what does it include in larger projects?

A data map is a documentation tool that helps describe how data flows through a project. For small projects, the entire code is stored. For larger projects, researchers prefer to describe the method, the model selection and the packages used — this is the exam-relevant answer.

15
New cards

What is a Persistent Identifier (PID) and what are the two most important examples?

A PID is a long-lasting reference to a digital resource that provides the information required to reliably identify, verify and locate research data. A PID may also be connected to metadata describing the resource. The two most important PIDs are: DOI (Digital Object Identifier) — mainly assigned to resources ready for public dissemination, and Handle — used to persistently identify other categories of digital resources (e.g. those created in labs) to make them referable by software and workflows.

16
New cards

What is a data license and why is it important for FAIR?

A data license is a legal arrangement between the creator of the data and the end-user specifying what users can do with the data. Licenses are essential for the Reusable dimension of FAIR — without a clear license, users cannot legally reuse data even if it is accessible. Examples include Creative Commons licenses (CC-BY, CC0) for data, and Open Source Initiative approved licenses for software.

17
New cards

What is the difference between proprietary and non-proprietary file formats?

Proprietary file formats (e.g. .nef from Nikon, .wma from Microsoft) are owned by a specific company and may not be accessible in the future. Non-proprietary formats (e.g. .txt, .csv) can be used with a variety of software and are preferred for long-term preservation. You may need to keep data files in multiple formats for different purposes: collection, processing/analysis, reuse and preservation.

18
New cards

Why are metadata more important than data from a FAIR perspective?

Because metadata would always be openly available and they link research data and publications in the Internet of FAIR Data and Services. Even if data access is restricted, metadata must remain accessible. The distinction between data and metadata is not ontological but grounded in use — some researchers' metadata can be other researchers' data. Data documentation is read by humans; metadata are primarily processed by machines.

19
New cards

What are the 3 types of metadata in FAIR?

Administrative metadata — data about a project/resource relevant for managing it (owner, PI, collaborators, funder, project period) — assigned before data collection. Descriptive or citation metadata — data that allow people to discover and identify a dataset (authors, title, abstract, keywords, persistent identifier, related publications). Structural metadata — data about how a dataset came about and how it is internally structured (unit of analysis, collection method, sampling procedure, sample size, categories, variables).

20
New cards

What is a metadata standard?

A metadata standard is a subject-specific guide to metadata. Metadata elements are grouped into sets designed for a specific purpose with standard names and definitions. Rules on what content must be included, what syntax to use, or a controlled vocabulary can be included. A starting point can be a taxonomy or an ontology.

21
New cards

What is the difference between a Taxonomy and an Ontology?

A taxonomy is a hierarchical classification system that organizes concepts into categories (e.g. Motor Vehicle → Passenger Car → Sedan). An ontology is a more complex knowledge representation system that defines not just categories but also relationships and properties between concepts, enabling machines to reason about data. Both can be used as starting points for metadata standards.

22
New cards

What is ALCOA++ and what is its origin?

ALCOA++ is a widely recognized framework for ensuring data integrity in regulated industries. It originated with the FDA (Stan W. Woollen, 1990s) and was later expanded by the European Medicines Agency (EMA) to reinforce robust data management principles. It is particularly relevant in Life Sciences, pharma and clinical trials.

23
New cards

What do the 9 letters of ALCOA++ stand for?

A — Attributable: clearly traceable to the person or system who created or updated the data.

L — Legible: easily readable and permanently recorded.

C — Contemporaneous: documented at the time of the activity or event.

O — Original: first-capture or source data.

A — Accurate: correct, truthful and reliable.

Complete: all required information is recorded without omissions.

Consistent: harmonized and logically aligned with related data.

Enduring: permanently maintained and protected from loss or alteration.

Available: easily accessible whenever needed.

24
New cards

How do FAIR and ALCOA++ complement each other?

FAIR focuses on making research data findable, accessible, interoperable and reusable — primarily for sharing and discovery in the broader research community. ALCOA++ focuses on ensuring the integrity of data in regulated environments — primarily during data collection, processing and recording in clinical or industrial settings. Together they cover both the quality of data at creation (ALCOA++) and the accessibility and reusability of data after creation (FAIR)

25
New cards

What is the core message of Data Stewardship II?

“Keep the project clean through FAIR." Research data should be managed according to FAIR principles from the start of a project. Applying FAIR and ALCOA++ together helps ensure that data are not only trustworthy and integral (ALCOA++) but also discoverable, accessible and reusable by others (FAIR) — supporting open science, transparency, and reprod