1/195
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Data collection considerations
1. Select the right data type 2. Determine the time frame for data collection 3. How the data will be collected 4. How much data to collect 5. Choose data sources 6. Decide what data to use
Population
All possible data values in a certain dataset
Sample
A part of the population that is representative of the population
First-party data
Data collected by an individual or group using their own resources
Second-party data
Data collected by a group directly from its audience and then sold
Third-party data
Data collected from outside sources who did not collect it directly
Internal data (Primary data)
Collected by a researcher from first-hand sources
External data (Secondary data)
Gathered by other people or from other research
Quantitative data
Can be measured and counted using numbers (quantity, amount, range)
Qualitative data
Cannot be counted, measured or easily expressed in numbers (names, categories, descriptions)
Discrete data
Data that is counted and has a limited number of values
Continuous data
Data that is measured and can have any numeric value
Nominal data
A type of qualitative data that is categorized without a set order
Ordinal data
A type of qualitative data with a set order or scale
Structured data
Data organized in a certain format such as rows and columns
Unstructured data
Data that is not organized in an easily identifiable manner
Data model
A model that is used for organizing data elements and how they relate to one another
Data elements
Pieces of information, such as people's names, account numbers, and addresses
Data modeling
The process of creating diagrams that visually represent how data is organized and structured
Levels of data modeling
1. Conceptual modeling 2. Logical data modeling 3. Physical data modeling
Physical data model
Defines all entities and attributes used.
Entity Relationship Diagram (ERD)
Visual way to understand the relationship between entities in the data model.
Unified Modeling Language (UML)
Detailed diagrams that describe the structure of a system by showing the system's entities, attributes, operations, and their relationships.
Data type
A specific kind of data attribute that tells what kind of value that is.
Text or string
A sequence of characters and punctuation that contains textual information.
Boolean
Data type with only two possible values: true or false.
Operator
A symbol that names the operation or calculation to be performed.
AND operator
Lets you stack both of your conditions.
OR operator
Lets you move forward if either one of your two conditions is met.
NOT operator
Lets you filter by subtracting specific conditions from the results.
Row
Record.
Column
Field.
Wide data
Data where each row contains multiple data points for the particular items identified in the columns.
Long data
Data where each row contains a single data point for a particular item.
Data transformation
The process of changing the data's format, structure, or values.
Data organization
Better organized data is easier to use.
Data compatibility
Different applications or systems can then use the same data.
Data migration
Data with matching formats can be moved from one system to another.
Data merging
Data with the same organization can be merged together.
Data enhancement
Data can be displayed with more detailed fields.
Data comparison
Apples-to-apples comparisons of the data can then be made.
Bias
A conscious or unconscious preference in favor of or against a person, group of people or thing.
Data bias
A type of error that systematically skews results in a certain direction.
Fairness
A quality of data analysis that does not create or reinforce bias.
Unbiased sampling
When a sample is representative of the population being measured - This is achieved using random sampling.
Sampling bias
When a sample isn't representative of the population as a whole.
Observer bias
The tendency for different people to observe things differently.
Interpretation
The tendency to interpret ambiguous situations in a positive or negative way.
Confirmation bias
The tendency to search for or interpret information in a way that confirms pre-existing beliefs.
Good data
Reliable, accurate, complete, unbiased information.
Bad data
Inaccurate, incomplete or biased information.
Original data
Validated with original source.
Comprehensive data
Contains all critical information needed to answer the question.
Current
Relevant to the task at hand
Cited
Provides credibility
Not current
Out of date and irrelevant
Not cited
Lacks credibility
Ethics
Well-founded standards of right and wrong that prescribe what humans ought to do, usually in terms of rights, obligations, benefits to society, fairness, or specific virtues
Data ethics
Well founded standards of right and wrong that indicate how data is collected, shared, and used
GDPR
General Data Protection Regulation of the European Union
Ownership
Individuals own the raw data they provide and they have control over its usage, how it's processed, and how it's shared
Transaction transparency
All data-processing activities and algorithms should be completely explainable and understood by the individual who provides their data
Consent
An individual's right to know explicit details about how and why their data will be used before agreeing to provide it
Currency
Individuals should be aware of financial transactions resulting from the use of their personal data and the scale of these transactions
Privacy
Preserving a data subject's information and activity any time a data transaction occurs
Openness
Free access, usage, and sharing of data
Data anonymization
The process of protecting people's private or sensitive data by eliminating that kind of information
Personally identifiable information (PII)
Information that can be used by itself or with other data to track down a person's identity
De-identification
A process used to wipe data clean of all personally identifying information
Openness (or open data)
The aspect of data ethics that promotes the free access, usage, and sharing of data
Availability and access
Open data must be available as a whole, preferably by downloading over the Internet in a convenient and modifiable form
Reuse and redistribution
Open data must be provided under terms that allow reuse and redistribution including the ability to use it with other datasets
Universal participation
Everyone must be able to use, reuse, and redistribute the data without discrimination
Data interoperability
The ability of data systems and services to openly connect and share data
Database
A collection of data stored in a computer system
Metadata
Data about data that tells you where the data comes from, when and how it was created, and what it's all about
Relational database
A database that contains a series of related tables that can be connected to form relationships
Primary key
An identifier that references a column in which each value is unique (the unique identifier for each row in a table)
Foreign keys
A field within a table that is primary key in another table (how one table can be connected to another)
Unique data constraint
Used to ensure data in a specific column is unique
Record identification
Uniquely identifies a record in a relational database table
Linking columns
A column or group of columns in a relational database table that provides a link between the data in two tables
Primary key limitation
Only one primary key is allowed in a table
Foreign key reference
Refers to the field in a table that's the primary key of another table
Null value restriction
Cannot contain null or blank values
Foreign key allowance
More than one foreign key is allowed to exist in a table
Normalization
A process of organizing data in a relational database to eliminate data redundancy, increase data integrity, and reduce complexity
Composite key
A primary key constructed using multiple columns of a table
Descriptive metadata
Metadata that describes a piece of data and can be used to identify it at a later point in time (a book's ISBN, author and title)
Structural metadata
Metadata that indicates how a piece of data is organized and whether it is part of one, or more than one, data collection
Administrative metadata
Metadata that indicates the technical source of a digital asset (metadata in a digital photo)
Elements of metadata
File or document type, date, time, and creator, title and description, geolocation, tags and categories, modification history, access permissions
Reliability of metadata
Metadata helps data analysts confirm their data is reliable by making sure it is accurate, precise, relevant, and timely
Consistency in databases
When a database is consistent, it's easier to discover relationships between the data inside the database and data that exists elsewhere
Metadata repositories
Specialized databases specifically created to store and manage metadata, describing where the metadata came from and storing that data in an accessible form
Access to metadata
Provides data analysts with quick and easy access to the data
Data classification
Data analysts can categorize data when it follows a consistent format, which is beneficial in cleaning and processing data
Data storage
Consistent and uniform data can be efficiently stored in various data repositories, streamlining storage management tasks
Data access
Users, applications, and systems can efficiently locate and use data
Data Governance
A process to ensure the formal management of a company's data assets.