Marketing Research Notes

Marketing Research Chapter 1 (Marketing, Marketing Strategy, and Marketing Research)

Marketing is defined as the activity, set of institutions, and processes for creating, communicating, delivering, and exchanging offerings that have value for customers.
The marketing process includes:
1. Situation Analysis
2. Strategy
3. Tactics
4. Objectives
Situation Analysis is comprised of the 3 C’s: Company, Customers, and Competitors, and sometimes Collaborator and Context.
Tactics are comprised of the 4 P’s: Product, Price, Place, and Promotion.
Objectives are comprised of Margins, ROI, CLV

Marketing Strategy

Marketing Strategy is defined as a roadmap for achieving marketing goals by:
- Identifying the target audience
- Differentiating the brand through a unique value proposition
- Utilizing the marketing mix (4 P’s: product, price, place, and promotion) to effectively position offerings in the market.
Strategy: STP
1. Segmentation
2. Targeting
3. Positioning

Marketing Research

Marketing Research is the process of designing, gathering, analyzing, and reporting information that may be used to solve a specific marketing problem.
Learning Marketing Research (Individual-level) for:
- Career Growth Opportunities
- Competitive Edge
- Effective Communication
Learning Marketing Research (Firm-level) for:
- Improved Decision-Making
- Enhanced Customer Understanding
- Optimized Marketing Campaigns
- Increased ROI
- Competitive Advantage
Uses of Marketing Research are:
1. Identify marketing opportunities and problems
2. Generate, refine, and evaluate potential marketing actions
3. Monitor marketing performance

Chapter 2 (Marketing Research Process)

There are 3 preliminary steps during the marketing research process.
1. Research Purpose
2. Research objective
3. Estimating Value of research information
There is a five-step marketing research process.
1. Define Research Purpose and Objectives
2. Research Design
3. Data Collection
4. Data Analysis and develop insights
5. Communicate the insight
Defining the research purpose is the first critical step during the marketing research process. It is crucial that the purpose or problem is correctly defined; if it is not, all the steps afterwards will be seen as a complete waste of time and resources. After the problem has been identified, the next step is to formulate the research objectives for that problem. A research objective specifies the information needed in order to solve the problem at hand.
A research design is the framework or approach used to meet research objectives, the research design will determine how the data is being collected, analyzed, and interpreted. There are three main types of research design these are
1. Exploratory Design
2. Descriptive Design
3. Causal Design
Data Collection is comprised of two types of information, Primary Information and Secondary Information. Primary Information is information that is collected specifically for the problem at hand examples are surveys, focus groups, and observations. Secondary Information is information that is already collected and available for external use for example, government reports, and industry publications.
Communicate the Insight which is the final step of the marketing research process, this includes presenting your findings to your stakeholders in a clear and concise manner.
A problem is the gap between what was supposed to happen and what actually happened. For managers, to solve a problem they have to get to the root of the problem and find a strategic solution in order to solve it.
A marketing opportunity is a favorable situation where a company can achieve growth or competitive advantage. An example would be to discover new uses for a product (baking soda).
Discovering the problem at hand and accumulating a marketing opportunity is the foundation for effective marketing research.
Research Purpose, Research Objectives, and Hypothesis
A research purpose in marketing research is the studys’ main goal, guiding its objectives and focus. A research objective is a clear statement or question that specifies the information needed to solve a problem or seize an opportunity. Hypothesis is a testable statement that predicts the relationship between variables or outcomes based on assumptions or observations. Hypothesis testing is used in marketing research to primarily focus on research efforts. Provide clarity and supports decision-making. Methods for hypothesis testing would be Experiments or focus groups.
A construct is an abstract concept that is not directly measurable but can be represented through measurable variables or attributes. An example of this would be Brand Loyalty, Customer Satisfaction, and Intention to purchase.
Key characteristics of Constructs
1. Abstract in Nature
2. Multi-Dimensional
3. Validity and Reliability
4. Operationalization
Marketing Research is NOT the best solution when: The information is already available, the timing is wrong, and Cost outweighs the value. Value of Research, if the cost of the research outweighs the value of information, then marketing research is not needed.

Chapter 3 (Research Design)

Research design is defined as specifying the methods to collect and analyze information for a research project. Research design addresses ethical issues early in the process, it provides a structured framework to address problems systematically which saves time and costs through preplanning.
Research Design is classified into three categories, Exploratory, Descriptive, and Causal. The choice of design depends largely on the objectives of the research.
For Exploratory Design, the purpose of this design is to gain background information and develop a hypothesis. When to use the exploratory design is based on these three characteristics:
1. When little is known about the problem or topic
2. To explore ideas or generate insights for further research
3. Informal
4. Flexible
5. Unstructured
Examples of exploratory design are Interviews, case studies, secondary data, and focus groups
Uses of Exploratory Research:
- Gain Background Information
- Define Terms (Satisfaction with service quality)
- Clarify problems and hypothesis
- Establish research Priorities
Descriptive Research Design, the purpose of this design is to describe the state of a phenomenon or variable. When to use Descriptive Research Design is based on these characteristics:
1. When you need to quantify or describe specific characteristics of a population.
2. Structured
3. Pre-planned
4. Quantitative
Examples of descriptive research design are surveys, observations, and panel studies.
Uses of Descriptive Research:
- To describe the characteristics of certain groups, for example how do the characteristics of online buyers differ from offline buyers
- To estimate the proportion of people who behave a certain way. For example, what is the percentage of customers who are responsive to marketing promotion.
Causal Research design, the purpose of this design is to test cause-and-effect relationships between variables. When to use Causal Research design is based on these characteristics:
1. When you want to determine how one variable influence another
2. Highly controlled
3. Experimental in nature
Experimental Design is a procedure for creating an experimental setting to attribute changes in a dependent variable.
Example of Experimental Design/causal research design relationship are:
1. If advertising increases $(x)$ , then sales will also increase $(y)$ .
2. If price increases, $(x)$ , then customer purchases will decrease $(y)$ .
Types of Experimental Designs:
1. Before-After Testing
2. A/B Testing
A/B testing simultaneously tests two or more independent variables. An example would be comparing two advertisements to see which one generates more clicks/engagement.
An experiment is valid if it is based on internal validity and external Validity
- Internal Validity observes the changes in the dependent variable and the independent variable. Without internal validity, the results cannot be trusted
- External Validity observes real world problems.
Test Marketing reduced risk by assessing product performance before full-scale launch. Detective Funnel is the combination of all three research techniques.

Chapter 4 (Secondary Sources of Marketing Data)

There are two types of data, primary and secondary data.
- Primary data is the data that is directly collected by the researcher for a specific project, it is used to address the research at hand. Examples would be experimental research, surveys, focus groups.
- Secondary data is the data that is collected by someone other than the user. It is used to gather for a purpose other than solving the problem. Examples would be Census, transaction records, and government reports.
There are two types of Primary data: Observational data and Questionnaire data.
Types of secondary data:
- Internal Secondary data
- External Secondary data
Internal secondary data is the data that is collected within an organization. Examples are Sales records, invoices, and customer complaints
External secondary data is the data that is collected outside of an organization. Examples of this include government reports, and market research studies.
Key sources of External Secondary data are:
- Published sources
- Official Statistics
- Data Aggregators
Advantages of Secondary Data
1. Cost-Effective (Cheaper than Primary Data)
2. Timesaving (Requires less time compared to Primary Data)
3. Wide Availability
Disadvantages of Secondary Data
1. Incompatible Reporting Units (Data may not match)
2. Mismatched Measurement Units (Differences in how data is measured)
3. Outdated Data
4. Lack of Credibility
Information Packaged Data is a type of External Secondary data where the collection process and/or the data itself are prepackaged for all users. Packaged data has two broad classes: Syndicated data and Packaged Services.
Syndicated data is a data set that is shared with multiple marketing companies
Packaged services which are predefined research process designed specifically for individual clients.
Advantages of Syndicated data:
1. High quality data
2. Cost-Effective
3. Quick Access
Disadvantages of Syndicated Data:
1. Lack of Customization (data is standardized)
2. Long-term contracts (minimum 3 years contract)
3. Easy Access to Information
Digital Tracking Data is also another form of External Secondary Data, this type of data is generated from users’ online activities such as browsing websites, using apps, or interacting with ads.

Chapter 5 (Qualitative Research Techniques)

There are two types of research methods:
1. Qualitative Research – Explores emotions, opinions, and motivations through a non-numerical data
2. Quantitative Research – focuses on numerical data and statistical analysis.
3. Mixed methods Research combines both Qualitative and Quantitative research.
Characteristics of Quantitative Research:
1. Uses surveys, polls, or experiments
2. Data is measurable and testable
Characteristics of Qualitative Research:
1. Open-ended questions and observations
2. Responses are unique and can be categorized as being negative, positive, or neutral
Qualitative Research: Observational Techniques
Observational Techniques is one form of qualitative Research, it involves watching and recording people, objects, or activities in an organized way. There are 3 types of Observation:
1. Direct vs. Indirect
2. Covert vs. Overt
3. Structured vs. Unstructured
Direct vs. Indirect Observation
- Direct Observation observes behavior as it happens in real time. An example would be watching how shoppers select fresh produce.
- Indirect Observation observes the effects or outcomes of past behaviors. An example of this would be Waste Management analyzing trash study to see the recycling habits that could impact the environment.
Covert vs. Overt Observation
- Covert Observation, during this observation customers are unaware that they are being observed. An example would be hidden cameras recording shoppers’ behavior. An advantage of this would be that it captures authentic, unaltered behavior.
- Overt Observation, during this observation customers are aware that they are being observed. An example of this would be Tracking TV viewership.
Structured vs. Unstructured Observations
- Structured Observation identifies predefined behaviors and recording them by using forms or checklists. An example of this would be monitoring how customers interact with products on display. An advantage of this is that it is efficient and focuses on specific reports.
- Unstructured Observations has no restrictions, all behaviors within the environment is recorded. An example of this would be watching children play Legos. An advantage of this is that it is more flexible.
Advantages of Observational Techniques:
1. Natural Behavior (Subjects are unaware that they are being watched)
2. No Recall Errors (Real-Time observations)
3. Cost – effective
4. Unique Insights
5. Complementary Methods
Disadvantages of Observational Techniques
1. Small Sample Size
2. Subjective Interpretation (Observer Bias can affect conclusions)
3. Lacks Depth (Cannot uncover motives, attitudes, and intentions)
4. Restricted Scope (Only suitable for the public)
5. Observer Effect (Awareness of observation may alter behavior in overt studies)
Focus Groups
Key Components of Focus Groups are
- Moderator skills (Excellent observation, and communication skills)
- Focus Group Reports (Summarize participants insights into categories and themes)
- Use of results (Exploring needs, attitudes, and preferences)
Advantages of focus groups
1. Encourages open discussions
2. Provides rich qualitative data that reveals customers’ needs
Disadvantages of focus groups
1. Small sample sizes
2. Discussions can be influenced by dominant participants
3. Requires skilled moderators
Focus Groups should be used when the objective is to explore or describe but not to predict. It should be used also to gain consumer insights based on how the consumer reacts to a specific product. It should be used also to generate ideas.
Focus groups SHOULD NOT be used to predict the outcome due to small samples and should also not be used for Quantitative Analysis because focus groups do not provide standardized, measurable data.
Ethnographic Research
Ethnographic Research is a qualitative research which provides a descriptive study of a group’s behavior and characteristics. The purpose of this to gain a deep understanding of consumer behavior in a natural setting over a period of time.
Methods of ethnographic research:
1. Immersion (Researchers embed themselves in participants’ environments)
2. Participant Observation (Observing and interacting with participants in a natural setting)
3. In-Depth Interviews
4. Shopalongs (Accompanying shoppers to observe their behavior during shopping trips)
Advantages of Ethnographic Research
- Provides rich contextual insights into consumer behavior
- Captures real life interactions with products and services
- Help to identify unmet needs
Disadvantages of ethnographic Research
- Time intensive and requires prolonged observation
- Relies on researcher’s interpretation, which can be biased
- Mobile ethnography may miss unconscious behaviors
Marketing Research Online Communities (MROC)
Marketing Research Online Communities is a group of respondents that come together online to interact, provide opinions, and complete tasks for research purposes. Participants can interact by sharing photos, posts, and videos.
Key Benefits of MROC:
1. Cost effective and Flexible
2. Rich Data Collection (Captures multimedia responses like videos and posts)
3. Convenience for Participants
4. Targeted Insights
5. Global Reach
Challenges of MROC:
1. Engagement drop off (Participants may lose interest over time)
2. Representation Issues
3. Data Overload
4. Limited Moderator Control
There are additional qualitative techniques which are:
- Neuromarketing
- Projective Techniques/indirect approach
- In-Depth Interviews (IDI)
In-Depth Interviews (IDI) is one on one interview that explores the thoughts and behaviors of another. An advantage of this would be that it generates rich, detailed responses. And the challenge would be that it requires skilled interviewers.
Projective Techniques/Indirect Approach is used when an individual prefers not to be open to any formal discussion. Examples of this would be to use Third person Techniques, Word Association, and Completion Test.
Neuromarketing is the study of how the brain and body react to marketing, like ads or products. Key neuromarketing Techniques:
1. Neuroimaging like MRI
2. Eye tracking
3. Facial Coding

Chapter 6 (Survey Data Collection Methods)

A survey is a method used to collect information from people by asking them structured verbal or written questions. Surveys normally has fixed and standardized questions meaning everyone answers the same set of questions. Surveys are mainly descriptive, so it is used primarily for descriptive research.
Advantages of Surveys
- It is Standardized
- Efficiency (Quickly gather large data sets from broad samples)
- Easy Administration (It is simple for interviewers and respondents)
- Motivation Insights
- Quick Analysis
- Subgroup Insights
Disadvantages of surveys
- Validity loss (Fixed-response formats may fail to capture true beliefs)
- Limited Depth (May not capture complex opinions or feelings)
- Response Bias (Users may provide social responses rather than honest responses)
- Sampling Issues
- Question Misinterpretation
Sources of Error in Survey Methods: There are four types of errors when conducting a survey, these are:
1. Sampling Errors
2. Measurement Errors
3. Measurement Instrument Error
4. Processing Error
Sampling Errors happened when the sample used does not fully represent the population we want to study. There are three types of sampling error, Frame Error, Population Specification Error, and Selection Error.
- Frame Error occurs when the sampling frame is incomplete or inaccurate.
- Population Specification Error which happens when the wrong group is chosen for a survey. An example of this would be to include high school students in a survey about a liquor store
- Selection Error which happens when the sampling process isn’t done correctly.
Measurement Error this error occurs when the data that was collected doesn’t match what we need. There are two types of measurement Error, Surrogate Information Error, and Interviewer Error
- Surrogate Information Error occurs when the researchers collect information that doesn’t solve the problem
- Interviewer Error where the interviewer influences the respondents’ answers consciously or unconsciously. This could happen because of age, gender, or facial expressions.
Measurement Instrument Error which happens from a poorly design questionnaire. An example would be questions that are unclear or easy to misinterpret
Processing Error happens when survey data is transferred incorrectly to a computer. An example would be scanning a document incorrectly.
Data Collection Methods: Interviewer and Computer Technology
- When there is no computer but there is an interviewer, this method is called a person-administered survey. When there is a computer and an interviewer the method is called a Computer-Assisted (Person- Administered Survey).
- When there is no computer and no interviewer the method is called Self-Administered Survey. When there is an computer and no interviewer, this method is called Computer-Administered Survey
1. A Person-Administered Survey takes place in person or over the phone with no computer present. This method can take place at home (In-Home Interviews), at the mall (Mall-Intercept Interviews), in the office (In-Office interviews), and over the phone (Telephone Interview). Advantages of Person-Administered Survey are: it provides Feedback, Quality control and adaptability. Disadvantages of Person-Administered Survey are: Human Error, Slow Speed, High Cost.
2. Computer Assisted surveys has two types of surveys: CATI (computer assisted telephone interview) and CAPI (computer assisted personal interview). Advantages of Computer-Assisted Surveys
  1. Speed
  2. Error-Free Interviews (Computers ensure that there is no errors in question sequencing)
  3. Use of Image and Audiovisuals (Showing a video of a new product during a survey for feedback).
Disadvantages of Computer-Assisted Surveys
1. High Setup Costs
2. Technical Skill Requirement (interviewers need training to operate the systems effectively).
3. Self-Administered Survey is a type of survey that respondents can complete on their own without the aid of another person or from a computer system. Self-Administered Surveys can be conducted in different ways such as Group Self-Administered Surveys and Drop-Off Surveys, and Mail Surveys
Group Self-Administered surveys are conducted in a group setting.
Drop-Off surveys are delivered to the respondent for later completion and return.
Mail Surveys are sent via postal mail for respondents to complete and return. Advantages of Self-Administered Surveys
1. Reduced Cost (No need for interview or computer systems) an example would be paper surveys
2. Respondent Control (Respondents can complete the survey at their own pace).
3. Reduced Interview Evaluation (This is ideal for sensitive topics) for example an anonymous health behavior survey. Disadvantages of Self-Administered Survey
  1. Respondent Control (Risk of incomplete responses, errors, or delayed return)
  2. Lack of monitoring (No researcher available to clarify certain questions)
  3. High Questionnaire Requirements (Requires clear instructions about the survey) An example of this is a poorly designed questionnaire/survey may frustrate the respondent and reduce completion task.
4. Computer-Administered Survey is a type of survey that a computer ask questions and records the respondents answers. This type of survey does not require an interviewer Ways of Conducting a Computer-Administered Survey:
  1. Online Surveys
  2. Interactive Voice Response (IVR). An example of this would be Post-Call Customer satisfaction surveys, press 5 for very satisfied etc. Advantages of Computer-administered survey
    1. User Friendly, which means that they are easy design and use
    2. High Efficiency
    3. No Interviewer Disadvantages of Computer-Administered Surveys
      1. Requires Computer-Literate Responses
      2. Respondent Misrepresentation

Chapter 7 (Survey Design/Attitude Measurement)

Measurement is the process of quantifying as in assigning numbers to properties of objects like consumers, brands, stores, or advertisement. Key Characteristics of Standardized Measurements are Consistency and Uniformity.
Objects are the entities that is being studied during the research process.
Properties are the specific features of the object, examples could be Brand Loyalty, Customer Satisfaction.
Objective vs. Subjective Properties in Management Objective Properties are observable properties that are tangible and physically verifiable, examples are Age, Income, Number of bottles purchased
Subjective Properties are Unobservable mental constructs such as attitudes or intentions, examples are Customer Satisfaction, Brand Loyalty or Purchase Intent. The measurement scale for this would rating a product from 1 to 5.
Types of Measurement Scale There are four types of Measurement Scale:
1. Nominal
2. Ordinal
3. Interval
4. Ratio
- Nominal Scales is a type of scale that is used label or classify data into categories. Examples include:
  1. Demographic (Gender, Race, Religion)
  2. Behavioral Data (Brand last Purchased)
  3. Other categories like Occupation, buyer/nonbuyer This type of data is only used for descriptive research
- Ordinal Scales is a type of scale that allows users to rank order on a variety of products or brands. For example, ranking different shoe brands from 1 – 4.
- Interval Scale is a subjective type of scale that allows a respondent to rate a product’s features on a scale. An example of this would be rating Starbuck’s coffee taste on a scale of 1 to 5. NOTE: There is no 0 in interval scale
- Ratio is a type of scale that includes all the properties of nominal, ordinal, and interval scales with the addition of a true 0 point. An example of this would be “How many pairs of shoes do you have?”, the answer choices would be from 0 to 2 etc.
- Interval Scale is one of the four scales that is commonly used in marketing research, it measures constructs like customer satisfaction, brand loyalty, and purchase intentions.
A Likert scale is a type of scale that measures a respondents level of agreement or disagreement with statements, it is a mixture of ordinal and interval scale.
A Semantic Differential Scale is a type of scale that measures subjective perceptions like attitudes and emotions towards an object, concept or experience. Key features of Semantic Differential Scale are:
1. Bipolar Adjectives
2. Continuum of Intensity
3. Quantifiable results
4. Random flipping of objects (High price vs low-price, low-price vs high price)
A Stapel Scale is unipolar rating scale that measures respondents’ attitudes towards an object by using a single adjective and numerical scale. An example of this is: “How would you rate the helpfulness of Chase customer Service?”
$+5$ $(+4)$ $(+3)$ $(+2)$ $(+1)$ $0$ $(-1)$ $(-2)$ $(-3)$ $(-4)$ $(-5)$
The 0 represents Neutrality if included Slider Scales is an interactive graphical scale where respondents indicate their answers by dragging a slider to a specific value on a continuum. Advantages of Slider scales are:
1. Engaging
2. Efficient
3. Mobile-Friendly Disadvantages of Slider Scales:
  1. Learning Curve (May confuse some respondents
  2. Bias Risk
  3. Uncertain Quality
Issues with Interval scales:
1. Neutral option response
2. Symmetric vs. Nonsymmetric Symmetric Scales are balanced with equal positive and negative points. Nonsymmetric Scales focuses only on positive responses and omit negative responses.

Chapter 8 (Designing a Questionnaire)

A Questionnaire is a tool that is used to collect data by presenting standardized questions to respondents. There is a six-step process for designing a questionnaire:
1. Define Research Objectives
2. Develop Questions
3. Determine Question Flow (starts with a flow of questions from general to specific and ends with sensitive topics
4. Pretest the Questionnaire
5. Client Review and Approval
6. Launch Survey
The four Do’s of Question Wording:
1. Focused: Address a single issue or topic
2. Brief: Avoid unnecessary wording
3. Grammatically Simple: Use short sentences with one subject
4. Make your question Crystal Clear: Make sure your questions are understandable to the respondents.
The four Do Not’s of Question Wording
1. Leading Questions (Don’t you worry when using your credit card online?)
2. Loaded Questions (Should people be allowed to protect themselves from harm by using a taser in self-defense)
3. Double-Barreled Questions (Were you satisfied with the restaurant’s food and service?).
4. Overstated Questions (How much would you pay sunglasses that protect against UV rays known to cause blindness?)
Question Flow for a Questionnaire
The introduction is a critical part of the questionnaire design because it sets the stage for the survey. Five key functions of an Introduction
1. Who is doing the survey?
2. What is the survey about?
3. How did you select me?
4. Motivate me to participate?
5. Am I qualified to take part?
  - Who is doing the survey – the introduction should clearly state who is conducting the survey. This includes introducing the interviewer or any sponsor team. An undisguised survey, the sponsor is undisclosed and for an disguised survey, the sponsor is withheld to avoid influencing responses
  - What is the survey about? – The introduction should clearly state the general purpose of the survey in a simple way. Avoids using lengthy wordings.
  - How did you select me? – The introduction should explain why this respondent was chosen for example (you were selected at random).
  - Motivate me to Participate? – The introduction should politely ask for participation, for example (“Would you mind answering a few questions for me?”). Offer incentive if possible such as: monetary rewards, product samples or discounts. Address privacy concerns such as this survey will guarantee anonymity and confidentiality.
  - Am I qualified to take part? – The introduction should contain screening questions in order to see if the respondent qualifies for the survey.

Chapter 10 (R Studio)

R Studio – Is an Integrated Development Environment (IDE) for R, providing user-friendly interface.
R Studio has four main windows, each serving a unique purpose:
1. Script Files
2. Environment
3. Console
4. Misc
Script Files – The Script files saves your script, allows code and comments, and can have multiple files open at a time.
Environment – This feature holds your objects and can review history.
Console/Command Line – This feature can be used as a calculator, it does not save codes, and this is where your output is displayed.
Misc – This feature displays files in working directory, plots data when produced, and helps with searching of files.
Script Editor: This feature writes and saves R scripts. Great for longer code blocks. Workspace Environment: View saved data objects and command history. Console: Run R commands directly here. Results appear instantly. Files/Plots/Packages/Help: Access files, view plots, manage packages, and read documentation. Key Features of the Script Editor
The Script Editor (top-left window in R Studio) is where you write and manage code you want to keep and refine.
Key Features of Script Editor:
- Syntax Highlighting: Easily Distinguish code elements.
- Code Completion: Speeds up coding by suggesting code options.
- Multiple-File Editing: Switch between open scripts effortlessly.
- Find/Replace: Quickly search and replace text in scripts.
Workspace Environments
Workspace Environment (top-right window) displays your current R working environment, including any user-defined objects. Codes Used When Managing Objects
- $ls()$ – List all objects that the workspace environment has
- $rm(x)$ – Removes certain elements from the workspace environment
- $rm(list = ls())$ – Removes all objects from the workspace environment
User-Defined Objects: This includes:
- Vectors, Matrices, Data Frames, Lists, Functions Miscellaneous Displays
The bottom-right window in R Studio has multiple useful tabs:
- Files: Shows available files in your working directory.
- Plots: Displays any plots or graphics generated by your code.
- Packages: Lists all downloaded packages, including those currently loaded.
- Help: Search for help topics or view help documentation for commands.

Chapter 11 (R Basics 2)

tidyverse – Is a collection of R packages designed for data science. It allows users to write sample, readable, and efficient codes. It is essential for data-wrangling tasks throughout this course.
Core Tidyverse Packages o Library (tidyverse): Loads tidyverse, which includes:
- dplyr: Data Manipulation
- ggplot2: Visualization
- tidyr: Data tidying
- readr: Data Import
- tibble: Enhanced data frames
- forcats: Categorial variable handling
- stringr: String manipulation
- purr: Functional Programming
- lubricate: Data/Time Management
$head(mpg)$ - is a type of function that will show the first six lines of a dataset.
Pipe ( $%>%$ ) Operator This function is provided by tidyverse, and it’s called a “pipe” operator. This operator will forward a value, or the result of an expression, into the next expression. The pipe operator ( $%>%$ ) is read as “and then”
Why Use $%>%$ ?
- It improves readability: Code flows from top to bottom, like natural reading. easier to debug and modify.
- Reduce Complexity: Avoids deeply nested function calls.
- Increases Maintainability: Each step of the operation is clear and self- contained.
Transforming Data with dplyr
dplyr is part of the tidyverse package, it is designed for task like manipulating, sorting, summarizing, and joining data frames. It also uses clear and easy to read syntax which makes data transformation faster and less error prone.
$select()$ – The select function in the dplyr packages is used to reduce dataframe size to only desired variables for current task.
$mutate()$ – The mutate function creates new variables or new columns to existing data.
Why Data Visualization?
- It helps to understand patterns and trends
- Detecting Outliers
- Simplifying Complexity
ggplot2 – Is a package used to construct charts and makes plots of the data.

Chapter 12&13 (Descriptive Analysis)

Descriptive Analysis – Provides a summary of data to create an overall picture (e.g., average customer ratings).
Coding Survey Responses
Before starting any statistical analyses, we must code survey responses to numeric numbers, this process is called “Coding”. There are two types of questions that must be analyzed, these are : Closed-Ended Questions and Open-Ended Questions.
- Closed-Ended Questions are easier to code, responses are predefined, making is straightforward to enter and analyze data.
- Assign Numeric Values: Each response option is given a specific numeric code (e.g., 1= yes, 2 = no).
- Open-Ended Questions are more complex to code, responses vary widely, creating a lengthy list of possible answers.
- Qualitative Analysis: Coding requires categorizing or grouping similar responses, which is time-consuming and can introduce subjective interpretation.
Purpose of Descriptive Analysis
1. Provides an overview of the data, helping to summarize large datasets.
2. Sets the foundation for deeper analysis and insight generation.
Two Key Types of Measures describe the Information obtained in a Sample.
1. Measures of Central Tendency – these describe the “typical” respondent or response (e.g., mean, median, mode).
2. Measures of Variability – These describe how similar or different respondents or responses are to the “typical” ones (e.g., range, variance, standard deviation, Frequency/Percentage Distribution).
A frequency distribution is a table that shows how often each unique value appears within a data set.
A percentage distribution is derived by dividing each frequency by the total number of observations and then converting it to a percentage. This helps to express the relative proportion of each value in the data set.
Range – Identifies the distance between the lowest value (minimum) and the highest value (maximum) in an ordered set of values.
- $Range = Maximum – Minimum$
A Standard Deviation measures how much values vary around the mean.
- A low standard deviation means values are close to the mean, while a high standard deviation shows a greater spread.
- 68% - values fall within one standard