1/103
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No study sessions yet.
What does it mean that biology is now a “data science”?
Modern biology relies heavily on collecting, analyzing, and interpreting large datasets using computational tools
How are computers involved in modern biological and medical research?
Computers are used in every step, including experimental design, data collection, data analysis, visualization, and interpretation
What is a computer, according to the lecture definition?
A machine that stores information, manipulates information according to rules, and produces new information as output
What are the three basic steps everything a computer does reduces to?
Input, processing, and output
What are binary digits (bits)?
The smallest units of information in a computer, represented as 0 or 1
Why do computers use binary instead of continuous values?
Binary is robust to noise and easy to represent physically as on/off or high/low voltage states
What is the role of the central processing unit (CPU)?
The CPU executes instructions, including arithmetic, logical comparisons, and data movement
What does it mean that modern CPUs have multiple cores?
They can perform multiple operations in parallel, increasing computational speed
What is random access memory (RAM)?
Temporary, fast-access memory that stores active data, variables, and intermediate results
Why is RAM considered volatile?
Its contents are lost when the computer is powered off
What is long-term storage used for?
Storing data and programs permanently, even when the computer is powered off
How does storage differ from RAM?
Storage is much larger and non-volatile but significantly slower to access than RAM
What is the operating system (OS)?
Software that manages hardware resources, schedules programs, allocates memory, and handles files and errors
What is code in the context of computing?
A precise, human-readable set of instructions that tells a computer exactly what to do
What is R and how does it compare to other languages?
R is a programming language designed for statistics, data analysis, and visualization; compared to other tools, it excels at reproducible analysis and scientific graphics, while other languages like Python are more general-purpose and Excel is not reproducible
What is swirl in R?
An interactive R package used to learn R by typing responses directly into the console
Where is swirl used in RStudio?
Only in the Console pane, not the Script pane
What do file extensions define?
The type of data contained in a file and which programs can open it
What are the four fundamental data types in R?
Numeric, character, logical, and missing values (NA)
What is an object (variable) in R?
A container that stores data and has a name, value, and data type
How are objects created in R?
Using the assignment operator <-
What does the class() function do?
It returns the data type of an object
What is a vector in R?
An object that stores a sequence of values of the same data type
How are vectors created in R?
Using the c() function
What is indexing in R?
Accessing elements of a vector using square brackets
What types of values can be used for indexing?
Numeric indices or logical values
What is a function in R?
A small program that takes inputs called arguments and returns output
How can ranges of values be created in R?
Using the colon operator or the seq() function
What are error messages in R?
Messages produced when R cannot run a command due to syntax or logical problems
How do warnings differ from errors in R?
Warnings do not stop the program but indicate something unusual may have occurred
What does the error “object not found” mean?
R cannot find an object with that name in memory
What causes errors from using the wrong data type?
Providing a function with a type of data it cannot operate on
Why does forgetting quotes around strings cause an error?
R interprets unquoted text as an object name instead of character data
What causes missing or extra parentheses errors?
An unmatched opening or closing parenthesis in a command
What are the two key functions for using packages in R?
install.packages() to install a package and library() to load it into memory
What are SNP tables used for in biology?
To organize genotype data with rows as samples and columns as genomic loci
What does geographic occurrence data represent in a matrix?
Locations arranged in rows and columns corresponding to geographic points or samples
What is gene expression data commonly stored as in R?
A matrix with rows as samples and columns as genes
Define a matrix in R.
A 2D array where all elements must be of the same data type
What does SNP stand for and mean?
Single Nucleotide Polymorphism; a position in the genome with variation among individuals
What is polymorphism in genetics?
The existence of multiple alleles or genetic variants within a population
What distinguishes gene expression matrices from other data?
They quantify gene activity levels across samples, arranged in matrix form
What is a data frame in R?
A 2D data structure that can hold different data types in each column
How do you create a data frame in R?
Using the data.frame() function with vectors as columns
hat is the purpose of the $ operator in R?
To access a specific named column in a data frame
How do you index elements in matrices or data frames?
Using square brackets [row, column]
What are logical operators used for in R?
To evaluate expressions and return TRUE or FALSE values
What is a logical expression?
A statement in R that evaluates to TRUE or FALSE
Name some common comparison operators in R.
== (equal), != (not equal), <, >, <=, >=
What does the logical AND operator (&) do?
Returns TRUE only if both sides of the expression are TRUE
What does the logical OR operator (|) do?
Returns TRUE if either or both sides of the expression are TRUE
Why should you avoid recycling vectors in logical operations?
Because recycling shorter vectors can cause unintended comparisons and warnings
How can logical expressions be useful when working with biological data?
For filtering datasets based on conditions, such as genotypes or expression thresholds