1/138
Flashcards designed to cover key concepts related to information systems and data management based on the lecture notes.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
Wisdom
Creatively assess knowledge to develop innovative policies and procedures.
Knowledge
Use of information to determine reasons for consistent downward trends in sales.
Information
The way data is portrayed, giving insight into the underlying trends.
Data
Raw figures taken from information.
Bits/Bytes
Storage is measured in bytes, while speed or bandwidth is measured in bits. There are 8 bits to 1 byte, or 60 bytes to 480 bits.
Big Data
High volume, 80-90% of data is unstructured data particularly in text formats.
Diminishing Data Value
The principle that the value of data diminishes rapidly over time, exemplified by the 90/90 Data Use Rule.
90/90 Data Use Rule
A rule stating that 90\% of data captured is never used, and of the 10\% that is used, 90\% is used within 90 days of capture.
On-demand self-service
Consumers can unilaterally provision computing capabilities.
Broad network access
Capabilities available over the network, accessible through standard mechanisms.
Resource pooling
Computing resources are pooled to serve multiple consumers via a multi-tenant model.
Rapid elasticity
Capabilities can be quickly provisioned and released to scale according to demand.
Measured service
Automatically controls resource use by leveraging a metering capability.
OLTP (Online Real Time Processing)
Processes each transaction as it occurs, offering immediate results.
Batch Processing
Collects all transactions for a set period, then processes and updates data.
Transaction Processing Systems (TPS)
Information processing divided into distinct, undividable operations called transactions. Used in functional business systems for areas like production/operations (tracking materials), marketing/sales (managing sales orders), human resources (processing payroll), and finance/accounting (processing credits/debits).
Analytics Environments
Systems designed for analyzing historical data to make business decisions, such as analyzing product defect rates in production or identifying profitable customer segments in sales.
ETL (Extract, Transform, Load) Systems
Processes for extracting, cleaning, and loading data into a data warehouse so analysts can run reports from an analytics environment to make decisions. This involves: 1. Extract: Pulling data from transaction systems. 2. Transform: Cleaning data and calculating tools. 3. Load: Inserting data into the data warehouse.
4 V's of Data Analytics
Refers to Veracity, Velocity, Volume, and Variety:
Variety: The analytical environment has expanded to include big data and unstructured sources beyond enterprise systems.
Volume: Involves the analysis of large columns of structured and unstructured data.
Velocity: Defines the difference between effective and ineffective analytics by the speed of access to reports and how quickly the database can respond, capture, and return an answer.
Veracity: Validating data and extracting insights that managers and workers can trust are key factors for successful analytics, emphasizing accuracy.
Circuit Switching
An older communication technology, originating with telephone calls, that is inefficient for digital transmission. It provides constant/predictable connections and is best for voice calls.
Packet Switching
A more efficient method for transferring data or voice, where files are broken into sequentially numbered packets, routed individually to their destinations, and then reassembled into their proper sequences upon reception.
Bandwidth/Broadband
The maximum number of bits per second that can be transmitted, used to measure network speed and capacity.
Quality of Service (QoS)
A set of technologies that manage network traffic to ensure the performance of critical applications. Key aspects include:
Latency-Sensitivity: Pertains to real-time data like voice and high-quality video that requires immediate delivery.
Prioritizing Traffic: Giving priority to data and applications that are time-delay sensitive (e.g., Zoom calls, stock trading apps).
Throttle Traffic: Holding back other types of traffic to give latency-sensitive applications priority (e.g., unlimited data plans throttled after a certain usage).
Traffic Shaping: The ability to prioritize and throttle network traffic to control bandwidth and streaming for various services (e.g., email or file servers).
Centralized Database Architecture
All data resides in one location, typically a single server or site. Offers a single point of control, is fast locally but slow globally, simple to implement, low cost, and provides better data quality control and IT security.
Distributed Database Architecture
Data is split and stored across multiple locations or servers, ideal for Big Data and global systems like Facebook's user data. It is fast locally everywhere but complex to manage and expensive, allowing both local and remote access using client/server architecture to process requests.
Descriptive Analytics
Creates a summary of historical data, yielding useful information and preparing data for further analysis, describing movement within a given period.
Predictive Analytics
Uses data analytics methods to model and make predictions, forecasting unknown events from existing data.
Prescriptive Analytics
Utilizes optimization technology and machine learning to find the best course of action among available choices, given specific parameters.
Private Cloud
Cloud service exclusively used by a single organization for high security needs.
Public Cloud
Open use by the general public, can be owned or managed by various organizations.
IaaS (Infrastructure as a Service)
Renting virtual computing resources like hardware and software (virtual computers, storage) that power computing resources/systems (e.g., EC2, S3) and include aspects of computing resource, network, and security management.
DaaS (Data as a Service)
Data shared among clouds, systems, and applications regardless of the data source or storage location, often delivered via APIs (e.g., weather data APIs, stock price APIs, Google Maps data).
PaaS (Platform as a Service)
Provides a platform allowing customers to develop, run, and manage applications without the complexity of building and maintaining infrastructure.
SaaS (Software as a Service)
Delivers software applications over the internet on a subscription basis, managed by a third-party vendor.
Hybrid Cloud
A cloud computing environment that combines a public cloud and a private cloud, allowing data and applications to be shared between them.
Software Defined Data Center (SDDC)
An architecture that integrates various infrastructure silos to optimize resource use, balance workloads, and maximize operational efficiency by dynamically distributing them. The goal is to decrease costs and increase agility, policy compliance, and security.
Virtualization
Technology that allows running multiple 'virtual computers' on one physical machine, typically using a hypervisor. The virtualization layer tracks how virtual cloud infrastructure resources are used.
Virtual Machines
Implementations of virtualization that are actual computers created using virtualization capabilities. They enable rapid deployment and scaling, allowing a single physical server to host multiple VMs (e.g., VM1: Windows + Web Server, VM2: Linux + Database, VM3: Another OS + App).
Database Management System (DBMS)
Software designed to organize and administer databases.
Data Warehouse
A large, central repository of data collected from various sources within an organization, used for reporting and data analysis.
Data Lake
A vast pool of raw system data, the purpose for which is not yet defined, often used for Big Data analytics.
Bluetooth
Short-range wireless communication allowing device pairing, typically used for Personal Area Networks (PAN).
Wi-Fi
A standard way to wirelessly connect computing devices through routers, commonly used for internet access.
Near Field Communication (NFC)
Enables two devices within close proximity to establish a communication channel and transfer data using radio waves. It's considered more secure than some other wireless technologies (e.g., Apple Watch, digital tickets, public transit via phones).
Cybersecurity
Threats posed via the Internet requiring protective measures. It encompasses understanding and mitigating various risks.
Vulnerability (Cybersecurity)
A gap in IT security defenses that can be exploited.
Incident (Cybersecurity)
An attempted or successful unauthorized access.
Data Breach (Cybersecurity)
A successful retrieval of sensitive information.
Intrusion Detection Systems (IDS)
Scans for unusual or suspicious traffic, acting as a passive defensive measure.
Intrusion Prevention Systems (IPS)
Takes immediate active defensive action, such as blocking specific IP addresses, in response to suspicious activity.
Antivirus Software
Detects malicious codes and prevents users from downloading them.
Firewalls (Security)
Filter network traffic and control what enters and exits a network. Security is an ongoing, unending process.
Risk Management (Formula)
Calculated as Risk = Probability~of~threat \times Total~cost~of~harm.
Probable Maximum Loss (PML)
Calculated as PML = Probability \times Total~cost~of~harm, used to estimate the worst-case financial impact of an event.
Cloud Computing
Model where computing services are provided over a network.
Concerns regarding businesses using the cloud
Key concerns include:
Downtime: Caused by maintenance, system failure, natural disasters, or outside intrusions/attacks.
Security: While providers align with standards, a certain level of risk always exists.
Limited Control: Cloud infrastructure is owned, managed, controlled, and monitored by the provider.
Vendor Agreements: The Cloud Service Agreement (CSA) dictates customer rights.
Search Engine Optimization (SEO)
A process aimed at increasing website visibility in search engine results. It involves:
ON-PAGE (Directly Controlled): Factors like content quality, relevance, up-to-dateness, functionality & programming (responsiveness, load time, secure connection, metadata, click-through rate, keyword connection). Content is often considered 'KING' (HTML responsible).
OFF-PAGE (Influenced but not directly controlled): Factors such as relevance and credibility (backlinks), click-through rate (CTR), dwell time (how long a user stays on a page), and personalized search (location-based, past history, social experience). The goal is organic search listings through quality content and functionality.
PPC Marketing Metrics
Metrics used to evaluate Pay Per Click campaigns:
Click Through Rate (CTR): Evaluates keyword selection and ad copy decisions, calculated as #~of~clicks / #~of~impressions.
Return on Advertising Spend (RoAS): Measures overall financial effectiveness, calculated as (total~revenue - budget)~OR~(total~revenue / total~ad~spend).
Conversion Rate: Indicates success in leading to sales, calculated as customers / clicks.
Cost of Customer Acquisition (CoCA): The amount spent to attract a paying customer, calculated as total~budget / #~of~customers~got~w/~the~budget.
Digital Demography
Categories of consumers based on their relationship with digital technology:
Digital Immigrants: Older consumers who view retail channels as separate and distinct.
Digital Natives: The first generation surrounded by digital devices and internet connectivity.
Digital Dependents: The emerging generation growing up with broadband and constant connectivity, placing greater demands on retailers for technology use and integration.
Search Engine
An application for locating webpages or other content on a computer network.
Crawler/Spider Search
A search method that uses automated bots (spiders/web bots) to index the web through repetitive scanning, returning information to be stored in a page repository.
Spiders/Web Bots
Small computer programs designed to perform automated, repetitive tasks over the internet, scanning webpages and returning information.
Semantic Search
The process of getting search results beyond an exact keyword match, where the search engine tries to understand the user's intent and context.
SERP (Search Engine Ranking/Results Page)
The page displayed by search engines in response to a user's query.
Basic Search Types
Categories of user search intent:
Informational: User is looking for information or answers.
Navigational: User is looking for a specific website.
Transactional: User is ready to complete an action (e.g., purchase, download).
Mobile: Searches conducted on mobile devices, often more transactional in nature.
Components of the Web Search Process
WWW (spiders/crawlers)
Page Repository
Indexer Module & Collection Analysis Module
Indexes (Text, Structure, Utility)
Query Formulation
Ranking
Results to User
Hacking (Intentional Forms)
Unauthorized access to or manipulation of computer systems. Includes: Phishing, Malware (Spyware, Adware, Ransomware, Trojan horses), DDoS, Botnets, APTs, Insider threats, Physical theft.
Phishing
A social engineering technique used to steal credentials, often through deceptive emails.
Malware
Malicious software designed to damage or gain unauthorized access to computer systems.
Spyware
Tracking software designed to monitor user activity without explicit consent.
Adware
Software that automatically embeds advertisements, often unwelcome ones, into a user's web browser or other applications.
Ransomware
Malicious software that blocks access to a computer system or data until a payment is made.
Trojan Horses
Malicious programs that rely on social engineering and user interaction by disguising themselves as legitimate software.
DDoS (Distributed Denial-of-Service)
An attack that crashes a network by bombarding it with excessive traffic from multiple compromised computer systems.
Botnets
A group of infected computers (referred to as 'zombies') controlled remotely by a 'botmaster', often used for DDoS attacks or sending spam.
APT (Advanced Persistent Threats)
Sophisticated, long-term espionage attacks often launched through phishing to gain and maintain unauthorized access to a system.
Insider Threats
Security risks posed by internal employees, often involving data tampering or misuse of privileges.
Physical Theft
The act of stealing physical assets, such as devices containing sensitive data.
Unintentional Hacking Forms
Security risks not caused by malicious intent, including human error (e.g., poorly designed systems, faulty programming, unaware users, neglecting to change passwords) and environmental hazards (e.g., natural disasters, faulty HVAC systems).
White Hat Hacker
A computer security specialist who tests security systems for vulnerabilities with ethical intent, helping to improve security.
Black Hat Hacker
An individual who finds and exploits vulnerabilities for personal or financial gain, acting with malicious intent.
Gray Hat Hacker
An individual who may violate ethical standards but typically without malicious intent, often operating in a questionable legal or ethical area.
Hacktivist
A hacker-activist who supports social or political causes through hacking activities, with mixed perceptions regarding their actions.
BYOD (Bring Your Own Device)
Employees use their personal devices for business purposes. Benefits include cutting business costs (no need to purchase/maintain employee devices). Security risks include weak authentication, lack of access controls and encryption, potential for theft or loss, and connection to mission-critical data and cloud services.
Social Engineering
Techniques aimed at manipulating a target into revealing specific information or performing actions for illegitimate reasons, exploiting human psychology rather than technical vulnerabilities (e.g., phishing emails, pretexting, baiting).
Artificial Intelligence (AI)
The simulation of human intelligence processes by machines, especially computer systems, including learning, reasoning, and self-correction; the ability of a machine to imitate intelligent human behavior, mimicking reason.
Machine Learning (ML)
An application of AI where a system automatically learns and improves from experience. The process involves: Input → Human in the Loop → Training → Output, where humans tell the system what features to learn.
Deep Learning (DL)
A subset of Machine Learning that uses complex algorithms and deep neural networks for feature extraction and pattern recognition. The process involves: Input → Feature Extraction + Training → Output, with no human in the loop. The system learns autonomously (e.g., learning different kinds of trees after being told what a tree is).
Key Differences between Machine Learning and Deep Learning
In Machine Learning, humans explicitly tell the system what features to look for, while in Deep Learning, the system automatically extracts features and learns patterns on its own.
Smart Contracts
Self-executing contracts where terms are directly written into computer code. They provide a business logic layer prior to block submission and can be simple 'if-then' requirements or complex multi-participant processes, automatically executing transactions based on predetermined conditions.
Consensus Mechanisms
Protocols that ensure the next block in a blockchain is the one and only version of truth, preventing powerful adversaries from derailing the system (e.g., Proof of Work (PoW), Proof of Stake (PoS)).
Blockchain
A distributed, decentralized, and immutable ledger system that records transactions across many computers, ensuring security and transparency. Its components include:
Decentralized Peers: Not a 'hub and spoke' centralized network; a peer-to-peer decentralized network where each node has a copy of the ledger, keeping each other honest.
Distributed Ledger: A sequential chain of data blocks that records transactions, establishes user identity, and contracts.
Immutability: Guaranteed by a hash function where each block contains its number, a Nonce, data (transaction records), and the hash of the previous block (creating the link). A valid hash often starts with leading zeros (e.g., 0000…). Any change to a block breaks the chain and requires re-mining all subsequent blocks.
Consensus Mechanisms: Ensure the next block is the only version of truth and prevent adversaries from compromising the system.
Benefits of Blockchain
Key advantages of blockchain technology include:
Immutability: Historical records cannot be altered.
Transparency: All participants can see all transactions.
Decentralization: No single point of failure or control.
Security: Achieved through cryptographic hashing and its distributed nature.
Traceability: Transactions can be traced back to their source (important for NFTs and ownership).
Data Mining
The process of discovering patterns and insights from large datasets, often using statistical and machine learning techniques.
Business Intelligence (BI)
Technologies, applications, and practices for the collection, integration, analysis, and presentation of business information to support better business decision-making.
Internet of Things (IoT)
A network of physical objects embedded with sensors, software, and other technologies for the purpose of connecting and exchanging data with other devices and systems over the internet.
API (Application Programming Interface)
A set of commands and programming standards that allows developers to write applications that communicate with other applications. APIs are often included in SDKs.
SDK (Software Development Kit)
A collection of software tools for writing applications that run on a specific device or platform, containing multiple tools including APIs.
AJAX Technology Stack
Technologies (HTML, CSS, JavaScript) that make webpages respond to user actions without reloading the entire page, creating more accessible webpages:
HTML: Provides the content structure.
CSS: Handles the formatting and styling.
JavaScript: Implements functionality and interactivity.
Cluetrain Manifesto
A revolutionary way of thinking about the Web, positing that 'Markets are Conversations.' It emphasizes understanding how people behave and think, advocating for successful companies to engage customers in conversations rather than broad broadcasting.