data
raw numbers, letters, symbols, sounds, or images with no meaning
Information
data that has been given meaning
Direct data
data collected for a specific purpose or task and is used for that purpose and that purpose only
Indirect data
data obtained from a third party and used for a different purpose to that for which it was collected
Questionnaires
a set of questions that are easy to distrubute, complete and collect. can be completed on computer or paper - direct data source
Interviews
a formal meeting between two people. gives the interviewee the opportunity to expand on their answer - direct data source
Observation
data collectors watch what happens in a given situation - direct data source
Data logging
using a computer and sensors to collect data - direct data source
Weather data
data that comes from weathering stations and later sold to other companies -indirect data source
Electoral registor
a list of adults entitled to vote in an election, contain personal information such as name, address, age, etc. -indirect data source
Data brokers
businesses collecting data from third parties. usually sold without the indivisuals knowledge - indirect data source
research
from textbooks journals and websites - indirect data source
census
usually carried out by the government to determine the number of people in the country and information collected about them, usually collected in the form of a questionnaire. -Indirect data source
advantage of direct data
we know how reliable it is because we know where it originated
disadvantage of indirect data
we may not know where the data originated and it could be that the source is only a small section of that group rather than a cross section of the whole group (sample biasing)
disadvantage of direct data
because of cash and time restraints, the sample or group size may be small whereas [blank] data sources tend to provide larger sets of data
advantage of indirect data
the person collecting the data may not be able to gain physical access to particular groups of people (usually due to geographical reasons) whereas [blank] data sources allows data from such groups
advantage of direct data
the person collecting the data can use methods to gather specific data even if the data is obscure
disadvantage of indirect data
the data might be so obscure it has never been collected before
disadvantage of direct data
it may not be possible to collect original data due to the time of the year while in [blank] data, historical weather data is available irrespective of time of the year
advantage of direct data
the gatherer only needs to collect as much or as little data as neccessary
disadvantage of indirect data
irrelevent data may need to be removed
disadvantage of direct data
by the time all the required data has been collected it may possibly be out of date so an indirect data source could have been used
advantage of direct data
may be opportunities to sell the data later on, reducing the expenses of collection
disadvantage of direct data
the collection of data may be more expensive than using an indirect data source as many people would have to be payed for the collection
accuracy
factors that affect quality of information 1
relevance
factors that affect quality of information 2
Age
factors that affect quality of information 3
level of detail
factors that affect quality of information 4
completeness of information
factors that affect quality of information 5
risk of not encrypting your data
identity theft, cyber fraud, ransoming. if related to company secrets could be sold to rival companies
process of encryption
plaintext → encryption algorithm + encryption key → ciphertext → decryption algorithm + decryption key →plaintext
symmetric encryption
involves the sending & recieving computer having the same key to encrypt and decrypt data. it is very fast but not as secure
assymetric encryption
involves 2 keys, 1 public which is distributed among many users or computers to encrypt the data. while a private key is only available to recieving computers, and used to decrypt the data
secure socket layer/transport layer security
enables encryption
enables authentication
makes sure data has not been corrupted or altered
ensures websites meet the payment card industry data security standard (PCI DSS)
improves customer trust
internet protocol security (IPsec)
an extension to the IP layer to provie authentication, integrity, and confidentiality of data.
the use of SSL/TLS in client server communication
used for applications to be securely exchanged over a client server network, such as web browsing and file transfers
the use of IPsec in client server communication
used to protect confidential data transmitted across a network within businesses, such as financial transactions or medical records. mostly used in VPN
data protection
use of encryption 1
system encryption
use of encryption 2
hard disk encryption
use of encryption 3
email encryption
use of encryption 4
encryption in HTTPS websites
use of encryption 5
validation
ensures that data is reasonable and sensible but not necessarily correct. always done by a computer.
presence check
make sure data has been entered in certain fields
range check
makes sure numeric data falls betweens a minimum and maximum value
type check
ensures data is a particular data type
length check
ensures data is a certain number of characters. not used on numeric fields. can be set to a range
format check
makes sure the data follows a certain pattern
check digit
an extra calculated digit is added to the data and when this check recalculates it. if the digit is the same the data goes through
lookup check
compares the data that has been entered to a limited number of valid enteries
consistency check
checks the data across fields is consistent (e.g someone born in 2010 can not have their grade level be grade 12)
limit check
like range check but with only one maximum or minimum boundary
verification
ensures data has either been entered accurately by a human or that is has been transferred accurately from one storage medium to another.
visual checking
the person entering the data visually compares data they have entered with that on the source document. this can also be done with 2 people, 1 to type and 1 to double check
this is time consuming and costly as a result
double data entry
involves entering the data twice, the first version is stored and the second entry is compared by a computer and the person is alerted to any differences
alternatively, another person could enter the data the second time and is alerted of any differences as they are typing
Parity check
data is being transmitted from 1 device to another,
the sending device counts the number of 1s in each byte
if the number of bytes is even, it sets the parity bit to 0 and adds this to the end of the byte
if it is odd, it sets the parity bit to 1 and adds it on
the other device recieves and checks if the number of 1s is even, if not, the data has been altered
checksum
used for WHOLE files of data. not bytes
can be any calculation ( such as sum of the number of bytes in the file)
the problem with it is that it can not detect a change in position of bytes
the digit is added to the end of the file and recalculated after the transition to make sure nothing has changed
hash total
a calculation performed using the data before it is sent, then recalculated, like checksum
however the digit is usually found by adding up all the numbers in a specific field or fields in a file usually with non-numeric values
control total
calculates in the same exact way as hash total except there is no need to convert alphanumeric fields to numeric because it is only carried out on numeric fields
advantage of validation & verification
make systems more accurate, ensuring the data is copied accurately and ensuring that the data is sensible
disadvantage of validation and verification
both tend to slow down the processing of data with neither method checking if data is correct
batch processing
effective for processing large amounts of data
entered and processed altogether in 1 batch and requires little to no human interaction
used in payrolls and billing systems
master file
file that contains all the important data that does not change often, such as names, workers number, hourly rate
transaction file
file that contains data that changes each week such as hours worked