Data Processing Notes
Data Processing
Unit Roadmap
The unit covers the following topics:
- Introduction
- Data Processing
- Data Communication, Networking and the Internet
- Big Data and Data Analytics
- Social Media, Social Networking, Virtual Reality and Cyberspace
- Artificial Intelligence
- E-commerce
- Security
- Cyber Warfare
- The impact of digital technology
- Cultural, Ethical, Environmental and Legal Issues Relating to Computing
- Summary and Assignment
Scope and Coverage
This session will:
- Explain the difference between data and information
- Discuss the development of large-scale data processing systems
- Describe how data is processed using relational Databases
- Explain what the software crisis is and what measures can be taken to solve the crisis in software engineering
- Identify issues of privacy and accuracy
Learning Outcomes
By the end of this topic, students will be able to:
- Discuss the development of the Digital Computer and its characteristics
- Explain the difference between data and information
- Discuss the development of large-scale data processing systems
- Describe how data is processed using relational Databases
- Explain what the software crisis is and what measures can be taken to solve the crisis in software engineering
- Identify issues of privacy and accuracy
Recall Quiz
- The theory of computation involves understanding whether a computer can be used to solve a problem.
- The 3 methods of computational thinking are abstraction, decomposition, and algorithms.
- Decomposition is a computational thinking method that breaks down a problem into smaller, easier-to-manage problems
Data and Information
- Data: Raw facts, figures, numbers that have no meaning or make no sense (e.g., 10, $30, 65%, apples).
- Information: Data that has been processed by a computer so that it makes sense. Data with meaning.
- Example: A student achieved 65% in their computing exam.
- Example: An employee at a local coffee shop earns $30 per hour.
Personal Data
Examples of personal data include:
- First name
- Date of birth
- Place of birth
- Gender
Large Scale Data Processing
The Data economy is predicted to be worth 94.6 billion by 2025.
Large Scale data processing is the processing of a big volume of personal data
Group discussion points:
- What is Personal Data?
- What personal data is stored about you online?
- Are you concerned that these organisations process your data? What are your concerns?
- Why do organisations want your data?
Issues of Data Privacy and Accuracy
- It is important that data that is stored / processed is accurate.
- It matters who has access to your personal data.
Data Accuracy
Data accuracy is when the data stored or processed is error-free and the information used is reliable.
Inaccurate data has real-world implications across industries:
- Healthcare: Could mean making a fatal mistake in patient care.
- Retail: Could mean making costly mistakes in business expansions.
Causes of Inaccurate Data
- Ignoring quality of the data – companies are too busy selling / sharing / marketing to consider whether the data is accurate.
- Poor data entry practices – no validation or verification process to check that the data entered or stored in a system is accurate.
- It is important that data stored / processed is accurate and reliable to enable better decision making by organisations.
Data Privacy
- Once data is put into a computer it can easily be copied or shared.
- This means that people's personal private data is at risk and organisations should do their best to ensure it remains private
- Data privacy is concerned with proper handling, processing, storage, and usage of personal data / information.
- It is all about the rights of individuals regarding their personal information.
- Data protection laws exist to ensure that organisations comply with the law regarding data privacy.
Relational Databases
- Data can be structured or unstructured.
- Databases are used to store and process information in a structured format so that it can be searched and used for creating reports.
- Organisations use databases to store data in a table:
- A database where all data is stored in one table is known as a flat file
| Student ID | First name | Surname | Date of birth | Test result (%) |
| ---------- | ---------- | ------- | ------------- | ------------- |
| A0101 | Esther | Amos | 010704 | 67 |
| A0162 | Michael | Sale | 230404 | 70 |
| A0173 | Yasmin | Wallace | 030804 | 66 |
- A database where all data is stored in one table is known as a flat file
Limitations of Flat File Databases
Storing and processing data in a flat file database has many limitations:
- Potential duplication of data
- Harder to update due to data duplications
- Difficult and complicated to search for data
Relational Databases Defined
To overcome the limitations of a simple flat file database that has only a single table, another type of database has been developed called a 'relational database'.
- Stores data in more than one table.
- Records within the tables are linked (related) to records held in other tables.
- Each table has a unique identifier known as a primary key.
- The line between them shows there is a link (relationship) between the tables.
Advantages of Relational Databases
Separating the data into several related tables brings many advantages over a flat file database. These include:
- Data is only stored once.
- Complex queries can be carried out to search for data in the database using SQL.
- Better security – certain tables can be made confidential.
- It is much easier to update and change information in the database.
The Software Crisis
- The speed of hardware developments is growing at a much faster rate than software developments.
- As computer technology became more sophisticated, organisations wanted to solve more complex problems using large amounts of personal data.
- Programmers struggled to keep pace with these developments which led to the Software Crisis.
- The software crisis led to projects that were overbudget, delivered late, low quality and did not meet the organisation's requirements.
- This impacted not only the development of new software but also the maintenance of older systems that needed to be adapted / updated.
Definition
Software Crisis is a term used in computer science for the difficulty of writing effective and efficient software in the required time.
Causes of the Software Crisis
The software crisis was due to using the same workforce, same design methods, same tools at the same time that there has been an increase in software demand, the complexity of software, and software challenges.
Software Crisis=\Increase in demand + Increase in challenges + Increase in complexity\Same workforce + Same design methods + Same tools
Contributing Factors to the Software Crisis
- Poor project management.
- The need to store and process large amounts of personal data.
- Lack of adequate training in software engineering.
- Less skilled project members.
- Low productivity improvements.
- Increase in demands for software.
- Increase in complexity of the software.
Solutions to the Software Crisis
- The main causes of the software crisis were linked to the overall complexity of hardware and the software development process.
- There is no single solution to the crisis.
- One possible solution to a software crisis is to develop software using different software design methodologies that utilizes computational thinking to solve problems:
- Waterfall
- Agile
- Cyclical Model
Software Design Methodologies
Waterfall Model
- Defines definite steps that are completed one at a time.
- Each step has specific outputs that lead to the next step.
- You can return to a previous stage but you must then work your way back down through the following stages.
- The user / customer is involved at the beginning in the analysis stage but has limited input until the evaluation stage.
Stages
Feasibility stage → Analysis → Design → Implementation → Evaluation → Maintenance
Advantages
- Self-contained steps are easy to manage.
- Defined process and outputs per step.
- Good for managing large groups of developers.
Disadvantages
- Requirement changes mean going back to an earlier stage.
- Changes can be costly in time and money.
- Lack of customer involvement.
Cyclical Model
- Works through each stage in order but allows you to add new requirements after maintaining the software / system for a while.
- Each time a new requirement or feature is added you go back to the start and go through each stage in turn.
- This type of model is often used by smartphones as new features are added to their design and a new model is then released.
Agile Model
- The Agile development methodology is an iterative process using small multitasking teams of developers.
- The process starts by making a prototype, which users / customers give feedback on.
- This informs the changes to be made to the prototype which is then developed further.
- At every stage, the customer is involved and gives feedback as the prototype is developed into the full product.
Advantages
- The Agile model method of developing software is best suited to small groups of developers.
- Good for rapidly changing environments.
Disadvantage
- It is not a good development methodology for large products.
Summary Quiz
- Information is data with meaning.
- Large scale data processing is when an organisation processes personal data.
- A cause of inaccurate data is ignoring the quality of the data and poor data entry techniques.
- A relational data stores and processes data in more than one table that is linked.
- Software design methodologies include Agile, Cyclical, and Waterfall.
- The Agile model is good for small teams of developers.
- Iterative models include Agile and Cyclical.
Summary of data processing
- Data is raw facts and figures, whereas information is data with context (meaning).
- Data is processed by computers to help people / organisations to form judgements and to make predictions.
- Data accuracy is when data stored or processed is error-free. Issues with the accuracy of data can lead to costly mistakes for organisations.
- Causes for inaccurate data can include poor data entry and organisations ignoring the quality of data.
- Relational databases are one method used by organisations to store and process data in a structured format.
- These relational databases allow organisations to process data effectively.
- The software crisis meant that software developed did not meet expectations or requirements effectively.
- One possible solution to this crisis is to develop software using different design methodologies – the waterfall model, the cyclical model and the agile model to develop software.
Next lesson
In the next lesson, we will be looking at:
- Sharing data over distance: the Internet
- Bandwidth: constraints and enablement
- The world wide web: Technology and applications
- Web services
- Digital convergence: Telecoms and Computing
- Cloud Computing
- IoT – Internet of Things
- The ‘Dark Web’