INFO303

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/151

Earn XP

Description and Tags

Semester one course

Computer Information System

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

152 Terms

New cards

What is an API

Set of rules and instructions (Requests/Responses) that different systems/apps can use to communicate with each other.

* Make it possible for different programs and apps to work together, share data, and accomplish things even if they were built by different people or companies.
* Exposing some of the programs internal function to the outside world, to make it possible to share data and take actions (signing up with Facebook/Gmail account).

New cards

Benefit of APIs

let developers use features from existing services w/o having to build them from scratch.

* Like borrowing ready-made tools to make creating new applications faster and easier.

New cards

Approaches to creating APIs

- Design first: API is designed first using a service like Swagger or API Builder, this makes sure that it is highly consistent and reusable.
- Code first: code is written first then a tool is used to generate the API specification from the code.

New cards

Middleware

Simplifies building distributed applications by providing ready-made tools and features that hide complex details.

* Devs can focus on their tasks without dealing with every technical aspect of the application. Makes it easier to build software to call an API.

New cards

Types of middleware

* Java RMI (Remote Method Invocation) - allows a program to call *methods* on Java objects running on other computers
* SOAP web services - allow *procedures* running on other computers to be called.
* REST web services - allow remote *resources* to be created, queried, updated or deleted.
* Message brokers - special servers that allow multiple clients to send messages to each other (*asynchronously).*

New cards

Benefits of Middleware

Takes care of the lower level aspects, so devs can focus on building important aspects of their applications.

* saves time and effort by handling tasks like data transfer, communication protocols, and managing connections between different parts of an application.

New cards

Microservices

\
* a way of building an application by dividing it into smaller, self-contained parts. Each service, can run independently and communicate with other services using a simple and lightweight method.
* Each service can be updated or changed without affecting the other services.

New cards

Microservices Pros

* Ravioli code: are like individual compartments in a box of ravioli, easier to understand and work on different parts of the system without getting tangled up in complex connections.
* Independent and flexible: each micro-service operates on its own. It can be deployed and updated separately, w/o affecting other services.
* Technology diversity: devs can pick the best tools for each micro-service.

New cards

Microservices Cons

* Harder to program: handle communication between services.
* Remote calls are slower than internal method calls and they have a risk of failing.
* Consistency problems: Changes made to one service may not be immediately reflected in other services, leading to potential consistency problems.
* Operational complexity managing deployment, monitoring, and scaling multiple services adds another layer of complexity. (need for experienced developers)

New cards

Motivation for REST web services

Scalability
Users might want to use your services for other services which they can do with this web service

It allows for it to be easily scaled if heaps of users come in 10 years time that you cannot plan for

New cards

safe and idempotent in http

Safe: it shouldn’t have any side effects on the server or the data it operates on. (read)

Idempotent: send it multiple times and it won't do anything bad

New cards

GET

Used for requesting and retrieving data across the web

A safe and impotent operation

* send it 10x and it won't do anything bad

New cards

POST

Used for uploading / submitting a file or form to the web

Not safe or impotent

* alters data and submitting it multiple times can display different outcomes.

New cards

PUT

Used for updating something at a certain URI

Idempotent but not safe

* same request produces the same result, while not safe means it can have side effects on the server or dat.

New cards

DELETE

Used for removing something from the web

Impotent but not safe.

* It can have side effects on the server or data, potentially causing permanent deletion/modification of resources.

New cards

Addressability (REST principle)

Each resource in a web service has a unique URL, making it easy for clients to locate and interact with specific information and functionalities. It enhances organisation and accessibility.

e.g., For an online store, each product has its own address (URL). This lets clients easily access and perform actions on specific products

New cards

Statelessness (REST principle)

Every HTTP request to the server contains all the needed information, so the server doesn't remember any past interactions or session data from the client.

e.g., users must provide their login credentials with each action they take, and the server verifies these credentials for authorisation before processing their request.

New cards

Uniform Interface (REST principle)

Web services use a common language (methods like GET, POST, etc. and data formats like JSON or XML) to communicate and share information. This makes it easier for different systems to understand and work together.

New cards

Connectedness (REST principle)

Hyperlinks in the API responses make it easier to explore and access related resources, providing clear navigation and interaction guidance.

e.g., including hyperlinks, server guides clients on how to navigate the application's functionality and access relevant resources.

New cards

Statelessness benefits

The server never has to worry about clients timing out as interactions only last one single request

Server never loses track of where each client is because the client sends all necessary info with each request

New cards

Relational database

have a fixed structure called a schema and use SQL for managing data. They are easy to work with, have predictable organisation, minimal data redundancy, and support efficient querying using SQL.

New cards

Semi-structured (data organisation)

Standard structure or format but the schema is flexible and is sometimes self-describing.

Structured text like JSON, XML, YAML

New cards

Unstructured (Data Organisation)

Implies no specific schema or data model. Free form text or binary data (Word, PDF)

New cards

Data lake

a storage repository that holds a vast amount of raw data in its original format until the business needs it. No structure at all.

Can easily become a "data swamp" with no meta-data, irrelevant data, no automation and poor cleaning

New cards

Parallel processing

Computing multiple parts of a task at the same time to increase performance. Any part that cannot be done in parallel will become a bottleneck. The data for each parallel process should be stored locally.

New cards

Amdahl's Law

A rule stating that the performance enhancement possible with a given improvement is limited by the amount that the improved feature is used. It is a quantitative version of the law of diminishing returns. Applies to any part of the system that cannot be done in parallel. Diminishing returns for throwing hardware at a problem

New cards

Batch processing

- Occurs at regular intervals (monthly, every GB, every 5 million clicks)
- Data is not always up to date
- Processing is not as time-constrained
- Probably a better option if data comes in chunks already

New cards

Real-time processing

- Processed as it arrives
- Tight time constraints
- Immediate response needed
- Susceptible to 'event storms'
Needed for data streams like news feeds etc.

New cards

Operational Databases

Day to day operations, transactional databases.
- Relational
- Mostly update commands used on it
- Simple and predictable queries
- High level of detail in the transactions
- Functional and process oriented

New cards

Analytical databases

For management decision making
- Relational and non-relational, denormalised
- Data warehouses
- Complex, ad hoc queries
- Data Aggregation on variables (dimensions) based on facts
- Lower level of detail, looking at a broader scope

New cards

Data warehouse

- Designed and optimised for analytical purposes
- Schema is very different from an operational database
- Organised around business metrics and facts
- analysis conducted through dimensions
- Aggregated at many levels of detail, caring about the trends in the data not so much individual transactions
- Usually are massive (Petabytes)

New cards

Traditional DB vs Data Warehouse

Timespan - Op DBs focus on current transactions. DWs take a longer view

Granularity - Op DBs have a fine level of detail. DWs aggregate at various levels; may only include aggregate data

Dimensionality - Operational tables are "flat". DWs aggregate data by many dimensions

Op DBs focus more on data (attributed of product for example)

DWs focuses more on results, answering queries is the whole point

New cards

Facts (Warehouses)

The values that we are interested in. The measure or the dependent variable. They are usually aggregate numbers like total revenue, average profit.
Simple values like price, cost, GST, Profit.
Attributes we use in calculations.

New cards

Dimensions (Warehouse)

Influence our view of the facts. The factor or the independent variable. Time is a key dimension. They are usually internally hierarchical (day, month, year).
Dimensions are usually used to filter the facts

New cards

Time dimension

Not as simple as it seems, it has high granularity (unit size) eg. year, month, week, day etc.
Alternative units, season, quarter.
Inconsistencies, fiscal years across the world, months different length, leap years.
TIME ZONES

New cards

Star Schema

Central fact table, cluster of related dimension tables (relational DB).
Each row represents a combination of dimension values.
A partially denormalised schema is the essence of a star schema.

New cards

Slowly changing dimensions

is the technique use to manage attribute changes in a dimension over time.
- Retain the old history and leave the data as it is
- overwrite with new data
- row versioning
- extra columns

New cards

Dimensional Hierarchy

Dimension tables are usually denormalised, different levels of aggregation are needed.
- Base fact table always aggregated at the highest level of detail across all dimensions
- Pre compute commonly used levels of aggregation

New cards

Problem: too many fact tables

How many fact tables to cover all possible combinations of dimensional aggregation levels

Options:

Pick the ones that are used more commonly and precompute those

Create additional fact tables as materialised views derived from the base fact table with query rewrite enabled

New cards

OLAP

Online Analytical Processing
For huge data
pre-processed and summarized
user reports fast
no access to details

New cards

OLAP tools

Data transformation. Business modelling. Statistical analysis. Powerful GUI query facility.
Visualisation (graphics)

New cards

Drilling down

a dimension hierarchy provides a more detailed view of the facts (aggregation by smaller units).

New cards

Rolling up (data warehouses)

Similar to drilling up but it collapses the data from multiple items into a single value. Rolling up a dimension hierarchy provides a more summarised view of the facts (aggregation by larger units).

New cards

OLAP storage

Internal proprietary DB (MDD). relational (ROLAP). multidimensional (MOLAP). Both (HOLAP).

New cards

Analytic query (warehouses)

Ranking queries
Running / cumulative totals
Computing multiple aggregates with different groupings in one operation
Windowed queries
First / last / nth value
Converting rows to a list

Some in plain SQL

New cards

Analytic function

Enhanced version of aggregation. Simple aggregate functions are affected by 'group by' and summarise multiple rows to a single value.

Analytic functions never are impacted by 'group by' and summarise multiple rows but dont reduce number of rows in result

New cards

GraphQL

An alternative to REST

A query language for APIs and a runtime for fulfilling those queries with your existing data. Provides a description of the data in your API, gives your clients the power to ask for what they need and nothing more, makes it easier to evolve APIs over time and enables powerful developer tools.

GraphQL needs a little bit more to run (middleware) rather than a typical REST system that is run along with the HTTP

New cards

GraphQL vs REST

- REST offered great ideas initially (Statelessness, structured access to servers) but is too inflexible to keep up with changing requirements.
- A clearly defined schema makes it easier for front and backend teams to work since they both know the definite structure of the data that is sent across the network.
- No more over and underfetching of the data
- GraphQL is more efficient than REST or RPC APIs

New cards

Underfetching (REST vs. GraphQL)

a specific endpoint doesn't provide enough of the needed info. The client will need to make more requests to fetch everything it needs.
E.g. having to send 20+ requests to get information about a football team

GQL queries can include nested structures (eg. team roster) and all matching data will be returned. This is all done with a Single URL and single request, less time waiting and less processing required

New cards

Overfetching (REST vs GQL)

A REST GET call returns the same representation whether the client is a desktop or mobile etc. a mobile might need to filer data it doesn't need this is a waste of bandwidth and time.

GQL queries state what data fields they want.

Selecting and filtering fields for data retrieval

Specifies precisely what the client wants

No need for filtering within the API which can get far too complex

You can specify the name of the query so that if it fails you can see for what query went wrong, good for documentation

New cards

Server side: Resolves / data fetchers

1. queries are parsed
2. validated (against the schema)
3. executed

Every field on every type is backed by a function called a resolver. A resolver is a function that resolves a value for a type or field in a schema.

Resolvers can be asynchronous ... They can resolve values from another REST API, database, cache, constant, etc

New cards

Enterprise Application Integration (EAI)

EAI is a problem to be solved in business, developers etc. have solved the problem by creating message brokers.

Connects the plans, methods, and tools aimed at integrating separate enterprise systems which there may be hundreds if not thousands of custom built or off the shelf systems

New cards

(EAI Problem \#1) if a process takes a while, how do we not waste time?

Communicate via asynchronous messages

When there are multiple asynchronous requests sent, how do we make sure the response is matched with the right request?

Need some sort of reference number, something needs to be included in the message that is a reference

The solution to this problem is to send messages to Queues within a messaging server which sends them out to the location

New cards

(EAI Problem \#2) business processes change, how do we futureproof?

Send messages to named channels via a commonly known messaging server (or, in larger organisations, a federated collection of message servers).

This means that as old apps are replaced with new ones, messages still get sent to the same place.

New cards

Message Oriented Middleware (MOM)

Supports asynchronous communication between applications using structured messages. Supports documents to be processed.

New cards

Motivation for MOM

RPC or RMI
- sender and reciever needs to be available at the same time
- sender must know the methods provided by the recipient
- tight coupling

MOM
- sent to a queue, recipient can retrieve at any time
- message format must be understood by both but there is loose coupling and usually represents a type of business data

New cards

(EAI Problem \#3) Independent programs can be run on their own but others might not be running too.

Ensure your messaging server supports "store and forward" messaging. Messages are stored in a database until they are successfully delivered

New cards

(EAI Problem \#4) Apps may use different data representations. How do we keep loose coupling but still allow for translation of messages?

1. Ensure your messaging server is a message broker that allows message transformation rules to be defined for particular channels within the server

2. Use a message routing and mediation engine (e.g. Apache Camel) that can read and write to message brokers, web services, etc., and contain application logic to transform and route messages.

New cards

(EAI Problem \#5) How do we avoid configuring the messaging server with a channel for each message type?

1. Ensure your messaging server is a message broker that allows content-based routing rules to be attached to channels within the server.

2. Use a message routing and mediation engine (e.g. Apache Camel) that can read and write to message brokers, web services, etc., and contain application logic to transform and route messages

New cards

The Java Message Service (JMS) API

JMS is a Java app programmer interface for clients of a MOM to send and receive messages

New cards

Enterprise Integrations Patterns (EIP)

65 design patterns for solving basic problems that commonly arise in messaging based EAI

Each pattern has a graphical icon so that diagrams can be used to show integration logic
Makes it easier to visualise what the pattern is
High level

The patterns can be combined together to solve complex integration problems

New cards

Service integration middleware

Allows for a connection of disparate apps and services together.

Provides tools to enable the services to communicate with eachother.

Allows us to coordinate the communication between services

New cards

Apache Camel

Open source integration framework. Allows for the construction of a business process or service that integrates many types of application, service and data source.

Support for many message formats (XML, JSON etc.

New cards

Endpoints

A channel for messages to enter or leave camel routes.
An endpoint can be a consumer or a producer or both
Defined by endpoint URIs
- http://
- jdbc:
- jms:queue:blahblah
Endpoints are implemented by Camel components

New cards

Processors (Message oriented middleware)

some Java DSL methods are predefined processors that perform operations on messages

New cards

The Exchange class

represents a message exchange along a route

Message processors can read and change the contents

New cards

MEP

Message Exchange Pattern. Most commonly the values are InOnly and InOut

New cards

Message bodies

Camel can store any data structure within a message body (String, XML, JSON, Java objects

Meaning that Camel is unopinionated

New cards

The processing pipeline

The consumer (from()) endpoint creates the exchange object to represent a received message.
It sets the MEP to InOut if it expects a response otherwise it is InOnly.
The out message from one processor becomes the in for the next processor

New cards

Message broker with Camel

- Supports multiple clients
- Messages are persistent
- We can look at the contents of queues and messages using ActiveMQ
- We can test individual routes by manually adding messages to queues using ActiveMQ

New cards

Cloud and Virtualisation

Cloud is basically virtualisation, your services will be running on some form of virtualised environment.

Deploying a service to a cloud platform will usually result in
- A machine / container image being retrieved or created
- A hypervisor booting that image to create a virtual guest
- The guest may install some additional software it needs to run the service
- The guest will start your service

New cards

Hypervisor

Software that provides virtual hardware to virtual machines.

Translates virtual instructions into real instructions on the physical hardware

Allows the booting of multiple guest OS that can run at the same time

each guest is isolated from the others

Provides features for managing VMs such as making snapshots of virtual disks and memory for backup and migration

New cards

Virtualisation benefits

Make more efficient use of existing IT infrastructure. In the past, a server would be dedicated to running one app. Able to use the infrastructure to its full potential.

Cut down OPEX and IT expenditure.

More eco-friendly.

Reduces downtime

New cards

Provisioning

Installing and configuring environments using simple, repeatable and consistent configuration

New cards

Server configuration

Installing and configuring the software on the servers needed for hosting the service. The same configuration should be used for different hosts.

This config should be repeatable (Puppet, Terraform)

New cards

Monitoring / management

tools that monitor the state of servers and services and warn admins when something is wrong.
allow admins to configure remote servers usually via a web app.

New cards

IaaS (Infrastructure as a Service) platforms

Platforms that provide a complete stack for creating, running and managing IaaS servers.
allow admins to provision, run, and monitor virtual servers via a web app.
Redcuded risk compared to using the platforms providers offerings as if you dont like it anymore you can move your machine images to another.

New cards

PaaS platforms

Platforms that provide a full stack of servers and libs for developing services.
Web servers, DBMS servers.
Can be deployed to virtual servers on any of the big providers

New cards

DevOps

An approach based on lean and agile principles in which business owners and the development, operations (hence DevOps) and quality assurance departments collaborate. Aimed at shortening systems development life cycles.

New cards

Infrastructure as Code

The idea of using a declarative config file to provision your servers, allowing you to automate provisioning.
Declarative meaning what you describe what you want rather than providing commands that perform the config, the tools work it out for you.
Aim of DevOps is to have the entire infrastructure automatically provisioned as source code. Services can be deployed repeatably and reliably no matter where you start from.
All information about how various systems are configured is in the IaC scripts.
Versioning can be used.

New cards

Container orchestration

The automated deployment, scaling and management of containerised applications

Addresses issues such as:
- How do I distribute replicated applications across multiple servers
- What order do apps need to be started
- How to make sure apps keep running And how they recover from crashes automatically

New cards

Kubernetes (k8s)

an open source system for automating deployment, scaling, and management of containerised applications.

New cards

K8s clusters

Cluster: a set of computing nodes (physical or VMs) that can run containers under the management of K8s.

New cards

K8s namespaces

Separate working areas within a k8s cluster.
Not needed if you are the only one using a cluster If there are more than one you need to specify your name space -ns (name) *operation*

New cards

K8s Pods

K8s manages pods not individual containers.
A pod can contain 1 or more containers.
Provides a way to set environmental variables, mount storage and feed info into a container

New cards

Containers

similar to virtualisation but there is no hypervisor and no guest OS. The guest only has the libs and the programs needed for the service to run.
The guest piggybacks of the hosts kernel.
Much smaller images and higher performance than VMs

New cards

LXC containers

Basic Linux container. Provides the engine that allows containers to run in a way that they are isolated between the host and other containers. Can be standalone

New cards

LXD

Runs ontop of LXC and extends it. Focus is on containerising entire Linux servers. Provides features that are similar to a hypervisor. Better performance though

New cards

Docker

Runs ontop of LXC and extends its capabilities. Its focus is on containerising single applications/services.
There is a hub like GitHub for hosting docker images.
Most large server software projects have an official docker image.
Uses the concept of layers

New cards

Criu

Checkpoint / restore in userspace
allows snapshots of running containers

New cards

Shared everything architecture

disk and memory shared between nodes, access to shared resources becomes a bottleneck at scale (huge DBS).
Not completely independent (Parallel)

Scales follow amdahl's law

New cards

Shared Nothing architecture

No shared resources and no dependency between the nodes.
Scaling is linear and unlimited.
data consistency can become a problem

Multi-node queries may require copying data between nodes

New cards

Scale up vs Scale out

Scale up, add more resources like CPU and memory to individual nodes. Compatible with SE and SN

Scale Out, add more nodes to the system, more web application, DB, servers
Compatible with SN but problematic with SE

New cards

NoSQL

"not only sql" a broad class of non-relational DBMSs, inspired by growing infrastructure needs on the web.
Made use of Flexible schemas

New cards

Flexible schemas

No schema at all
Schema defined by data (self defining, JSON and XML)
Entity attribute structure (EAV)
All have implications for integrity, consistency, querying

New cards

SQL problems

impedance mismatch between SQL and typical programming environments.

SQL is not well designed but is the standard. Hard to write and debug.

ACID is difficult to enforce on distributed node systems

New cards

NewSQL

A database model that attempts to provide ACID-compliant transactions across a highly distributed infrastructure.

Each type focusses on solving a specific kind of problem
Key-value data stores
Column-oriented data stores
Document oriented data stores
(BigTable)

New cards

Column-oriented data stores

Extensible columns of closely related data.
Both rows and columns can be split over multiple nodes.

Both rows and columns can be split over multiple nodes

New cards

Google BigTable

- Shared nothing
- Supports single row transactions
- Simple queries
- No secondary indexes
- Only one server is responsible for a given piece of data

100

New cards

Document oriented data stores

Organises data as a collection of documents

(MondoDB)