CSC 430

ORGANIZATION OF PROGRAMMING LANGUAGES

Course Information

CSC 430 is a 3-credit course.
Prerequisite: CIT/CSC 304.
45 hours of lectures.

Course Objectives

Equip learners with the tools to evaluate programming languages.
Understand the design decisions behind different programming languages.
Expand student's understanding of programming concepts.
Enhance student's understanding of computer programming and computer science.

Diversity of Languages

Over a thousand programming languages have been designed.
Most languages remain confined to their design group, while others have been replaced by newer languages.

Reasons for Studying Organization of Programming Languages

1. Increased Capacity to Express Ideas

Limited language knowledge restricts the complexity of thought.
Programmers are constrained by the languages they use, which limits control and data structures.
Awareness of a wider variety of programming language features can reduce such limitations.

2. Improved Background for Choosing Appropriate Languages

Many programmers lack formal computer science education.
Programmers often use familiar languages even if unsuitable for new projects.
Familiarity with diverse languages enables informed language choices.

3. Increased Ability to Learn New Languages

Computer programming is a continuously evolving discipline.
Learning new languages can be lengthy and difficult.
Understanding fundamental language concepts facilitates learning new languages.

4. Better Understanding of Implementation Significance

Understanding implementation issues leads to more intelligent language use.
For example, a programmer unaware of recursion implementation might not realize its potential slowness compared to iteration.

5. Increased Ability to Design New Languages

6. Overall Advancement of Computing

ALGOL 60's superior design was not widely understood.
Better-informed language choices could lead to quicker adoption of superior languages.

Language Categories

Programming Domains

Different areas of computer applications have associated languages.

1. Scientific Applications

Involve simple data structures and floating-point arithmetic.
Common data structures: Arrays and matrices.
Common control structures: Loops and selections.
Examples: FORTRAN and ALGOL.

2. Business Applications

Example: COBOL.

3. Artificial Intelligence

Characterized by symbolic computations.
Examples: Lisp, Prolog.

4. Systems Programming

Involves operating systems and programming support tools.
Requires fast execution and software interfaces to external devices.
Examples: PL/1, PL/S, BLISS, Extended ALGOL, BCPL, Coral 66, Jovial, Java, and XPL.
Widely used languages: C and C++.

5. Special-Purpose Languages

Domain-specific languages.
Examples: * String Manipulation: COMIT, SNOBOL, SNOBOL 4 * List Processing Languages: IPL-V, Lisp * Simulation Languages: * Used for system simulation. * Examples: GPSS (General-Purpose Simulation System) and Simula 67 * Scripting Languages: * Used for communicating with components in other languages. * Interpreted instead of compiled. * Examples: awk, Perl, Python, Tcl, JavaScript, PHP, and ASP.

Language Evaluation Criteria

A. Readability

Ease with which programs can be read and understood.

B. Writability

Ease with which a language can be used to create programs.

C. Reliability

A program performs to its specifications under all conditions.

D. Cost

Financial and computing resources.

Characteristics Contributing to Programming Language Readability

i. Overall Simplicity

The size of basic types: A language that has large number of basic components is more difficult to learn than one with a small number of basic types
Feature Multiplicity: Having more than one way to accomplish a particular task might be complicating for some programmers e.g count = count +1, count + = 1, count ++ and ++count in C++ and C all have the same meaning when used as standalone expressions although the last two have slightly different meanings.
Overloading. Overloading is a useful feature, but can reduce program readability.

ii. Orthogonality

Features work independently without interference.
Allows combining features consistently, simplifying the language.
A small set of primitive constructs can build control and data structures.
Every possible combination of primitives is legal and meaningful.
Extreme orthogonality can lead to complexity.
Example: Conditional statements on the left side of assignment statements in ALGOL 68.

Example: Orthogonality in Variables and Data Types

Variables can be declared independently of their data types.
Integers, floats, and strings can be used without special rules. * Orthogonal example (Python): $x = 10$ # Integer $y = 10.5$ # Float $z = "Hello"$ # String * Non-orthogonal example (C): $int x = 10;$ $float y = 10.5;$ $char* z = "Hello";$

iii. Control Statements

Availability of control structures aids readability.
Indiscriminate use of goto reduces readability.

iv. Data Types and Structures

Facilities for defining data types aid readability.
Using numeric types for Boolean types reduces readability.

v. Syntax Considerations

Identifier Forms: Restricting identifiers to short lengths reduces readability.
Special Words: Using special words as variable names reduces readability.

vi. Forms and Meaning

The appearance of a statement should indicate its purpose.

Factors Affecting Writability

i. Simplicity and Orthogonality

ii. Support for Abstraction

iii. Expressivity

A convenient way of specifying computations.
Example: count++ instead of count = count + 1

Factors Affecting Reliability

i. Type Checking

Compile-time type checking is better and less expensive.

ii. Exception Handling

Intercepting run-time errors and taking corrective measures aids reliability.

iii. Aliasing

Having multiple names for the same memory cell.
A dangerous feature.

iv. Readability and Writability

Easier programs are more likely to be correct.
Readability affects reliability in writing and maintenance phases.

Factors Affecting Cost

Cost of training programmers
Cost of writing programs
Cost of compiling programs
Cost of language implementation systems (e.g., Java's free compiler).
Cost of maintaining programs
Language design: Languages requiring many run-time type checks prohibit fast code execution.

Influences on Language Design

1. Computer Architecture

Imperative languages designed around the von Neumann architecture.
Data and instructions stored in the same memory.
Variables model memory cells.
Assignment statements based on piping operations.

2. Programming Design Methodologies

Process-oriented and data-oriented designs.

Language Design Tradeoffs

Choosing constructs and features involves compromises between criteria like reliability and cost of execution.

Defining the Syntax and Semantics of Languages

Providing a concise and understandable language description is difficult but essential.
Language descriptions must cater to diverse users.
New languages undergo scrutiny before completion.

Language Study is Divided Into

Syntax
Semantics

Syntax

The form of a language's expressions, statements, and program units.

Semantics

The meaning of the expressions, statements, and program units.
Example: while (<expr>) { <statements> } in C++.
Meaning: The statement should be executed as long as the expression is true.

The General Problem of Describing Syntax

Languages are sets of strings of characters from some alphabet.
Strings of a language are called statements or sentences.
Syntax rules specify which strings are in the language.
Formal descriptions often exclude the lowest-level syntactic units (lexemes).
Lexemes include identifiers, literals, operators, and special words.
Semantics should follow directly from syntax in a well-designed language.

Token

A token is the category of its lexemes.
Example: index = 2 * count + 17 * Lexemes: index, =, 2, *, count, +, 17, ; * Tokens: Identifier, Equal_sign, Int_literal, Mult_op, Identifier, Plus-op, Int_literal, Semicolon

Ways to Define a Language Formally

Recognition
Generation

Recognition

A device reads strings and indicates whether they belong to the language.
The syntax analyzer or parser of a compiler is a recognizer.

Generation

A device generates sentences of a language.

Methods of Describing the Syntax of a Language

Context-free grammar or Backus-Naur Form/Backus-Normal Form (BNF) is the most common method.

BNF

A metalanguage.
A language used to describe other languages.
A generative device for defining languages.
Sentences are generated through rule applications, beginning with a start symbol.
Example: * ::=0|1|2|3|4|5|6|7|8|9 * ::=a|b|c|d|…x|y|z * ::=||

Nonterminal

Written within angle brackets < and >.

Symbol ::=

Means 'is defined as’.

Symbol |

Means 'or’
‘ch1’ is an identifier derived from : * * * * * c * ch * ch1
The sequence of terminals and nonterminals produced at each stage is a derivation is called a sentential form.
Final sentential form with no nonterminals is called a sentence.
The sentence of a derivation is best shown by a derivation tree.

Tree for ‘ch1’:

A complete programming language is defined by starting with a nonterminal such as <program>, called the start symbol.
All possible programs can be defined from the start symbol by replacing the left parts of production rules by one of their corresponding right parts.
Trees can be created top-down or bottom-up.
Top-down: Derive the sentence from the start symbol (e.g., ‘ch1’).
Bottom-up: Reduce the sentence to the start symbol.
Both approaches are used in the syntax analysis phase of compilers.

Recursion in Grammar

A grammar defining a programming language can be left-recursive or right-recursive.
Example: * ::= * The above is left-recursive because the nonterminal being defined, that is, , is the leftmost symbol in the right part.

Context-Free BNF

A BNF grammar is context-free if the left part always consists of a single nonterminal.
The left part can always be replaced by one of its alternative right parts, irrespective of the context.

Ambiguity

A grammar that generates a sentence for which there are two or more distinct derivation trees is said to be ambiguous.
Example: * ::=|beginend * ::=if then |if then else
Two different derivation trees exist for the conditional statement: if <condition> then if <condition> then <statement> else <statement>
The problem is, knowing to which then the else belongs.
Note: A grammar is always ambiguous if it is both left- and right-recursive with respect to the same nonterminal.

Extended BNF (EBNF)

Several variants of BNF notations have been used in language definitions. An example is EBNF
It sometimes uses ‘->’ in place of ‘::=’
It does not use angle brackets
It encloses terminals within quotes
Example: * ::= + + will be written as exp -> exp ‘+’ term

Describing the Semantics of a Programming Language

Defining semantics is problematic compared to formal syntax definition.
BNF was used for Algol 60, but the semantic problem was addressed using English.
English prose is ambiguous.

Formal Approaches to Describing Semantics

Operational Semantics
Axiomatic Semantics
Denotational Semantics

Operational Semantics

Describing a program's meaning by executing its statements on a machine.
Changes in the machine’s state define the meaning of the statement.
Example: * Let the state of a machine be the values of all its registers and memory locations including condition codes and status registers. * If one simply records the state of the computer, executes the instruction for which meaning is sought, and then examines the machine’s new state, the semantics of that instruction are clear.

Example 2: Step-by-Step Execution

Given:

if x > 0:
 y = 10
else:
 y = 20

Check x > 0. * If true, execute y = 10. * If false, execute y = 20.
Store the result in y.

Axiomatic Semantics

Based on symbolic logic and used to prove program correctness.
Uses rules of inference to deduce the effect of executing a construct.
Defines the meaning of a program using logical rules, describing what must be true before and after code execution.
The meaning of a statement S is described in terms of: * The condition P (the pre-condition) that is assumed or asserted to be true before S is executed * The condition Q (the post-condition) which can be deduced to be true after execution of S
This is usually written as: {P} S {Q}
Example: Consider the assignment statement $x:= x + 1$ If the condition $x \ge 0$ is true before execution of the statement, then x > 0 will be true after execution of the statement.
This is written as: {x >= 0} x:=x + 1 {x > 0}

Example 2

Given:

if x > 0:
 y = 10
else:
 y = 20

We express its meaning using logical rules: * Precondition: We know x is some number. * Postconditions: * If x > 0, then y = 10. * If x ≤ 0, then y = 20. * This lets us prove that y will always be either 10 or 20 after execution.

Denotational Semantics

The most rigorous widely known method for describing the semantics of a program.
Based on recursive function theory.
Defines a mathematical object and a function that maps instances of the entity onto instances of the mathematical object for each language entity.

Example 1: Describing the Meaning of Binary Numbers

Consider the grammar rules: * :=0|1| 0| 1
The objects in this case are simple decimal numbers.
In this example, the meaning of a binary number will be the meaning of its decimal equivalent.
Let the domain of semantic values of the objects be N, the set of non- negative decimal integer values.
It is these objects that we want to associate with binary numbers
Let the semantic function be $M_{bin}$
The function $M_{bin}$ is defined as follows: * $M_{bin} (’0’) = 0$ * $M_{bin} (’1’) = 1$ * $M_{bin} (<bin-num> ‘0’) = 2 * M_{bin} (<bin-num>)$ * $M_{bin} (<bin-num> ‘1’) = 2 * M_{bin} (<bin-num>) + 1$

Example 2

Given:

if x > 0:
 y = 10
else:
 y = 20

We can define the meaning using a mathematical function:

Runtime Considerations

Programming languages are designed with different runtime considerations that impact their efficiency, reliability, and usability.
Understanding these considerations is crucial for selecting the right language for a given application.

Various Runtime Concerns

Execution Models

Different programming languages use distinct execution models that influence runtime behaviour. The primary execution models include: * Interpreted Execution: Code is executed line-by-line by an interpreter, e.g., Python, JavaScript. * Compiled Execution: Code is translated into machine code before execution, e.g., C, C++. * Hybrid Execution: A mix of compilation and interpretation, e.g., Java (bytecode executed by JVM), .NET languages (executed by CLR). * Each model affects runtime performance, portability, and flexibility.

Memory Management

Efficient memory management is essential for optimizing runtime performance. Programming languages employ different strategies: * Manual Memory Management: Requires explicit allocation and deallocation (e.g., C, C++ with malloc/free/delete). * Automatic Garbage Collection: The system automatically reclaims unused memory (e.g., Java, Python, Go). * Reference Counting: Tracks object references and deallocates when no references remain (e.g., Swift, Objective-C). * Region-Based Memory Management: Allocates memory in blocks and releases them together (e.g., Rust with ownership model).

Runtime Performance Considerations

Several factors impact the runtime efficiency of programming languages:

Execution Speed: Compiled languages typically perform faster than interpreted ones.
Memory Usage: Garbage collection can introduce overhead but helps prevent memory leaks.
Concurrency and Parallelism: Support for multithreading and asynchronous execution affects performance.
Optimization Techniques: Just-In-Time (JIT) compilation, loop unrolling, and inlining improve runtime efficiency.

Type Systems and Runtime Behaviour

The choice between static and dynamic typing influences runtime performance:

Statically Typed Languages (e.g., Java, C) perform type checking at compile-time, reducing runtime errors.
Dynamically Typed Languages (e.g., Python, JavaScript) check types at runtime, allowing flexibility but adding overhead.

Exception Handling and Error Management

Error handling mechanisms impact the robustness of a runtime system:

Checked Exceptions: Enforced at compile-time (e.g., Java).
Unchecked Exceptions: Occur at runtime without mandatory handling (e.g., Python, C++).
Structured Exception Handling: Used in systems programming for reliability (e.g., Windows SEH, POSIX signals).

Understanding runtime considerations is crucial for selecting and optimizing programming languages for various applications.

Types, Values and Declarations

The type of a data item determines its allowed values together with the set of operations that can be used to manipulate these values.
Data items have a value and a type and may be held in what are called variables.
Variables have a name, various attributes and refer to an area of computer store.
There is a distinction between the following: * The name of a variable : Identifier * Where the variable is stored: Reference or Address * The value stored.

1. Identifier Types of Characters Used

An identifier typically is a combination of letters and digits, with the first character being a letter.
An exception is Perl where an identifier has prefix to show whether it is scalar ($) and array @ or a hash(%) variable.

Spaces

Spaces are not normally allowed within identifiers, but some languages like Ada, C, C++, etc allow the use of underscore.
Although Java allows the use of underscore, the convention is to use internal capitals instead e.gleftLink instead of left_link.

Length

Early languages often restricted the length of identifiers to six or eight characters, but most modern languages have no limit.

Case Sensitivity

In C, C++ and Java, case matters
In Pascal, Fortran 90 and Ada, case does not matter.

2. Declaration and Binding

Binding: The time when different language features are associated with, or bound to one another.
Binding can take place at: * Compile time; * Load time; and * Run time.
Early (compile-time) binding leads to efficient execution while late (run- time) binding leads to more flexibility.

Name-Declaration Binding/Scope

The connection between the use of a name in a statement and its declaration is called name-declaration binding/scope.
Java, C, C++ and Ada allow us to determine scope by looking at the program text alone i.e at compile time. This is called Static Scope.
Consider the Java class declaration:

class Example {
 private double x, y;
 public voidop1() {
 int y, z;
 y=34;
 z=27.4;
 }//
 public void op2() {
 x=3.768;
 y=x;
 }//op2
}//Example

y and z declared inside op1 are local variables. They are not visible outside op1
x and y declared under Example are global variables. They are visible in op1 and op2.
The use of y in op1 and op2 refer to local variables.
Ada, however, allows y declared under Example to be accessed inside op1 and op2 by the name Example.y.
A language that employs static binding is said to be statically typed.
When there are no loopholes in the type checking, the language is said to be strongly typed e.g Ada and Algol 68.

Advantages of Static Typing

Static typing leads to programs which are:

Reliable: Because errors are detected at compile-time.
Understandable: Because the connection, or binding between the use of an identifier and its declaration can be determined from the program text.
Efficient: Because type checks do not have to be made at run time.
Dynamic Scope: The binding between the use of an identifier and its declaration depends on the order of execution, and so is delayed until run time.

Declaration-Reference Binding/Scope

During program execution, a program variable is allocated memory location in which its value is stored.
The store location has a reference (i.e address) through which they can be accessed.
When a variable is allocated storage, we say that the declaration of the variable is bound to a reference.

How Variables are Allocated Memory Locations

1. On Block Entry

This is when the procedure or method is called. When control is returned from the method call, the memory is de-allocated.
Data members of class are allocated when the object is created and de- allocated when the object expires.

2. Static Variables

Binding the declaration of a local variable to a different store location (i.e reference) each time a procedure or method is entered has the consequence that a local variable does not retain the value it had when the method was last executed.
To handle this problem, Algol 60, C, C++ and Delphi allow the declaration of static local variable.
In object-oriented languages, there is no need for static variables as information that needs to be held from one method call to the other can be declared at the class level called class variable.
For example, only one instance of counter variable will be created in the following:

class Example {
 private static int counter=0;
 …
}

Reference-Value Binding

The binding of a variable to its value involves three bindings: * The binding of the variable’s name to its declaration (name- declaration binding) * The binding of its declaration to a store location (declaration- reference binding). * The binding of the store location to a value (reference-value binding).
Dereferencing: The process of finding the value, given the reference is called dereferencing.
In the statement below, the y on the right-hand side is dereferenced to find its value so that 1 can be added to it. * y=y+1
A reference is sometimes called an L-value and the value of a variable an R-value.

Constants

Constants are data items which do not change in a program.
A constant has a name and a value, but the constant identifier cannot appear on the left-hand side of an assignment statement.
The example below shows how to define constants in C++ and Java. * const int size =20; // C++ * final int size =29; //Java

3. Type Definitions

The most important attribute of a variable is its type.
The characteristics of a type are its allowed values together with the set of operations that can be used to manipulate these values.

Kinds of Types

The three main kinds of types are: * Scalar type * Structured type and * Reference type

Scalar Types

Numeric Data Types: Integer, floating point and fixed point and complex.
Logical Type: Called Boolean
Character Type
Scalar types can be split into two categories: * Discrete/Ordinal Type: Where each value (except the minimum and the maximum values) has a successor and a predecessor. * Others: Like floating-point types for which this is not the case.
Built-in Types: These are types that are immediately available to the programmer.

Specifying Type Information

In most programming languages, all variables must be declared before they can be used.
Variables can be declared to be either one of the built-in types or of a user- defined type.

4. Numeric Data Types

The arithmetic operators (+,-,\,/) are defined for numeric data types.
However, the effect of these operators depends on the context of use. * Example * 2 + 3 ---Integer addition * 2.5 + 3.5----Floating-point addition
When the effect of an operator depends on the type of its operands, the operator is said to be overloaded.
C, C++ and Java use a single overloaded division operator. * E.g 7/2 gives 3----Integer division * 7.0/2.0 gives 3.5----Floating-point division
Other operators available for numeric types are: * Exponentiation * Relational operator e.g
Integer * The range of integers is dependent on the machine hardware and the representation (no. of bits) used.

How Integers can be Used

In normal mathematical sense
As counters in a loop or array inde