Joe Herbert - Computing


Show all

Hide all

Processors, Input, Output and Storage Devices Software Exchanging Data Data Types and Structures
Legal, Moral, Cultural and Ethical Issues Computational Thinking Problem Solving and Programming Algorithms

1.1 Contemporary Processors, Input, Output and Storage Devices

1.1.1 Structure and function of the processor

The Central Processing Unit (CPU):
The CPU is the brains of the computer. It manipulates information sent to its registers, whether this is instructions or data.
Computers can have multiple CPUs, being dual-core, quad-core or even more.

Arithmetic and Logic Unit (ALU):
The ALU carries out arithmetic operations such as addition, multiplication, subtraction and division. It can also carry out logical comparisons on data, like finding out which number is higher. The result of the ALU is stored in the accumulator.

Control Unit (CU):
The CU governs the operation of all hardware, including input and output (I/O) devices and the CPU. It controls the FDE cycle.

Registers are a very small section of memory built into the CPU, which are very quick to access.
A bus is used to carry information from one place to another.

Fetch-Decode-Execute (FDE) Cycle:
- Fetch:
  1. The contents of the PC are copied into the MAR. The MAR now contains the location address of the next instruction and a memory read is initiated to copy the instruction from memory into the MDR.
  2. The PC is incremented so it not contains the address of the next instruction to be executed.
- Decode:
  1. The instruction is then copied from the MDR to the CIR.
  2. An instruction is made up from the opcode and the operand, and these are what the CPU uses to decode the instruction.
- Execute:
  1. The instruction in the CIR is executed. If the result needs to be committed to memory, the address is held in the MAR.
  2. Unless the instruction is a STOP instruction, the cycle is then repeated.

Performance of a CPU:
- Clock speed: The clock is used to regulate the rate at which certain instructions are carried out. The faster the clock speed the more instructions it can perform every second.
- Number of cores: Multiple core processors are essentially multiple processors in one chip. This means they can perform multiple instructions at the same time. However, applications have to be specifically coded to use threading for multiple cores.
- Cache: The cache is a small amount of RAM built directly into the CPU. It stores repeatedly used data and is very quick to access as it is part of the CPU.

Using Pipelining the CPU is capable of handling more than one instruction at a time. Pipelining uses a RISC structure; the stages are handled by separate circuitry which allows fetch decode, calculation and memory/register fetching/allocation to be performed on the same clock cycle. Each part is passed along for each clock cycle.
The main issue with pipelining is that you can't always correctly predetermine the next instruction, e.g. if there's an if condition the CPU won't know what code to run until the previous instruction has finished. The CPU will load the most likely instruction, and if the wrong instruction is loaded the pipeline must be flushed to clear it so the right instruction can be loaded. Modern CPUs have many pipelines so they can store multiple possible next instructions close to the CPU.

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7
Fetch Execute Decode
Fetch Execute Decode
Fetch Execute Decode
Fetch Execute Decode
Fetch Execute Decode

CPU Architectures:

1.1.2 Types of processor

CISC (Complex Instruction Set Computing):
RISC (Reduced Instruction Set Computing):
High Level: y = z * x Low Level:
 RISC: LDA 100
LDB 200
STA 300
 CISC: MULT 100, 200
STA 300

Or even just:

MULT 100, 200, 300

Graphical Processing Unit (GPU):
The GPU is a specialised CPU designed to manipulate images and graphics. It is often embedded hardware which allows faster manipulation of graphics where CPUs would struggle.
GPUs often act as Array Processors, where it executes one instruction on multiple sets of data the same time e.g. Rotating an image is just the same action on a huge amount of data. This is sometimes called Vector Processing or Single Instruction Multiple Data (SIMD).
GPUs often also use Parallel Processing, as described below.

Multicore Systems:
A multicore processing unit is a single Chip Multiprocessor (CMP) which contains multiple cores (CPUs). Each core can execute ordinary CPU instructions on its own. These cores could also share cache or have an inter-core communication system to improve performance.
Software must be designed to run over multiple cores.
Parallel Systems:
Parallel Processing is the processing of programming instructions by dividing them across multiple cores in order to speed up the processing time. This is how pipelining works.

1.1.3 Input, output and storage

Input Devices:
Output Devices:
Storage Devices:

Magnetic Storage:
Data patterns are stored on a magnetised medium. Each bit is magnetised to either positive or negative, representing a binary 1 or 0. Hard drives have 'heads' which move across the spinning discs to read or write data.
They have low cost per GB but are sensitive to movement, so aren't very portable. They're used for large capacity storage.
Examples include hard drives, floppy discs and magnetic tape.
Flash Storage:
Flash devices have no moving parts, and store data using tiny switches/gates on integrated circuits. They won't break if dropped, and they have a small physical size.
They have very low power requirements, but are relatively expensive per GB.
Examples include USB sticks, flash memory cards and SSD drives.
Optical Storage:
Data is stored using a laser which changes how a layer on a disc reflects light. Different wavelengths of laser can be used, e.g. Blu-Ray, to increase the amount of storage available.
Discs are very portable and can be swapped/distributed easily. Can be RW or WORM. Good for making backups.
Examples include CDs and DVD discs.

Random Access Memory (RAM):
RAM, also known as Immediate Access Store (IAS) when directly connected to the CPU, holds instructions and data needed by a program currently in use, ready for the FDE cycle.
The CPU can't tell the difference between data code and instruction (executable) code. To solve this, each memory location has a bit pattern that identifies it.
RAM can be changed but it is volatile.
Read Only Memory (ROM):
ROM cannot be changed most of the time, and it is non-volatile. It is possible to update, known as flashing, but this damages the ROM so it is done as little as possible.
It is used to store programs that always have to be ready to run and do not need to be altered, and to store programs that control the very beginning stages of start up before the OS is loaded. The BIOS or UEFI are stored in a type of flash ROM called Firmware. You don't store the whole OS on ROM as it is more expensive than RAM, and you want to flash ROM as little as possible but OSs are updated a lot.

Virtual Storage:
You can pool the storage of several devices together to appear to be one device. The advantages of this are: Virtual storage often uses Redundant Array of Independent Disks (RAID). These have the advantage of providing a continuous service if one disk were to fail. A failed disk can be rebuilt from the other disks since they hold information about each other.
Cloud storage is an extension of virtual storage. The advantages of cloud storage are: The disadvantages are: Complementary Metal Oxide Semiconductor (CMOS):
This is a small amount of RAM supported by a battery to stop information being lost when power is lost. It is used to store things like user settings (e.g. where to boot from) and current date and time. The battery can be removed to restore default settings.

1.2 Software

1.2.1 Systems Software

Operating System (OS):
The OS manages the system's resources and processes, including managing peripherals and programs. Acts as an interface between the user and the computer.
It is a collection of programs that work together to supply a level of abstraction for the user, allowing them to perform functions but not showing the complexities.

Memory Management:
Memory in a computer is finite so the OS has to divide it up between the processes which need it, and it keeps track of the memory that is available. Memory management can cause fragmentation, which the OS tries to reduce. Different methods of memory management are shown below and can be combined to provide the most efficient solution.
Paging is where the OS will load a fixed-size block of data, called a page, from secondary storage into RAM, allowing the computer to run applications larger than the RAM available as only the part it needs at that moment is kept in RAM. A page table is used to keep track of whether the page is stored in RAM or secondary storage. This is like a person reading a book of instructions, where they only need to read the page they're on, as the other instructions aren't needed now.
The disadvantages are that the OS has to allocate a whole page even if some of it isn't used, meaning there are gaps which could be exploited by viruses. Also to transfer a page from secondary storage to RAM requires a lot of moving data around and can cause fragmentation.
Segmentation is where the memory is split into variable-sized segments, and blocks of memory are then used to store segments. The segment will have an ID and an offset (length) which allows a program that has different modules to be linked together within the memory. Segments can be different sizes depending on their contents and the OS uses a table to track and handle the memory. This table is smaller than the page table, but it is more complicated to retrieve a specific segment as it is not a uniform size.
Virtual Memory:
Virtual memory is where the operating system moves the contents of memory to and from a hard disk. Overuse can cause disk threshing when the hard drive has to do lots of small operations over and over, damaging the hard drive. Virtual memory can also cause programs to slow down, since hard disks are generally much slower than RAM. This is often used alongside paging or segmentation, so a whole segment could be moved between RAM and a hard drive.

Interrupts cause a computer to stop its normal sequential processing. When an internal error occurs within a computer system, or a message needs to be sent to the CPU, an interrupt is sent from the device to notify the processor. Interrupts are stored in a priority queue called an interrupt queue, which in most computers is checked after each instruction has been executed. If there is something in the interrupt queue the processor runs a program called the Interrupt Service Routine (ISR).
The ISR is different for each type of error. While the ISR is at work, the voltaile environment of what was running before the interrupt occurred needs to be stored so when the main program is working again the user can continue where they left off.
Types of Interrupt, in order of priority from highest to lowest: The Interrupt Process: Scheduling:
Single core CPUs can only run one task at a time, so processes must be scheduled to provide the most efficient solution when trying to give the impression of multiple programs running at once.
Round Robin:
This is where each process is given a set amount fo time and the processor is switched to each process in a circular method. This is an easy method of implementing scheduling but can be inefficient, as a processor may be allocated to a process that may be waiting for an input or output signal so time is lost while it waits for the signal.
First Come First Served:
Also known as 'run to completion', or 'First In First Out (FIFO)', this is where each process is queued in order of arrival and then processed till completion. This is the simplest to implement but is very inefficient, as it may take a long time to complete a short process important to giving the user feedback if there is a long process taking up the processing power.
Multi-level Feedback Queues:
This allows important jobs to be completed first, when a job arrives it is allocated a priority, dependent on the process type, amount of CPU time needed, I/O access or memory size. It is then added to a queue based on its priority, and the CPU processes the queues using either round robin or first come first served, processing higher priority queues first. If a process is added to a queue of higher priority than what the CPU is currently working on, it immediately performs the higher priority process. Processes can change priority, meaning they move to another queue. A CPU may lower the priority of a process if it is taking too long to complete, so other processes are not ignored. This system allows I/O processes to be performed quickly while lower-priority jobs are still performed.
Shortest Job First:
This is where the CPU orders the processes it needs to complete by the amount of time they will take, and completes the shortest tasks first. This means short jobs will be executed very quickly, but long jobs could get processor starvation, where they are not performed for a long time.
Shortest Remaining Time First:
This is very similar to shortest job first but the CPU orders the tasks by the amount of time left to complete the process. If a task with less time comes up, the CPU will switch and do this first, then go back to completing the process it was on. This means short jobs are executed quickly, but again longer jobs could be subject to processor starvation.

OS Types:
Distributed OS:
Distributed OSs are split across multiple computers, or nodes, so that each can specialise for particular configuration. The nodes will communicate with each other to work together. The separation is not visible to the user. This method allows for easy expansion and scalability.
Embedded OS:
These are hidden within many devices, and are within the hardware of the device itself. They have a dedicated purpose, have little or no user interface, are fully autonomous and have limited resources, using only what is required.
Multi-tasking OS:
These allow the OS to perform several tasks at the same time, either by the CPU switching between different jobs using one of the scheduling methods above, or by using parallel processing.
Multi-user OS:
Often known as aminframe systems, these allow for several users to be accessing the processor and programs and resources at the same time. They usually use a round robin scheduling method where each user is allocated an amount of time for the processor to perform their jobs.
Real Time OS:
These are specialised to allow inputs to be processed under strict time limits so the output can affect the source of the inputs. They may be designed to take a certain amount of time to complete the processing each time. They usually have very limited user interaction but respond automatically to inputs.

Basic Input Output System (BIOS):
Checks all necessary hardware is present and working. Looks for and loads bootstrap from hard drive/CD. Bootstrap then loads the OS.

Unified Extensible Firmware Interface (UEFI):
This is a modern replacement for BIOS, offering a GUI. It is hardware and CPU independent so can run on many devices. It can also boot from larger disks and provide a secure boot as the OS has to be digitally signed.

Device Drivers:
Device drivers are software to enable communications between peripherals and the CPU. A generic driver allows basic features of many peripherals, whereas a specific driver provides more functionality but may only work with certain peripherals. Specific drivers can often be downloaded from the manufacturers website. A Hardware Abstraction Layer (HAL) is used so that individual programs do not need to know how the target device works, the OS handles it.

Virtual Machines:
A virtual machine is software which is used to provide hardware, and is the base for other programs or OSs. It can execute intermediate code as well as software running on the virtualised OS. It uses the Hypervisor mode of a CPU to ensure anything inside the virtual machine has no effect on the rest of the computer.

1.2.2 Applications Generation

The Nature of Applications:
Applications are designed to allow the user to complete a task, aiming to make their lives easier normally with large amounts of functionality. Applications may include word processors, desktop publishers, spreadsheets, database managers, CAD software, presentation graphics, games, photo or video editing software etc.

Utilities are small, useful programs, usually designed for one specific function.
System Update - checks online database for newer versions of system software. Often automatically installs when newer version available.
System Cleanup - removes unwanted/out-of-date files in order to free up Hard Drive space and remove clashing versions. E.g. remove temporary internet files or delete files in recycle folder.
System Diagnostics - provides information about the computer to help resolve any problems. E.g. amount of RAM or hard drive space, check for installed drivers for new hardware.
Defragmentation - files often get split up on a hard drive in order to fill up space available. Lots of small parts of a file are harder to read/use than one large contiguous file. Defrag utility re-organises files and puts pieces together where possible, thus speeding up hard drive operations.
Format - sets up a storage device (eg hard drive or USB stick etc) so the OS can use it to save files. Format creates a table on the storage device showing where/how the files will be stored. As files are written to the device, the OS updates the table.
File Transfer - used to move files between folders on the hard drive, on to memory stick etc.
Firewall - prevents data entering or leaving computer. Firewall intercepts any requests to transfer data and compares it to a set of rules about which data to allow or block. E.g. access can be prevented for certain types of software, from certain addresses or on certain ports.
Anti-virus - tries to block infection from virus programs designed to damage computer or data. Must be kept to date as new viruses are constantly created.
Anti-Spyware - tries to block infection from spyware programs, i.e. ones designed to sit quietly and send personal/confidential/key data to another computer. Must be kept up to date.
Compression Software - used to reduce size of a file by removing redundant space or by using codes for common phrases/data.
Back up - Creates a second/additional copy/copies of data onto separate storage medium (eg DVD). Used to retrieve original files/data in case of loss/theft/fire damage/corruption etc.
Archiving - frees up storage space by moving old data onto another medium (eg tape). Old data may need to be accessed in future, but not currently needed. E.g. finance data can be archived once tax returns completed.

Open Source vs Closed Source:
Open Source:
Closed Source:
Programmers produce source code which somehow needs to be converted into executable code.
Interpreters translate one line of source code into machine code, execute it, then translates the next line and repeats. This is slower during runtime, but it allows examining of variables at error point and can run changed code without re-compilation, making debugging and development quicker. Interpreters are also easier to create than compilers.
The whole of the source code is compiled into object code (very similar to machine code), which is executable. This means compilations only has to be performed once, and object code can be run on different machines to the one the source code was compiled on. Source code can be compiled in modules, then linked together. Object code is difficult to reverse-engineer, preventing people from stealing code. Compilers will also detect errors when the program is converted to object code, before it is run.
Assemblers are used to convert assembly language into machine code. Assembly language is very low level, meaning it is quick to convert to machine code but hard for a programmer to use.
Mixed/Dual Mode:
Some languages, like Java, require both compilation and interpretation. Source code is compiled into intermediary bytecode, which is distributed to users. The user's CPU then runs an interpreter which converts bytecode into machine code. The result of this is a slower application at runtime, and a larger application to distribute as the interpreter must be distributed with it, but an application which should be able to run on any platform.

Stages of Compilation:
Lexical Analysis:
Source code is examined. Comments and spaces are removed. Keywords are replaced with generic tokens. This creates patterns which can be examined.
totalCost = cost * 1.2
After tokenisation a look-up table is created to store variable names, data types and the value of constants.
Syntax Analysis:
Checks to see if program makes sense. The token string is split into phrases, and each phrase is examined against language syntax, checking for missing tokens, incorrect order etc. This is known as parsing code, where you check phrases against the allowed patterns.
Is okay.

Is not okay.
Code Generation:
Errors are passed to report generator/error handling routines. If no errors are present, the code is generated. Variables are substituted with their addresses from the look-up table, likewise for any library routines needed. The linker works out how to join sequences, and the loader works out where things are at run time.
Code Optimisation:
The code that has been generated is not necessarily efficient in terms of size of object code or time to execute. Optimisation removes redundant code (like adding 0 or multiplying by 1) and optimises it for the CPU architecture.

Linkers join library modules or other separate code sections to the current code. They combine modules into one larger block. Libraries store data in relative addresses, as the program and libraries are compiled separately, but the linker joins the modules together in the correct way and updates the relative addresses, to produce a single executable.
Loaders are responsible for locating object code in RAM. They adjust relative addresses in programs and load the program/data into efficient RAM space.
Libraries allow programmers to reuse code, saving time and money as they only need to be coded and tested once.
One type of library is a Dynamically Linked Library (DLL), which is a collection of executable modules usually stored as part of OS files. This allows other programs to use a common set of routines, preventing duplication in code and allowing easier upgrading of common routines. A single DLL can be used by many programs at the same time.

1.2.3 Software Development

Waterfall Lifecycle:
Each stage is classed as a milestone of the development, and consists of a number of subtasks that need to be completed. In practice it is often necessary to go back to refine previous stages.

Spiral Model:
This is an extension of the waterfall lifecycle but is designed to ensure that review and risks are evaluated regularly. It results in lower project failure rate, and involved lots of regular communication with the client to get feedback.

Rapid Application Development (RAD):
RAD is common in Object Oriented Programming, and it aims to speed up the process. It allows users/clients to view a work in progress, and increases flexibility and adaptability. It works using a repeated process of prototype, feedback, prototype, feedback, etc. This gives plenty of opportunity for the client to give their views, allowing the developers to create the product they want.

Agile Methodologies:
A type of RAD, Agile development is often employed in larger projects. It involves teams focusing on specific areas of a project, and then the client uses this solution and the product is evolved based on their feedback. This means the client sees progress sooner and is able to provide suggestions.

Extreme Programming:
Extreme programming is a type of agile software development, and it is performed by ising frequent solutions in short development cycles. It includes a constant review of progress and feedback from the client. The system only works with a very level management structure and works well in small projects. Features may also not be implemented until they are needed, focusing on the more important parts of the solution. The disadvantage of extreme programming is a lack of design so solutions may not be the optimal solution to the problem.

1.2.4 Types of Programming Language (Programming Paradigms)

First Generation Languages Second Generation Languages Third Generation Languages Fourth Generation Languages Fifth Generation Languages
Procedural Programming
Procedural languages are an example of third generation languages.
Instructions are written and executed in order. Lots of use of sequence, selection and iteration.
Uses functions and procedures. Uses modularisation, so recycles code. Is done in Structured/Imperative languages.

Functional Programming
Usually maths/equation based. Involves breaking the problem down into smallest units (decomposition).
Functions are written to receive parameters and return results.
Used for science, maths and engineering in particular.
Two examples are Haskell and HUGS.

Declarative Programming
Based on facts and relationships. Facts are entered into a knowledge base, relationships are defined and then a goal is set. Program works through the knowledge base using the rules/constraints.
Often used for AI (often with PROLOG), or diagnostic tools in medicine or technology.
e.g. Given male(David), female(Sarah), male(Alex), female(Claire), female(Paula), male(Tom)
If the goal is 'male(x)', the output would be: David, Alex, Tom
When something is found, this is an instance. If something is not found, this is a failure.
When you return to the original set of facts to check the next instance, this is backtracking.

Jackson Structured Programming (JSP)
Used with 3rd generation languages.
Uses a top-down approach - continued breaking down of problem into increasingly smaller units/modules (decomposition).
JSP diagrams help break a program down and maintain an understandable view of system.
To create a JSP diagram:
  1. Define problem. Consider Initialise, Input, Process, Output
  2. Break problem down into smaller modules
  3. Break modules down into smaller modules

An example could look like this:

This uses:
Sequence - modules are processed in order.
Iteration - use of asterisk and comment.
Selection - use of small circle showing alternatives.

Object-Oriented Programming (OOP)
OOP aims to create models of real world objects. Key characteristics are known as attributes, and key actions are known as methods.
The benefits of an OOP are they make it easier to write working code; a class can be fully tested & released for other coders in the team to use. Classes can be treated as black boxes, as other coders don't need to know how it works internally, just how to manipulate it through its methods. There are many design patterns already available which solve common programming tasks. A coder can pick up a pattern for a UI for example, and begin coding straight away. Classes can easily be used in multiple projects as they can be reused. The code is also portable, as it only requires the CPU specific compiler.
Drawbacks of OOP include its steep learning curve, as becoming proficient in an OOP language can be difficult, as it requires a different way of thinking about problems to a standard programming language. This complexity requires skill to create efficient and flexible classes. They are also not as efficient and compact as low level language coding.

This is binding an object and a method together. Assigning and retrieving object property values are only done using methods to access the data. You make attributes in a class private, but allow them to be changed and accessed through public methods. This means that once you've written code that works, you can guarantee that it is not the source of any future problems as it will not change.

Inheritance is where a child class (subclass) is defined based on parent classes (superclasses).
For single inheritance, a subclass inherits all the attributes and methods of a single parent class.
For multiple inheritance, a subclass inherits attributes and methods from multiple parent classes.

This is when calling one method will have a different effect depending on the data supplied to it.
e.g. Tom might have a method Sleep(). Calling Sleep(8) will have him sleep for 8 hours. Calling Sleep('Tight') will have him sleep tight.

This is a type of overloading, where multiple objects act consistently when the save method name is called on them.
e.g. Tom and Tom's dog both have a method Sleep(). Calling Sleep() on either of them will have the same effect of sending them to sleep.
Polymorphism gives a single interface to multiple versions of a function or method performing similar actions or processing.

Some key components used for OOP include: UML (Unified Modelling Language) Diagrams:
Range of diagrams used to describe a system and the relationships between classes/objects.
Memory Addressing Modes
Direct Addressing
This is where the operand represents the location of the data in memory. e.g. LDA 18 means load the data from address 18.
Indirect Addressing
With Indirect Addressing, the operand represents the memory location which holds the memory address of the data. e.g. LDA 1, if memory address 1 contains 2, means load the contents of memory address 2.
Immediate Addressing
Here the operand is the actual value to be loaded, so LDA 2 means the value 2 is loaded.
Indexed Addressing
One of the CPU registers is the index register. This holds the base memory address.
Indexed Addressing then functions like Direct Addressing, except the addresses are relative to the base address. e.g. If the base address is 2, LDA 3 would load the contents of memory address 5.
Iteration can be achieved by incrementing the base address.

1.3 Exchanging Data

1.3.1 Compression, Encryption and Hashing

Lossless Compression:
Lossless compression is used for text, programs and data where all the information is required. However, less space is saved compared to lossy compression. It is usually a two stage process - analyse, then compress. The compression and decompression algorithm is needed at both ends. Examples include .png and .flac.

Lossy Compression:
Data is lost, hopefully redundant or less important data. More space is saved. Different files can be compressed different amounts. Lossy compression is often used for video and sound, where certain frequencies can be ignored or detail of image can be reduced without visible loss of quality. Examples are .jpg and .mp3.

Run-Length Encoding (RLE):
Identifies repeating patterns. Stores one copy of the pattern along with how many times it repeats consecutively. For example, the string "00000011110000110111100" could be compressed to "60414021104120", which reduces the string from 23 characters to 14. The following code will compress a string called "fileText" using RLE.
string fileText = "...";
string rleCompressed = "";
int rleCharCount = 0;
int rleIndex = 0;

private void RleEncoding() {
  while (rleIndex < fileText.Length-1) {
    rleCharCount = 0;
    char tmpChar = fileText[rleIndex];
    int count = getCountOfChar(tmpChar);
    if (rleIndex == fileText.Length - 1) count++;
    rleCompressed += tmpChar.ToString() + count.ToString();
  MessageBox.Show(rleCompressed + ": " + rleCompressed.Length);

private int getCountOfChar(char charac) {
  if (fileText[rleIndex] == charac && rleIndex < fileText.Length - 1) {
  return rleCharCount;
Dictionary Encoding:
This creates a dictionary of key words/phrases which are repeated throughout the data. Each item is given an index, and a substitution encoder replaces the key phrases with the relevant index. An example is shown below:
Original Text Dictionary Compressed Text
Happy birthday to you
Happy birthday to you
Happy birthday dear <name>
Happy birthday to you
1 Happy
2 birthday
3 to
4 you
1 2 3 4
1 2 3 4
1 2 dear <name>
1 2 3 4
100 chars 26 chars 47 chars

Algorithms called ciphers are used to convert data in plain text into cipher text. Keys are used to convert between plain text and cipher text.

Symmetric Encryption
This uses the same key to encrypt and decrypt some data. This means the key must be shared between the sender and the receiver, which means it can be intercepted. Symmetric encryption is very easy to understand and implement, which means it is often used in low profile situations which need a quick set up, but don't need the highest level of security. It is also easier to keep track of, as there is only one key you need to be aware of.

Assymetric Encryption
This uses a different key for encryption and decryption. This means the key doesn't need to be shared by the sender, as the receiver has a different key. This also gives more control over permissions, as the encrypt key could be made available to anyone (known as a public key), but the decrypt key kept private (a private key) so anyone can encrypt data, but only those with the decrypt key can see the data. Even if someone was to intercept the data being sent, it would be useless as it is in cipher text, and they don't have the private key. This does mean anyone can encrypt data, but this doesn't pose a security threat, as malicious messages would just be detected when decrypted on the receiver's end.

Hashing is the process of applying a mathematical calculation to get a result.

1.3.2 Databases

A database is a collection of data that can be used to store, sort and search large quantities of information.
The titles of the groups (columns) of data in a database are called fields. Each set of data (each row) is known as a record.

Database Types
Flat file
A flat file database is just a single table which stores data. Flat file databases are hard to maintain as they often have repeated or redundant data, and there is often very little pattern to how the data is stored. This means it is confusing and slow to use.
Flat file databases are often stored in plain-text form, where each line holds one record, and fields in a record are separated by commas. This is known as a CSV (Comma-Separated Values) file.

A relational database is one which contains multiple tables of data. These tables are known as entities, and entity names should always be singular. Entities are related to each other through special fields known as keys.
There are three common types of key: An example of a database is shown below. This shows two tables, and the keys used to link them together.

Entity Relationship Modelling
An Entity Relationship (ER) Diagram is a simplified diagram of how tables are connected within a database. There are three types of relationship:

The following ER Diagram is based on the student and class tables above.

The school changes their system, and each student can now take multiple classes. The ER diagram is shown below.

Many-to-many relationships are not allowed. They must be resolved with an Association Table. The following ER Diagram uses an association table, studentTakes, to resolve the many-to-many relationship.

To make this work, the classId field must also be removed from the student table.

Methods of Capturing, Selecting, Managing and Exchanging Data
Data is usually managed with queries. The most common types of query are: Select, used to find/sort data; Append, used to add new records; Update, used to change existing records; and Delete, used to remove records.
Data is captured using forms. These are on-screen sections where users can input things. Forms display data in a readable format as well as allowing validation to be employed before a query is executed. Data is outputted from a database in reports. These are generally in paper form.

Normalisation is a tool used to validate ER Diagram models and optimise database structure.
A Normal Form is a label given to an arrangement of data, depending on how it is arranged.
A dependency is when one field is based upon another field. For example, an age field would be dependent on a date of birth field, as the age can be inferred from the date of birth. Dependencies are important in normalisation.

Un-Normalised Form (UNF or 0NF)
A database which has had no normalisation applied to it is in UNF.

First Normal Form (1NF)
To get a database into 1NF, you must:
  1. Eliminate duplicate columns (any columns with the same field name or which store the same data)
  2. Get rid of any groups of repeating data (where a single field contains an array of repeating values)
  3. Identify the Primary Key, or create one if it doesn't already exist
  4. Separate out any fields which are not atomic
Second Normal Form (2NF)
To get a database into 2NF, you must:
  1. Check data is already in 1NF
  2. Remove any partial dependencies by splitting the table on that field and moving the data to a new table. These can only occur if the primary key is a composite key
  3. Fix any many-to-many relationships using association tables
Third Normal Form (3NF)
To get the database into 3NF, you must:
  1. Check data is already in 2NF
  2. Check there are no non-key dependencies
To get to 3NF, all fields should be dependent on the key, the whole key, and nothing but the key.

Starting in UNF:
product(Item, Colours, Price, Tax)

Item Colours Price Tax
T-shirt red, blue 12.00 0.60
Polo red, yellow 12.00 0.60
T-shirt red, blue 12.00 0.60
Sweatshirt blue, black 25.00 1.25

Not currently in 1NF as: Normalise to 1NF:

product(Item, Colour, Price, Tax)

Item Colour Price Tax
T-shirt red 12.00 0.60
T-shirt blue 12.00 0.60
Polo red 12.00 0.60
Polo yellow 12.00 0.60
Sweatshirt blue 25.00 1.25
Sweatshirt black 25.00 1.25

Not currently in 2NF as: Normalise to 2NF:

product(Item, Colour)
cost(Item, Price, Tax)

Item Colour
T-shirt red
T-shirt blue
Polo red
Polo yellow
Sweatshirt blue
Sweatshirt black
Item Price Tax
T-shirt 12.00 0.60
Polo 12.00 0.60
Sweatshirt 25.00 1.25

Not currently in 3NF as: Normalise to 3NF:

product(Item, Colour)
cost(Item, Price)
tax(Price, Tax)

Item Colour
T-shirt red
T-shirt blue
Polo red
Polo yellow
Sweatshirt blue
Sweatshirt black
Item Price
T-shirt 12.00
Polo 12.00
Sweatshirt 25.00
Price Tax
12.00 0.60
25.00 1.25

Structured Query Language (SQL)
SQL is used to manipulate data in databases. There are many variations but the principle remains the same. SQL can be split into two parts - the Data Definition Language (DDL) and the Data Manipulation Language (DML).

The DDL is used to build the structure of the database. The main commands are: Example:
The following code creates the database in 3NF above, but adds a field for the primary key of the product table. SQL commands are usually written in capitals. CREATE DATABASE db;
GRANT ALL ON db.* to dbUSer;
USE db;

CREATE TABLE product (
 colour VARCHAR(10) NOT NULL,

 PRIMARY KEY productId,

 FOREIGN KEY price REFERENCES tax(price)



The DML manages the data inside the database. Commands except SELECT often return TRUE if they complete successfully, depending on the language. The main commands are: Examples:
INSERT INTO product (item, colour) VALUES
   ('T-shirt', 'red'),
   ('T-shirt', 'blue'),
   ('Sweatshirt', 'blue'),
   ('Polo', 'red'),
   ('Polo', 'yellow'),
   ('Sweatshirt', 'black');
INSERT INTO cost (item, price) VALUES
   ('T-shirt', 12.00),
   ('Polo', 12.00),
   ('Sweatshirt', 25.00);
INSERT INTO tax (price, tax) VALUES
   (12.00, 0.60),
   (25.00, 1.25);
   $\longrightarrow$ TRUE UPDATE product SET colour = 'red' WHERE productId = 5;    $\longrightarrow$ TRUE SELECT * FROM product WHERE colour = 'red';    $\longrightarrow$
productId item colour
1 T-shirt red
3 Polo red
5 Sweatshirt red
SELECT colour FROM product WHERE item = 'Polo';    $\longrightarrow$
SELECT * FROM cost WHERE item LIKE '%shirt';    $\longrightarrow$
item price
T-shirt 12.00
Sweatshirt 25.00
SELECT * FROM db.product WHERE colour = 'red' OR colour = 'blue';    $\longrightarrow$
productId item colour
1 T-shirt red
2 T-shirt blue
3 Polo red
5 Sweatshirt red
SELECT product.productId, product.item, product.colour, cost.price FROM product JOIN cost ON product.item = cost.item; This gets the specified fields for every record where there is a matching value in product.item and cost.item. Any values in product.item which aren't in cost.item, or vice versa, wouldn't be returned.

productId item colour price
1 T-shirt red 12.00
2 T-shirt blue 12.00
3 Polo red 12.00
4 Polo yellow 12.00
5 Sweatshirt blue 25.00
6 Sweatshirt black 25.00

Different commands can be merged into one statement, for example:
SELECT product.productId, product.item, product.colour, cost.price FROM product JOIN cost ON product.item = cost.item WHERE cost.price = 12.00;    $\longrightarrow$
productId item colour price
1 T-shirt red 12.00
2 T-shirt blue 12.00
3 Polo red 12.00
4 Polo yellow 12.00

Referential Integrity
The maintence and consistency of relationships in a database is known as referential integrity.
One part of this is making it impossible to add a record with a foreign key that doesn't exist. E.g. In the above examples, it's impossible to add a record in db.product which has a value for the item field which doesn't exist in db.cost.
Another part of referential integrity is the cascade delete, where a deleted primary key results in all foreign key records with that primary key being deleted. E.g. Deleting T-shirt from db.cost would also delete the first two records from db.product as these reference the T-shirt primary key.

Transaction Processing
Transaction processing maintains database integrity. A transaction is where all of the relevant information is entered, so a record is complete. A transaction can't remain in an intermediate or incomplete state. Other processes can't access the data until either the transaction is complete, or the database is rolled back after it fails. The database can process the information once all data has been entered.

The ACID transaction method guarantees the database integrity is kept. The transaction must comply with the following properties.
All or nothing. Everything must be completed in one action, so if the database crashes the transaction is either complete or nothing's changed.
Ensures the transaction complies with the update rules, e.g. format, referential integrity, not causing the database to fail. Data must be valid whether or not it is correct.
Each transaction must be performed independently of any other transaction.
Once a transaction is complete, the change must remain even in the case of a power failure. This can be done by immediately saving the data to storage when the transaction is complete, and not queuing or holding the data.

Record Locking
This stops records or fields from being changed by multiple queries at once. While a query is being performed, other queries can read the data but can't change it, to prevent things getting confused.

Duplicate data in a database is known as redundant. This should always be minimised as it compromises the database integrity, and it is often removed through normalisation.

1.3.3 Networks

Topology is the way computers are linked together, and it describes the physical cabling layout and how data moves between components.
Protocols and standards are rules which must be followed when networking computers, allowing different hardware and software to be able to communicated.

The Internet Structure
The TCP/IP Stack
The Transmission Control Protocol / Internet Protocol allows communications on LAN and WAN networks. The stack has four layers:
Once the stages have been completed, the data is transmitted, and when received the stages are performed in reverse to retrieve the original data.

Internet Protocol (IP) Address
IPv4: 32-bit address, the standard format (e.g. In 2011, the last address was allocated, but they can still be used as lots of addresses have been bought but are dormant, and addresses are dynamic so can be reallocated.
IPv6: 128-bit address. Written in Hex to save space (e.g. 4aae:1803:4647:2:200:b9be:de12:94cd).

Advantages of IPv6: Disadvantages of IPv6: Public (Routable) IP:
These are assigned by IANA (Internet Assigned Numbers Authority) who distribute numbers to local registrars, who assign numbers to users or companies. Each computer often doesn't have its own IP address, the router for the network it's connected to has an IP, and this router manages the traffic going in and out, and distributes it as necessary. This reduces the number of IP addresses needed, and provides some security as you can't directly access a computer based on an IP.

Private (Non-routable) IP:
Some addresses are specified as private by IANA. These are used in LANs not directly connected to the Internet. A LAN can be connected to the internet with a router with Network Address Translation (NAT) so the packages go to the right place and outgoing packages look like they came from the router's public IP. Private IPs start with 10, 172.16-172.31, or 192.168.

Subnet masking:
This tells the computer which IP addresses it can reach directly and which it needs to access through a router. The mask is in the form of IPv4 but can only contain 255 (part of the IP must be the same as the host) or 0 (part of the IP can vary from the host, but can still be accessed directly). The mask is applied to an IP by converting both the mask and IP to binary, then applying a bitwise logical AND comparison. The first half of the result is the network address, and the second half is the usable host identifier.

Domain Name Server (DNS)
The DNS is primarily a database which translates domain names, as these are easily memorable by humans, to IP addresses corresponding to the location of the site.
The browser sends a request to the DNS for a specific domain name. DNS servers return the corresponding IP address. If the DNS server can't resolve the domain, it recursively passes the request onto another DNS server. When the address has been returned, the browser retrieves the website from the server at the IP address.

Protocol Layering
Protocols can be layered on top of each other (applied one after another). Some common protocols are:

HTTP (Hypertext Transfer Protocol):
This defines how data from web pages is transferred from the server to the client when the page is viewed over a TCP/IP network. The protocol makes a request of the server at the IP address and the server will return something for the client to view.
HTTPS (HTTP Secure):
This is also used to download web pages but it also applies a layer of encryption, which means if the data is intercepted during transmission it is very difficult to decode it. HTTPS is mainly necessary for passwords, bank details and other personal information.
FTP (File Transfer Protocol):
This is used to upload and download files. Most browsers include an FTP utility to download non-webpage files from the internet. FTP transfers can often continue after reconnecting if there is an issue with the connection partway through, meaning progress isn't lost.
POP3 (Post Office Protocol Version 3):
This allows an application to receive emails from a server. The protocol connects to the server, downloads the messages to the user's computer, and then deletes the messages from the server. This means once viewed on one email client, emails are no longer available on the server for other clients to use. POP3 uses the user's computer as the primary storage location.
IMAP (Internet Message Access Protocol):
Modern users access emails on multiple clients, so IMAP is used, which only deletes messages when users explicitly request it. IMAP uses the server as the primary storage location.
SMTP (Simple Mail Transfer Protocol):
This distributes emails. The message will include the intended recipients, attachments and the email content. This message is then sent to the sender's mail server, which uses DNS to send the message to the client's mail server.
SSH (Secure Shell):
This is a remote-access protocol, allowing secure communication between a client and a server. Both the client and server must be running SSH programs. SSH ensures no one else can see the data transfferred.

LANs (Local Area Network) are within a geographical range and use their own cabling and topology structure. They're usually within an organisation.
WANs (Wide Area Network) are over larger areas, usually using the internet to share data.

Packet and Circuit Switching
Packet switching is where the packets which make up a message are sent using the optimum route for each packet, which may not be the same for all of them. When the destination receives the packets they are reassembled into the correct order, regardless of which computers they went through to get to the destination. If a packet is lost due to a collision and the destination computer doesn't send a confirmation message back to the source, the source computer will retransmit after a certain time.

Circuit switching is where a physical connection is made between two computers. This is only possible on LAN networks, but is fast and secure.

Network Security
A virus is a program embedded into another apparently harmless program which is intended to cause harm to a computer. When the harmless program is executed, the virus copies itself onto the disk, where it can hide and make changes to the system to cause damage. Viruses often duplicate themselves and spread to other computers. Antivirus programs detect and remove viruses. Most browsers scan files being downloaded for viruses.
A worm is a malicious program designed to replicate itself and spread across a network such as the internet. Unlike a virus, a worm is a complete program in itself and doesn't hide in another program. They're most commonly spread through email, and they often slow down servers and data transfer.
A Trojan virus is a non-self-replicating virus which hides in a file, and when the file is executed it gains access to the sustem. This then acts as a back door for someone to fully control the infected system.

A packet filtering firewall studies each packet as it arrives and compares the features of the packets with the filtering protocol in use. It's used on certain ports which the network admin employs it on.
Advantages: Disadvantages: A stateful inspection firewall doesn't care about the purpose, origin or destination of a packet, it just compares every packet against the rule set. It stores the states of connections into a state table, speeding up processing the data, and these states are checked against the states of the allowed policies.
Advantages: Disadvantages: A proxy server is a server which connections go through to get from the client to other servers, which checks any transfers to ensure data entering or leaving the system is safe. They are often used along wih a firewall to increase security and reduce bandwidth use. Computers inside a network use proxy servers to access external data. This means only one computer is exposed to the outside network, and this connection can be managed closely.
Advantages: Disadvantages: Encryption
A digital signature allows the receiver to determine who the sender is and ensures they can tell whether or not a message has been tampered with during transmission. The signature is generated by formalising the document in mathematical notation and applying a hash function, which generates a hash code specific to the content of the message. If the contents change then they won't match the hash.
A digital certificate identifies a specific user. It contains a name, unique ID such as serial number, the user's public key and the certificate authority's signature to show it's real. This prevents someone pretending to be someone else and sending you their public key rather than the intended recipient's public key, which would allow the imposter to decode the data.

Network Hardware
Network Interface Card (NIC)
This allows a device to connect to a netowrk. Most motherboards have these built in.

The simplest form of network device, this receives a signal from a computer, then transmits it to all other computers in that network. Cheap and effective in small networks, but for larger networks it causes collisions and slows the network down.

This utilises built-in memory to form a lookup table. When data is sent, the switch sends it to all computers, and receives a signal back from the correct computer. It stores this information in the lookup table, so in the future it can easily send data bak and forth between these computers, without sending the data to everyone. This cuts down on traffic and results in more reliable communications.
Managed switches are more advanced switches which can be configured by a user, e.g. to filter certain ports or give priorities to certain ports or computers.

A router forwards packets from one network to another. Each network has a router which works out where to send packets which are destined for other networks. A packet sent over the internet will go through many routers before it reaches its destination. Routers work by changing the destination MAC address to forward the packet to another router without changing the destination IP address.

These connect multiple networks of different architectures, called an internetworking system. Gateways can be a physical object or they can be built into a network's subsystems and into the network software. The gateway takes in a packet from an external computer and strips it down to the raw data, which is repacked using the protocol which that network supports.

This is a high-bandwidth wireless method of communication which is an alternative to Ethernet networks, despite being slower and less reliable. Wi-Fi networks are accessed through hotspots, which is the area where a device can connect to a Wi-Fi network. All networks are based on an international standard so any device can connect to any network.

Wi-Fi networks can be secured in the following ways.
Types of networking
These have no central server or central management and every computer has the same rights. These are cheap as they require no expensive server or hardware which needs to be maintained. Programs must be installed on every computer and monitored to make sure everything's up to date. On most peer-to-peer networks the data is stored locally on each machine so regular backups are needed to prevent data loss. Mostly used for small networks as maintaining a small number of machines isn't too difficult. This is good as if one connection/computer goes down, most of the system can continue to function.

AKA server-based networking, this is where one or more central computers act as a server and are responsible for managing resources, security and more. The server is the only computer which holds all information and functionality, which is advantageous when lots of interaction takes place between computers. Resources can be deployed remotely from the server without visiting each computer individually. Security and accounts are the same across the network, so users can log in and view their data on any computer on the network. Back ups are automatically done by the server so no data is lost. However, if the server goes down, the whole network stops working.

All data and resources are held on the server Data and resources are held on individual machines
Software can be shared across the network and easily distributed and managed Copies of software are held on individual machines which need to be maintained individually
More complicated and expensive to set up and maintain Generally simpler to implement, but gets more complex with more machines
Usernames, passwords and access levels are controlled centrally so it is easier to ensure security No central security, can only access accounts/data on a specific computer
Centralised and regular backup facilities to prevent data loss Backing up is down to individual users so is less reliable
If the server goes down, the whole networks stops, but if a client goes down then there is no data loss or impact to other clients No central server dependence, but if a client goes down the data on that machine is unavailable to everyone
Server controls the communication protocol so it doesn't matter what type of computer the clients are, or if they're the same Individual computers must all have the same software loaded to control communication with other computers

1.4 Data Types and Structures

1.4.1 Data Types

Primitive Data Types:
Binary is what the computer stores data in, as the computer can only understand two states and binary is base 2, so it can only be 0s or 1s.

Positive Numbers:
Add up the denary number for each time there's a 1 in the number, as shown in the table below.
128 64 32 16 8 4 2 1
1 1 1 1 1 1 1 1

11111111 = 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 = 255.
10101010 = 1 + 0 + 1 + 0 + 1 + 0 + 1 + 0 = 170.

Negative Numbers:
Negative numbers in binary can be displayed either using a sign bit or using two's complement.
Two's Complement:
To get a two's complement number, follow the following steps:
  1. Find the binary of the modulus of the number
  2. Invert this, so all 0s turn to 1s and all 1s turn to 0s
  3. Add one to the result
Sign Bit:
A sign bit number simply uses the first bit to represent either positive or negative, where 1 is negative and 0 is positive. This reduces the maximum value a binary number can be, as an 8-bit number can now only go up to 127.

For addition, put one number over the other. Work from right to left, adding the binary digit to the one below it. 0+0=0, 1+0=0+1=1, 1+1 = 0 carry 1 to the left column.
E.g. (carries are shown in orange):

If the furthest left column results in a carry, so the number becomes more than 8 bits, it is known as an overflow. Write the furthest left bit with brackets around it. In a computer, this would be ignored so the result would be incorrect.

To subtract numbers, find the two's complement of the number you want to subtract and add this to the number you want to subtract from.
Alternatively, use the borrowing method, e.g.
Fixed Point:
Decimal numbers can be either fixed or floating point. Fixed point is where a certain number of bits are allocated for the integer part, and the rest are allocated for the fractional part.

The number above, 01011.101, is 11.625 in denary.

Floating Point:
Uses mantissa, the main number, and the exponent, which is how many times the mantissa needs to be doubled to get the original number.
The sign can either be a sign bit or the mantissa can be in two's complement.
The binary point in the mantissa goes between the most significant bit and the second most significant bit.

Converting binary to denary:
  1. Convert the exponent to denary, remembering the left most 1 has a negative place value
  2. Move the binary point in the mantissa this many places to the right (or to the left if negative)
  3. Convert the mantissa to denary, remembering the left most 1 has a negative place value
  1. If the binary number starts with a 1, remove all leading 1s, if it starts with a 0 then remove all leading 0s.
  2. Pad out the mantissa with 0s on the right
  3. Decrease the exponent by the number of leading 1s or 0s you remove (a.k.a. how many places the binary point moves to the right)
  1. Change the number so the exponents are the same. If you increase the exponent by x, shift the mantissa to the right by x places. If decreasing the exponent, shift the mantissa to the left
  2. Perform the addition or subtraction on the mantissa parts
  3. Join the mantissa and exponent
  4. Normalise
Floating Point Errors: Hexadecimal:
Hexadecimal is easier for humans to understand than binary, but harder for computers to understand. It is used as a compromise between words and binary.
Hex is base 16. As we don't have numbers to go this high, letters are used. A represents 10, B 11, etc until F representing 15.

Convert between binary, hex and denary:
Character Sets:
ASCII (American Standard Code for Information Interchange) is 7 bits, worked for all characters on english keyboards and some extra control characters. Capital alphabet is 65-90, lower case is 97-122.
Extended ASCII (8 bits) was incompatible over different devices.
Unicode is 16-bit and is universally supported. All standard ASCII characters have the same code point to be backwards compatible. Allows for enough characters to represent all human languages, including emoji, and only about half the space has been used. It is space efficient, so common characters are represented by fewer bits than less common characters so they're quicker and easier to use. Unicode Transformation Format (UTF-8) allows characters to be represented by 1, 2, 3 or 4 bytes. Each byte has one or more bits reserved for control bits, which indicates how many bytes this character is made up of and whether this is the first byte or a continuation.

1.4.2 Data Structures

Static Data Structures:
An array is a data structure that can be used to hold elements of data of the same type. You cannot change the length of an array. If you change the length, it becomes a list. A number after the variable name (subscript) indicates the index of array element to be used.
A two dimensional array is essentially an array of arrays, and can be visualised as having two columns, or being two arrays next to each other.
Advantages: Disadvantages: Records
This is a data structure which can provide a structure within a structure. For example, to store the name, age and postcode of 100 customers, you can define a structure which can be used for each element.
e.g. Customer[72] = "name: Joe Herbert
    postcode: AB01 2CD"
Lists are a collection of data which follows on from each other in sequence. Arrays are a type of list with a fixed length. Linked Lists are also a type of list.

Tuples are lists of a finite length, like arrays, but their contents are immutable.

Dynamic Data Structures:
Queues and stacks are often implemented using arrays, so they are dynamic structures implemented inside a static one. A completely dynamic data structure uses the heap. This is an area of RAM reserved for processes that don't know exactly how much storage space they need for their processing. Processes can request a heap block from the OS.

New item is added after the previous one. Known as FIFO (First In First Out).
Use a pointer called FRONT to point to the memory location of the front of the queue and a pointer called END to point to the end of the queue. Therefore the queue itself is dynamic but its information can be stored in a static array.

Works like a stack of plates, where the new item is added on the top. Known as LIFO (Last In First Out). Data is put in and retrieved from the same end.
Uses a TOP pointer to point to the top of the stack. "Push" = add new item to stack. "Pop" = remove item from stack.
Stack can be used to quickly reverse order of iem in a list, just be adding everything to a stack and taking it off again. If you add another item to a full stack, there will be an overflow error.
A stack is used when the CPU receives an interrupt. Current program details or registers are pushed onto stack and retrieved after interrupt dealt with.

Linked List
Linked lists are dynamic data structures which can change size during runtime. The structure expands as new data is added. It is memory efficient as it only uses the memory it needs, and it frees memory when it isn't needed.
The contents of linked lists aren't always defined at runtime so the data is often non-contiguous, so the OS links together the data regardless of where it is in memory. Pointers are used to keep track of the next item in the list, and a NULL pointer is used to terminate the list. When a new node needs to be stored, the OS finds spare memory from the heap. Memory location is returned to the heap when the node is deleted.
There is usually a list called the Head Of Lists which points to the address of the first node of each list being stored in memory, with a NULL pointer at the end.

Search for an item: INPUT searchFor
currentNode = startOfList
found = false
WHILE !found AND !EndOfList
 IF currentNode == searchFor THEN
  OUTPUT currentNode.Pointer
  found = true
  currentNode =

Graphs are a diagrammatic way to represent linked data and visualise a potential solution. Each graph consists of several nodes. The lines connecting the nodes (or vertices) are called edges (or vectors).
A graph can either be a digraph (directed graph), or an undirected graph.

A weighted graph shows the relative weighting for each edge. This is often used to show distance or time to get from one point to another.

An adjacency list can be used to represent each node and its respective links. For the above weighted graph, the adjacency list is below.
0 1 2
1 0 2 3 4
2 0 1 3
3 1 2 4
4 1 3

An adjacency matrix shows the same information, but in matrix form. 1 represents a link, 0 represents no link.
$$\begin{bmatrix} \space & \textbf{0} & \textbf{1} & \textbf{2} & \textbf{3} & \textbf{4}\\\textbf{0} & 0 & 1 & 1 & 0 & 0\\\textbf{1} & 1 & 0 & 1 & 1 & 1\\\textbf{2} & 1 & 1 & 0 & 1 & 0\\\textbf{3} & 0 & 1 & 1 & 0 & 1\\\textbf{4} & 0 & 1 & 0 & 1 & 0\end{bmatrix}$$ An adjacency matrix can also be used to show weightings.
$$\begin{bmatrix} \space & \textbf{0} & \textbf{1} & \textbf{2} & \textbf{3} & \textbf{4}\\\textbf{0} & 0 & 1 & 6 & \infty & \infty\\\textbf{1} & 1 & 0 & 4 & 3 & 1\\\textbf{2} & 6 & 4 & 0 & 1 & \infty\\\textbf{3} & \infty & 3 & 1 & 0 & 1\\\textbf{4} & \infty & 1 & \infty & 1 & 0\end{bmatrix}$$ Trees
A tree is an undirected graph where from one node to another node, there is only ever one route. A tree will also contain one less edge than the number of nodes. A tree will often express a form of hierarchy, where a root node represents the main or first item.

Binary Trees:
E.g. Binary tree for the numbers 44, 89, 1, 12, 44, 00, 23, 86, 23, 55:

Depth-First Traversal of Binary Trees:
This attempts to get to the bottom of the tree quickly, using recursion. There are three ways to traverse a tree depth-first:
For example, for the following tree:

Breadth-First Traversal of Binary Trees:
This attempts to traverse across the tree first, keeping a log of any nodes below the current node being checked, so the program knows which ones it needs to come back to.

Hash Table
A hashing algorithm can be performed on search criteria to speed up the process of finding the index corresponding to the data to be found.
For example, an algorithm could add the numerical values for the first two letters in the search term (e.g. a=0, b=1 etc). So Joe would give index 23 (9+14), and Alan would give index 11 (0+11). However, this causes a collision if the name John was searched for, as this would also give index 23. Hashing algorithms aim to cut down on collisions while optimising the space used.
One collision resolution strategy is to just use the next free space available to store data when the correct index is already in use. However, this gets very hard to keep track of as data can end up almost anywhere, and you can't assume data has been deleted if it's not in the expected place, so linear searches must be used if the hashing algorithm doesn't produce the right index.
Another collision resolution strategy is using overflow lists, so any elements which can't be put in the expected index are put in the overflow list, which must be linearly searched when searching for an element. This reduces the size of the linear search when an element isn't in the expected index.
Another strategy is to have linked lists at every index. This means as many elements as desired can be stored at each index, so the hashing algorithm always produces the correct position. However, this is more complicated to implement.

1.4.3 Boolean Algebra

Karnaugh Maps
When representing two or more inputs on one side of a K map, the values must start at all 0s, and then differ by only one bit for the next value, and so on, as shown in the picture above. 00, 01, 11, 10 is the correct order for two inputs.
When converting to an expression from a K map, form groups of 1s, as shown below.
These groups must be rectangular or square, and they must have $2^n$ 1s in, so a group which contains six 1s is invalid, but two groups of four 1s is valid. Each group should be as large as possible, and they can overlap each other.

Boolean Algebra Rules:
De Morgan's Laws:
$\neg (A $ V $ B) \equiv (\neg A) $ Λ $ (\neg B)$ NOT (A OR B) is the same as (NOT A) AND (NOT B)
$\neg (A $ Λ $ B) \equiv (\neg A) $ V $ (\neg B)$ NOT (A AND B) is the same as (NOT A) OR (NOT B)

$A $ Λ $ (B $ V $ C) \equiv (A $ Λ $ B) $ V $ (A $ Λ $ C)$ A AND (B OR C) is the same as (A AND B) OR (A AND C)
$A$ V $(B $ Λ $ C) \equiv (A $ V $ B) $ Λ $ (A $ V $ C)$ A OR (B AND C) is the same as (A OR B) AND (A OR C)

$A$ V $(B $ V $ C) \equiv (A $ V $ B) $ V $ C \equiv A $ V $ B $ V $ C$ A OR (B OR C) is the same as (A OR B) OR C is the same as A OR B OR C
$A $ Λ $ (B $ Λ $ C) \equiv (A $ Λ $ B) $ Λ $ C \equiv A $ Λ $ B $ Λ $ C$ A AND (B AND C) is the same as (A AND B) AND C is the same as A AND B AND C

$A $ Λ $ B \equiv B $ Λ $ A$ $A $ V $ B \equiv B $ V $ A$ Double Negation:
$\neg\neg$A = A Absorption:
$A $ V $ (A $ Λ $ B) \equiv A$ A OR (A AND B) is the same as A
$A $ Λ $ (A $ V $ B) \equiv A$ Logic Gates and Truth Tables:

Conjunction AND Λ
A B Out
0 0 0
0 1 0
1 0 0
1 1 1
Disjunction OR V Can be written as "A + B"
A B Out
0 0 0
0 1 1
1 0 1
1 1 1
Exclusive Disjunction XOR V Can be written as "A ⊕ B"
A B Out
0 0 0
0 1 1
1 0 1
1 1 0
Negative NOT $\neg$ Can be written as "$\bar{\text{A}}$" or "~A"
A Out
0 1
1 0
Equivalence The same as $\equiv$ Can be written as "$\leftrightarrow$"

Complex Logic Gates:
A flip-flop circuit changes its state depending upon the previous state. The idea is that the flip-flop will remain in its state until a set or reset signal is sent. SSDs use flip-flops to store data.
S: Set    R: Reset    Q: Output    Q': Compliment

S R Q Q' Effect
1 0 0 1 Turn Off
1 1 0 1 No Change
0 1 1 0 Turn On
0 0 1 1 Invalid

D-Type Flip-Flop:
A D-type flip-flop circuit effectively stores the input received. Usually used with a Clock signal to store the D input at a given time.
D: Data    Q: Output    Q': Compliment

D CLK Q Q' Effect
1 1 1 0 Set to On
0 1 0 1 Set to Off

Half Adder:
A half adder adds 2 bits, producing a Sum and a Carry bit (an overflow). However, the overflow is often lost from the calculation.

0 0 0 0
1 0 1 0
0 1 1 0
1 1 0 1

Full Adder:
A full adder combines 2 or more half adders, using the carry bit as a carry-in bit.

A B C$_{in}$ S C$_{out}$
0 0 0 0 0
0 0 1 1 0
0 1 0 1 0
0 1 1 0 1
1 0 0 1 0
1 0 1 0 1
1 1 0 0 1
1 1 1 1 1

1.5.1 Computing related legislation

The Data Protection Act 1998:
With tech advancing, more data was being stored, and it was hard to know who stored what. This law is designed to protect individuals and their personal data if it is automatically processed.
People who handle personal data (data controllers) have to register with the Information Commissioner Office (ICO), who are responsible for: The Data Protection Act has 8 principles: The ICO requires controllers to register certain information: There are a few exceptions from the act: A data subject is entitled to compensation for a number of reasons: The Computer Misuse Act 1990:
Designed to protect organisations' data, software and IT structures. The law consists of three levels.
- Level 1 - Access to unauthorised material: - Level 2 - Access to unauthorised material with intent to commit a crime: - Level 3 - Access and modification of data on a computer system: There are very few prosecutions under this act because: The Copyright Design and Patents Act 1988:
Designed to protect individuals' intellectual property. People and organisations have the right for their produced work to not be stolen. This covers images, text, software and music. In computing the main concern is software piracy, where illegal copies are made of software.
This damages the industry in various ways: The law in the UK is very simple: The designs part of the act protects original designs: The Regulation of Investigatory Powers Act 2000:
As encryption and security improved, criminals were able to use computers as tools for the first time.
RIPA allows investigations to force access into organisations' IT structure via legislation. RIPA allows for: If a RIPA-backed request is made to an organisation, they have to comply of face legal action.

1.5.2 Moral and ethical issues

Computers in the Workforce:
There is a concern about management spying on the workforce.
Constant monitoring could be a breach of privacy.
Many view monitoring as intrusive and showing a lack of trust.
Computers can now multi-task very easily, enabling cyber-slacking.
Increased pressure to do tasks rapidly leading to more stress related illnesses.

Automated Decision Making:
Computers can use a vast database to perform diagnostics.
These systems are very effective, quick and reliable.
When a decision is wrong, who is to blame? The computer, the programmer, the user entering the information or the person acting on the results?
Some people are now very reliant on computers for diagnostics.

Artificial Intelligence:
This sector is still in very early development, but is already mimicking human and animal behaviour.
Systems can effectively think and solve problems.
At what stage is a computer cognitive/alive? Is it right to turn them off when they behave like live creatures? What if computers become android-like?
Tests exist and are being designed to measure artificial intelligence of a system.

Environmental Effects:
Computer equipment uses valuable resources during construction, use and decommission.
With technological advancements, computers reach end-of-life by becoming obsolete rather than breaking, unlike other machines.
These machines often contain private data and require destruction.
Schemes exist to distribute these machines to less developed nations but are viewed as dumping our trash on the unfortunate.
Companies buy machines to harvest and resell precious metals (e.g. Envirofone).
Computers have a huge carbon footprint and are running 24/7.

Censorship and the Internet:
It is now very easy to produce and distribute any content you want.
Pre-Internet, every publication went through a moral and accuracy check.
This is no longer feasible as so much is produced and distributed.
The Internet allows freedom of information, but organisations are responsible for uploading content suitable for their intended audiences.
Some governments, like the Chinese, censor their Internet, but this means they can have political bias (e.g. Tiananmen Square).

Monitoring Behaviour:
All computers are addressable, so it's possible to monitor what someone uses a computer for.
Some sites use cookies to collate data to sell on to third-parties.
There is a moral issue on how extensive this tracking should be.

Analyse Personal Information:
Computers are now able to perform complex analysis of data.
Previously it would be difficult to scan customer records and find out who lives near who but now it's easy.
This is serious if it involves personal data is very dangerous in the wrong hands.

Piracy and Offensive Communications:
With technology, piracy has gone through the roof, and it's damaging to authors.
Computers can be used to generate offensive material, such as cyber-bullying or anti-Semitic hate speech.
It is illegal in most countries as it has led to suicide and depression, but it's very hard to trace and prosecute.
However, it does clash with the idea of Freedom of Speech.

Layout, Colour Paradigms and Character Sets:
There are many cultures in the world, often with clashing views.
Not all characters are used to write with (e.g. middle east doesn't use Latin alphabet) and not all cultures write left to right. Certain colours also have cultural significance.
This has to be considered when designed media.

2.1 Computational Thinking

2.1.1 Thinking abstractly

Abstraction is filtering out less important characteristics and specific details to focus on the important parts of a problem. For example, an abstraction of a train could be that it runs on tracks and has carriages. The colour, size, number of carriages etc are ignored since they're not important to it being a train.
Decomposition (also known as step-wide refinement, factoring, the top-down approach, or divide and conquer) is breaking down complex problems into less complex problems. These problems can be decomposed further until they're a manageable size, like a command or short set of commands. Each set of commands can then be brought together to solve the original complex problem. When decomposing, look out for sequence, selection and iteration.
The advantages of decomposition are: The disadvantages of decomposition are:

2.1.2 Thinking ahead

Thinking should take place before coding starts.
Think about inputs - Origin? Format? Range? Validation? Best method for inputting?
Think about outputs - Format? Presentation? Target audience?
Determine the preconditions of the system- Hardware? Software? Could impact what language is used.
Should the new solution build upon, adapt or replace the existing solution?
Consider the value of replacing the old system - Time? Cost?

Caching may be needed for large systems, especially databases.
All of the data may not fit into the system at once.
Data is organised into blocks, which can be loaded/unloaded as separate chunks rather than individual items. The advantages of caching are: The disadvantages of caching are: Reusable components:

2.1.3 Thinking procedurally

Thinking procedurally involves decomposing the problem, ensuring the original can be recreated from the component parts, and then the final solution is created from constituent components.
It is very important to think about the order in which you create and use components.

2.1.4 Thinking logically

Thinking logically is all about using sequence, selection and iteration.

2.1.5 Thinking concurrently

The aim of thinking concurrently is to minimise the time, and therefore the cost, needed.
Which components need to run at the same time?
Which components can be developed at the same time?
Use of teams?
Isolate components, test, then combine and re-test.
Be careful when combining, particularly with global variables.
Need to consider whether to combine source code or object code?

2.2 Problem Solving and Programming

2.2.1 Programming techniques

Programming Constructs:
Sequence: Putting the right commands in the right order.
e.g. A very simple example of incorrect sequencing, using a variable before it is declared: Console.WriteLine(x);
int x = 10;
Iteration: Repeating the same command or action. When using iteration, ensure you consider the stopping condition.
e.g. The following iteration will output 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9: int x = 0;
while (x < 10) {
Selection/Branching: Deciding which commands to run based on a given condition.
e.g. This code will welcome people differently depending on whether their name is Joe or not: string name = Console.ReadLine();
if (name == "Joe") {
 Console.WriteLine("Hi Joe!");
} else {
 Console.WriteLine("Hi stranger!");
This is when a function is called by itself, creating iteration. The following example outputs "Hi!" 10 times. outputMessage(10, "Hi!");
function outputMessage(count, message) {
 count = count - 1;
 if (count > 0) {
  outputMessage(count, message);
Variable Scope
A global variable is available throughout the whole program. A local variable is only available to one particular section/function. Global variables are more powerful but more dangerous. If you use global variables and something goes wrong, the error could be anywhere in thousands of lines of code. Local variables limit the area you need to test, making it easier to fix errors. Global variables could also be accidentally changed somewhere in the program, and they mean you can't use that variable name again. When a function/procedure finishes, all the local variables contained inside it are cleared, freeing up memory for other programs to use. Local variables can be passed between functions, either by reference or by value, so they are not completely stuck in one section.

The advantages of modularisation are: The disadvantages are: Integrated Development Environment (IDE)
An IDE is a single program made from a number of components, used for developing programs.

2.2.2 Computational methods

Computable Problems
A computable problem is one that can be solved using an algorithm, producing the right answer every time in less than infinite time.

Intractable Problems
Intractable problems are ones which can be solved by a computer, but which cannot be solved in a reasonable amount of time for large inputs. A reasonable amount of time is defined as polynomial time or less, so any algorithms that take more than this are classified as intractable. Intractable algorithms therefore have an exponential order of complexity.

Divide and Conquer
This is an algorithmic technique where a problem is broken down recursively until it is solved using a simple technique.
E.g. for 5!, we divide it down to 5 * 4 * 3! then 5 * 4 * 3 * 2! and then 5 * 4 * 3 * 2 * 1! which we can conquer.

This is where you find solutions to a problem by trying a method and then going back if it fails to try another method. Often uses recursion. It can take a long time to complete.

Data mining
This is using pattern recognition or summarising of data within large data sets. This allows you to spot erroneous data and trends, which helps give information which can be used to increase profitability in a business.

Heuristic Solutions
A Heuristic solution is one that's 'good enough'. It may not be the best solution, but it works and saves time compared to the perfect solution.

Performance Modelling
This is simulating and testing the behaviour of a system before it is used.

This is the method of finding a solution to a problem by using visual methods like diagrams. For example, a tree structure is not actually stored in the shape of a tree in the computer but it is easier to visualise that way.

2.3 Algorithms

2.3.1 Algorithms

Big O Notation
Big O notation describes the complexity of an algorithm relative to another, in terms of both time and space. There are six main types of complexity.

Notation Description Example Code Taken from Example Use Taken from
O(1) Constant. Always executes in the same amount of time regardless of the size of the data set. Always efficient. random_num = data_set(x) Extracting data from any element from an array.
Hashing algorithm.
O(n) Linear. Takes longer for a larger data set. Reduces efficiency the larger the data set. For x = 1 To y
 data_set(x) = counter
A loop iterating through a single dimension array.
Linear search.
O(log n) Logarithmic. Halves the data set in each pass. Opposite to exponential. Efficient with large data sets. While Found = False And LowerBound ≤ UpperBound
 MidPoint = LowerBound + (UpperBound - LowerBound) \ 2
 If data_set (MidPoint) = searchedFor Then
  Found = True
 ElseIf data_set (MidPoint) < searchedFor Thexn
  LowerBound = MidPoint + 1
  UpperBound = MidPoint - 1
 End If
End While
Binary search.
O(n log n) Linearithmic. Divides a data set but can be solved using concurrency on independent divided lists. Opposite to exponential. Efficient with large data sets. Quick sort.
Merge sort.
O(n$^2$) Polynomial. Performance is proportional to the square of the size of the data set. Significantly reduces efficiency the larger the data set. O(n$^3$), O(n$^4$) etc. are also polynomial. For x = 1 To w
 For y = 1 To z
  data_set(x, y) = 0
A nested loop iterating through a two dimension array.
Bubble sort.
O($2^\text{n}$) Exponential. Doubles with each addition to the data set in each pass. Opposite to logarithmic. Inefficient especially for large data sets. Function fib(x)
 If x ≤ 1 Then Return x
 Return fib(x - 2) + fib(x - 1)
End Function
Recursive functions with two calls, e.g. Fibonacci number calculation with recursion.

Algorithms for Data Structures
  1. Check to see if the stack is full
  2. If it is full, return an error and stop
  3. If it's not full, increment the stack pointer (which represents the top of the stack), then
  4. Insert the new data item into the cell being pointed to by the stack pointer and stop
  1. Check to see if the stack is empty
  2. If it is empty, return an error and stop
  3. If it's not empty, copy out the data in the cell pointed to by the stack pointer, then
  4. Decrement the stack pointer and stop
  1. Check to see if the queue is full
  2. If it is full, return an error and stop
  3. If it's not full, increment the tail pointer, then
  4. Insert the new data into the cell pointed to by the tail pointer and stop
  1. Check to see if the queue is empty
  2. If it is empty, return an error and stop
  3. If it's not epty, copy out the data in the cell pointed to by the head pointer, then
  4. Increment the head pointer and stop
Graph Traversal
Depth-first (using a stack):
  1. Push root vertex onto stack
  2. Visit an unvisited vertex connected to the vertex on top of the stack
  3. Push this vertex onto the stack
  4. If there are no unvisited vertices connected to the vertex on top of the stack, pop the vertex on top of the stack
  5. Repeat points 2, 3 and 4 until the stack is empty and all vertices have been visited
Breadth-first (using a queue):
  1. Push root vertex onto queue
  2. Visit all unvisited vertices connected to the root vertex
  3. Push these vertices onto the the queue
  4. Pop vertex at the front of the queue
  5. Visit all unvisited vertices connected to the vertex at the front of the queue
  6. Push these vertices onto the queue
  7. Repeat points 4, 5 and 6 until the queue is empty and all vertices have been visited
Tree Traversal
Pre-order traversal:
  1. Visit node
  2. Go left
  3. Go right
In-order traversal:
  1. Go left
  2. Visit node
  3. Go right
Post-order traversal:
  1. Go left
  2. Go right
  3. Visit node
Bubble Sort
Bubble sort runs through the data comparing each pair of items. If they are the wrong way around they will swap them and carry on to the next pair. This is either repeated one less time than the length of the data, or it is repeated until no swaps are made, which requires an extra variable.
Bubble sort is very inefficient as if some data starts at the top when it should be at the bottom, it will check every pair over and over many times just to make a small change.

Insertion Sort
This essentially creates a new array and one by one moves each value into the new array. If the value is greater than the top position, it inserts it at the top, else it moves down and checks each position to find where it should go. When it inserts the value, all values below it in the new array must be moved down one by one to make space for the value to be inserted. This is not very efficient.

Merge Sort
This is a recursive technique which follows the basic 'divide and conquer' approach. The idea is to break down the inputs into smaller lists until each contains just a single element so you can compare them individually, then merge the elements back together so that they are sorted.

This is another divide and conquer method. It uses a pivot point, splitting the list into two parts, with all the values less than the pivot on the left and all values more than the pivot on the right. Both lists are then individually sorted using the same method, repeating until all items are in the correct positions.

Dijkstra's Shortest Path Algorithm
Dijkstra's algorithm finds a shortest path tree from a single source node, by building a set of nodes that have minimum distance from the source.
Simplified Let distance of start vertex from start vertex = 0
Let distance of all other vertices from start = $ \infty $

 Visit the unvisited vertex with the smallest known distance from the start vertex
 For the current vertex, examine its unvisited neighbours
 For the current vertex, calculate the distance of each neighbour from start vertex
 If the calculated distance of a vertex is less than the known distance, update the shortest distance
 Update the previous vertex for each of the updated distances
 Add the current vertex to the list of visited vertices
Until all vertices visited
In more detail
The graph has the following: This is done by initialising three values: The algorithm proceeds as follows:
  1. While $Q$ is not empty, put the node $v$, that is not already in $S$, from $Q$ with the smallest $dist(v)$. In the first run, source node $s$ will be chosen because $dist(s)$ was initialized to 0. In the next run, the next node with the smallest $dist$ value is chosen.
  2. Add node $v$ to $S$, to indicate that $v$ has been visited
  3. Update $dist$ values of adjacent nodes of the current node $v$ as follows, for each new adjacent node $u$:
    • If $dist(v) + w (u, v) \lt dist(u)$, there is a new minimal distance found for $u$, so update $dist(u)$ to the new minimal distance value
    • Otherwise, no updates are made to $dist(u)$
$dist$ now contains the shortest path tree from source $s$.
Pseudocode for Dijkstra's Shortest Path Algorithm is shown below: function Dijkstra(Graph, source):
  dist[source] := 0              // Distance from source to source is set to 0
  for each vertex v in Graph:        // Initialisations
    if v $\ne$ source
      dist[v] := ∞         // Unknown distance function from source to each node set to infinity
    add v to Q                // All nodes initially in Q

  while Q is not empty:           // The main loop
    v := vertex in Q with min dist[v]  // In the first run-through, this vertex is the source node
    remove v from Q

    for each neighbour u of v:       // Where neighbour u has not yet been removed from Q.
      alt := dist[v] + length(v, u)
      if alt < dist[u]:         // A shorter path to u has been found
        dist[u] := alt         // Update distance of u

  return dist[]
A* Algorithm
This is another pathfinding algorithm that traverses a graph to find the shortest route. It uses a formula that builds lists of possible routes and then finds the best by having the least score.
Create two lists: open and closed. Closed list contains squares we do not need to evaluate again, open list is the opposite. Each square will be given a score which determines how good that square is in terms of getting to the goal.
Find the square on the open list which has the lowest score, and call it $S$.
Remove $S$ from the open list and add it to the closed list.
For each square $T$ in $S$'s adjacent tiles: Binary Search
This will only work if the list has previously been sorted.
  1. Find the median. If the list length is even, you must use the number to the right of the median line (round up the median)
  2. If the median = target then found
  3. If target > median then go to step 1 with the RIGHT sub list
  4. If target < median then go to step 1 with the LEFT sub list
  5. Repeat until found or list = 1 and not found
Linear Search
  1. Get target number
  2. Start at the beginning of the list
  3. Get an item
  4. If the item matches the target then it is found
  5. If not, move on to the next item in the list
  6. If the end of the list is reached, the target does not exist in the list
Disadvantage of linear searching: