1.1 Contemporary Processors, Input, Output and Storage Devices
1.1.1 Structure and function of the processor
The Central Processing Unit (CPU):
The CPU is the brains of the computer. It manipulates information sent to its registers, whether this is instructions or data.
Computers can have multiple CPUs, being dual-core, quad-core or even more.
Arithmetic and Logic Unit (ALU):
The ALU carries out arithmetic operations such as addition, multiplication, subtraction and division. It can also carry out logical comparisons on data, like finding out which number is higher. The result of the ALU is stored in the accumulator.
Control Unit (CU):
The CU governs the operation of all hardware, including input and output (I/O) devices and the CPU. It controls the FDE cycle.
Registers:
Registers are a very small section of memory built into the CPU, which are very quick to access.
Program Counter (PC):
The PC holds the address of the next instruction to be executed.
Accumulator (ACC):
The accumulator holds the results of calculations.
Status Register:
Uses 'flags' to store information about the calculation. It can alter program flow as a result of setting status flags.
Memory Address Register (MAR):
The MAR stores the address of the current instruction being run.
Memory Data Register (MDR) or Memory Buffer Register (MBR):
The MDR stores the data to be used by an instruction.
Current Instruction Register (CIR):
The CIR stores the instruction currently being run, copied from MAR and split into instruction and address of data to be worked upon.
Buses:
A bus is used to carry information from one place to another.
Data bus:
Carries the data being exchanged around a system in its binary form.
Address bus:
Carries the information about where the data is being sent.
Control bus:
Carries the signal that regulates data flow around a system. It controls the timing of operations.
Fetch-Decode-Execute (FDE) Cycle:
- Fetch:
The contents of the PC are copied into the MAR. The MAR now contains the location address of the next instruction and a memory read is initiated to copy the instruction from memory into the MDR.
The PC is incremented so it not contains the address of the next instruction to be executed.
- Decode:
The instruction is then copied from the MDR to the CIR.
An instruction is made up from the opcode and the operand, and these are what the CPU uses to decode the instruction.
- Execute:
The instruction in the CIR is executed. If the result needs to be committed to memory, the address is held in the MAR.
Unless the instruction is a STOP instruction, the cycle is then repeated.
Performance of a CPU:
- Clock speed: The clock is used to regulate the rate at which certain instructions are carried out. The faster the clock speed the more instructions it can perform every second.
- Number of cores: Multiple core processors are essentially multiple processors in one chip. This means they can perform multiple instructions at the same time. However, applications have to be specifically coded to use threading for multiple cores.
- Cache: The cache is a small amount of RAM built directly into the CPU. It stores repeatedly used data and is very quick to access as it is part of the CPU.
Pipelining:
Using Pipelining the CPU is capable of handling more than one instruction at a time. Pipelining uses a RISC structure; the stages are handled by separate circuitry which allows fetch decode, calculation and memory/register fetching/allocation to be performed on the same clock cycle. Each part is passed along for each clock cycle.
The main issue with pipelining is that you can't always correctly predetermine the next instruction, e.g. if there's an if condition the CPU won't know what code to run until the previous instruction has finished. The CPU will load the most likely instruction, and if the wrong instruction is loaded the pipeline must be flushed to clear it so the right instruction can be loaded. Modern CPUs have many pipelines so they can store multiple possible next instructions close to the CPU.
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Cycle 5
Cycle 6
Cycle 7
Fetch
Execute
Decode
Fetch
Execute
Decode
Fetch
Execute
Decode
Fetch
Execute
Decode
Fetch
Execute
Decode
CPU Architectures:
Von Neumann Architecture has one processor, one memory block for both data and code, one bus, and 3 components: the CPU, IAS/RAM, and I/O devices.
Uses the FDE cycle and works on one instruction at a time. Each instruction takes two clock cycles (decode and execute).
Harvard Architecture has multiple buses - one for instructions and one for data. It uses separate memory for data and instructions and utilises the control unit at the centre of the structure. It also allows the CPU to pipeline, often utilising a RISC instruction set. It can also be co-processor, adding functionality like floating point maths.
Processor can complete and instruction in single clock cycle if pipelining is used.
Often used in microcontrollers and digital signal processors.
Contemporary Architecture is a type of Harvard Architecture which relaxes the strict separation of data and instructions, but still lets the CPU access more than two memory buses.
1.1.2 Types of processor
CISC (Complex Instruction Set Computing):
Wide range of actual instructions
Instructions themselves are more complex
Instructions may vary in size
More addressing modes to access memory
LDA/STA often part of instructions
Each instruction may take more than one cycle to complete
Theory is that running complex instructions gets more done
RISC (Reduced Instruction Set Computing):
Fewer actual instructions
Instructions are reduced in complexity
Instructions are fixed and consistent in size
Fewer addressing modes to access memory
Increased use of LDA/STA instructions
Each machine instruction can be completed in a single cycle
Theory is that running multiple short instructions offers better performance
Needs reduced power/heat requirements
Main proponent for RISC is ARM
e.g.
High Level:
y = z * x
Low Level:
RISC:
LDA 100
LDB 200
MULT A,B
STA 300
CISC:
MULT 100, 200
STA 300
Or even just:
MULT 100, 200, 300
Graphical Processing Unit (GPU):
The GPU is a specialised CPU designed to manipulate images and graphics. It is often embedded hardware which allows faster manipulation of graphics where CPUs would struggle.
GPUs often act as Array Processors, where it executes one instruction on multiple sets of data the same time e.g. Rotating an image is just the same action on a huge amount of data. This is sometimes called Vector Processing or Single Instruction Multiple Data (SIMD).
GPUs often also use Parallel Processing, as described below.
Multicore Systems:
A multicore processing unit is a single Chip Multiprocessor (CMP) which contains multiple cores (CPUs). Each core can execute ordinary CPU instructions on its own. These cores could also share cache or have an inter-core communication system to improve performance.
Software must be designed to run over multiple cores. Parallel Systems:
Parallel Processing is the processing of programming instructions by dividing them across multiple cores in order to speed up the processing time. This is how pipelining works.
1.1.3 Input, output and storage
Input Devices: Output Devices: Storage Devices:
Magnetic Storage:
Data patterns are stored on a magnetised medium. Each bit is magnetised to either positive or negative, representing a binary 1 or 0. Hard drives have 'heads' which move across the spinning discs to read or write data.
They have low cost per GB but are sensitive to movement, so aren't very portable. They're used for large capacity storage.
Examples include hard drives, floppy discs and magnetic tape. Flash Storage:
Flash devices have no moving parts, and store data using tiny switches/gates on integrated circuits. They won't break if dropped, and they have a small physical size.
They have very low power requirements, but are relatively expensive per GB.
Examples include USB sticks, flash memory cards and SSD drives. Optical Storage:
Data is stored using a laser which changes how a layer on a disc reflects light. Different wavelengths of laser can be used, e.g. Blu-Ray, to increase the amount of storage available.
Discs are very portable and can be swapped/distributed easily. Can be RW or WORM. Good for making backups.
Examples include CDs and DVD discs.
Random Access Memory (RAM):
RAM, also known as Immediate Access Store (IAS) when directly connected to the CPU, holds instructions and data needed by a program currently in use, ready for the FDE cycle.
The CPU can't tell the difference between data code and instruction (executable) code. To solve this, each memory location has a bit pattern that identifies it.
RAM can be changed but it is volatile. Read Only Memory (ROM):
ROM cannot be changed most of the time, and it is non-volatile. It is possible to update, known as flashing, but this damages the ROM so it is done as little as possible.
It is used to store programs that always have to be ready to run and do not need to be altered, and to store programs that control the very beginning stages of start up before the OS is loaded. The BIOS or UEFI are stored in a type of flash ROM called Firmware. You don't store the whole OS on ROM as it is more expensive than RAM, and you want to flash ROM as little as possible but OSs are updated a lot.
Virtual Storage:
You can pool the storage of several devices together to appear to be one device. The advantages of this are:
It's easier to backup and maintain
Many cheaper devices can be combined to provide better storage
The storage device is irrelevant to many users as they are able to access their data from various sources
The storage can be easily expanded to meet demand over time
Virtual storage often uses Redundant Array of Independent Disks (RAID). These have the advantage of providing a continuous service if one disk were to fail. A failed disk can be rebuilt from the other disks since they hold information about each other.
Cloud storage is an extension of virtual storage. The advantages of cloud storage are:
If something happens to the workplace then the data is not lost
The information is available anywhere with an internet connection
Storage is 'rented' so can be expanded or decreased as needed
The disadvantages are:
The company hosting the storage has complete control over the data
The company could increase costs with no warning
If the user doesn't have an internet connection they can't access any of their data
Complementary Metal Oxide Semiconductor (CMOS):
This is a small amount of RAM supported by a battery to stop information being lost when power is lost. It is used to store things like user settings (e.g. where to boot from) and current date and time. The battery can be removed to restore default settings.
1.2 Software
1.2.1 Systems Software
Operating System (OS):
The OS manages the system's resources and processes, including managing peripherals and programs. Acts as an interface between the user and the computer.
It is a collection of programs that work together to supply a level of abstraction for the user, allowing them to perform functions but not showing the complexities.
Memory Management:
Memory in a computer is finite so the OS has to divide it up between the processes which need it, and it keeps track of the memory that is available. Memory management can cause fragmentation, which the OS tries to reduce. Different methods of memory management are shown below and can be combined to provide the most efficient solution. Paging:
Paging is where the OS will load a fixed-size block of data, called a page, from secondary storage into RAM, allowing the computer to run applications larger than the RAM available as only the part it needs at that moment is kept in RAM. A page table is used to keep track of whether the page is stored in RAM or secondary storage. This is like a person reading a book of instructions, where they only need to read the page they're on, as the other instructions aren't needed now.
The disadvantages are that the OS has to allocate a whole page even if some of it isn't used, meaning there are gaps which could be exploited by viruses. Also to transfer a page from secondary storage to RAM requires a lot of moving data around and can cause fragmentation. Segmentation:
Segmentation is where the memory is split into variable-sized segments, and blocks of memory are then used to store segments. The segment will have an ID and an offset (length) which allows a program that has different modules to be linked together within the memory. Segments can be different sizes depending on their contents and the OS uses a table to track and handle the memory. This table is smaller than the page table, but it is more complicated to retrieve a specific segment as it is not a uniform size. Virtual Memory:
Virtual memory is where the operating system moves the contents of memory to and from a hard disk. Overuse can cause disk threshing when the hard drive has to do lots of small operations over and over, damaging the hard drive. Virtual memory can also cause programs to slow down, since hard disks are generally much slower than RAM. This is often used alongside paging or segmentation, so a whole segment could be moved between RAM and a hard drive.
Interrupts:
Interrupts cause a computer to stop its normal sequential processing. When an internal error occurs within a computer system, or a message needs to be sent to the CPU, an interrupt is sent from the device to notify the processor. Interrupts are stored in a priority queue called an interrupt queue, which in most computers is checked after each instruction has been executed. If there is something in the interrupt queue the processor runs a program called the Interrupt Service Routine (ISR).
The ISR is different for each type of error. While the ISR is at work, the voltaile environment of what was running before the interrupt occurred needs to be stored so when the main program is working again the user can continue where they left off.
Types of Interrupt, in order of priority from highest to lowest:
Hardware interrupts: These signify critical errors in the hardware of the computer itself that affect the current process being performed by the CPU.
Restart interrupt: These are sent by a reset button and cause the CPU to restart something.
Program interrupts: These are produced by software on a computer, telling the CPU an error has occurred. The most common cause of program interrupts is memory access violations where a program tries to access a memory location that doesn't exist or is in use. When a program enters a state of failure it is said to be unresponsive.
Timer interrupts: These happen at fixed timing intervals.
I/O interrupts: These occur when the computer is transferring data, when the transfer has been completed or when there is an error during the transfer. They also occur for situations like printer buffers or keyboard presses.
The Interrupt Process:
Interrupt Register (IR) in the CPU stores the interrupts
During FDE cycle each bit in IR is checked
If a bit is set, CPU decides what to do
Current process is pushed onto stack (PC and ACC)
CPU runs ISR
Stack is put back and CPU continues
Scheduling:
Single core CPUs can only run one task at a time, so processes must be scheduled to provide the most efficient solution when trying to give the impression of multiple programs running at once. Round Robin:
This is where each process is given a set amount fo time and the processor is switched to each process in a circular method. This is an easy method of implementing scheduling but can be inefficient, as a processor may be allocated to a process that may be waiting for an input or output signal so time is lost while it waits for the signal. First Come First Served:
Also known as 'run to completion', or 'First In First Out (FIFO)', this is where each process is queued in order of arrival and then processed till completion. This is the simplest to implement but is very inefficient, as it may take a long time to complete a short process important to giving the user feedback if there is a long process taking up the processing power. Multi-level Feedback Queues:
This allows important jobs to be completed first, when a job arrives it is allocated a priority, dependent on the process type, amount of CPU time needed, I/O access or memory size. It is then added to a queue based on its priority, and the CPU processes the queues using either round robin or first come first served, processing higher priority queues first. If a process is added to a queue of higher priority than what the CPU is currently working on, it immediately performs the higher priority process. Processes can change priority, meaning they move to another queue. A CPU may lower the priority of a process if it is taking too long to complete, so other processes are not ignored. This system allows I/O processes to be performed quickly while lower-priority jobs are still performed. Shortest Job First:
This is where the CPU orders the processes it needs to complete by the amount of time they will take, and completes the shortest tasks first. This means short jobs will be executed very quickly, but long jobs could get processor starvation, where they are not performed for a long time. Shortest Remaining Time First:
This is very similar to shortest job first but the CPU orders the tasks by the amount of time left to complete the process. If a task with less time comes up, the CPU will switch and do this first, then go back to completing the process it was on. This means short jobs are executed quickly, but again longer jobs could be subject to processor starvation.
OS Types: Distributed OS:
Distributed OSs are split across multiple computers, or nodes, so that each can specialise for particular configuration. The nodes will communicate with each other to work together. The separation is not visible to the user. This method allows for easy expansion and scalability. Embedded OS:
These are hidden within many devices, and are within the hardware of the device itself. They have a dedicated purpose, have little or no user interface, are fully autonomous and have limited resources, using only what is required. Multi-tasking OS:
These allow the OS to perform several tasks at the same time, either by the CPU switching between different jobs using one of the scheduling methods above, or by using parallel processing. Multi-user OS:
Often known as aminframe systems, these allow for several users to be accessing the processor and programs and resources at the same time. They usually use a round robin scheduling method where each user is allocated an amount of time for the processor to perform their jobs. Real Time OS:
These are specialised to allow inputs to be processed under strict time limits so the output can affect the source of the inputs. They may be designed to take a certain amount of time to complete the processing each time. They usually have very limited user interaction but respond automatically to inputs.
Basic Input Output System (BIOS):
Checks all necessary hardware is present and working. Looks for and loads bootstrap from hard drive/CD. Bootstrap then loads the OS.
Unified Extensible Firmware Interface (UEFI):
This is a modern replacement for BIOS, offering a GUI. It is hardware and CPU independent so can run on many devices. It can also boot from larger disks and provide a secure boot as the OS has to be digitally signed.
Device Drivers:
Device drivers are software to enable communications between peripherals and the CPU. A generic driver allows basic features of many peripherals, whereas a specific driver provides more functionality but may only work with certain peripherals. Specific drivers can often be downloaded from the manufacturers website. A Hardware Abstraction Layer (HAL) is used so that individual programs do not need to know how the target device works, the OS handles it.
Virtual Machines:
A virtual machine is software which is used to provide hardware, and is the base for other programs or OSs. It can execute intermediate code as well as software running on the virtualised OS. It uses the Hypervisor mode of a CPU to ensure anything inside the virtual machine has no effect on the rest of the computer.
1.2.2 Applications Generation
The Nature of Applications:
Applications are designed to allow the user to complete a task, aiming to make their lives easier normally with large amounts of functionality. Applications may include word processors, desktop publishers, spreadsheets, database managers, CAD software, presentation graphics, games, photo or video editing software etc.
Utilities:
Utilities are small, useful programs, usually designed for one specific function. System Update - checks online database for newer versions of system software. Often automatically installs when newer version available. System Cleanup - removes unwanted/out-of-date files in order to free up Hard Drive space and remove clashing versions. E.g. remove temporary internet files or delete files in recycle folder. System Diagnostics - provides information about the computer to help resolve any problems. E.g. amount of RAM or hard drive space, check for installed drivers for new hardware. Defragmentation - files often get split up on a hard drive in order to fill up space available. Lots of small parts of a file are harder to read/use than one large contiguous file. Defrag utility re-organises files and puts pieces together where possible, thus speeding up hard drive operations. Format - sets up a storage device (eg hard drive or USB stick etc) so the OS can use it to save files. Format creates a table on the storage device showing where/how the files will be stored. As files are written to the device, the OS updates the table. File Transfer - used to move files between folders on the hard drive, on to memory stick etc. Firewall - prevents data entering or leaving computer. Firewall intercepts any requests to transfer data and compares it to a set of rules about which data to allow or block. E.g. access can be prevented for certain types of software, from certain addresses or on certain ports. Anti-virus - tries to block infection from virus programs designed to damage computer or data. Must be kept to date as new viruses are constantly created. Anti-Spyware - tries to block infection from spyware programs, i.e. ones designed to sit quietly and send personal/confidential/key data to another computer. Must be kept up to date. Compression Software - used to reduce size of a file by removing redundant space or by using codes for common phrases/data. Back up - Creates a second/additional copy/copies of data onto separate storage medium (eg DVD). Used to retrieve original files/data in case of loss/theft/fire damage/corruption etc. Archiving - frees up storage space by moving old data onto another medium (eg tape). Old data may need to be accessed in future, but not currently needed. E.g. finance data can be archived once tax returns completed.
Open Source vs Closed Source: Open Source:
Code available for other people to see or alter
Reduces cost of development, often volunteers will improve the code for free
Lacks formal development methods and testing
No deadlines for development, and often has quick fixes for problems so there is no guarantee the software will function fully
Often developed by people who do not have full knowledge of all the functionalities and methods used by the software so code may be inefficient
Closed Source:
Code kept by author, only executable code is distributed
Development funded by the company, costs are recovered through sales
Tested fully and not released until the software is considered reliable
Has formal deadlines for full solutions to be produced by so you know the software is complete
Technical solution is fully understood so the program is more efficient
Translators:
Programmers produce source code which somehow needs to be converted into executable code. Interpreters:
Interpreters translate one line of source code into machine code, execute it, then translates the next line and repeats. This is slower during runtime, but it allows examining of variables at error point and can run changed code without re-compilation, making debugging and development quicker. Interpreters are also easier to create than compilers. Compilers:
The whole of the source code is compiled into object code (very similar to machine code), which is executable. This means compilations only has to be performed once, and object code can be run on different machines to the one the source code was compiled on. Source code can be compiled in modules, then linked together. Object code is difficult to reverse-engineer, preventing people from stealing code. Compilers will also detect errors when the program is converted to object code, before it is run. Assemblers:
Assemblers are used to convert assembly language into machine code. Assembly language is very low level, meaning it is quick to convert to machine code but hard for a programmer to use. Mixed/Dual Mode:
Some languages, like Java, require both compilation and interpretation. Source code is compiled into intermediary bytecode, which is distributed to users. The user's CPU then runs an interpreter which converts bytecode into machine code. The result of this is a slower application at runtime, and a larger application to distribute as the interpreter must be distributed with it, but an application which should be able to run on any platform.
Stages of Compilation: Lexical Analysis:
Source code is examined. Comments and spaces are removed. Keywords are replaced with generic tokens. This creates patterns which can be examined.
E.g.
totalCost = cost * 1.2
Becomes
VARIABLE-ASSIGN-VARIABLE-OPERATOR-CONSTANT
After tokenisation a look-up table is created to store variable names, data types and the value of constants. Syntax Analysis:
Checks to see if program makes sense. The token string is split into phrases, and each phrase is examined against language syntax, checking for missing tokens, incorrect order etc. This is known as parsing code, where you check phrases against the allowed patterns.
E.g.
SELECTION/IF-VARIABLE-OPERATOR-CONSTANT-SELECTION/THEN
Is okay.
SELECTION/THEN-VARIABLE-OPERATOR-CONSTANT-SELECTION/IF
Is not okay.
Code Generation:
Errors are passed to report generator/error handling routines. If no errors are present, the code is generated. Variables are substituted with their addresses from the look-up table, likewise for any library routines needed. The linker works out how to join sequences, and the loader works out where things are at run time. Code Optimisation:
The code that has been generated is not necessarily efficient in terms of size of object code or time to execute. Optimisation removes redundant code (like adding 0 or multiplying by 1) and optimises it for the CPU architecture.
Libraries: Linkers:
Linkers join library modules or other separate code sections to the current code. They combine modules into one larger block. Libraries store data in relative addresses, as the program and libraries are compiled separately, but the linker joins the modules together in the correct way and updates the relative addresses, to produce a single executable. Loaders:
Loaders are responsible for locating object code in RAM. They adjust relative addresses in programs and load the program/data into efficient RAM space. Libraries:
Libraries allow programmers to reuse code, saving time and money as they only need to be coded and tested once.
One type of library is a Dynamically Linked Library (DLL), which is a collection of executable modules usually stored as part of OS files. This allows other programs to use a common set of routines, preventing duplication in code and allowing easier upgrading of common routines. A single DLL can be used by many programs at the same time.
1.2.3 Software Development
Waterfall Lifecycle:
Each stage is classed as a milestone of the development, and consists of a number of subtasks that need to be completed. In practice it is often necessary to go back to refine previous stages. Spiral Model:
This is an extension of the waterfall lifecycle but is designed to ensure that review and risks are evaluated regularly. It results in lower project failure rate, and involved lots of regular communication with the client to get feedback. Rapid Application Development (RAD):
RAD is common in Object Oriented Programming, and it aims to speed up the process. It allows users/clients to view a work in progress, and increases flexibility and adaptability. It works using a repeated process of prototype, feedback, prototype, feedback, etc. This gives plenty of opportunity for the client to give their views, allowing the developers to create the product they want.
Agile Methodologies:
A type of RAD, Agile development is often employed in larger projects. It involves teams focusing on specific areas of a project, and then the client uses this solution and the product is evolved based on their feedback. This means the client sees progress sooner and is able to provide suggestions.
Extreme Programming:
Extreme programming is a type of agile software development, and it is performed by ising frequent solutions in short development cycles. It includes a constant review of progress and feedback from the client. The system only works with a very level management structure and works well in small projects. Features may also not be implemented until they are needed, focusing on the more important parts of the solution. The disadvantage of extreme programming is a lack of design so solutions may not be the optimal solution to the problem.
1.2.4 Types of Programming Language (Programming Paradigms)
First Generation Languages
Written in binary or hex
No translation needed
Specific to instruction set of CPU
Quick to run
Slow to create
Easy to make mistakes
Hard to debug
Second Generation Languages
Assembly language
Low-level language
A little closer to English/mnemonics
Needed to be translated into machine code
Specific to instruction set of the CPU
e.g. LMC
Third Generation Languages
High level languages
Keywords very close to English
Additional built-in routines for printing and displaying, etc.
Start to see IDEs
Easier to write and debug code
e.g. BASIC, PASCAL, COBOL, FORTRAN, C, C#, Java, etc.
Fourth Generation Languages
Limited training required
Reduced time to create programs
Aimed towards problem-solving and systems engineering
Often automatic generation of code from diagrams - CASE
Report generators/Data managers (SQL, MAPPER)
e.g. AppInventor, Scratch, etc.
Fifth Generation Languages
Uses constraints rather than algorithms
Rules and knowledge based
Expert system
Interface sits above inference engine
Artificial intelligence
e.g. PROLOG, Mercury, etc.
Procedural Programming
Procedural languages are an example of third generation languages.
Instructions are written and executed in order. Lots of use of sequence, selection and iteration.
Uses functions and procedures. Uses modularisation, so recycles code. Is done in Structured/Imperative languages.
Functional Programming
Usually maths/equation based. Involves breaking the problem down into smallest units (decomposition).
Functions are written to receive parameters and return results.
Used for science, maths and engineering in particular.
Two examples are Haskell and HUGS.
Declarative Programming
Based on facts and relationships. Facts are entered into a knowledge base, relationships are defined and then a goal is set. Program works through the knowledge base using the rules/constraints.
Often used for AI (often with PROLOG), or diagnostic tools in medicine or technology.
e.g. Given male(David), female(Sarah), male(Alex), female(Claire), female(Paula), male(Tom)
If the goal is 'male(x)', the output would be: David, Alex, Tom
When something is found, this is an instance. If something is not found, this is a failure.
When you return to the original set of facts to check the next instance, this is backtracking.
Jackson Structured Programming (JSP)
Used with 3rd generation languages.
Uses a top-down approach - continued breaking down of problem into increasingly smaller units/modules (decomposition).
JSP diagrams help break a program down and maintain an understandable view of system.
To create a JSP diagram:
An example could look like this:
This uses:
Sequence - modules are processed in order.
Iteration - use of asterisk and comment.
Selection - use of small circle showing alternatives.
Object-Oriented Programming (OOP)
OOP aims to create models of real world objects. Key characteristics are known as attributes, and key actions are known as methods.
The benefits of an OOP are they make it easier to write working code; a class can be fully tested & released for other coders in the team to use. Classes can be treated as black boxes, as other coders don't need to know how it works internally, just how to manipulate it through its methods. There are many design patterns already available which solve common programming tasks. A coder can pick up a pattern for a UI for example, and begin coding straight away. Classes can easily be used in multiple projects as they can be reused. The code is also portable, as it only requires the CPU specific compiler.
Drawbacks of OOP include its steep learning curve, as becoming proficient in an OOP language can be difficult, as it requires a different way of thinking about problems to a standard programming language. This complexity requires skill to create efficient and flexible classes. They are also not as efficient and compact as low level language coding.
Encapsulation:
This is binding an object and a method together. Assigning and retrieving object property values are only done using methods to access the data. You make attributes in a class private, but allow them to be changed and accessed through public methods. This means that once you've written code that works, you can guarantee that it is not the source of any future problems as it will not change.
Inheritance:
Inheritance is where a child class (subclass) is defined based on parent classes (superclasses).
For single inheritance, a subclass inherits all the attributes and methods of a single parent class.
For multiple inheritance, a subclass inherits attributes and methods from multiple parent classes.
Overloading:
This is when calling one method will have a different effect depending on the data supplied to it.
e.g. Tom might have a method Sleep(). Calling Sleep(8) will have him sleep for 8 hours. Calling Sleep('Tight') will have him sleep tight.
Polymorphism:
This is a type of overloading, where multiple objects act consistently when the save method name is called on them.
e.g. Tom and Tom's dog both have a method Sleep(). Calling Sleep() on either of them will have the same effect of sending them to sleep.
Polymorphism gives a single interface to multiple versions of a function or method performing similar actions or processing.
Some key components used for OOP include:
Class - A template for a set of objects which have state and behaviour.
Describes properties and methods of a real world entity. A "Blueprint" for actual objects created later.
Each specific object is an instance of a class.
Object - An instance of a class.
A real world entity.
Holds attributes and methods.
Attributes - An attribute describes an individual data item within an entity
Methods- A method is a procedure associated with an object
Inheritance - The ability of a class (the derived class) to use properties and methods of another class (the parent class).
Derived class is the class resulting from the inheritance process.
UML (Unified Modelling Language) Diagrams:
Range of diagrams used to describe a system and the relationships between classes/objects.
Use-Case Diagram
Class Diagram
Object Diagram
Communication Diagram
Sequence Diagram
Activity Diagram
State Transmission Diagram
Data Flow Diagram
Memory Addressing Modes Direct Addressing
This is where the operand represents the location of the data in memory. e.g. LDA 18 means load the data from address 18. Indirect Addressing
With Indirect Addressing, the operand represents the memory location which holds the memory address of the data. e.g. LDA 1, if memory address 1 contains 2, means load the contents of memory address 2. Immediate Addressing
Here the operand is the actual value to be loaded, so LDA 2 means the value 2 is loaded. Indexed Addressing
One of the CPU registers is the index register. This holds the base memory address.
Indexed Addressing then functions like Direct Addressing, except the addresses are relative to the base address. e.g. If the base address is 2, LDA 3 would load the contents of memory address 5.
Iteration can be achieved by incrementing the base address.
1.3 Exchanging Data
1.3.1 Compression, Encryption and Hashing
Lossless Compression:
Lossless compression is used for text, programs and data where all the information is required. However, less space is saved compared to lossy compression. It is usually a two stage process - analyse, then compress. The compression and decompression algorithm is needed at both ends. Examples include .png and .flac.
Lossy Compression:
Data is lost, hopefully redundant or less important data. More space is saved. Different files can be compressed different amounts. Lossy compression is often used for video and sound, where certain frequencies can be ignored or detail of image can be reduced without visible loss of quality. Examples are .jpg and .mp3.
Run-Length Encoding (RLE):
Identifies repeating patterns. Stores one copy of the pattern along with how many times it repeats consecutively. For example, the string "00000011110000110111100" could be compressed to "60414021104120", which reduces the string from 23 characters to 14. The following code will compress a string called "fileText" using RLE.
string fileText = "...";
string rleCompressed = "";
int rleCharCount = 0;
int rleIndex = 0;
private int getCountOfChar(char charac) {
if (fileText[rleIndex] == charac && rleIndex < fileText.Length - 1) {
rleCharCount++;
rleIndex++;
getCountOfChar(charac);
}
return rleCharCount;
} Dictionary Encoding:
This creates a dictionary of key words/phrases which are repeated throughout the data. Each item is given an index, and a substitution encoder replaces the key phrases with the relevant index. An example is shown below:
Original Text
Dictionary
Compressed Text
Happy birthday to you
Happy birthday to you
Happy birthday dear <name>
Happy birthday to you
1 Happy
2 birthday
3 to
4 you
1 2 3 4
1 2 3 4
1 2 dear <name>
1 2 3 4
100 chars
26 chars
47 chars
Encryption
Algorithms called ciphers are used to convert data in plain text into cipher text. Keys are used to convert between plain text and cipher text.
Symmetric Encryption
This uses the same key to encrypt and decrypt some data. This means the key must be shared between the sender and the receiver, which means it can be intercepted. Symmetric encryption is very easy to understand and implement, which means it is often used in low profile situations which need a quick set up, but don't need the highest level of security. It is also easier to keep track of, as there is only one key you need to be aware of.
Assymetric Encryption
This uses a different key for encryption and decryption. This means the key doesn't need to be shared by the sender, as the receiver has a different key. This also gives more control over permissions, as the encrypt key could be made available to anyone (known as a public key), but the decrypt key kept private (a private key) so anyone can encrypt data, but only those with the decrypt key can see the data. Even if someone was to intercept the data being sent, it would be useless as it is in cipher text, and they don't have the private key. This does mean anyone can encrypt data, but this doesn't pose a security threat, as malicious messages would just be detected when decrypted on the receiver's end.
Hashing
Hashing is the process of applying a mathematical calculation to get a result.
1.3.2 Databases
A database is a collection of data that can be used to store, sort and search large quantities of information.
The titles of the groups (columns) of data in a database are called fields. Each set of data (each row) is known as a record.
Database Types Flat file
A flat file database is just a single table which stores data. Flat file databases are hard to maintain as they often have repeated or redundant data, and there is often very little pattern to how the data is stored. This means it is confusing and slow to use.
Flat file databases are often stored in plain-text form, where each line holds one record, and fields in a record are separated by commas. This is known as a CSV (Comma-Separated Values) file.
Relational
A relational database is one which contains multiple tables of data. These tables are known as entities, and entity names should always be singular. Entities are related to each other through special fields known as keys.
There are three common types of key:
Primary Key:
A primary key is a field which has a unique identifier for every record in that table. The primary key for each record must be different.
A composite key is a primary key which is made up of multiple already existing fields to create a unique value.
A primary key is often denoted by the field name being underlined.
Foreign Key:
A foreign key is a primary key but in another table. This allows tables to be linked together.
Secondary Key:
A secondary key is an important field, which doesn't necessarily have to be unique. The field should be one which is likely to be searched for, indexed or sorted. One table can have multiple secondary keys.
An example of a database is shown below. This shows two tables, and the keys used to link them together.
Entity Relationship Modelling
An Entity Relationship (ER) Diagram is a simplified diagram of how tables are connected within a database. There are three types of relationship:
One-to-one: These relationships are often redundant and can be resolved by merging the two tables, as this relation only tends to occur between fields within the same entity. E.g. a student and a seat. Each student only has one seat, and each seat is for one student. This could be resolved by making the seat a field in the student table.
One-to-many: These are the most common relationship type. E.g. a student is in one class, but a class has many students.
Many-to-many: Although these relationships are common, they cannot exist in a database. To resolve this, an Association Table is used, so there are only one-to-one and one-to-many relationships.
Examples
The following ER Diagram is based on the student and class tables above.
The school changes their system, and each student can now take multiple classes. The ER diagram is shown below. Many-to-many relationships are not allowed. They must be resolved with an Association Table.
The following ER Diagram uses an association table, studentTakes, to resolve the many-to-many relationship.
To make this work, the classId field must also be removed from the student table.
Methods of Capturing, Selecting, Managing and Exchanging Data
Data is usually managed with queries. The most common types of query are: Select, used to find/sort data; Append, used to add new records; Update, used to change existing records; and Delete, used to remove records.
Data is captured using forms. These are on-screen sections where users can input things. Forms display data in a readable format as well as allowing validation to be employed before a query is executed. Data is outputted from a database in reports. These are generally in paper form.
Normalisation
Normalisation is a tool used to validate ER Diagram models and optimise database structure.
A Normal Form is a label given to an arrangement of data, depending on how it is arranged.
A dependency is when one field is based upon another field. For example, an age field would be dependent on a date of birth field, as the age can be inferred from the date of birth. Dependencies are important in normalisation.
Un-Normalised Form (UNF or 0NF)
A database which has had no normalisation applied to it is in UNF.
First Normal Form (1NF)
To get a database into 1NF, you must:
Eliminate duplicate columns (any columns with the same field name or which store the same data)
Get rid of any groups of repeating data (where a single field contains an array of repeating values)
Identify the Primary Key, or create one if it doesn't already exist
Separate out any fields which are not atomic
Second Normal Form (2NF)
To get a database into 2NF, you must:
Check data is already in 1NF
Remove any partial dependencies by splitting the table on that field and moving the data to a new table. These can only occur if the primary key is a composite key
Fix any many-to-many relationships using association tables
Third Normal Form (3NF)
To get the database into 3NF, you must:
Check data is already in 2NF
Check there are no non-key dependencies
To get to 3NF, all fields should be dependent on the key, the whole key, and nothing but the key. Example
Starting in UNF:
product(Item, Colours, Price, Tax)
Item
Colours
Price
Tax
T-shirt
red, blue
12.00
0.60
Polo
red, yellow
12.00
0.60
T-shirt
red, blue
12.00
0.60
Sweatshirt
blue, black
25.00
1.25
Not currently in 1NF as:
Colours field is not atomic
There are duplicate records
There's no primary key
Normalise to 1NF:
product(Item, Colour, Price, Tax)
Item
Colour
Price
Tax
T-shirt
red
12.00
0.60
T-shirt
blue
12.00
0.60
Polo
red
12.00
0.60
Polo
yellow
12.00
0.60
Sweatshirt
blue
25.00
1.25
Sweatshirt
black
25.00
1.25
Not currently in 2NF as:
There is a partial dependency, as Price and Tax depend on Item, but not Colour
Normalise to 2NF:
product(Item, Colour)
cost(Item, Price, Tax)
Item
Colour
T-shirt
red
T-shirt
blue
Polo
red
Polo
yellow
Sweatshirt
blue
Sweatshirt
black
Item
Price
Tax
T-shirt
12.00
0.60
Polo
12.00
0.60
Sweatshirt
25.00
1.25
Not currently in 3NF as:
There is a non-key dependency, as Tax depends on Price, not Item
Structured Query Language (SQL)
SQL is used to manipulate data in databases. There are many variations but the principle remains the same. SQL can be split into two parts - the Data Definition Language (DDL) and the Data Manipulation Language (DML).
DDL
The DDL is used to build the structure of the database. The main commands are:
CREATE
Used to create databases, tables and users
GRANT
Gives certain permissions to the user, allowing them to perform the given DML commands
DROP
Deletes a database, table or user
Example:
The following code creates the database in 3NF above, but adds a field for the primary key of the product table. SQL commands are usually written in capitals.
CREATE DATABASE db;
CREATE USER dbUser;
GRANT ALL ON db.* to dbUSer;
USE db;
CREATE TABLE product (
productId INTEGER NOT NULL AUTO_INCREMENT,
item VARCHAR(15) NOT NULL,
colour VARCHAR(10) NOT NULL,
CREATE TABLE tax (
price DECIMAL NOT NULL,
tax DECIMAL NOT NULL,
PRIMARY KEY price
);
DML
The DML manages the data inside the database. Commands except SELECT often return TRUE if they complete successfully, depending on the language. The main commands are:
INSERT
Used to insert a new record to the specified table using the given data. Any fields not specified will be given the default value, unless there is no default and they have NOT NULL, in which case an error will be returned
UPDATE
Used to update a specific record with the given data
SELECT
Used to retrieve certain fields (or all fields using the * symbol) from records which match the given condition.
These queries often use many other keywords:
FROM
Specifies the table the record to be found is in
WHERE
Begins the condition any record must meet to be returned
LIKE
Allows searching for a pattern rather than an exact value. Wildcards are used to define a pattern and vary depending on the system used:
* or % represents any zero or more characters. These are the only wildcards you need to know
? or _ represents any single character
- represents a range of characters e.g. [a-f] or [0-9]
AND
The record must meet both conditions on either side of the AND keyword to be returned
OR
The record must meet either condition to be returned
JOIN
The same as INNER JOIN in the real world, this allows you to retrieve records from multiple tables in one query, selecting records which have matching values in all tables for the specified columns
DELETE
Used to delete a specific record
Examples:
INSERT INTO product (item, colour) VALUES
('T-shirt', 'red'),
('T-shirt', 'blue'),
('Sweatshirt', 'blue'),
('Polo', 'red'),
('Polo', 'yellow'),
('Sweatshirt', 'black');
INSERT INTO cost (item, price) VALUES
('T-shirt', 12.00),
('Polo', 12.00),
('Sweatshirt', 25.00);
INSERT INTO tax (price, tax) VALUES
(12.00, 0.60),
(25.00, 1.25);
$\longrightarrow$ TRUE
UPDATE product SET colour = 'red' WHERE productId = 5;
$\longrightarrow$ TRUE
SELECT * FROM product WHERE colour = 'red';
$\longrightarrow$
productId
item
colour
1
T-shirt
red
3
Polo
red
5
Sweatshirt
red
SELECT colour FROM product WHERE item = 'Polo';
$\longrightarrow$
colour
red
yellow
SELECT * FROM cost WHERE item LIKE '%shirt';
$\longrightarrow$
item
price
T-shirt
12.00
Sweatshirt
25.00
SELECT * FROM db.product WHERE colour = 'red' OR colour = 'blue';
$\longrightarrow$
productId
item
colour
1
T-shirt
red
2
T-shirt
blue
3
Polo
red
5
Sweatshirt
red
SELECT product.productId, product.item, product.colour, cost.price FROM product JOIN cost ON product.item = cost.item;
This gets the specified fields for every record where there is a matching value in product.item and cost.item. Any values in product.item which aren't in cost.item, or vice versa, wouldn't be returned.
$\longrightarrow$
productId
item
colour
price
1
T-shirt
red
12.00
2
T-shirt
blue
12.00
3
Polo
red
12.00
4
Polo
yellow
12.00
5
Sweatshirt
blue
25.00
6
Sweatshirt
black
25.00
Different commands can be merged into one statement, for example: SELECT product.productId, product.item, product.colour, cost.price FROM product JOIN cost ON product.item = cost.item WHERE cost.price = 12.00;
$\longrightarrow$
productId
item
colour
price
1
T-shirt
red
12.00
2
T-shirt
blue
12.00
3
Polo
red
12.00
4
Polo
yellow
12.00
Referential Integrity
The maintence and consistency of relationships in a database is known as referential integrity.
One part of this is making it impossible to add a record with a foreign key that doesn't exist. E.g. In the above examples, it's impossible to add a record in db.product which has a value for the item field which doesn't exist in db.cost.
Another part of referential integrity is the cascade delete, where a deleted primary key results in all foreign key records with that primary key being deleted. E.g. Deleting T-shirt from db.cost would also delete the first two records from db.product as these reference the T-shirt primary key.
Transaction Processing
Transaction processing maintains database integrity. A transaction is where all of the relevant information is entered, so a record is complete. A transaction can't remain in an intermediate or incomplete state. Other processes can't access the data until either the transaction is complete, or the database is rolled back after it fails. The database can process the information once all data has been entered.
ACID
The ACID transaction method guarantees the database integrity is kept. The transaction must comply with the following properties. Atomicity
All or nothing. Everything must be completed in one action, so if the database crashes the transaction is either complete or nothing's changed. Consistency
Ensures the transaction complies with the update rules, e.g. format, referential integrity, not causing the database to fail. Data must be valid whether or not it is correct. Isolation
Each transaction must be performed independently of any other transaction. Durability
Once a transaction is complete, the change must remain even in the case of a power failure. This can be done by immediately saving the data to storage when the transaction is complete, and not queuing or holding the data.
Record Locking
This stops records or fields from being changed by multiple queries at once. While a query is being performed, other queries can read the data but can't change it, to prevent things getting confused.
Redundancy
Duplicate data in a database is known as redundant. This should always be minimised as it compromises the database integrity, and it is often removed through normalisation.
1.3.3 Networks
Topology is the way computers are linked together, and it describes the physical cabling layout and how data moves between components.
Protocols and standards are rules which must be followed when networking computers, allowing different hardware and software to be able to communicated.
The Internet Structure The TCP/IP Stack
The Transmission Control Protocol / Internet Protocol allows communications on LAN and WAN networks. The stack has four layers:
The Application Layer (top layer) uses an appropriate protocol relating to whatever application is being used to transmit data.
The Transport Layer is repsonsible for establishing an end-to-end connection, and then breaks the information down into packets. Each packet is given a header which includes the order of the packets so they can be reassembled at the other end
The Network Layer (IP) adds to the header the source IP address and the destination IP address. Routers operate at this layer, using the IP address to know where to send the data. After this layer, a complete socket is included in the header.
The Link Layer (bottom layer) represents the physical connection between various network nodes. It adds the MAC (Media Access Control) address of both the source device and destination device.
Once the stages have been completed, the data is transmitted, and when received the stages are performed in reverse to retrieve the original data.
Internet Protocol (IP) Address
IPv4: 32-bit address, the standard format (e.g. 192.168.0.0). In 2011, the last address was allocated, but they can still be used as lots of addresses have been bought but are dormant, and addresses are dynamic so can be reallocated.
IPv6: 128-bit address. Written in Hex to save space (e.g. 4aae:1803:4647:2:200:b9be:de12:94cd).
Advantages of IPv6:
Easier to identify a device as all devices are given a public IP address
No possibility of private address collisions on a network
Removes the need for a System Admin to set up complex network settings
Higher data transmission speed due to direct routing of packets
Increase in security due gto packet routing and size of IP pool
Simplified network infrastructure
Disadvantages of IPv6:
Specifying addresses is harder as increased number of bits
ISPs have absolute control over their network which could cause privacy issues or restricted usage unless you pay a premium
No clear method for transitioning from v4 to v6 and could affect usage in the meantime
When the transition is complete, all v4 machines will immediately become obselete
Manufacturers could charge more for new computer systems as everyone will need new systems to connect using v6
Public (Routable) IP:
These are assigned by IANA (Internet Assigned Numbers Authority) who distribute numbers to local registrars, who assign numbers to users or companies. Each computer often doesn't have its own IP address, the router for the network it's connected to has an IP, and this router manages the traffic going in and out, and distributes it as necessary. This reduces the number of IP addresses needed, and provides some security as you can't directly access a computer based on an IP.
Private (Non-routable) IP:
Some addresses are specified as private by IANA. These are used in LANs not directly connected to the Internet. A LAN can be connected to the internet with a router with Network Address Translation (NAT) so the packages go to the right place and outgoing packages look like they came from the router's public IP. Private IPs start with 10, 172.16-172.31, or 192.168.
Subnet masking:
This tells the computer which IP addresses it can reach directly and which it needs to access through a router. The mask is in the form of IPv4 but can only contain 255 (part of the IP must be the same as the host) or 0 (part of the IP can vary from the host, but can still be accessed directly). The mask is applied to an IP by converting both the mask and IP to binary, then applying a bitwise logical AND comparison. The first half of the result is the network address, and the second half is the usable host identifier.
Domain Name Server (DNS)
The DNS is primarily a database which translates domain names, as these are easily memorable by humans, to IP addresses corresponding to the location of the site.
The browser sends a request to the DNS for a specific domain name. DNS servers return the corresponding IP address. If the DNS server can't resolve the domain, it recursively passes the request onto another DNS server. When the address has been returned, the browser retrieves the website from the server at the IP address.
Protocol Layering
Protocols can be layered on top of each other (applied one after another). Some common protocols are:
HTTP (Hypertext Transfer Protocol):
This defines how data from web pages is transferred from the server to the client when the page is viewed over a TCP/IP network. The protocol makes a request of the server at the IP address and the server will return something for the client to view.
HTTPS (HTTP Secure):
This is also used to download web pages but it also applies a layer of encryption, which means if the data is intercepted during transmission it is very difficult to decode it. HTTPS is mainly necessary for passwords, bank details and other personal information.
FTP (File Transfer Protocol):
This is used to upload and download files. Most browsers include an FTP utility to download non-webpage files from the internet. FTP transfers can often continue after reconnecting if there is an issue with the connection partway through, meaning progress isn't lost.
POP3 (Post Office Protocol Version 3):
This allows an application to receive emails from a server. The protocol connects to the server, downloads the messages to the user's computer, and then deletes the messages from the server. This means once viewed on one email client, emails are no longer available on the server for other clients to use. POP3 uses the user's computer as the primary storage location.
IMAP (Internet Message Access Protocol):
Modern users access emails on multiple clients, so IMAP is used, which only deletes messages when users explicitly request it. IMAP uses the server as the primary storage location.
SMTP (Simple Mail Transfer Protocol):
This distributes emails. The message will include the intended recipients, attachments and the email content. This message is then sent to the sender's mail server, which uses DNS to send the message to the client's mail server.
SSH (Secure Shell):
This is a remote-access protocol, allowing secure communication between a client and a server. Both the client and server must be running SSH programs. SSH ensures no one else can see the data transfferred.
LAN vs WAN
LANs (Local Area Network) are within a geographical range and use their own cabling and topology structure. They're usually within an organisation.
WANs (Wide Area Network) are over larger areas, usually using the internet to share data.
Packet and Circuit Switching
Packet switching is where the packets which make up a message are sent using the optimum route for each packet, which may not be the same for all of them. When the destination receives the packets they are reassembled into the correct order, regardless of which computers they went through to get to the destination. If a packet is lost due to a collision and the destination computer doesn't send a confirmation message back to the source, the source computer will retransmit after a certain time.
Circuit switching is where a physical connection is made between two computers. This is only possible on LAN networks, but is fast and secure.
Network Security Threats
A virus is a program embedded into another apparently harmless program which is intended to cause harm to a computer. When the harmless program is executed, the virus copies itself onto the disk, where it can hide and make changes to the system to cause damage. Viruses often duplicate themselves and spread to other computers. Antivirus programs detect and remove viruses. Most browsers scan files being downloaded for viruses.
A worm is a malicious program designed to replicate itself and spread across a network such as the internet. Unlike a virus, a worm is a complete program in itself and doesn't hide in another program. They're most commonly spread through email, and they often slow down servers and data transfer.
A Trojan virus is a non-self-replicating virus which hides in a file, and when the file is executed it gains access to the sustem. This then acts as a back door for someone to fully control the infected system.
Firewalls
A packet filtering firewall studies each packet as it arrives and compares the features of the packets with the filtering protocol in use. It's used on certain ports which the network admin employs it on.
Advantages:
Functionality provided by standard router firmware
Fast
Flexible
No user action required for installation
Disadvantages:
Filtering rules are complex to specify
Rules can't be tested before they're in use
May not cover all ports
A stateful inspection firewall doesn't care about the purpose, origin or destination of a packet, it just compares every packet against the rule set. It stores the states of connections into a state table, speeding up processing the data, and these states are checked against the states of the allowed policies.
Advantages:
Faster than a proxy server
Added security built in by performing application layer filtering of protocols
More secure than packet filtering
Cheap to set up
Disadvantages:
Less secure than proxy servers
Slower than packet filtering
Resource-intensive
Vulnerable to new forms of attacks which target the protocols of the connections
Can be difficult to create the polocies to control the connections
A proxy server is a server which connections go through to get from the client to other servers, which checks any transfers to ensure data entering or leaving the system is safe. They are often used along wih a firewall to increase security and reduce bandwidth use. Computers inside a network use proxy servers to access external data. This means only one computer is exposed to the outside network, and this connection can be managed closely.
Advantages:
Hides the internal structure from external sources
Improved authentication and logging
Cost effective
Rule set is less complex than packet filtering
Can be used for privacy as the server being accessed doesn't know who the client is, only who the proxy server is
Disadvantages:
Protocols are more advanced and require multiple authentication steps
Changing protocols can affect other traffic
Can be expensive to implement
Can be slow as data needs to be sent twice
Encryption
A digital signature allows the receiver to determine who the sender is and ensures they can tell whether or not a message has been tampered with during transmission. The signature is generated by formalising the document in mathematical notation and applying a hash function, which generates a hash code specific to the content of the message. If the contents change then they won't match the hash.
A digital certificate identifies a specific user. It contains a name, unique ID such as serial number, the user's public key and the certificate authority's signature to show it's real. This prevents someone pretending to be someone else and sending you their public key rather than the intended recipient's public key, which would allow the imposter to decode the data.
Network Hardware Network Interface Card (NIC)
This allows a device to connect to a netowrk. Most motherboards have these built in.
Hub
The simplest form of network device, this receives a signal from a computer, then transmits it to all other computers in that network. Cheap and effective in small networks, but for larger networks it causes collisions and slows the network down.
Switch
This utilises built-in memory to form a lookup table. When data is sent, the switch sends it to all computers, and receives a signal back from the correct computer. It stores this information in the lookup table, so in the future it can easily send data bak and forth between these computers, without sending the data to everyone. This cuts down on traffic and results in more reliable communications.
Managed switches are more advanced switches which can be configured by a user, e.g. to filter certain ports or give priorities to certain ports or computers.
Router
A router forwards packets from one network to another. Each network has a router which works out where to send packets which are destined for other networks. A packet sent over the internet will go through many routers before it reaches its destination. Routers work by changing the destination MAC address to forward the packet to another router without changing the destination IP address.
Gateways
These connect multiple networks of different architectures, called an internetworking system. Gateways can be a physical object or they can be built into a network's subsystems and into the network software. The gateway takes in a packet from an external computer and strips it down to the raw data, which is repacked using the protocol which that network supports.
Wi-Fi
This is a high-bandwidth wireless method of communication which is an alternative to Ethernet networks, despite being slower and less reliable. Wi-Fi networks are accessed through hotspots, which is the area where a device can connect to a Wi-Fi network. All networks are based on an international standard so any device can connect to any network.
Wi-Fi networks can be secured in the following ways.
Wi-Fi Protected Access (WPA/WPA2): The user must enter a passphrase to connect, which is used along with the SSID to create an encryption key. This key is unique to each device and temporary.
Service Set Identification (SSID): This is the name given to a wireless loccal area by the admin during installation. To communicate with each other through the network, devices must have the same SSID for their connection. This can be used to secure networks by turning off the automatic SSID broadcast, so it won't appear to all devices, and users who don't know the exact name of the network can't access it.
MAC address whitelisting: This blocks every device other than those with one of the specified MAC addresses. This is used when only a small number of people should be allowed to join the network, and when encrypting the data sent over the network isn't feasible.
Carrier Sense Multiple Access with Collision Avoidance (CSMA/CA): A transmission protocol that acts to prevent the collision of packets. When a computer receives a packet which is to be sent, it checks to see if there is a clear channel which can be used, else it will check again after a random period of time.
Types of networking Peer-to-peer
These have no central server or central management and every computer has the same rights. These are cheap as they require no expensive server or hardware which needs to be maintained. Programs must be installed on every computer and monitored to make sure everything's up to date. On most peer-to-peer networks the data is stored locally on each machine so regular backups are needed to prevent data loss. Mostly used for small networks as maintaining a small number of machines isn't too difficult. This is good as if one connection/computer goes down, most of the system can continue to function.
Client-server
AKA server-based networking, this is where one or more central computers act as a server and are responsible for managing resources, security and more. The server is the only computer which holds all information and functionality, which is advantageous when lots of interaction takes place between computers. Resources can be deployed remotely from the server without visiting each computer individually. Security and accounts are the same across the network, so users can log in and view their data on any computer on the network. Back ups are automatically done by the server so no data is lost. However, if the server goes down, the whole network stops working.
Client-server
Peer-to-peer
All data and resources are held on the server
Data and resources are held on individual machines
Software can be shared across the network and easily distributed and managed
Copies of software are held on individual machines which need to be maintained individually
More complicated and expensive to set up and maintain
Generally simpler to implement, but gets more complex with more machines
Usernames, passwords and access levels are controlled centrally so it is easier to ensure security
No central security, can only access accounts/data on a specific computer
Centralised and regular backup facilities to prevent data loss
Backing up is down to individual users so is less reliable
If the server goes down, the whole networks stops, but if a client goes down then there is no data loss or impact to other clients
No central server dependence, but if a client goes down the data on that machine is unavailable to everyone
Server controls the communication protocol so it doesn't matter what type of computer the clients are, or if they're the same
Individual computers must all have the same software loaded to control communication with other computers
1.4 Data Types and Structures
1.4.1 Data Types
Primitive Data Types:
Integer: A whole number.
Real/Floating Point: A decimal number.
Character: A single character.
String: A series of characters, like a sentence or name.
Boolean: Either True or False.
Binary:
Binary is what the computer stores data in, as the computer can only understand two states and binary is base 2, so it can only be 0s or 1s.
Positive Numbers:
Add up the denary number for each time there's a 1 in the number, as shown in the table below.
Negative Numbers:
Negative numbers in binary can be displayed either using a sign bit or using two's complement.
Two's Complement:
To get a two's complement number, follow the following steps:
Find the binary of the modulus of the number
Invert this, so all 0s turn to 1s and all 1s turn to 0s
Add one to the result
Sign Bit:
A sign bit number simply uses the first bit to represent either positive or negative, where 1 is negative and 0 is positive. This reduces the maximum value a binary number can be, as an 8-bit number can now only go up to 127.
Addition:
For addition, put one number over the other. Work from right to left, adding the binary digit to the one below it. 0+0=0, 1+0=0+1=1, 1+1 = 0 carry 1 to the left column.
E.g. (carries are shown in orange):
If the furthest left column results in a carry, so the number becomes more than 8 bits, it is known as an overflow. Write the furthest left bit with brackets around it. In a computer, this would be ignored so the result would be incorrect.
Subtraction:
To subtract numbers, find the two's complement of the number you want to subtract and add this to the number you want to subtract from.
Alternatively, use the borrowing method, e.g. Fixed Point:
Decimal numbers can be either fixed or floating point. Fixed point is where a certain number of bits are allocated for the integer part, and the rest are allocated for the fractional part.
The number above, 01011.101, is 11.625 in denary.
Floating Point:
Uses mantissa, the main number, and the exponent, which is how many times the mantissa needs to be doubled to get the original number.
The sign can either be a sign bit or the mantissa can be in two's complement.
The binary point in the mantissa goes between the most significant bit and the second most significant bit.
Converting binary to denary:
Convert the exponent to denary, remembering the left most 1 has a negative place value
Move the binary point in the mantissa this many places to the right (or to the left if negative)
Convert the mantissa to denary, remembering the left most 1 has a negative place value
Normalisation:
If the binary number starts with a 1, remove all leading 1s, if it starts with a 0 then remove all leading 0s.
Pad out the mantissa with 0s on the right
Decrease the exponent by the number of leading 1s or 0s you remove (a.k.a. how many places the binary point moves to the right)
Addition/Subtraction:
Change the number so the exponents are the same. If you increase the exponent by x, shift the mantissa to the right by x places. If decreasing the exponent, shift the mantissa to the left
Perform the addition or subtraction on the mantissa parts
Join the mantissa and exponent
Normalise
Floating Point Errors:
Rounding - Recurring numbers, e.g. $\frac{1}{3}$, can't be represented exactly.
Cancellation - When numbers of completely different magnitudes are subtracted from each other, the one being subtracted from is sometimes returned, e.g. 1 - 0.000000004 = 1. Also, when subtracting numbers of equal magnitudes, e.g. if 1.000 - 1.000 = 0.00000000234.
Absolute - The difference between the number desired and the nearest computer representation of that number.
Relative - The absolute error divided by the desired value.
Underflow - When using very small numbers and you reach the boundary of what the computer can store, e.g. if the computer can store no lower than $\frac{1}{128}$, then $\frac{1}{128} \times \frac{1}{128}$ results in an underflow error as the result can't be stored.
Overflow - When the result of a calculation is too large to be stored.
Hexadecimal:
Hexadecimal is easier for humans to understand than binary, but harder for computers to understand. It is used as a compromise between words and binary.
Hex is base 16. As we don't have numbers to go this high, letters are used. A represents 10, B 11, etc until F representing 15.
Convert between binary, hex and denary:
To convert between binary and denary, use the positive numbers table above.
To convert between hex and denary, multiply the first part of the hex number by 16, and then add on the denary value of the last hex digit. To convert back from denary to hex, divide the number by 16 and convert this to hex to get the first part, then convert the remainder of the previous division to hex and put that after the first part.
To convert between hex and binary, you just have to convert each hex digit to a binary nibble using the positive numbers table above for the denary version of the digit. To convert back, just convert each nibble to hex.
Character Sets:
ASCII (American Standard Code for Information Interchange) is 7 bits, worked for all characters on english keyboards and some extra control characters. Capital alphabet is 65-90, lower case is 97-122.
Extended ASCII (8 bits) was incompatible over different devices.
Unicode is 16-bit and is universally supported. All standard ASCII characters have the same code point to be backwards compatible. Allows for enough characters to represent all human languages, including emoji, and only about half the space has been used. It is space efficient, so common characters are represented by fewer bits than less common characters so they're quicker and easier to use. Unicode Transformation Format (UTF-8) allows characters to be represented by 1, 2, 3 or 4 bytes. Each byte has one or more bits reserved for control bits, which indicates how many bytes this character is made up of and whether this is the first byte or a continuation.
1.4.2 Data Structures
Static Data Structures: Arrays
An array is a data structure that can be used to hold elements of data of the same type. You cannot change the length of an array. If you change the length, it becomes a list. A number after the variable name (subscript) indicates the index of array element to be used.
A two dimensional array is essentially an array of arrays, and can be visualised as having two columns, or being two arrays next to each other.
Advantages:
Easily accessed
Fits into loops
Static - you know how many items there are and how much memory it takes up
Disadvantages:
Stored in volatile RAM
Single data type for one array
Memory intensive, often they are defined as larger sizes than necessary in case they get bigger
Hard to sort
Static - can't grow if necessary
Records
This is a data structure which can provide a structure within a structure. For example, to store the name, age and postcode of 100 customers, you can define a structure which can be used for each element.
e.g.
Customer[72] = "name: Joe Herbert
age:
postcode: AB01 2CD"
Lists
Lists are a collection of data which follows on from each other in sequence. Arrays are a type of list with a fixed length. Linked Lists are also a type of list.
Tuple
Tuples are lists of a finite length, like arrays, but their contents are immutable.
Dynamic Data Structures:
Queues and stacks are often implemented using arrays, so they are dynamic structures implemented inside a static one. A completely dynamic data structure uses the heap. This is an area of RAM reserved for processes that don't know exactly how much storage space they need for their processing. Processes can request a heap block from the OS.
Queue
New item is added after the previous one. Known as FIFO (First In First Out).
Use a pointer called FRONT to point to the memory location of the front of the queue and a pointer called END to point to the end of the queue. Therefore the queue itself is dynamic but its information can be stored in a static array.
Stack
Works like a stack of plates, where the new item is added on the top. Known as LIFO (Last In First Out). Data is put in and retrieved from the same end.
Uses a TOP pointer to point to the top of the stack. "Push" = add new item to stack. "Pop" = remove item from stack.
Stack can be used to quickly reverse order of iem in a list, just be adding everything to a stack and taking it off again. If you add another item to a full stack, there will be an overflow error.
A stack is used when the CPU receives an interrupt. Current program details or registers are pushed onto stack and retrieved after interrupt dealt with.
Linked List
Linked lists are dynamic data structures which can change size during runtime. The structure expands as new data is added. It is memory efficient as it only uses the memory it needs, and it frees memory when it isn't needed.
The contents of linked lists aren't always defined at runtime so the data is often non-contiguous, so the OS links together the data regardless of where it is in memory. Pointers are used to keep track of the next item in the list, and a NULL pointer is used to terminate the list. When a new node needs to be stored, the OS finds spare memory from the heap. Memory location is returned to the heap when the node is deleted.
Advantages:
Expands or contracts as necessary
Memory efficient
Easy to add and delete items
Disadvantages:
Slow to traverse, as they have no index
Possibly not able to reserve additional memory for new items, if memory is filled during runtime it could cause errors
More complex to manage, but most languages offer helpful libraries
There is usually a list called the Head Of Lists which points to the address of the first node of each list being stored in memory, with a NULL pointer at the end.
Pseudocode:
Search for an item:
INPUT searchFor
currentNode = startOfList
found = false
WHILE !found AND !EndOfList
IF currentNode == searchFor THEN
OUTPUT currentNode.Pointer
found = true
ELSE
currentNode = currentNode.next
END IF
DO
Graphs
Graphs are a diagrammatic way to represent linked data and visualise a potential solution. Each graph consists of several nodes. The lines connecting the nodes (or vertices) are called edges (or vectors). Graphs
A graph can either be a digraph (directed graph), or an undirected graph.
A weighted graph shows the relative weighting for each edge. This is often used to show distance or time to get from one point to another.
An adjacency list can be used to represent each node and its respective links. For the above weighted graph, the adjacency list is below.
0
1
2
1
0
2
3
4
2
0
1
3
3
1
2
4
4
1
3
An adjacency matrix shows the same information, but in matrix form. 1 represents a link, 0 represents no link.
$$\begin{bmatrix} \space & \textbf{0} & \textbf{1} & \textbf{2} & \textbf{3} & \textbf{4}\\\textbf{0} & 0 & 1 & 1 & 0 & 0\\\textbf{1} & 1 & 0 & 1 & 1 & 1\\\textbf{2} & 1 & 1 & 0 & 1 & 0\\\textbf{3} & 0 & 1 & 1 & 0 & 1\\\textbf{4} & 0 & 1 & 0 & 1 & 0\end{bmatrix}$$
An adjacency matrix can also be used to show weightings.
$$\begin{bmatrix} \space & \textbf{0} & \textbf{1} & \textbf{2} & \textbf{3} & \textbf{4}\\\textbf{0} & 0 & 1 & 6 & \infty & \infty\\\textbf{1} & 1 & 0 & 4 & 3 & 1\\\textbf{2} & 6 & 4 & 0 & 1 & \infty\\\textbf{3} & \infty & 3 & 1 & 0 & 1\\\textbf{4} & \infty & 1 & \infty & 1 & 0\end{bmatrix}$$
Trees
A tree is an undirected graph where from one node to another node, there is only ever one route. A tree will also contain one less edge than the number of nodes. A tree will often express a form of hierarchy, where a root node represents the main or first item.
Binary Trees:
Data organised according to rules
Data sorted rather than being stored serially
First item is the Root Node
The last items are terminal nodes
Maximum of two nodes can be added to the previous node
Branch left if item is less than previous
Branch right if item more than previous
E.g. Binary tree for the numbers 44, 89, 1, 12, 44, 00, 23, 86, 23, 55:
Depth-First Traversal of Binary Trees:
This attempts to get to the bottom of the tree quickly, using recursion. There are three ways to traverse a tree depth-first:
In-order:
Examine left sub-tree
Examine root node
Examine right sub-tree
In-order is most common as it produces the result in the correct order.
Pre-order:
Examine root node
Examine left sub-tree
Examine right sub-tree
Post-order:
Examine left sub-tree
Examine right sub-tree
Examine root node
Post-order can be very useful, especially for computers in compiling languages, as commands are usually stored [DATA][DATA][INSTRUCTION] e.g. 4 6 +. Post-order takes the data first, then deals with the operation.
Post-order: AUDI, DAIMLER, CITROEN, VOLKSWAGEN, FORD
Breadth-First Traversal of Binary Trees:
This attempts to traverse across the tree first, keeping a log of any nodes below the current node being checked, so the program knows which ones it needs to come back to.
Hash Table
A hashing algorithm can be performed on search criteria to speed up the process of finding the index corresponding to the data to be found.
For example, an algorithm could add the numerical values for the first two letters in the search term (e.g. a=0, b=1 etc). So Joe would give index 23 (9+14), and Alan would give index 11 (0+11). However, this causes a collision if the name John was searched for, as this would also give index 23. Hashing algorithms aim to cut down on collisions while optimising the space used.
One collision resolution strategy is to just use the next free space available to store data when the correct index is already in use. However, this gets very hard to keep track of as data can end up almost anywhere, and you can't assume data has been deleted if it's not in the expected place, so linear searches must be used if the hashing algorithm doesn't produce the right index.
Another collision resolution strategy is using overflow lists, so any elements which can't be put in the expected index are put in the overflow list, which must be linearly searched when searching for an element. This reduces the size of the linear search when an element isn't in the expected index.
Another strategy is to have linked lists at every index. This means as many elements as desired can be stored at each index, so the hashing algorithm always produces the correct position. However, this is more complicated to implement.
1.4.3 Boolean Algebra
Karnaugh Maps
Example:
When representing two or more inputs on one side of a K map, the values must start at all 0s, and then differ by only one bit for the next value, and so on, as shown in the picture above. 00, 01, 11, 10 is the correct order for two inputs.
When converting to an expression from a K map, form groups of 1s, as shown below.
These groups must be rectangular or square, and they must have $2^n$ 1s in, so a group which contains six 1s is invalid, but two groups of four 1s is valid. Each group should be as large as possible, and they can overlap each other.
Boolean Algebra Rules: De Morgan's Laws: $\neg (A $ V $ B) \equiv (\neg A) $ Λ $ (\neg B)$
NOT (A OR B) is the same as (NOT A) AND (NOT B) $\neg (A $ Λ $ B) \equiv (\neg A) $ V $ (\neg B)$
NOT (A AND B) is the same as (NOT A) OR (NOT B)
Distribution: $A $ Λ $ (B $ V $ C) \equiv (A $ Λ $ B) $ V $ (A $ Λ $ C)$
A AND (B OR C) is the same as (A AND B) OR (A AND C) $A$ V $(B $ Λ $ C) \equiv (A $ V $ B) $ Λ $ (A $ V $ C)$
A OR (B AND C) is the same as (A OR B) AND (A OR C)
Association: $A$ V $(B $ V $ C) \equiv (A $ V $ B) $ V $ C \equiv A $ V $ B $ V $ C$
A OR (B OR C) is the same as (A OR B) OR C is the same as A OR B OR C $A $ Λ $ (B $ Λ $ C) \equiv (A $ Λ $ B) $ Λ $ C \equiv A $ Λ $ B $ Λ $ C$
A AND (B AND C) is the same as (A AND B) AND C is the same as A AND B AND C
Commutation: $A $ Λ $ B \equiv B $ Λ $ A$$A $ V $ B \equiv B $ V $ A$Double Negation: $\neg\neg$A = AAbsorption: $A $ V $ (A $ Λ $ B) \equiv A$
A OR (A AND B) is the same as A $A $ Λ $ (A $ V $ B) \equiv A$Logic Gates and Truth Tables:
Conjunction
AND
Λ
A
B
Out
0
0
0
0
1
0
1
0
0
1
1
1
Disjunction
OR
V
Can be written as "A + B"
A
B
Out
0
0
0
0
1
1
1
0
1
1
1
1
Exclusive Disjunction
XOR
V
Can be written as "A ⊕ B"
A
B
Out
0
0
0
0
1
1
1
0
1
1
1
0
Negative
NOT
$\neg$
Can be written as "$\bar{\text{A}}$" or "~A"
A
Out
0
1
1
0
Equivalence
The same as
$\equiv$
Can be written as "$\leftrightarrow$"
Complex Logic Gates: Flip-Flops:
A flip-flop circuit changes its state depending upon the previous state. The idea is that the flip-flop will remain in its state until a set or reset signal is sent. SSDs use flip-flops to store data.
S: Set R: Reset Q: Output Q': Compliment
S
R
Q
Q'
Effect
1
0
0
1
Turn Off
1
1
0
1
No Change
0
1
1
0
Turn On
0
0
1
1
Invalid
D-Type Flip-Flop:
A D-type flip-flop circuit effectively stores the input received. Usually used with a Clock signal to store the D input at a given time.
D: Data Q: Output Q': Compliment
D
CLK
Q
Q'
Effect
1
1
1
0
Set to On
0
1
0
1
Set to Off
Half Adder:
A half adder adds 2 bits, producing a Sum and a Carry bit (an overflow). However, the overflow is often lost from the calculation.
A
B
S
C
0
0
0
0
1
0
1
0
0
1
1
0
1
1
0
1
Full Adder:
A full adder combines 2 or more half adders, using the carry bit as a carry-in bit.
A
B
C$_{in}$
S
C$_{out}$
0
0
0
0
0
0
0
1
1
0
0
1
0
1
0
0
1
1
0
1
1
0
0
1
0
1
0
1
0
1
1
1
0
0
1
1
1
1
1
1
1.5 Legal, Moral, Cultural and Ethical Issues
1.5.1 Computing related legislation
The Data Protection Act 1998:
With tech advancing, more data was being stored, and it was hard to know who stored what. This law is designed to protect individuals and their personal data if it is automatically processed.
People who handle personal data (data controllers) have to register with the Information Commissioner Office (ICO), who are responsible for:
Running the public list of data controllers
Advising controllers on the principles of the Act
Promoting compliance with the act
Receive complaints about breaches of the act
Prosecuting offenders on behalf of subjects
The Data Protection Act has 8 principles:
Data must be obtained and processed fairly and legally
Data must be held for legal specified purposes
Data collected has to be adequate, relevant and inexcessive
Data must be kept accurate and up-to-date
Data must not be kept longer than necessary
Data must be processed within subject/s rights
Data must be kept secure
Data must not leave European Economic Area without protection
The ICO requires controllers to register certain information:
Why the data is being collected (provision of financial services)
Description of data subjects (customers)
Description of data classes (personal details, financial details)
List of data recipients (financial organisations)
If the data will leave the EEA
There are a few exceptions from the act:
Some charities and non-profit organisations
Personal, family or household reasons (contact list on phones)
Data collection for a public register
Data for internal business affairs (staff admin, ads, accounts)
Special purposes (national security, crime prevention, tax)
A data subject is entitled to compensation for a number of reasons:
Unauthorised disclosure or data
Company doesn't update data they are informed of
Unauthorised access, loss or destruction of data
The Computer Misuse Act 1990:
Designed to protect organisations' data, software and IT structures. The law consists of three levels.
- Level 1 - Access to unauthorised material:
A person causes an offence if they cause a computer to gain access to unauthorised content
This has to be premeditated
The organisation violated determines the authority however, so a student accessing a staff account is a Level 1 infringement
- Level 2 - Access to unauthorised material with intent to commit a crime:
This is essentially Level 1 CMA, but with the intent of fraud, blackmail or extortion
Carries a custodial sentence of five years
- Level 3 - Access and modification of data on a computer system:
This impairs the modification of a computer to:
impair the operation of any computer
prevent or access-hinder any program
impair operation of a program or data reliability
This includes intentional malware distribution
There are very few prosecutions under this act because:
It's difficult to prove, especially for Levels 1 and 2
If an organisation prosecutes through CMA, they are publicly declaring they've been breached, so clients lose trust in them
Other crimes carry more sentence than CMA. If a hacker commits fraud via computer misuse, police would use fraud (10 years) not Level 2 CMA (5 years)
The Copyright Design and Patents Act 1988:
Designed to protect individuals' intellectual property. People and organisations have the right for their produced work to not be stolen. This covers images, text, software and music. In computing the main concern is software piracy, where illegal copies are made of software.
This damages the industry in various ways:
Less income for the author
Without sales, no money to invest in newer versions
Software quality is reduced as security measures are bypassed
Software may contain malware
The law in the UK is very simple:
Only use copyrighted material with the owner's permission
It was illegal to change formats of a file (CD $\rightarrow$ MP3)
It is illegal to download and copy images
It is illegal to copy software without a relevant license. Some software is single-user, multi-user or site-licensed
The designs part of the act protects original designs:
This must be registered with the Designs and Patents Office
Designs are valid for 25 years
Patents are valid for 20 years
Patents must have a not obvious inventive step
Scientific discovery can't be patented as it would hinder progress
The Regulation of Investigatory Powers Act 2000:
As encryption and security improved, criminals were able to use computers as tools for the first time.
RIPA allows investigations to force access into organisations' IT structure via legislation. RIPA allows for:
Interception of communications
Acquisition of data relating to communications
Carrying out surveillance
Convert human intelligence sources
Access to secure data
If a RIPA-backed request is made to an organisation, they have to comply of face legal action.
1.5.2 Moral and ethical issues
Computers in the Workforce:
There is a concern about management spying on the workforce.
Constant monitoring could be a breach of privacy.
Many view monitoring as intrusive and showing a lack of trust.
Computers can now multi-task very easily, enabling cyber-slacking.
Increased pressure to do tasks rapidly leading to more stress related illnesses.
Automated Decision Making:
Computers can use a vast database to perform diagnostics.
These systems are very effective, quick and reliable.
When a decision is wrong, who is to blame? The computer, the programmer, the user entering the information or the person acting on the results?
Some people are now very reliant on computers for diagnostics.
Artificial Intelligence:
This sector is still in very early development, but is already mimicking human and animal behaviour.
Systems can effectively think and solve problems.
At what stage is a computer cognitive/alive? Is it right to turn them off when they behave like live creatures? What if computers become android-like?
Tests exist and are being designed to measure artificial intelligence of a system.
Environmental Effects:
Computer equipment uses valuable resources during construction, use and decommission.
With technological advancements, computers reach end-of-life by becoming obsolete rather than breaking, unlike other machines.
These machines often contain private data and require destruction.
Schemes exist to distribute these machines to less developed nations but are viewed as dumping our trash on the unfortunate.
Companies buy machines to harvest and resell precious metals (e.g. Envirofone).
Computers have a huge carbon footprint and are running 24/7.
Censorship and the Internet:
It is now very easy to produce and distribute any content you want.
Pre-Internet, every publication went through a moral and accuracy check.
This is no longer feasible as so much is produced and distributed.
The Internet allows freedom of information, but organisations are responsible for uploading content suitable for their intended audiences.
Some governments, like the Chinese, censor their Internet, but this means they can have political bias (e.g. Tiananmen Square).
Monitoring Behaviour:
All computers are addressable, so it's possible to monitor what someone uses a computer for.
Some sites use cookies to collate data to sell on to third-parties.
There is a moral issue on how extensive this tracking should be.
Analyse Personal Information:
Computers are now able to perform complex analysis of data.
Previously it would be difficult to scan customer records and find out who lives near who but now it's easy.
This is serious if it involves personal data is very dangerous in the wrong hands.
Piracy and Offensive Communications:
With technology, piracy has gone through the roof, and it's damaging to authors.
Computers can be used to generate offensive material, such as cyber-bullying or anti-Semitic hate speech.
It is illegal in most countries as it has led to suicide and depression, but it's very hard to trace and prosecute.
However, it does clash with the idea of Freedom of Speech.
Layout, Colour Paradigms and Character Sets:
There are many cultures in the world, often with clashing views.
Not all characters are used to write with (e.g. middle east doesn't use Latin alphabet) and not all cultures write left to right. Certain colours also have cultural significance.
This has to be considered when designed media.
2.1 Computational Thinking
2.1.1 Thinking abstractly
Abstraction is filtering out less important characteristics and specific details to focus on the important parts of a problem. For example, an abstraction of a train could be that it runs on tracks and has carriages. The colour, size, number of carriages etc are ignored since they're not important to it being a train.
Decomposition (also known as step-wide refinement, factoring, the top-down approach, or divide and conquer) is breaking down complex problems into less complex problems. These problems can be decomposed further until they're a manageable size, like a command or short set of commands. Each set of commands can then be brought together to solve the original complex problem. When decomposing, look out for sequence, selection and iteration.
The advantages of decomposition are:
It makes it easier to tackle smaller problems
Different sections can be worked on by different people at once
The design and the code for a process can be tested separately to the rest of the program
Code can be re-used in other projects
Modular code is often easier to read and maintain
The disadvantages of decomposition are:
You have to spend time decomposing the problem
Modules might not recombine effectively
Modules will need testing as separate units and then need to be tested again when recombined
Complex problems might not be fully understood by people working on a small part so erroneous assumptions might be made
2.1.2 Thinking ahead
Thinking should take place before coding starts.
Think about inputs - Origin? Format? Range? Validation? Best method for inputting?
Think about outputs - Format? Presentation? Target audience?
Determine the preconditions of the system- Hardware? Software? Could impact what language is used.
Should the new solution build upon, adapt or replace the existing solution?
Consider the value of replacing the old system - Time? Cost?
Caching:
Caching may be needed for large systems, especially databases.
All of the data may not fit into the system at once.
Data is organised into blocks, which can be loaded/unloaded as separate chunks rather than individual items. The advantages of caching are:
It reduces overall access times when reading/writing
Data blocks can be held locally to improve processing performance
Less reading/writing means components wear down slower and last longer
If the central database fails cached data can be used to restore some of it
The disadvantages of caching are:
Data blocks may be locked to prevent other machines accessing the same data, which could cause errors
Synchronising data can be awkward as some versions might be out of date
Increasing complexity leads to higher development, testing and maintenance costs
Reusable components:
Using library routines can speed up development as the code only needs to be written and tested once
They save energy, time, effort and cost in the long run
They improve readability as the components are given names to describe their function
Modules can be reused in different projects
2.1.3 Thinking procedurally
Thinking procedurally involves decomposing the problem, ensuring the original can be recreated from the component parts, and then the final solution is created from constituent components.
It is very important to think about the order in which you create and use components.
2.1.4 Thinking logically
Thinking logically is all about using sequence, selection and iteration.
2.1.5 Thinking concurrently
The aim of thinking concurrently is to minimise the time, and therefore the cost, needed.
Which components need to run at the same time?
Which components can be developed at the same time?
Use of teams?
Isolate components, test, then combine and re-test.
Be careful when combining, particularly with global variables.
Need to consider whether to combine source code or object code?
2.2 Problem Solving and Programming
2.2.1 Programming techniques
Programming Constructs:
Sequence: Putting the right commands in the right order.
e.g. A very simple example of incorrect sequencing, using a variable before it is declared:
Console.WriteLine(x);
int x = 10;
Iteration: Repeating the same command or action. When using iteration, ensure you consider the stopping condition.
e.g. The following iteration will output 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9:
int x = 0;
while (x < 10) {
Console.WriteLine(x);
x++;
}
Selection/Branching: Deciding which commands to run based on a given condition.
e.g. This code will welcome people differently depending on whether their name is Joe or not:
string name = Console.ReadLine();
if (name == "Joe") {
Console.WriteLine("Hi Joe!");
} else {
Console.WriteLine("Hi stranger!");
}
Recursion
This is when a function is called by itself, creating iteration. The following example outputs "Hi!" 10 times.
outputMessage(10, "Hi!");
function outputMessage(count, message) {
Console.WriteLine(message);
count = count - 1;
if (count > 0) {
outputMessage(count, message);
}
}
Variable Scope
A global variable is available throughout the whole program. A local variable is only available to one particular section/function. Global variables are more powerful but more dangerous. If you use global variables and something goes wrong, the error could be anywhere in thousands of lines of code. Local variables limit the area you need to test, making it easier to fix errors. Global variables could also be accidentally changed somewhere in the program, and they mean you can't use that variable name again. When a function/procedure finishes, all the local variables contained inside it are cleared, freeing up memory for other programs to use. Local variables can be passed between functions, either by reference or by value, so they are not completely stuck in one section.
Modularisation
The advantages of modularisation are:
Easier to debug as you know the problem is in a specific area
Reusable code means the programming only has to be done once and it only has to be tested once, saving time and money
Readability is greatly improved by using modules as sections of code have names describing what they do and everything is clearer - this makes it easier when working on large projects where multiple developers are editing the same code
Allows multiple programmers to work on a project at the same time as they code specific modules, decreasing development time and cost
Modules can be programmed in different languages which are more appropriate for the specific function of that module
The disadvantages are:
Documentation of modules must be thorough
Problems can occur when recombining modules, e.g. same variable names or not linking to the correct function
Modules need to be tested individually then all together which takes time
Integrated Development Environment (IDE)
An IDE is a single program made from a number of components, used for developing programs.
Features:
Text-editor:
Allows code to be written
Often customisable theme so the developer is more comfortable
Translator/Run-time environment:
Allows the code to be executed without being exported
Auto-complete:
Speeds up typing
Avoids spelling mistakes
Can view available identifiers
Colour-coding/Syntax highlighting:
Easier to understand code
Can identify features quickly
Can be used to check code is correct and has no spelling mistakes
Stepping:
Allows you to run one line at a time and check the output's correct
Makes it easier to diagnose where a bug is
Breakpoints:
Stops the program's execution at a set point so you can check variables
Also makes it easier to diagnose where a bug is
Variable watch:
Check values of variables at different points during execution
Often used in conjunction with breakpoints or stepping
Error diagnostics:
Find and give information about errors before or during program execution
Makes it easier to solve errors as you get feedback on them
Some errors are found and underlined before the program executes which saves a lot of time
Traces:
Prints the values of variables at different points in the program
Allows you to identify where the program goes wrong
Crash-dump/post-mortem routine:
Shows the state of variables when an error occurs
Helps diagnose what caused the error
Cross-referencers:
Identifies where variables are used within a program
Helps avoid duplicates or using the wrong variable
2.2.2 Computational methods
Computable Problems
A computable problem is one that can be solved using an algorithm, producing the right answer every time in less than infinite time.
Intractable Problems
Intractable problems are ones which can be solved by a computer, but which cannot be solved in a reasonable amount of time for large inputs. A reasonable amount of time is defined as polynomial time or less, so any algorithms that take more than this are classified as intractable. Intractable algorithms therefore have an exponential order of complexity.
Divide and Conquer
This is an algorithmic technique where a problem is broken down recursively until it is solved using a simple technique.
E.g. for 5!, we divide it down to 5 * 4 * 3! then 5 * 4 * 3 * 2! and then 5 * 4 * 3 * 2 * 1! which we can conquer.
Backtracking
This is where you find solutions to a problem by trying a method and then going back if it fails to try another method. Often uses recursion. It can take a long time to complete.
Data mining
This is using pattern recognition or summarising of data within large data sets. This allows you to spot erroneous data and trends, which helps give information which can be used to increase profitability in a business.
Heuristic Solutions
A Heuristic solution is one that's 'good enough'. It may not be the best solution, but it works and saves time compared to the perfect solution.
Performance Modelling
This is simulating and testing the behaviour of a system before it is used.
Visualisation
This is the method of finding a solution to a problem by using visual methods like diagrams. For example, a tree structure is not actually stored in the shape of a tree in the computer but it is easier to visualise that way.
2.3 Algorithms
2.3.1 Algorithms
Big O Notation
Big O notation describes the complexity of an algorithm relative to another, in terms of both time and space. There are six main types of complexity.
Notation
Description
Example Code
Taken from craigndave.org
Example Use
Taken from craigndave.org
O(1)
Constant. Always executes in the same amount of time regardless of the size of the data set. Always efficient.
random_num = data_set(x)
Extracting data from any element from an array.
Hashing algorithm.
O(n)
Linear. Takes longer for a larger data set. Reduces efficiency the larger the data set.
For x = 1 To y
data_set(x) = counter
Next
A loop iterating through a single dimension array.
Linear search.
O(log n)
Logarithmic. Halves the data set in each pass. Opposite to exponential. Efficient with large data sets.
While Found = False And LowerBound ≤ UpperBound
MidPoint = LowerBound + (UpperBound - LowerBound) \ 2
If data_set (MidPoint) = searchedFor Then
Found = True
ElseIf data_set (MidPoint) < searchedFor Thexn
LowerBound = MidPoint + 1
Else
UpperBound = MidPoint - 1
End If
End While
Binary search.
O(n log n)
Linearithmic. Divides a data set but can be solved using concurrency on independent divided lists. Opposite to exponential. Efficient with large data sets.
Quick sort.
Merge sort.
O(n$^2$)
Polynomial. Performance is proportional to the square of the size of the data set. Significantly reduces efficiency the larger the data set. O(n$^3$), O(n$^4$) etc. are also polynomial.
For x = 1 To w
For y = 1 To z
data_set(x, y) = 0
Next
Next
A nested loop iterating through a two dimension array.
Bubble sort.
O($2^\text{n}$)
Exponential. Doubles with each addition to the data set in each pass. Opposite to logarithmic. Inefficient especially for large data sets.
Function fib(x)
If x ≤ 1 Then Return x
Return fib(x - 2) + fib(x - 1)
End Function
Recursive functions with two calls, e.g. Fibonacci number calculation with recursion.
Algorithms for Data Structures Stack
Insertion:
Check to see if the stack is full
If it is full, return an error and stop
If it's not full, increment the stack pointer (which represents the top of the stack), then
Insert the new data item into the cell being pointed to by the stack pointer and stop
Deletion:
Check to see if the stack is empty
If it is empty, return an error and stop
If it's not empty, copy out the data in the cell pointed to by the stack pointer, then
Decrement the stack pointer and stop
Queue
Insertion:
Check to see if the queue is full
If it is full, return an error and stop
If it's not full, increment the tail pointer, then
Insert the new data into the cell pointed to by the tail pointer and stop
Deletion:
Check to see if the queue is empty
If it is empty, return an error and stop
If it's not epty, copy out the data in the cell pointed to by the head pointer, then
Increment the head pointer and stop
Graph Traversal
Depth-first (using a stack):
Push root vertex onto stack
Visit an unvisited vertex connected to the vertex on top of the stack
Push this vertex onto the stack
If there are no unvisited vertices connected to the vertex on top of the stack, pop the vertex on top of the stack
Repeat points 2, 3 and 4 until the stack is empty and all vertices have been visited
Breadth-first (using a queue):
Push root vertex onto queue
Visit all unvisited vertices connected to the root vertex
Push these vertices onto the the queue
Pop vertex at the front of the queue
Visit all unvisited vertices connected to the vertex at the front of the queue
Push these vertices onto the queue
Repeat points 4, 5 and 6 until the queue is empty and all vertices have been visited
Tree Traversal
Pre-order traversal:
Visit node
Go left
Go right
In-order traversal:
Go left
Visit node
Go right
Post-order traversal:
Go left
Go right
Visit node
Bubble Sort
Bubble sort runs through the data comparing each pair of items. If they are the wrong way around they will swap them and carry on to the next pair. This is either repeated one less time than the length of the data, or it is repeated until no swaps are made, which requires an extra variable.
Bubble sort is very inefficient as if some data starts at the top when it should be at the bottom, it will check every pair over and over many times just to make a small change. Insertion Sort
This essentially creates a new array and one by one moves each value into the new array. If the value is greater than the top position, it inserts it at the top, else it moves down and checks each position to find where it should go. When it inserts the value, all values below it in the new array must be moved down one by one to make space for the value to be inserted. This is not very efficient. Merge Sort
This is a recursive technique which follows the basic 'divide and conquer' approach. The idea is to break down the inputs into smaller lists until each contains just a single element so you can compare them individually, then merge the elements back together so that they are sorted. Quicksort
This is another divide and conquer method. It uses a pivot point, splitting the list into two parts, with all the values less than the pivot on the left and all values more than the pivot on the right. Both lists are then individually sorted using the same method, repeating until all items are in the correct positions. Dijkstra's Shortest Path Algorithm
Dijkstra's algorithm finds a shortest path tree from a single source node, by building a set of nodes that have minimum distance from the source. Simplified
Let distance of start vertex from start vertex = 0
Let distance of all other vertices from start = $ \infty $
Repeat
Visit the unvisited vertex with the smallest known distance from the start vertex
For the current vertex, examine its unvisited neighbours
For the current vertex, calculate the distance of each neighbour from start vertex
If the calculated distance of a vertex is less than the known distance, update the shortest distance
Update the previous vertex for each of the updated distances
Add the current vertex to the list of visited vertices
Until all vertices visited
In more detail
The graph has the following:
vertices, or nodes, denoted in the algorithm by $v$ or $u$
weighted edges that connect two nodes: $(u, v)$ denotes an edge, and $w (u, v)$ denotes its weight.
This is done by initialising three values:
$dist$, an array of distances from the source node $s$ to each node in the graph, initialized the following way: $dist(s) = 0$; and for all other nodes $v$, $dist(v) = \infty$. This is done at the beginning because as the algorithm proceeds, the $dist$ from the source to each node $v$ in the graph will be recalculated and finalised when the shortest distance to $v$ is found.
$Q$, a queue of all nodes in the graph. At the end of the algorithm's progress, $Q$ will be empty.
$S$, an empty set, to indicate which nodes the algorithm has visited. At the end of the algorithm's run, $S$ will contain all the nodes of the graph.
The algorithm proceeds as follows:
While $Q$ is not empty, put the node $v$, that is not already in $S$, from $Q$ with the smallest $dist(v)$. In the first run, source node $s$ will be chosen because $dist(s)$ was initialized to 0. In the next run, the next node with the smallest $dist$ value is chosen.
Add node $v$ to $S$, to indicate that $v$ has been visited
Update $dist$ values of adjacent nodes of the current node $v$ as follows, for each new adjacent node $u$:
If $dist(v) + w (u, v) \lt dist(u)$, there is a new minimal distance found for $u$, so update $dist(u)$ to the new minimal distance value
Otherwise, no updates are made to $dist(u)$
$dist$ now contains the shortest path tree from source $s$.
Pseudocode for Dijkstra's Shortest Path Algorithm is shown below:
function Dijkstra(Graph, source):
dist[source] := 0 // Distance from source to source is set to 0
for each vertex v in Graph: // Initialisations
if v $\ne$ source
dist[v] := ∞ // Unknown distance function from source to each node set to infinity
add v to Q // All nodes initially in Q
while Q is not empty: // The main loop
v := vertex in Q with min dist[v] // In the first run-through, this vertex is the source node
remove v from Q
for each neighbour u of v: // Where neighbour u has not yet been removed from Q.
alt := dist[v] + length(v, u)
if alt < dist[u]: // A shorter path to u has been found
dist[u] := alt // Update distance of u
return dist[]
A* Algorithm
This is another pathfinding algorithm that traverses a graph to find the shortest route. It uses a formula that builds lists of possible routes and then finds the best by having the least score.
Create two lists: open and closed. Closed list contains squares we do not need to evaluate again, open list is the opposite. Each square will be given a score which determines how good that square is in terms of getting to the goal.
Find the square on the open list which has the lowest score, and call it $S$.
Remove $S$ from the open list and add it to the closed list.
For each square $T$ in $S$'s adjacent tiles:
If $T$ is in the closed list, ignore it
If $T$ is not in the open list, add it and compute its score
If $T$ is already in the open list, check if the final score is lower when we use the current generated path to get there. If it is, update its score and update its parent as well.
Binary Search
This will only work if the list has previously been sorted.
Steps:
Find the median. If the list length is even, you must use the number to the right of the median line (round up the median)
If the median = target then found
If target > median then go to step 1 with the RIGHT sub list
If target < median then go to step 1 with the LEFT sub list
Repeat until found or list = 1 and not found
Linear Search
Steps:
Get target number
Start at the beginning of the list
Get an item
If the item matches the target then it is found
If not, move on to the next item in the list
If the end of the list is reached, the target does not exist in the list
Disadvantage of linear searching:
As the size of the list grows, so does the maximum potential time required to find the target
Worst case scenario - the target is at the end of the list
It is difficult to predict how long a search will take, so it is hard to provide user feedback