Team Wunuontoo: ERDS
From SoftwarePractice.org
Guide to Document according to Marking Guide
For markers' convenience, this section provides links to areas of interest according to each section of the Milestone 2 marking guide.
Section A
A complete and coherent description of your architectural analysis including purpose of system, context, stakeholder concerns and user needs.
- Purpose of the System
- System Context
- Stakeholder Concerns / User Needs
- Quality Attributes of the System
Section B
A coherent set of architectural views describing the structure, behaviour, implementation and constraints of your system architecture.
Section C
Refinement of your execution architecture into a process view or concurrent subsystems view, and a corresponding deployment view.
Section D
Exploration of the behaviour of your execution architecture through correct application of use-case maps, linked to key events. Supported by careful reasoning of why/how your use-case maps validate or reveal deficiencies in your architecture.
Section E
Refinement of your implementation architecture showing the mapping of conceptual components to application components, and infrastructure components. Reasoned choice of off-the-shelf and custom-built components in your implementation architecture.
Section F
Thoughtful explanation of key architectural decisions and issues with reference to the system context, stakeholder input, results of prototypes, etc.
- Conceptual Architecture Justifications
- Execution Architecture Justifications
- Implementation Architecture Justifications
- Results of Prototypes
Section G
Critique of architectural design in the light of your executable prototype.
Team Members
Team T4
| Name | Student Number |
| James Dibley (Boris) | 10422931 |
| Dean McNiven (McChicken) | 10463988 |
| Phillip Fyvie | 10407688 |
| Rui Huang (Rayman84 ) | 10451817 |
| Daniel Le Clere | 10460970 |
| Christopher Davies | 10437046 |
Project Plan & Milestones
The Project Plan, Milestones, Meeting Minutes and Team Charter are located here.
System Overview
System Purpose
The purpose of the Emergency Response Dispatch System is to allow the vehicles for the various Emergency Services to be tracked and efficiently allocated by Dispatchers to Emergencies. The tracking of vehicles will allow reports to be developed and viewed about the vehicle movements and activity.
System Scope
The ERDS will do the following:
- Provide support the following Emergency Services:
- Ambulance
- Fire
- Police
- Police Rescue
- Coastal
- Harbour Patrol
- Traffic Patrol
- Accident Response Units
- State Emergency Services
- Cover geographic regions ranging from a suburban level to a region and state-wide scale.
- Allow Dispatchers to enter call information received from a 000 call
- Allow Dispatchers to contact resources and allocate them to certain emergencies, as well as reallocate current active resources, on a case-by-case basis.
- Find the closest, appropriate Emergency Service resource to the current emergency using its route calculator.
- Allow the tracking of those Emergency Services vehicles.
- Allow users to build activity reports based on vehicle, centre or region.
- Be available and responsive continuously, 24 hours a day, 7 days a week, with little to no interruption to service. Even when service is interrupted dispatch of vehicles must still be possible
The ERDS will not do the following:
- Track Emergency Services units that operate on foot, or not in a vehicle
- Select a route for a contacted resource
- Select which calls to be sent to which dispatch centre
The ERDS will interface with the following:
- The existing 000 emergency system.
- The system is the telecommunications network that handles and routes 000 calls based on their location and service.
- The individual resources of each service branch.
Quality Attributes
These are the two primary qualities that our ERDS architecture will have to take into account are Reliability and Performance. The other emphasised attributes are Security, Scalability and Testability.
Primary Quality Attributes
- Reliability
- One of the requirements of the system is that it must operate 24 hours a day, 7 days a week, and thus reliability is an innate requirement.
- Furthermore if the system is not reliable it will be abandoned by operators and won't be used.
- To ensure the safety of the public the system must continue to run and improve the current Emergency Services operated, thus meaning it must have at least the same level of reliability as the current system.
- Performance
- The system will have to undergo various periods of stress, for instance during sporting events, New Years Eve, or special events (APEC/WYD), even just every Friday Night, and must not collapse under these loads
- Efficient processing is required to ensure that the services are dispatched quickly and the system is used
Secondary Quality Attributes
- Security
- Because of the sensitive information being dealt with on a regular basis, the system must not allow data in its possession to be compromised to a third party.
- Scalability
- While the system might start operation in only a limited area, it needs to be capable of expanding in scope to cover a multitude of areas of various sizes. In addition, it needs to scale to each emergency service that implements this system, since each has its own areas of coverage and system priorities.
- Testability
- In order to adequately implement the other quality attributes, especially reliability, the system needs to be testable enough to verify its own operations.
Stakeholder Needs
This section details the vision each stakeholder has for the system, what they desire it to do and the priorities they would have that are relevant to the system.
Emergency Services
Dispatchers
Ethel McNab
000 Daytime Dispatcher
Ethel is an in-experienced dispatcher, still adjusting to the pressured conversations that occur. On top of this she has to then contact a emergency service and deliver details so that the fastest response time is achieved. She is excited about having a system that would take up some of the burden, but afraid of having to learn a new system while still adapting to the current job.
Andre Fheland
000 Night Dispatcher
Andre is an experienced dispatcher, and knows existing procedure inside-out. He has worked during previous attempts to implement similar dispatch control systems and regards the ERDS as a potential annoyance at best and a potential disaster at worst. He often finds himself under very heavy pressure and must multi-task during calls - especially for calls that come in from districts he's unfamiliar with.
Drivers for Emergency Services
Constable Ryan, Fireman Lance and Paramedic Chip
Members of the various Emergency Services around Sydney
All these people belong to the Emergency Services and operate out of their respective vehicles and any technology that affects how they are selected for, and attend to, an emergency matters greatly to them. They all want to be the best and most efficient Emergency Services, but each have a degree of scepticism about any new system.
Administration in Emergency Services
Snr. Sergeant Alex Scipione & Snr. Sergeant Bob Spain
Regional Commanders of Burwood and Croydon respectively
Currently, when there is a major emergency in Burwood (and current emergency resources are depleted), Snr. Sergeant Scipione must manually call up Snr. Sergeant Spain in order to ask for aid and any spare emergency resources available. This can take some time and cost lives, and so any minimisation in the time will reduce the stress of both Snr. Sergeants. While they still want to be kept in the loop when resources are being borrowed, they want to be able to authorise it quickly and easily.
Government
Federal Government
The Hon Anthony Albanese MP
Minister for Infrastructure, Transport, Regional Development and Local Government
As Minister for Local Government, Mr. Albanese has influence over the Fire, Police and Ambulance services in each state. He has the ability to approve or disapprove any system that would affect these services, and thus needs to be shown not just the benefits but also how much it would cost to integrate into the existing systems.
State Government
Gregory Peck
Systems Engineer for RTA Traffic Systems
Mr. Peck helps develop and maintain the traffic monitoring/management systems the RTA has in place on NSW roads, particularly in the cities. He has extensive knowledge of their operation and capabilities, and would need to be consulted for any changes to how emergency services are routed to see how they would affect his systems.
Mr. A.R. Hilfitcher
Analyst for the Office of Emergency Services, NSW
Mr. Hilfitcher currently analyses how fast a response time is achieved in various areas by the different services and tries to find areas that need improvement and areas that are working past expectations. Currently there is a large amount of paper recording used and thus analysis from these records is painstakingly slow and often results come in too late to be made use of. Anything that would allow automated digital recording of data about how the Emergency Services are used will allow the production of more timely and accurate analysis reports.
International
The Rt Hon. Jacqui Smith, MP
Home Secretary, Home Office
United Kingdom
Jacqueline Smith is currently the department head of the United Kingdom Home Office, with responsibility for the fire services and police services. While not being involved in the day-to-day running of these services, she still has to set overarching goals and improve the department she is running in order to reach her political goals.
Anything that would improve the services offered by the fire and police forces will reflect well on her, so she subsequently keeps an eye out for systems. A successful ERDS system implemented in Australia would be attractive to transfer due to the similar system contexts and objectives. Also, a system developed and tested elsewhere is financially attractive since her department would not have to bear the cost of development - nor would it carry the same risk of failure as using a newly-developed system that has not yet been successfully piloted in the real world.
Public
General Public
Eugen d'Bean
Eugen is a member of the public who has called the emergency services number, 000, to report an emergency. He doesn't worry about how the Emergency services travel from one point to the other, all he knows is that he requires them immediately. Anything that improves the response time will be appreciated, with inadequate response times being reported to the ESM group.
Interest Group
Emergency Services Monitoring (ESM)
The EMS group is constantly lobbying both state and federal government departments to improve the services they offer the public. They also monitor the response times of the various services to emergencies, and use the various examples of inadequate response times in order to put pressure on the government through the media.
System
ERDS Developers
Eliah Smith
System Architect at Wunuontoo Systems
Eliah Smith has just been assigned as the Systems Architect for the ERDS the first project he is leading, and is concerned with doing an excellent job. While having been involved with complex systems before, it has never been as the lead developer. As such the ERDS will be more cautiously developed, with less ambitious aims than an experienced architect would set. Eliah would like to continue his job as a System Architect, so by setting less ambitious aims overall he hopes to deliver a higher-quality system.
London Ambulance Service Computer Assisted Dispatch (LASCAD)
Joy Wang
Consultant to Wunuontoo Systems
Joy Wang was one of the Engineers who worked in the failed 1992 LASCAD program for London Ambulance Service. She also helped with the program wrap-up, including developing the list of identified issues with the LASCAD program that inhibited its success.
She is now working as a Consultant helping to guide the development of the ERDS. Her experience, particularly in the failures of the last program, will be invaluable guidance to the developers at Wunuontoo Systems. She also hopes for a successful program that would help develop the new LASCAD program.
Support Staff
Eric Palisin
Technical Support for ERDS
Eric works nights ensuring that ERDS is functioning, and helps users with any queries. While he knows that dispatches and the emergency services can work without it, he also knows that they rely on it and the high critically means that he needs to be able to repair it fast as possible whenever something goes wrong.
Administrator of Routing System
Deepak Kahn
Administrator and Support for Routing System
Deepak maintains the Routing System that ERDS uses to deduce the quickest path between two points. While currently his system is not critical, and thus has a lower level of support, he is concerned about the ERDS being attached. It would require him to spend more time maintaining his system, as well as ensure his system is scalable, maintainable and reliable according to the ERDS' criticality requirements.
System Context
Market/Competitive
Wunuontoo Systems is a relatively new software company, however has tackled some large, important systems quickly and with positive feedback from the stakeholders involved. While they have experienced developers (including Joy Wang) and a unique culture (Enabler), developing the ERDS will require all the company's resources (Risk). They won the tender to develop the ERDS, shutting out any Australia Competition (Enabler), however both the cost and potential liability in the system means that stringent reviews will need to take place constantly (Constraint).
Organisational
Organisational Procedures
Currently when a call is made it is taken by a Telstra 000 operator who asks for service type and current location of emergency. This is routed to the actual Emergency Operator, local to that location, who records the details of the emergency. This procedure can't be modified greatly but is common to all emergency services (Constraint & Enabler). Also any modifications by Telstra to the 000 system would require changes to the ERDS (Risk).
Current dispatcher procedures will also have to be complied to in order to ensure that the switch over and retraining time will be minimal (Constraint).
Organisation Structure
The Telstra 000 network works across all states/territories and emergency service types (Enabler). However each state runs its own set of Emergency Services, with divisions of each as required (for instance there are local police/fire/ambulance, but state-wide general Emergency Services) whose various structures and operations will have to be accommodated by a generalised Dispatch System (Risk).
Technological
The Emergency Services are bound to very strict protocols and guides in relation to how they operate and how the information is passed and stored between the various departments (Constraint).
The Emergency Services have existing fleets of vehicles which will prevent any radical changes being made by the implementation of the ERDS (Constraint). Similarly to the vehicle fleets, the Dispatch for all Emergency Services is run by Telstra (Constraint) and therefore any major infrastructure changes would be both costly and undesirable. However, Emergency Service vehicles all have Global Positioning System already installed (Enabler). This will allow the ERDS to identify which required Emergency unit is closest to the required location.
The different Emergency Services divisions all have differing needs of the ERDS (Constraint). Some services such as the police may be able to be transferred between incidents where as an ambulance would typically return to base to replenish supplies and to leave a patient at a hospital. Similarly the resources of each division are limited in nature (Constraint), both physical resources and personnel.
The prevalence of mobile phone coverage in Australia (Enabler) allows for the possibility to build the ERDS around a mobile networking infrastructure (eg, Encrypted Tunnelling over HSDPA or EDGE, creating a Virtual Private Network or VPN, etc.). However, we must take into account that despite the wide mobile phone coverage in Australia; the device itself is still not fully reliable (Risk).
Policy
Emergency response in NSW is governed by the State Emergency and Rescue Management Act 1989, which sets forth the structure of the State's emergency response agencies and procedures - at least in theory. The Office for Emergency Services, which is responsible for co-ordination of the State's emergency services under the Act, has also put forward several precise policies that stipulate procedures to be followed for each type of emergency. (Constraint) These policies form the core of the State's training and procedures, so any departures (intentional or unintentional) from the structures and systems outlined in policy are likely to cause significant confusion (Risk).
Usage Narratives
Ethel McNab
Narrative One
Ethel McNab has just returned to her work area after a break period. Shortly after registering with the system as active, Ethel receives a call from Eugen d'Bean. Eugen frantically informs Ethel that his cat, Doogle, climbed up a tree and refuses to disembark. Ethel gathers Eugen's address, 123 Fake St, Seedy Meadow. Ethel uses her own judgment to list the event as low priority and allocate a unit that includes general rescue operations among its listed capabilities.
Narrative Two
Ethel McNab, having dealt with Doogle, instantly received another call. This time it is from Jerry Harvey. Jerry just witnessed a car crash where a Fire Engine collided with a car travelling along a major motorway. Ethel logs the event into the ERDS system, and schedules in the nearest Fire Engine, Ambulance, and Police Car. To Ethel's surprise the Fire Engine was instantly listed as having arrived. Ethel, being clever, realised that the Fire Engine listed was the same as the one in the accident. Ethel then used the ERDS to schedule the next closest Fire Engine. The system allows this as it is to aide the judgement of the operator, not to force certain decisions.
Senior Sergeant Alex Scipione & Senior Sergeant Bob Spain
Narrative One
There is a robbery in progress on a busy Friday night, and currently all the Burwood police units are deployed on other emergencies. Senior Sergeant Scipione knows that nearby Croydon is experiencing fewer emergencies and is likely to have free police units. He contacts Snr. Sergeant Spain and requests a spare unit to attend urgently, and Spain sees that he has a free unit nearby and transfers it to Burwood dispatch centre temporarily. The robbery is resolved and the unit transferred back.
Andre Fheland
Narrative One
About an hour into his shift, Andre receives a call from a very shaken Eugen d'Bean. Eugen informs Andre that he's just witnessed an aggravated assault, and multiple people have been injured. Andre follows procedure, and begins collecting information - creating a new case in the ERDS marked as requiring high-priority response by both Ambulance and Police Services.
Andre asks Eugen which area he is calling from - after receiving the answer he pulls up a map of Seedy Meadows. However, when asked for the location that emergency units need to be sent, Eugen responds that "it's near where the old pub on Baker St used to be" but is unable to give an exact street number.
Normally, Andre would pass this information directly to the local stations he requests emergency units from and they'd handle the finer details - but he isn't sure how to do that using the ERDS system. He begins to feel irritated with the system, and zooms his map in on Baker Street (a long street that bisects Seedy Meadows). Searching for Pub buildings on Baker Street returns six marks on his map, all quite far from each other. Andre is now prepared to fall back on backup procedure and abandon use of the ERDS entirely - Eugen's nervous babbling isn't helping any. He asks the system to locate all available ambulance units "near Baker Street", and sends a request for the closest one to attend to his current case "near where the old pub on Baker St used to be".
A few seconds later, the ambulance unit requests clarification. Andre twitches in frustration, but notices an option to "contact unit". He selects this as a last resort and is put through on a second line to the ambulance driver - who has no idea where the place is either. After Andre explains the situation, the driver says that he'll ask the others in the ambulance or, failing that, radio his base hospital for help and cuts the connection.
Andre spends a minute collecting further details from Eugen (five injured - one with deep lacerations, no current threats since assailant ran away) and updating the case details while watching the ambulance icon nervously. To his delight, the destination co-ordinates for the case are filled in (the paramedic riding shotgun has input them after receiving them from the base hospital) and the ambulance accepts responsibility for the critically injured patient and one non-critically injured patient - but the case is still marked as "Partially Assigned". Since the co-ordinates are now known, Andre simply searches for the nearest available ambulance (the one he contacted earlier is no longer "Available", of course) and police unit and requests that they attend the scene. A few seconds later, both accept and the case is marked as "Assigned" for both police and ambulance response.
Relieved, Andre informs Eugen that help for everyone is on the way and prepares to take another call.
Narrative Two
Andre has pulled the Saturday night shift and has just taken a call reporting an assault - one of a few that Andre has already received tonight. He enters the location and incident details given to him by the caller and notices that the incident is close to where police on foot should be patrolling. Foot units are not tracked by his ERDS console, so he asks the police dispatch co-ordinator if police on foot can look into the incident. After hearing that foot units will take care of things, Andre manually sets the newly created case's status to "Responding" and additionally notes that it's being responded to by police foot units rather than a vehicle. The case changes to the colour for "Responding" and additionally displays a small "hand" icon next to the case to signify that Andre has manually set its status.
Since the injuries that the caller has reported sound severe, Andre adds another response slot to the case and assigns an ambulance to the slot. The case expands to show two response slots - one for the Police, coloured with "Responding" and a hand icon, the other for Ambulance and set to "Response Pending" and no hand icon. A few seconds later, the ambulance's slot changes to "Responding" as the driver accepts the case. Andre minimises the case and its entry in the case tray collapses to show that it has two slots and that the case overall has a status of "Responding".
A while later, Andre notes that one of the two case slots has changed to "Resolved". Expanding the case in the case tray, he sees that the slot in question is the Ambulance's. A few seconds later, the Police slot changes to "Resolved" - but the "hand" icon remains. Curious, Andre brings up the case in detail and notices that the Police slot was changed to "Resolved" by the user that corresponds to the police dispatch co-ordinator.
Constable Ryan
Constable Ryan is a regular police officer who works the beat around the city. He has been in the police force for 5 years and is competent at what he does. Whilst he may not be as skilled as an Ambulance Driver, he is still quite accomplished a driver. He has a detailed knowledge of his usual patrol routes and is fairly knowledgeable about city roads in general.
Constable Ryan does not trust the ERDS system, since he remembers how unsuccessful past systems were. He especially mistrusts the dispatch allocation portion of the system, and will quickly fall back to direct radio if he becomes frustrated. He will be inclined to ignore the pathfinding system if he feels he has a better idea of how to get to a destination, especially if it lies somewhere along his usual patrol routes.
Narrative One: Officer Ignores Directions, Mutes System
Ryan is patrolling his beat in a vehicle when the ERDS console beeps: there has been a break-in at a house in the Marrickville area, and his car has been asked to respond. After pressing the large green "accept" button, the ERDS displays a route from his current location to the home. However, Ryan finds the unit's directions irritating since he already knows the way there and mutes the driving directions. The ERDS unit stops reading directions to him out loud, although it still displays an updated map and will still audibly notify him of new dispatch requests.
Narrative Two: Officer Finishes Shift while Allocated, is Re-Allocated to Emergency.
Ryan is at the end of the long day, and is on his way to respond to his last dispatch. While driving there he reaches the end of his shift and changes his status to "Offline" so that he won't be allocated to further jobs after this call out. Ethel (who assigned the dispatch to Ryan's vehicle) does not notice any change to the case on her console. However, Andre (who has spotted Ryan's vehicle and wants to assign a call to it) notices that Ryan's vehicle's icon is overlaid by a clock icon, indicating the driver has gone off shift. Since his call is an urgent emergency and Ryan's vehicle is the only one close enough to respond in time, Andre allocates the call to Ryan's vehicle and confirms the allocation in spite of the vehicle's status.
The ERDS console in Ryan's car beeps, notifying him that he's been requested at a high-priority emergency. Sighing, Ryan accepts the new case and reminds himself to clock overtime when he returns to the station.
Ethel notices that the case she originally assigned Ryan's vehicle to has changed to the colour corresponding to "allocated but not currently being responded to". Curious, she brings up the case and notices that Ryan's vehicle has been allocated to a higher-priority emergency. She also notices that he's supposed to be off-shift, so she allocates another, further-away vehicle to the original case.
After Ryan finishes with the high-priority emergency and returns to his car, he notices that the case he was originally responding to is no longer in his list - he can return to the station and clock off!
Fireman Lance
Lance has been a driver for the Burwood fire station for the last 10 years. Over the years Lance has gained a great deal of experience in navigating Burwood. He’s proven himself to be a competent driver for the fire department over and over again with his knowledge of the area.
Lance doesn’t know what to think of the ERDS - to him the ERDS is just another radio allocator and Global Positioning System (GPS) navigator. To Lance however, he trust in his experience of the knowledge to the area rather than some electronic device. But he firmly believes that one way or the other the ERDS will assist him with his job, and that he needs to adapt to using it.
Narrative One: Driver Uses Own Route, Mutes Directions
It’s 5 pm in the afternoon and peak hour traffic across Burwood’s main road. A call has come in regarding a house fire. It’s estimated that under normal circumstances it would take Lance's truck no more than 5 minutes to get to the destination. Lance’s unit has been selected to attend to the emergency and his ERDS console has presented him with a route. The route presented would indeed allow the team to get to the target building in 5 minutes under best conditions.
However, Lance’s experience tells him that by taking the route generated by the ERDS, the trip will actually be longer because the route goes through to major roads and it is currently peak hour. Lance knows the best method to get to the destination is by back roads in order to avoid main roads - the trip will still take longer than usual but will still be faster than the route offered by the ERDS at the moment.
Lance decides to use his own route. Since he doesn't want to be distracted, he mutes the ERDS' driving directions.
Narrative Two: Vehicle Low on Supplies, Cannot Attend
Lance's crew has just finished a putting out a fire in an apartment block. Most of the Oxygen in Lance’s unit has been used up. Dispatch receives another call about another fire in the Burwood area and Lance’s unit is closest to the location. Lance's ERDS unit beeps, notifying him that his vehicle has been requested to attend an apartment fire.
Lance, using the ERDS notifies the dispatch that the resources in his unit are running low and that he can not adequately deal with the event. Dispatch agrees with the Lance and dispatches another unit to attend to the second fire rather than using Lance’s unit.
Paramedic Chip
Chip has been a paramedic with the Bankstown branch of the NSW Ambulance Service for the last 15 years. Over the years Chip had gained extensive knowledge of the geographical layout of Bankstown. However, due to funding problems, Chip has been reassigned to the Ultimo branch of the NSW Ambulance Service, an area that is completely foreign to him.
Suddenly being placed in an unfamiliar area has made Chip regard the ERDS as an important tool which has assisted him to respond to emergencies and to be a productive member of the team immediately following his transfer.
Narrative One: Low Supplies, Attends in Limited Capacity as Triage
Ethel receives a notification from Constable Ryan requesting immediate assistance at the scene of a robbery to treat several people suffering from deep lacerations, an incident with a high priority.
Although Chip is close, his ambulance is low on supplies. When his ERDS unit beeps and notifies him of the emergency he's being requested to attend, he informs Dispatch as such. Due to the urgency of the situation, Dispatch tells Chip to attend and treat the most urgent cases until another unit can relieve him.
Narrative Two: Unit Exhausted
Chip’s ambulance is returning from its third successive call without a chance for rest. The supplies in the ambulance are depleted and its crew exhausted. The ERDS alerts Chip that his unit has been requested at the scene of an accident.
Chip knows that his ambulance will not be sufficient due to its depleted supplies, so the paramedic riding shotgun rejects the request citing insufficient capacity. A few seconds later, the request disappears - although he feels bad for turning down a request where time is a factor, Chip knows that it has been reassigned to another ambulance that has a better chance of success than his.
Mr A.R. Hilfitcher
Narrative One
Mr. Hilfitcher is preparing a report on ambulance response times in Ultimo during various stages of the day over the last three months – a task that usually means a great deal of dreaded copying, pasting, manual entry and spreadsheet wrangling.
He brings up the Report Creation panel of the ERDS system and requests information on ambulance movement in Ultimo. Since he wants more than the last recorded day, he ignores the detailed map presented to him and asks the system to produce a spreadsheet file with all cases logged in the last three months – specifically their date, time and time between case creation and case response.
He is pleased when he is presented with a spreadsheet neatly organised into columns with the data he wanted – even more so when he is able to paste the information directly into the spreadsheet he is currently working on in Microsoft Excel.
Quality Attributes
Reliability
Because of the nature of the ERDS, the data it contains and each individual transaction (each call) must not be lost. Any failure of the system must not loose any data, other wise peoples lives will be placed in jeopardy and the Emergency Services or even the developer of the ERDS may be criminally liable if actual harm was to occur.
Server Failure
Ethel McNab & Eric Palisin & Paramedic Chip
Ethel is processing a triple zero call, making entries within the Current Case module. As she does so, the network server her cell is currently connected to crashes, something that happens on average (MTBF) every 30 days. Her system momentarily becomes disconnected before her cell reconnects to a different server, a task that takes a mean time (MTTR) of 15 seconds. It doesn't affect how she enters and processes calls; however her data won't be backed up until at most one minute after her cell reconnects to a server.
Eric Palisin, being in charge of technical support, quickly moves to assess the situation. He notes that while server traffic has increased on the other servers, the increases are spread out across the whole network, as each cell is allocated a different server to send data to. He also knows that at peak rates, it would take three servers going down simultaneously before users would notice any significant performance degradation. This spread ensures that no single server takes too much traffic and prevents a cascade failure from occurring. With these concerns settled, Eric begins working on fixing the downed server, a task with a mean time (MTTR) of 2 hours.
Once the server comes back up, he notes that it resynchronises the latest data from the other servers before offering itself to users, a process that takes a mean time of 10 minutes.
Terminal Crash
Andre Fheland
Andre is working when his display becomes unresponsive and then abruptly disappears. Surprised, he remembers his training with the system mentioned this situation - his console has crashed. While he waits, he uses a pen and paper he has on hand to take down any important points from the caller he currently has on the line. He waits for half a minute (MTTR), in which time his display has restored itself and asked for his credentials. He identifies himself to the system and it displays the same view he had before it crashed. Relieved, Andre continues to input the information supplied to him by the caller.
Dispatcher Cell Failure
Ethel McNab & Andre Fheland & Constable Ryan
Ethel is again processing a request when the terminal power fails, as does the power of the terminals around her. The backup power does not come on and the network link to the server has also failed. Ethel realises that her entire cell has failed.
After determining that she can’t quickly recover from this problem, she falls to backup procedure and uses the police radio to coordinate the current case with the neighbouring cells that have picked up her Burwood cell’s workload. This process takes approximately 2 minutes (mean time to use fall-back procedures).
Andre has come in early today and is managing the neighbouring cell in Croydon. A minute after Ethel's experiences the power blackout (mean timeout period before control is transferred), the system informs Andre that Burwood has been disconnected and that Croydon is now in charge of Eastern Burwood. After approximately 30 seconds (mean time to transfer cell data), Andre sees that his ERDS console now has the cases that Burwood was handling before their cell failed - most have been allocated and need no urgent attention. For the cases where vehicles still need to be allocated, he coordinates with Ethel and the drivers he is responsible for to ensure the current case is still dealt with.
Constable Ryan is patrolling the Burwood area. A minute after Ethel's experiences the power blackout (mean timeout period before control is transferred), he is notified by his vehicle interface that he is now temporarily under the control of the Croyden cell as a result of system failure at Burwood. This has no immediate effect on Ryan - although he is now receiving orders from Croydon until Burwood comes online, the system behaves no differently from before and he does not need to change his own procedure.
Vehicle Disconnected
Ethel McNab & Fireman Lance
Lance's vehicle has been called to check an automatic fire alarm call in a high-rise apartment block. He has parked the fire truck in the building's underground carpark. He glances at the ERDS panel, expecting to see the big green button that automatically appears when the system knows he's close to his destination. Ordinarily, touching this button would confirm that his vehicle has arrived and that the emergency is being responded to - but he's alarmed to see that the button is greyed out and a small area in the top-left corner of his display reads "No Signal - Connection Lost". Lance realises that where he has parked doesn't have sufficient reception for the ERDS, so he contacts his local dispatch using the truck's radio to confirm that he has arrived.
Ethel is handling the automatic fire alarm call case - it is currently minimised in her case queue as "Assigned". However, she notices that the case has turned purple. Curious, she selects it. The system tells her that connection with one of her assigned vehicles for that case has been lost and shows her a greyed-out vehicle icon at its last known location - a few metres away from its destination. Ethel knows that this probably just means that the vehicle has lost reception or that its ERDS console has malfunctioned so she decides to leave it for a minute while she checks some of her other cases. Sure enough, about a minute later she receives word that Lance has radioed in a confirmation. Ethel selects the purple case and manually changes its status to "Responding". After she confirms her manual override, the case changes back to the normal colour for "Responding" and shows a small "hand" icon next to it to signify that its status has been manually set.
Lance and his crew finish checking the false alarm. As Lance drives out of the underground area, his ERDS beeps and the "Connection Lost" message changes to "Reconnecting...". A few seconds later, the console updates to show that his case has been changed to "Responding". Since they've already finished, Lance confirms that the emergency has been resolved.
Ethel notices that the "hand" icon next to the case she assigned Lance has disappeared and that the case has changed to the colour for "Resolved".
Performance
Lucy Johnson
Emergency Dispatch Centre
Rosehill Local Area Command
Lucy Johnson works at the Emergency Dispatch Centre at Rosehill. Her job is to find and contact available emergency unit to attend to the appropriate emergency situation. Since she acts as the intermediary between the emergency unit and the caller, it is important that she has up-to-date information on real-time movements of emergency units within her area.
It is Wednesday night, and Lucy is on duty. She receives an emergency request for an ambulance. On her console, she enters the address where the ambulance is needed. The map on her console centres itself on and points out the location of the address in approximately two seconds (mean time to respond to map search). The map also shows the location of the nearest available ambulance within the local command area. Since the ERDS updates constantly Lucy knows that the location of the ambulance is reliable - the data shown on her map is a maximum of 15 seconds old (maximum location update latency). She selects the closest unit available to the location, and allocates it to the case she's just created. The selected unit receives the request within 5 seconds (maximum time to transfer request). Once the driver accepts or otherwise responds to the request, Lucy knows she'll see the result within 5 seconds (maximum time to transfer response).
Lucy remains on the phone with the caller to keep her calm. On the console, Lucy sees that the ambulance has accepted the request - its status has changed to Allocated and it's heading in the direction of the caller's house. Since Lucy knows that the events she's seeing on the screen of her ERDS console reflect actual movements in real time, she's able to reassure the caller by keeping them updated as to the progress of the ambulance toward them.
Initial Conceptual Architecture
Descriptions
These specify the domain level responsibilities of each component.
Current Case
- Stores cases for that area up to 3 hours
- Allows sorting and retrieval of cases by operators
- Tracks which vehicle is responding to which case
Archive DB
- Long term storage of all case information for the area
- Allows sorting/filter for report generation
Report Generation
- Accesses the archive and implements any queries from the interface and returns the relevant information
Report User Interface
- Allows viewing of the reports
- Allows entry of queries, filtering criteria
- Do we need to store the reports once generated?
Route Finder
- Takes a list of points as well as a single point, and returns the closest point (from the list) to that single point
Vehicle Movement DB
- Tracks current position of vehicle
- Also tracks position of vehicle at certain intervals of times and at certain events
Operator Panel
- Allows entry of call data
- Viewing of current calls
- Allows selection of vehicle to respond
- Ability to contact vehicles
- Provides pictorial representation of call and vehicle locations
In Vehicle Interface
- Allow vehicles to update their status
- Allow them to see pictorial representation of call and vehicle locations in local area
- Allow them to contact operators?
Elaborated Conceptual Architecture
Overall
In order to best meet reliability and performance criteria, the system's architecture is divided into different levels, separated by responsibility and scope.
The Cell-Level Architecture has responsibility for entering cases, tracking vehicles and allocating vehicles within a given geographical area. (This gives the advantage of stakeholder familiarity: Emergency service dispatch already uses a command structure separated into Local Command Areas, or Cells.)
Each cell platform operates independently of other cells, meaning that heavy loads within a given cell will not affect performance in other cells. The limited geographical area of command will also mean that a failure of one cell will have a limited impact that can be mitigated by nearby cells taking responsibility for part of its area - reliability is thus improved by allowing uninterrupted service without the danger of a cascade failure.
Multiple cells connect and synchronise to form the Distributed Network Architecture. To maximise reliability without impacting on performance or risking a cascade failure, cells will form a decentralised, distributed backup architecture based on the [Distributed Hash Table]. This model is scalable - meaning that we can add servers when peaks in demand occur (e.g. APEC, WYD). If certain servers go down, all data they were handling remains accessible (in line with the type of reliability required). It is auto-balancing (preventing cascade failure), self-healing, and only requires knowledge of a few key nodes for cells to access the backup network.
Interoperation of Cells & Distributed Network
Each cell has responsibility over a geographical area. This area will be defined according to existing Emergency Services Local Command Areas (stakeholder familiarity), population size, the number of emergency services to be handled, any organisational limitations or guidelines, and technical limits on the range of communication available between the dispatch centres and the vehicles.
There will be a number of servers located at different sites that will handle a number of cells. It will be given a range of cell hashes to cover - any cell whose hash falls within this hash range will connect to that server. It will handle backup of data from this cell, sending data to cells close by (i.e. also connected to that server) when requested, and synchronising data with other servers.
Interoperation Conceptual architecture
Description
Common to DHT and Cell
Database Layer General datastore used for storing relevant data
- Stores data
- Delivers requested data
XML Layer XML Handler common to both the DHT and Cell nodes and resposible for;
- Packaging query requests from the Cell to prepare them for transport to the DHT
- Packaging query responses from the DHT resultant from a request
- Unpacking query requests from Cell Nodes
- Unpackaging query responses from the DHT
Transport This is the general transmission layer and is responsible for;
- Moving data from one entity to another
Internetwork This layer gives network addresses to the cell/DHT.
Cell Node
DHT Hash Layer This component is responsible for;
- Creating a hash representative of the data given
- Finding the DHT Server Node responsible for data of the given hash range
DHT Server
DHT Hash Layer This component is responsible for;
- Receiving requests from Cell Nodes and checking to see if the request hash falls with in the DHT Node's domain of responsibility
- Referring requests to the next DHT Node that fall outside this domain of responsibility
Use Case Maps
Add data
- 'XML Layer' creates a XML representation of database query
- 'DHT Hash Layer' in Cell Query Layer
- Create a hash representative of the data to be inserted into 'Database Layer' located in DHT Query Layer
- Transmit hash to 'DHT Hash Layer' in DHT Query Layer
- 'DHT Hash Layer' in DHT Query Layer - Hash(400-599)
- Check to see if hash received from 'DHT Hash Layer' in Cell Query Layer falls within its hash domain
- In this case it does not so it responds with the address of the next DHT Node
- 'DHT Hash Layer' in Cell Query Layer'
- Receives negative response from the DHT Query Layer - Hash(400-599)
- Sends hash to referred 'DHT Hash Layer' in DHT Query Layer - Hash(600-800)
- 'DHT Hash Layer' in DHT Query Layer - Hash(600-800)
- Check to see if hash received from 'DHT Hash Layer' in Cell Query Layer falls within its hash domain
- In this case the provided hash falls within the hash domain
- Query is allowed to proceed to 'XML Layer' in DHT Query Layer - Hash (600-800)
- 'XML Layer' located in DHT Query Layer - Hash(600-800) will unpackage the query received from the 'Cell Query Layer' into a query format recognised by the 'Database Layer'
- 'Database Layer' in DHT Query Layer performs query
Cell Level Architecture
Description
In-Vehicle GUI
This is the GUI inside each Emergency Vehicle, displayed on an adjustable Touch Screen:
- Displays Dispatch Requests that include location and case details in graphical and textual format.
- Allows driver/co-driver to accept requests and give additional feedback to Operator by pressing buttons displayed on touch interface.
- Allows driver/co-driver to update vehicle's Status (e.g. arrived, off-shift, supplies low).
- Allows driver/co-driver to reject requests and supply feedback to dispatch operator.
- Displays vehicle's current position and calculated routes from the GPS system.
GPS
This is a Global Positioning System located in the vehicle.
- Receives current location coordinates of the vehicle from GPS sensor hardware.
- Sends vehicle's current location and calculated routes to destination coordinates to the In-Vehicle GUI for display.
- Communicates the vehicle's current location to the Vehicle Handler via the Vehicle Connection handler.
Vehicle Handler
This handles communication between the Vehicle components and the rest of the Cell system. By passing all communication between a vehicle and the rest of the Cell through this common decoupled interface, the system can be evolved to use a different wireless connection methodology by swapping in a new version of this component.
- Provides a wireless link between the Physical Interface, the Vehicle Components and Vehicle Handler.
- Tracks which vehicles are attached to the node, and handles all the communication to/from those vehicles.
- Sends status, position and case assignment data to the Current Vehicle Position & Status database.
- Acts an intermediary between dispatch operators (using the Operator GUI component) and vehicles (using the Vehicle GUI component) - passes Requests from dispatch operators and replies from vehicles.
Vehicle Position and Status
Stores the current position, status and case assignment of vehicles attached to the node. Separation of this component (which is of the Real-Time type) from the Vehicle Movement history database allows the critical performance of this component to be maintained without the overhead of the large database of the Vehicle Movement component.
- When GPS location or status received from the Vehicle Handler changes, sends a timestamped update to the Vehicle Movement component.
- Sends current vehicle data to the Type Filter component when a vehicle proximity search is requested from the Operator GUI via the Current Cases component.
Vehicle Movement
Stores the history of position and status changes of vehicles attached to the node.
- Position and status of a vehicle over time are stored as a series of co-ordinate and data "snap-shots". A new snap-shot is added when:
- A given interval of time has passed.
- Vehicle status changes.
- New vehicle is added to the node.
Type Filter
Filters appropriate vehicle types for a given emergency, so only vehicles that can respond to a given emergency are considered for dispatch.
- Accepts an emergency type, compares it against the type of the vehicles in the list from Vehicle Position and Status.
- Accepts location co-ordinates, compares it against locations of vehicles in the list from Vehicle Position and Status.
- Sends a list of vehicles matching search criteria and emergency location to the Route Finder module.
Route Finder
Calculates fastest driving route and estimated trip time between two co-ordinates.
- Accepts an emergency location and a list of vehicles.
- Sends a list of nearby vehicles (ordered by proximity and status) and the routes to the emergency location for each.
Current Case
Holds all the details for currently active cases.
- Stores case information entered by dispatch operator and received from Vehicle Hander.
- Allows cases currently in the system to be searched.
- Sends new cases to the servers on the distributed network for backup (Reliability).
- If changes are made to the case, the updated copy is sent to the distributed server cloud.
Operator GUI
Displays the interface upon which dispatch operators enter case information, manage cases and allocate vehicles.
- Displays operator's current cases, map, vehicle locations, addresses and other information pursuant to dispatch operations.
- Updates vehicle locations in real-time (performance requirements mandate a maximum lag time of 15 seconds).
- Optimised to allow quick entry of case information, management of existing cases and interaction with vehicles under the area's command.
Local Authentication
Checks credentials supplied by an operator and grant access to the functionality of the GUI if supplied credentials match those of an authorised user.
- Accepts authentication details (e.g. username, password) and compares them with stores authentication hashes.
- Allows authorisation data to be altered by system administrators.
Remote Network Interface
Interface to the distributed network that will hold copies of current data and archive older data (allowing uninterrupted system operation, a type of reliability).
- Connects to neighbouring cell servers using the DHT network architecture.
- Exchanges current and historical case data for purposes of backup.
- Transports the current case details and vehicle movements.
- Collects and sends backed-up data back to the Current Case modules from the distributed network in the event of failure.
Use Case Maps
User Login
- User supplies login credentials via the Operator GUI.
- Login credentials are hashed and sent to Local Authentication.
- Local Authentication sends indication of Success/Fail back to Operator GUI.
Creating a New Case File
Triggered by the creation of a new case by a Dispatch Operator, presumably in response to an emergency call or other request.
- Dispatcher creates a "New Case" from the 'Operator GUI'.
- Case is sent to the 'Current Case File' data store.
- Case is also sent to the 'Remote Network Interface' for backup within the distributed server cloud and retrieval from the cloud by other authorised operators from different cells.
- Case emergency location and requested vehicle type are sent to the Type Filter, which generates a list of appropriate vehicles close to the emergency location, ordered by status and proximity. This list is sent to the Route Finder.
- The Route Finder generates paths to the emergency location for each vehicle, along with estimated ETA. These routes and list are sent to the Operator GUI for display.
- The Operator GUI presents a list of vehicles able to respond to the emergency, ordered by ETA and status.
Sending message to Vehicles & Response
Invoked when a message is sent to a given vehicle (e.g. a request to attend an emergency), and the vehicle sends back a response.
- A dispatch operator selects a vehicle using the Operator GUI and makes a request of that unit.
- The request is forwarded through the Vehicle Handler module.
- The request (along with suitable options for response) is presented to the driver/co-driver through the In-Vehicle GUI.
- Response is selected using the In-Vehicle GUI.
- Response is forwarded through the Vehicle Handler.
- Response message is presented to the dispatch operator who made the request via the Operator GUI.
- The response is added to the case history in the Current Cases data store.
- If the response implied a status change for the vehicle (e.g. available -> responding), the status update is simultaneously propagated to the Current Vehicle Position & Status data store.
Updating the Vehicle Position
The node must know the vehicle's position in reality. Vehicle position data presented to dispatch operators must only be a maximum of 15 seconds old.
- GPS sends current co-ordinates to the Connection Tracker, which pushes the data to the node via the Vehicle Handler.
- Vehicle Handler sends the location update to the Current Vehicle Position & Status data store.
- The location update is added to the Vehicle Movement history.
- Location updates are backed up to the DHT network cloud through the Remote Network Interface.
Data Models
Local Authentication Persistent Storage
These processes are located within Local Authentication, which needs to be able to store currently logged in users, as well as manage their login sessions (in this case, using a cookie-like structure). The data store is an aggregation of User Details.
Case data
Case data stores all the metrics relevant to a current case for passing between both vehicles and the database.
Vehicle Data
Vehicle Data needs to keep track of the vehicle ID, type and status, for recording by either the database or for use by the vehicle handler. In addition, it takes data from the vehicle GPS in the form of Location Data.
Request Data
Request Data is used to pass requests between the operator and the individual vehicles, passing individual case data to the vehicle GUI's. In order to coordinate this, it needs to store which cell is sending the request(request source), since the requesting cell is not always the vehicle home cell. In addition, it needs to store the type of request sent, since different cases will require different types of emergency response. Finally, each request only ever contains one case.
Distributed Network Architecture
Description
Admin Panel
This is a thin GUI for administering a Server Node.
- Presents an interface for sending activation, reallocation, and deactivation requests to Update Status.
- Presents outcomes of requests.
- Presents critical network cloud status statistics.
Update Status
Central regulatory unit of the DHT Server Node.
- Calculates the average of two hashes allocated to existing 'side-by-side' DHT Server Nodes with the greatest difference.
- Allocates responsibility for hash ranges to the local DHT component.
- Informs all other servers on the DHT that the node is active and has taken responsibility for the local hash range.
- Invokes the Sync component when responsibility for a new hash range is accepted.
Sync
Actively probes other DHT Server Nodes upon system startup and every 15 minutes thereafter (also whenever responsibility for a new hash range is accepted) for data it doesn't have stored and updates the local data stores. This ensures that every DHT Server Node on the network is an exact mirror of every other Server Node. In the event of several Server Nodes 'crashing', the other nodes will automatically take over their hash domain without even needing to be aware a crash has taken place.
- Detects changes to data in other nodes.
- Copies unsynchronised data from other nodes into local data stores.
Request Caller
Interface between local components and components in other DHT Nodes.
- Sends requests from local components looking to gain data through the DHT network protocol.
- Handles incoming responses given to these requests.
Neighbour Monitor
Scans the DHT Network for changes in the server cloud. This ensures that a server can never be 'stranded' by its two closest neighbours unexpectedly going offline (essential for overall system reliability).
- Checks links in the DHT chain to ensure Neighbour Store retains a full, true picture of the network.
- Updates Neighbour Store with changes detected in the server cloud.
Neighbour Store
Stores a list of the members of the DHT Network's server cloud.
- Stores a list of servers within the DHT cloud.
- Provides this list to other components.
Request Listener
Waits for external requests for data contained in local data stores.
- Handles incoming requests.
- Handles outgoing responses.
- Informs inquiring DHT Nodes of the local Node's Hash responsibilities and the hash responsibilities of the next node.
Query Handler
Takes raw buffer data from the Request Listener and uses it to execute queries on the correct data store.
- Converts DHT Protocol data into a format compatible with the data stores.
- Directs queries to correct data stores.
- Transmits responses back to the client in the form of DHT Protocol data.
Vehicle/Case Store
Stores backup information received from the ERDS Client.
- Stores backed-up data.
- Responds to queries for backed-up data.
Report Development
User Interface that allows users to formulate queries for report data on ERDS history (e.g. vehicle movement, response times, number of calls logged, vehicles on shift, communication history).
- Presents a GUI for formulating queries.
- Presents the returned Report Information.
- Presents a GUI for exporting data to other applications.
Use Case Maps
A DHT Server Node Synchronising itself against other nodes
This is when one of the distributed servers tries to ensure that it has a complete copy of the data by comparing against its neighbour nodes.
To do this it:
- Contact Neighbours:
- Sync retrieves from the neighbour store the neighbour data.
- It then uses this data to contact its neighbours through the Request Caller (i.e. contact other DHT nodes) to update the records
- Update Records:
- When the Request Caller determines that the incoming data is data to be stored, it is passed to Sync
- First the Vehicle related information is passed to Vehicle Store
- Then the Case related information is passed to Case Store
A DHT Server Node responding to a synchronisation request from another DHT Server Node
This is when a distributed server receives a request from another node to ensure it is synchronised. This request is in the terms of particular entries.
To do this it:
- Receives a Sync Request through its Request Listener component
- Builds the appropriate Query in order to find entries
- This query is run against the Vehicle and Case stores as needed and the results stored temporarily in the Query Handler
- The Query handler returns the entries through the request listener to the correct node.
A DHT Server Node processing a 'add case' request from a ERDS Cell
This is when a Cell has added a case and is sending the first set of data through to be backed up.
To do this it:
- Receives the request to store the data through request listener, followed by the data itself.
- Request listener informs Query Handler that this is for an Add option
- Query Handler formats the information for entry into the database, as well as any metadata needed
- Query Handler sends back an acknowledgement to the ERDS Cell.
A DHT Server Node being activated for the first time
This is where the Server first becomes active, and the tasks it needs to perform before it can come online. Because of the network structure this is somewhat complicated.
- User at Admin Panel tells the Update Status component to activate.
- The Update Status module then
- Activates neighbour store, gets a hash and informs other servers of its hash.
- Activates Sync to update Case and Vehicle Store
- Ensures that neighbours are found
Data Models
Query
Query data consists of an SQL Statement, which is comprised of a series of three vectors: Select, Insert and Update. These are passed like variables, but act like commands. They can be stacked in whatever order and act like commands to the database.
Select collects data from a source and needs to know the source location and needs to hold the data for return to the system.
Insert puts data into the system, and only needs to know what the data is.
Update takes data and changes it, needing to know where the data is and what to change it to.
Results
Result data is returned as a series of the same vectors as Query; Select, Insert and Update. The vectors are returned on a 1 for 1 basis with the equivalent requests from the Query statement, such that each request will have a corresponding return.
Insert and Update return a timestamp that specifies when the data operation was carried out, as well as a Boolean indicating whether the changes were successful or not.
Select holds the retrieved information, its destination and the timestamp when the information was retrieved.
Finally, these results are combined into a unified Return Statement, which returns to the source of the request with the relevant information (returned data, its data type, the source of each request and the update time, for each corresponding vector).
Vehicle and Case
Vehicle and Case share the same data types between network and cell levels. This is important because otherwise the data would remain incompatible between network layers, which would defeat the purpose of the system.
The only addition to this Data Model is the inclusion of "Inserted Data", which represents the use of SQL requests (Select, Insert and Update) to manipulate data. Effectively, every SQL request can contain information on vehicles or cases (for instance, what vehicles responded to a certain case or what cases a vehicle has undertaken within a period of time).
Neighbour Data (AKA Hash data)
Neighbour data contains the relevant information for the Distributed Hash Table: the hash of each cell, the corresponding hash table the cell’s hash belongs to, and the server-wide hash information. This data type is generalised for compatibility, since it is used by all aspects of the DHT system on the network architecture.
The important data types that neighbour data carries are the individual hash numbers, the corresponding hash table, and the location of any neighbours. The neighbour data is critical to making the DHT work, since otherwise the individual nodes won’t communicate with each other.
Impact Maps
Reliability
Server Failure
In the event that the distributed network server that the Cell is attached to fails, this is what will occur:
- No Impact on entry and processing of current cases or of finding vehicles
- The remote network interface stops sending data to the server
- The remote server then has to search the remote network for another available server
Cell Failure
In the event that a single cell fails, the following needs to occur
- Vehicle Perspective
- When not receiving a response from the vehicle handler, the Vehicle would search for the closest available node.
- The vehicle would contact that node's Vehicle Handler that it is new to the network, and the case number it is responding to at the time (assumed to to be non-null for this impact map)
- The vehicle proceeds on its current call, or taking directions from the new node, until its primary node is brought back online.
- The new node
- Vehicle handler updates its Current Vehicle Position & Status, as well as informing the Operator GUI that there is a new Case under its control
- The Operator GUI will request the case from the Current Case Data store, which will be retrieved off the remote network
- Distributed Network
- EDRS Cell Node redirect request to the DHT Server Node it queried previously.
- All future EDRS Cell Node requests for data with a hash in the failed server's domain will go through the previous DHT Server Node.
- If another DHT Server Node is added to the network it may take over the hash responsibilities of the failed node. This is unpredictable as DHT Server Nodes will allocate them selves to take over from the server with the greatest workload.
- Neighbour Monitors in the other DHT Server Nodes will detect the unexpected failure and update their maps of the network accordingly.
- EDRS Cell Node redirect request to the DHT Server Node it queried previously.
Architectural Justifications
We are using a cell and server architecture for various reasons:
- Reliability - the system overall must remain available and responsive even if any given component fails, data must not be lost and must remain available even if any given component fails.
- If a single node fails, responsibility for its area is assumed by neighboring nodes.
- If a server fails, the DHT architecture will distribute the system load among the remaining servers and keep the overall system running while the failed server is recovered. Once the failed server recovers, it automatically heals back into the server cloud and accepts its share of load.
- Data is always mirrored across multiple servers - if any given server goes down, data is still accessible without any visible difference from an operator's point of view.
- Performance - the system must provide current data to operators in real time, since the nature of emergency dispatch requires instant access to a view of the world as it exists in reality at any given moment.
- Each node only needs to be responsible for a smaller geographical area and smaller number of vehicles, thus reducing the load.
- The server network is self-healing and load-balances automatically.
- Each server only needs to take care of a certain number of nodes.
- Current cases are stored in a leaner, faster database and archived to a separate database in order to reduce latency in access times.
- Scalability - the system must be scalable, to serve during times of increased demand such as World Youth Day or APAC.
- More nodes and servers can be added without the architecture changing, only some configuration changes.
- Adding servers to or removing servers from the distributed network cloud is very easy, since the architecture is self-healing and automatically load-balances without need for central configuration changes.
The disadvantages of this are:
- Reporting may be more complex.
- To get complete data for vehicles that operate across servers or after there has been server crashes can involve searching multiple servers/records for cell areas.
- The system concept is more complex, overhead is greater.
- Although operation is smoother from a user's point of view, this type of system will require more extensive development efforts and refinement.
- DHT operation involves greater traffic overhead between servers in the cloud.
- The vehicle modules need to take on more responsibility.
- Vehicles have to be able to coordinate with their set node, and other nodes if that set node goes down.
Elaborated Execution Architecture
Cell Level
Concurrent Subsystems View
Vehicle Application
There are multiple Vehicle Application processes, one in each emergency vehicle. It can send information to vehicle handler asynchronously (as if the data isn't time dependent), and also synchronously receive requests from vehicle handler (as a response is needed quickly).
Vehicle Handler
There is one Vehicle Handler running in a node that handles the various requests to and from vehicle applications. It sends data to the database asynchronously (data is not time dependent) and receives requests from the operator application asynchronously with a call back as a request could take some time to be filtered through and responded to, and we do not want to stop the operator application.
Operator Application
There are multiple Operator Application processes, one used by each Dispatcher. It has to receive an authentication token from the Authentication Server first, thus must wait for feedback (synchronous). It also sends requests and updates to both Vehicle Handler & Database and receives feedback from requests. It is asynchronous as we do not want the Operator Application to hang while waiting for feedback from these processes.
Authentication Server
There is a single Local Authentication Service for security purposes that will give back authentication tokens to the Operator Application processes.
Type Filter
There are multiple Type Filter processes that will receive requests and information from a Operator Application process. This will get information synchronously from the database (needs the data before it can proceed), filter it then pass it to Route Calculator.
Route Calculator
There will be multiple Route Calculator processes to handle the multiple requests from the Type Filter processes. It will receive the information and then pass it on to Operator Application.
Database
This will be a database application running multiple processes. It handles requests and sends back information when it has it (asynchronous call back).
Network Interface and Backup
This is a single interface for the Cell to the Network. It periodically gets data from the database and sends it to the database on the Distributed Network. It can also act as a service to get data from the network that Operator Application need (after a cell failure for instance).
Behavioural Analysis
Connect To Cell
This is how a vehicle connects to a node, with this case assuming that it knows the node it is connecting to (i.e. this isn't the search for nearest node process). This is done whenever a vehicle loses connection to a node.
The Vehicle sends an authentication packet to the server and waits for the reply (synchronously), if it doesn't get one then it will try to connect to the next node in its own list.
The Vehicle Handler adds a mapping of the ID to the IP (to allow later communication), and it also passes the vehicle data to the database.
Add Case to Database
This is what happens when a case is entered by an operator. The request is sent asynchronously to the Route Calculator as soon as the location and the type of emergency vehicle required has been entered (so that the operator can continue to enter notes).
The Route Calculator builds a query to insert the data and find a vehicle and sends it off to the database, again asynchronously so that it can continue to take requests from other operators.
The Database takes a request, places it at the end of the queue and continues to process the queue, sending vehicle lists as a call back to the Route Calculator to then be passed back to the Operator GUI.
Send Request to Vehicle
Once an Operator has a list of available vehicles given to it from the Route Calculator, then the operator chooses one to send a request to (typically the first). It does not wait for reply in case further information needs to be entered (i.e. new emergency services are required).
The Vehicle Handler takes the Request (consisting of Case Information and destined Vehicle ID), looks up the IP based on Vehicle ID and sends a Request to it.
The Vehicle Application then has its GUI updated and allows the crew to either Accept or Reject the request.
Response to Case
The response in this case is Accept as it explores more of the execution architecture.
The crew accept the case and begin responding without having to check any further details (asynchronously in this case).
The Vehicle Handler interprets the response as Accept, updates the Case Information as being responded to as well as informing the Operator GUI that the Vehicle is responding.
If it was a rejection, then there would be no update to the Database being done.
Recover Data from DHT Network
This recovery is run when a node fails or crashes, to gather all the case data that belongs to that node from the DHT network. This process multicasts a recovery request for the node, but doesn't wait for reply.
Each DHT node builds a set of cases and sends it back to the cell, which the Backup Processes passes onto the Database for entry.
On bootup the rest of the Cell does not wait for the recovery to be completed so that new calls can be taken immediately.
Backup New Data to DHT Network
This happens periodically to ensure data integrity and overall system reliability. A request is sent to the database for any "new" (since the last update) information.
The Database processes the request like any other, returning the information to the calling process, and once the backup process has all the "new data", it formats it and sends it to the relevant DHT nodes.
Results from Behavioural Analysis
The behavioural analysis above highlighted several issues:
- The complete separation of behaviour from the Vehicle/Operator part of the system to that of the DHT/Backup part of the system.
- This informed our Deployment view below with the Backup being on a separate server to the Vehicle/Operator Application logic.
- That the requests are queued in a FIFO (First In First Out) fashion, which would mean new cases being entered or new vehicle lists being requested would not be prioritised over backups being run.
- This means that our database will have to prioritise requests based on their type.
- That the Operator will need some mechanism of knowing when to attempt to contact the next vehicle
- This would be either procedural, a time out mechanism in the handler (based on twice the round trip delay plus some reaction time) or to have input being stopped till one of the above happens (not recommended)
Deployment View
The deployment was done with the three tier architectural style, with a rich client, application layer and a data layer.
Vehicle
These are the thick user clients located in each emergency vehicle that connect to the application server to get emergency information and give responses to requests. It also contains a GPS that allows tracking of vehicle movements.
Operator Consoles
These are the thick user clients that allow the entry of emergency information. There are one for each operator (hence the replication), connecting to the application server to send information to the database and the vehicles.
Application Server
This application server provides all the data processing and logic to requests and information given to it from the operator and vehicle systems. The replication represents redundancy of having multiple servers doing the same operation. Only one would be operational at a time, however there would be at least one other server there keeping the same information and would take over seamlessly (at least in terms of the other components) the duties of the other server.
Backup and DHT Connection Server
This is a separate server to the application server as it does no data processing in response to requests, it only sends data to and collects data from the DHT backup network. Furthermore if this server goes down the primary function of the cell can be maintained. Also if the connection to the DHT is lost, then backups can be built up and stored on this server without slowing down the main functionality. Like other processes, it is replicated so that redundant computers can be used.
DHT Backup Network
This is the remote backup network running with a Distributed Hash Table, made up of DHT Nodes, detailed in other sections.
Database Server
This is the server that holds all the persistent data entered into the system, and responds to requests from the Application and Backup Servers. It will also be replicated so that redundant computers will exist doing the same duties, and takeover in the case that the main server fails.
Distributed Network Level
Concurrent Process View
Active Listener
Active Listener checks the connection interface waiting for another DHT Node to make a connection. This node may be either another DHT Server Node, a Cell Node, or some sort of reporting node. The external DHT Node will query the Active Listener for its Hash Domain responsibilities. If the DHT Server Node is responsible for the query it will return a response indicating as such and pass the data in any supplementary query to the Data Store via call back. Call back was selected because the Active Listener will process other requests while the Data Store is preparing its response, the response data will be sent back to the querying node, but it doesn't really matter when, or in what order.
Server Manager
Server Manager is maintained by a remote HTTP connection. Server Manager retrieves the addresses of other DHT Nodes from the Neighbour Monitor and asynchronously calls the Active Listener, allocating it a hash derived from the data retrieved from Neighbour Monitor. Periodically it will to reassign a hash to Active Listener, Neighbour Monitor, and the synchronisation class, because of this hash is updated regularly Server Manager doesn't really care if and when the package is received by Active Listener, and thus the call is made asynchronously.
Neighbour Monitor
Neighbour Monitor will periodically trigger a multicast request to every DHT Server Node on the network for their IP. This multicast call is made via an asynchronous request to the Request Service because of the fact there is no way of knowing which order the responses will come and when they will be received via the multicast.
DHT Server Synchroniser This class periodically queries every single other node in the DHT Server Network for all the non-synchronised data within their domain of hash responsibility. This data, once received, is then inserted into the datastore asynchronously, as the DHT Server Synchroniser will be receiving a lot of data and will need to continuing accepting responses.
Behavioural Analysis - Impact Maps
Cell Recovery
Suppose a cell within the system crashes....
- Upon being re-initiated the cell queries every DHT Server Node for data under its domain of responsibility.
- Here is one such transaction between a DHT Server Node and a recovering cell
This method of cell recovery is much better than if our team were to implement a single server system because the restoration burden is spread over a normal distribution covering all servers because of the DHT system. In qualitative terms, this means that performance of the servers reduces less during this process and the overall reliability of the system increases because a distributed server load means less likelihood of a cascade failure.
DHT Server Node Recovery
Suppose a DHT Server Node within the system crashes and starts up again...
- Upon being re-initiated the DHT Server Node Synchroniser multicasts a request for data to every other DHT Server Node.
- The responding DHT Server Nodes then proceed gather all the data they are responsible for and transmit it to the requesting DHT Server Node
- The diagram provided illustrates one such transaction between a re-initiated DHT Server Node and a responding node.
Useful to note that this is no different to the normal process of starting a DHT Server Node - this is an important architectural consideration as it reduces the complexity of maintenance in assuming that every DHT Server Node has a small hash domain and there exists multiple redundancies of all its data across many other nodes and thus no local redundancy is required. This like the recovery of a cell is a distributed process across every DHT Server Node according to their hash-domain responsibilities reducing performance issues and reliability issues such as cascade failure. There may evolve some issues with data integrity as the hashes of different servers change over time, but we believe we have minimised such potential problems adequately.
Both these impact maps show times where a great deal of load would be put on the DHT Server. Considering the possibility that both these events could take place at the same time it makes sense to have these two components run independently.
Deployment View
Reasoning for this deployment
This mode of deployment provides improves the configurability, performance, reliability, and scalability of every DHT Server Node. Spreading the regulation, query, datastore, and synchronisation components of the system on to separate compute nodes provides these benefits for the following reasons.
Performance and reliability
The 'DHT Server Node Recovery' and 'Cell Node Recovery' impact maps show that responsibility for handling regular and expected network operations is spread across several compute nodes. This is a specific design feature as it is expected that a great deal of network traffic will pass through these compute nodes and that this traffic will have very different behavioural patterns. It is expected that the DHT Hash & Query Handler will mainly be responsible for responding to small and extremely frequent database transaction requests, such as the addition of cases into the datastore where a small and efficient data transport protocol might be used such as UDP, and irregular and infrequent cell recovery requests where larger amounts of data might be moved, these events are too irregular to warrant a separate architectural component. DHT Server Synchroniser would seem to 'replicate' many of the functions of the DHT Hash & Query Handler. This component operates on a separate compute node because of the fact it is responsible for regularly sending and receiving much larger data elements. This is because the DHT Server Node Network is designed to be massively distributed, nodes are constantly being added and removed, during addition operations, DHT Server Nodes seek to synchronise them selves across the network, collecting every single cell database entry following the DHT Hash load distribution pattern. If these two components were to be housed on the same compute node they'd be competing for both computation and network resources in such a way that would negatively affect performance. For these reasons separating the two makes a lot of sense.
Sever Manager and Neighbour Monitor are tightly coupled components that perform regular communication. They were put on the same compute node because it was thought that putting an inter-compute-nodal communications protocol, and possibly a physical network layer between them would induce additional burden on the system for little-if-any realisable gain.
The reliability of the DHT Server System is largely due to the fact that it is massively distributed and self-healing when a node goes down. The DHT Server Nodes themselves are not designed to be particularly reliable as their reliability has minimal importance to the overall reliability of the network. This being said there were certain allowances for reliability made in the design of the deployment architecture reducing the mean time to recovery and possibly increasing the mean time to failure, with some administrator intervention. While all the components of the system are certainly reliant of each other for proper functionality (Else why would they exist?), it is true to say that the system can function to a several extents on the occurrence of several events.
- Regulation Server fails
If the regulation server were to fail, the DHT Server Node would function perfectly well for a short period of time. Eventually other DHT Server Nodes would stop referring requests to the node as they updated their local server location and hash lists, this would result in the node, while otherwise functioning perfectly well, being 'lost' in the network, never receiving requests. This being said, the synchronisation component isn't dependent on the neighbour monitor to function and it can be said that the local datastore will still be perfectly synchronised during this period. It is hoped that in the period between the failure of the regulation server and other DHT Server Nodes removing the node from their hash lists, administrators will become aware of the issue and perform the necessary procedures to correct the problem. This time delay is expected to increase the mean time to failure, because administrators will be able to stop many DHT Server Node failures. The fact that the synchronisation component will continue to maintain the integrity of the datastore in the absence of the regulation server reduces the mean time to recovery as the DHT Server Node will not have to perform any additional data fetching prior to rejoining the DHT Server Network, in fact, all that will happen from a network perspective, is the DHT Server Node will simply acquire a new hash, a process that all DHT Server Nodes do regularly to ensure the integrity of the normal distribution of the hash responsibilities.
- Synchronise Server fails
This is a much more serious problem than a failure of the regulation server, but has a less immediate impact on reliability. If the Synchronisation Server were to fail the system would still handle requests from Cell Nodes perfectly, however, the data entered would not be synchronised across the DHT Network and the DHT Server would not synchronise its local datastores against those of other nodes. If the hash of the affected DHT Server Node were to change during this period it would have a very serious impact on the integrity of the data held by the system. For this reason, this compute node will feature a redundancy as its failure features a high level of criticality, further to this, if all the redundancies were to fail, the entire DHT Server Node would shut down immediately to prevent any likelihood of data disruption, necessitating complex and burdensome re-synchronisation measures taking place.
Scalability, security and configurability
By decoupling the datastore, active listener, synchronisation, and regulation compute nodes from each other, the designers of this system have enabled system administrators to configure their installation of the DHT Servers they are responsible for in a way that best represents their available technology and resources. For instance, if a network administrator had enormous resources at hand, the administrator could conceivably install thousands of DHT Server Nodes across their local network, keeping all the datastore components on a central data bank with high performance data storage and access capacity, accessible only by a secure internal network, and the other, network orientated, components on a distributed cluster featuring access to the DHT Server, Cell, and HTTP network. While a administrator a smaller network may choose to keep all these elements on one high performance multi-threaded server.
Firewalls have been placed at all communication points between the local DHT Server and all other external elements. This is a specific example of a layed security measure to prevent any form of connection to the elements of the system that is not desired for the proper function of the system.
It is expected that the DHT Server Network will be comprised of somewhere between 4 and 2^128 DHT Server Nodes spread across unpredictable geographic regions and distances, each with different configurations and environments that the development team will have no control over, all working together to handle data operations data distributed evenly across them. This massively distributed environment will render denial of service attacks very difficult as there is no single point of failure in the system, if one node, or a thousand nodes are brought down, the system will self-heal in a distributed manner, to prevent secondary effects of such attacks, such as cascade failure.
Architectural Justifications and Reasoning
- Reliability
- Reliablity (in the form of uninterrupted data availability and prevention of data loss) was achieved for the cell by dividing the backup and data processing responsibilities into different processes on different computing nodes. This ensures that data is still accessible and that case data isn't lost if a cell goes down - the results of an emergency case disappearing from the system before it is responded to could be disasterous.
- This was further enhanced by ensuring that these processes are replicated and configured for failover.
- As the vehicles are loosely coupled from the Cell and can connect to other nodes, they can continue to operate even in the event of their home node failing.
- Reliability was achieved for the DHT mainly through it running as a self-healing, automatically load-balancing concurrent subsystem that operates as a gestalt across many, many servers.
- The nature of Distributed Hash Tables allow it to be self-healing - the overall system's performance is uninterrupted and unhindered even if its individual servers are not.
- Performance
- Acheived at a cell level by the division of responsibility, with seperate nodes for backup and application execution.
- The ability to get data based upon its hash minimises the number of handshakes and reduces the speed of retrieval from the database.
- Each DHT node only needs to maintain responsibility for a small hash range - by automatically balancing load across a distributed server cloud, average time to retreive data is reduced drastically.
- Scalability
- The geographical command area for a cell is flexible, and thus can be adjusted to either city or country areas.
- For special events, extra cells and DHT nodes can be added temporarily to deal with any extra expected load.
- The system can be deployed at any level, from local suburb to national level, based simply upon the geographical area that the nodes need to cover and the DHT backup network it uses.
Elaborated Implementation Architecture
Cell Level
Network
This is the distributed network of servers that will provide data backup and coordination services for each individual node. The link will run TCP/IP for connectivity, and carry our API and data to the data handlers at the other end.
GPS
This is a off the Global Position System end-unit that will give the location of the unit as a set of longitude and latitude coordinates. It will have a standard API that will let us access those coordinates when needed. In our system we will develop an interface to simulate the input from the GPS system.
Operator GUI
This is a AWT or Swing GUI that will allow the operator to enter call information, find vehicles, allocate vehicles and view responses from vehicles. It will be custom produced to insure it will interface smoothly with the rest of the ERDS system.
In Vehicle GUI
This will be a basic AWT or Swing GUI that will run on a small computer in each vehicle. It will be touch-screen orientated to allow fast operation, and thus be simplistic and allow only basic functionality. In our system we will custom build the GUI and run it on a separate computer, but simulate input with a mouse and not a touch screen.
Dispatch Application Server
This will be a server running the components as multiple Java processes, which need to operate separately due to their different time constraints. It can be a single server as it will only have to support a single dispatch centre. This will be produced by modifying an off-the-shelf server to interface with the rest of the ERDS.
Data Handler
This will be a threaded process that will allow the many different types of data, or requests for data, to be processed and appropriately formatted for the database. It will have to be custom built in order to translate the various pieces of data. In order to connect to the Database using Java Database Connectivity (JDBC) because the rest of the system will be developed using Java.
Database
This will be an off-the-shelf database, if possible Oracle due to the reliability and capability. However the cost limitations mean we will use MySQL instead, and will receive requests from Data Handler in SQL form.
Distributed Network Level
Server Manager
Server Manager is the core execution process of the DHT node. It is responsible for calculating the hash value to correctly position the DHT in the region of most need. In order to do this it retrieves a list of neighbours from the Neighbour Monitor process, and inserts itself in between the greatest variances within network as a whole.
Synchronise DHT
This component is responsible for ensuring synchronisation between all the DHT nodes within the network. This is primarily used when re-booting a DHT node to populate the node prior to join it to the greater network. In addition, it is used periodically to ensure that the node remains up-to-date.
Neighbour Monitor
Neighbour Monitor polls the network over a multicast stream to ensure that the individual nodes have an accurate list of all DHT nodes, along with their respect Hash and their IP addresses, within the network. This process then makes this data available to other processes as an ordered list of neighbours.
Active Listener
This process listens to requests that arrive from the Cell level, and responds according. If the request is for handshaking, the module can respond by itself to confirm the handshake. If the request is for data from the database, the process passes the query to the DHT Data Processor, before sending the reply back to the client.
DHT Data Processor
The DHT Data Process translates XML formatted queries that arrive from other parts of the system into SQL queries that can be passed on to the database, or vice versa. It is the only interface for the database, having being developed this way to enable easy switching of database management systems (eg. SQLite to Oracle).
SQLite Database
As specified above, this will be a generic off-the-shelf database running on SQLite. It accepts and returns queries in an SQL format. On a network level, this component stores case and vehicle data, for use when generating reports, creating & refreshing nodes, or to perform system backups.
Mapping of Conceptual to Execution to Implementation
Cell Level
| Conceptual Component | Execution Component | Implementation Component |
| In Vehicle GUI | Vehicle Application | In Vehicle GUI |
| GPS | Vehicle Application | GPS |
| Connection Tracker | Vehicle Application | Connection Tracker |
| Vehicle Handler | Vehicle Handler | Vehicle Handler |
| Current Vehicle Position & Status | Database | Database |
| Vehicle Movement | Database | Database |
| Current Cases | Database | Database |
| Type Filter | Route Calculator | Route Calculator |
| Route Finder | Route Calculator | Route Calculator |
| Operator GUI | Operator Application | Operator GUI |
| Remote Network Interface | Backup and DHT Interface | Backup Manager |
DHT Level
| Conceptual Component | Execution Component | Implementation Component |
| DHT Hash & Query Listener | DHT Hash & Query Listener | Active Listener |
| Query Handler | DHT Hash & Query Listener | DHTDataProcessor |
| Case Store | Datastore | Case Data Model |
| DHT Server Node Synchroniser | DHT Server Node Synchroniser | SynchroniseDHT |
| Update Status | Server Manager | Server Manager |
| Admin Panel | Server Manager | HTTP Port |
| Neighbour Monitor | Neighbour Monitor | Neighbour Monitor |
| Neighbour Store | Neighbour Monitor | Neighbour Data Model |
Architectural Decisions and Justifications
- Reliability
- The system has no single point of failure in the overall structure
- Performance
- The performance overhead used translating data to XML formatted was accepted, as other options such as serializing objects would cause major maintenance issues.
- Use of hashing to distribute server load reduced the incidence of overburdened servers delaying execution of queries.
- Scalability
- As the connections between the vehicle and the cell, and the connections between the cell and the DHT backup network are loosely coupled, this overall architecture can be extended to almost any scale.
- Decoupling the application layer from the database layer via the JDBC driver allows for the use of a threaded database system if the non-threaded SQL-Lite becomes a problem.
- Configurability
- Due to the 3 tier architectural style and clear interfaces between components the inner workings of the components can be switched out easily. This includes the Database (which would probably be switched to Oracle) and the Network Layer (implementing more security and TCP).
Constraints on Development
When considering how the executable prototype will be developed, the following constraints must be taken into account:
- The prototype will have to run on the Engineering Faculty computers
- Open source software, or software with licenses owned by the Engineering Faculty is highly preferred
- The strongest programming language Team Wunuontoo have in common is Java
- The development must take place with Eclipse & use the SVN
- All developers must be involved (limits on complexity)
- The development must be completed in 7 weeks
Design Decisions for Implementation Components
Off the Shelf Components
- Digester
- This is a library developed as part of the Apache Commons collection.
- It is used to read in XML formatted document into Objects
- This meant we would not have to serialise objects to pass them across the network or between processes on different compute nodes.
- It is also easy to write the XML formatted documents from Object Data
- While this made it faster and easier to test the executable prototype, there is a performance hit from doing this.
- If the drop in the performance is significant, either serializing objects could be used, or reduce the amount of Object->XML->Object transformations that are done by increasing the coupling.
- SQLite
- This was used as the database backend due to its simplicity and price (free)
- It also had a JDBC interface that allowed easy execution of commands and retrieval of result sets
- It was able to be run on the Engineering computers without any installation required
- The problem with these is problems with threading, and might not be able to deal with the true load of a cell
- A better solution would be to use Oracle or a similar large database system.
Custom Components
The custom components had to be built for specific functionality in our system, for example the GUI's and the DHT related functions.
- XML Digester
- This was a set of configurations for the Digester package listed above to our XML format of documents.
- Query Handler
- This was used to translate the various types of queries into SQL formatted data, and thus had to be adapted to the Digester style input. It also had to be adapted to the SQLite database.
- Route Calculator
- This was used to find which vehicle could respond fastest to the emergency, and was developed exclusively for this system.
- Vehicle Handler
- This was a relatively simple component that mapped VehicleID to IP, and was faster to write ourselves than to try to adapt a component.
- In Vehicle GUI
- This was obviously custom to our system, adapted to display our specific information and allow the entry of certain commands. It was however based off the Swing/AWT libraries.
- Connection Tracker
- Like Vehicle Handler it was a fairly simple component to track which node the vehicle was attached to, and thus faster to write ourselves.
- Operator GUI
- This was obviously customised for our system, adapted to display our specific information and allow the entry of certain commands. It was however based off the Swing/AWT libraries.
- Backup Manager
- This had to be custom built as it interfaces with our DHT backup network in order to do backups and recoveries.
Architecture Evaluation
To properly evaluate the architecture of this system, an explanation of the prototype is necessary.
Prototype Analysis
Functionality wise our prototype was a success. Using the existing architecture we constructed a prototype that was able to simulate a small emergency network with several nodes across many different computers. These nodes (representing vehicles, their corresponding emergency dispatch cells, and a series of hash distributed back up nodes) were able to locate each other, form a network and communicate using XML and a string-based protocol developed by our team that ran over UDP.
Addressing, back-up and message handling was implemented by using a self-healing hashing system based upon the Distributed Hash Table Chord implementation, with each node having responsibility for a hash range, whilst data storage was managed with an SQLite database. Each dispatch node was able to send case data to its assigned vehicles, whilst each vehicle node was able to respond to requests to confirm or deny whether it wanted to take a particular case. Interactions were handled using XML (which passed data) and a string-based protocol pass light weight requests and responses (such as handshaking and vehicle authentication).
Positive Aspects of our Prototype
The strength of this prototype was that it adhered strongly to most of the original quality attributes: Reliability, performance, testability and scalability were all given good coverage by the prototype. This is especially apparent with reliability: the prototype is capable of repairing itself if random nodes are disconnected, as long as one emergency dispatch cell remains operational. In addition, due to the design of the DHT network, the prototype can scale to support a large number of nodes. At present, our prototype hash algorithm can support up to 800 different nodes: with a proper hashing algorithm such as SHA1, the prototype could support an order of magnitude more.
Prototype testing was fairly easy as there are only a few elements of data processing based on entered data, such as creating a hash. Most of the data was just being displayed or sent to the database, and thus problems with user entering data were minimal. Problems with data flow were easy to manage, since each module was self-contained, making problem tracking easy.
Finally, by using the 3-tier rich client architectural style, we were able to separate our duties for each tier (presentation, application and database) once the interfaces were defined. This meant we could develop in parallel, unit test and then perform integration testing, rather than having to test integration as development progressed.
Negative Aspects of our Prototype
The weaknesses of this prototype are that several sections of the code have become more complex than anticipated. In particular, the DHT, the network layer and the various data handlers became more complex than originally envisaged as a result of a lack of access to powerful off-the-shelf components that were available due to constraints. Digester added significant overhead to the design process because it proved temperamental to implement, accepting only specifically formatted XML, which required several ad-hoc solutions to the problem.
Another weakness was testing performance, which was hard to do without automated testing. The major testing was through the use of scripts to generate a large number of the nodes, vehicles, etc. to see how many simultaneous connections were possible. Automated testing could not be implemented within a reasonable time frame due to configuration and complexity issues.
Furthermore, operator level security was deferred as a concern for an external interface, and was not fully considered in the prototype. It was felt that such security could be set aside till later since for demonstration purposes the prototype didn’t need fully implemented operator authentication.
Finally, writing our own web server/web interface was not required and could have been substituted with a simple off-the shelf web server such as Jetty web server. The time spent writing this section of the prototype could have been better spent on other sections. However, due to the simple nature of the information we were displaying and entering through our server, there was no loss in functionality as a whole.
Architecture Analysis
Overall our architecture dealt with the needs of the primary quality attributes (performance and reliability), as well as achieving scalability in the process. This was primarily due to the extensive analysis phase, during which the problem, the requirements and relevant quality attributes were examined carefully to assist our development decisions. The talents of the team and its willingness to research existing options also made the overall architecture a success, allowing us to conquer the complexity of the problem and building the executable prototype.
Positive Aspects of our Architecture
The primary aspect that allowed our architecture to achieve performance and reliability was the division of tasks within the system. Specifically, resource management was handled at a cell level and the backup network was kept as a separate entity, allowing for concurrent development between the two.
This also meant that the system resources could be split across many local cells (thus lowering performance requirements) and enhancing reliability (due to their overlapping nature and ability to temporarily take over responsibilities for other areas). To ensure interoperability and successful transfer of data in the event of failure, the Distributed Hash Table was used, which was both fast (due to hashing) and reliable (self-healing).
Furthermore the use of the 3-tier architectural style in the cell and DHT architectures helped the development by further dividing responsibilities into clear units. This helped reduce coupling and enhanced cohesion, as well as ensuring during deployment that the failure of one component wouldn't stop the overall system from executing.
This also produced a modularity that would help implementation or even some architectural changes. For instance, our choice of the database component only affects the query handler and the JDBC since both components with this database directly. The choice does not matter to the rest of the system. Similarly, our network layer implementation could be altered internally to allow for an improved network security setup or encryption, without altering any of the connections to other components.
Beyond achieving the quality attributes, the architecture also closely matched the system constraints and implementation considerations. The decision to divide the ERDS into a series of area-based cells mirrors how the Telstra emergency network operates (calls are taken on a national level, then passed down to the relevant local level command) and how the emergency services allocate their units to an emergency (allocation is based on distance, currently assigned tasks and resources).
Negative Aspects of our Architecture
This architecture does have a number of assumptions at its basis that, if not realised, could alter its implementation. These assumptions include the idea that the control of emergency resources is cellular in nature, and that there is a direct relationship between the command cells and the physical infrastructure commanding their assets. If this relationship was not managed, or didn’t directly overlap, then the connection protocol for the vehicles would become more complex, probably requiring a tighter integration of the GPS and the connection tracker, which could come at the cost of modularity.
Also while the components were fairly simple due to their division of tasks, the overall system was fairly complex, with a client cell connecting to various different DHT nodes, as needed, on a dynamic basis. This made both implementation and some of the full system testing more complex, especially where the DHT Nodes were used. However once their operation was verified there would be very few changes need to be made, since each component was modular enough to remain unaffected by most of the changes with the rest of the system.
Also there was a large amount of redundancy within the system, especially on the Cell Deployment level, with both the backup and application services being replicated in a failover mode configuration. This would considerably raise the cost of the overall system being implemented, both in terms of money and drain on system resources. However, due to the nature of the project, if it the deployment cost could be properly justified the government would have a strong inclination to approve such spending. Furthermore it would ease the requirements on operators and emergency crews slightly, as well as help save lives, so this cost would be fully justified.
Finally while our prototype did explore many issues and concepts of the full system, changes need to be made in the implementation details. These include implementing a more reliable database system, adding further security (especially on wireless links), and using TCP as the transport layer for some connections. However, due to the overall architectural design with the coherent division of responsibilities, these will not require any architectural overhaul.
Conclusions
When we were designing the architecture of the system, we kept in mind the scale and purpose of the system at all times throughout the process. We realised that, in order for our system to perform to expectations, the scenario would need to be explored in great depth. This excessive analysis paid off in a functional prototype, a solid architecture and a system design that can be repurposed for any sort of emergency response imaginable. The trade-off comes in the form of system complexity and system redundancy, as well making the project size large in a general sense.
Overall though, we believe that we have designed a suitable architecture for the design and creation of an Emergency Response Dispatch System. It is in our opinion that the negative aspects of the system are heavily outweighed by the positive aspects, and that this is a viable architecture for future development.
Future Considerations
Hashing Considerations
The use of a industrial standard hashing algorithm, such as SHA-1, would need to be used during full system implementation, as well as a consistent way of hashing the data. SHA-1 is able to create 2^128 unique hashes, which should be sufficient since the data will be moved off the system after a certain period of time. In fact, this should be powerful enough to store data from a nation-wide network. Implementation of a different hashing algorithm should be simple due to component modularity.
Security Considerations
On the cell level, the Operator GUI will primarily be secured through physical security measures (i.e. Photo ID checking, passwords, finger prints, etc.). If it is deemed necessary, an authentication server can be added for local authentication.
For vehicles-to-cell communication will be conducted over some form of wireless link, which is inherently insecure. To stop unauthorised users listening to the data, when a vehicle connects to a node it will exchange private 3DES/AES keys through a set of public/private keys on PGP. To stop or reduce the effects of denial of service attacks, the use of the cell structure helps by allowing the vehicle to connect to another node.
For the DHT-to-Cell communication, unauthorised users will be stopped from listening to traffic by the use of single mode fibre, which cannot be "sniffed" without breaking the medium. This will mean the need for encryption is minimal. However it can be added with the use of dedicated 3DES/AES encryption/decryption boxes at the ends of the links.
Individual or even a large number of DHT nodes failing is not of major concern as it can operate effectively with a variable number of nodes. The threat of information disclosure is mainly handled by the closed nature of the network, limiting unauthorised access. Stopping data modification will be handled by how the data is entered in the database.
Data Handling Considerations
While it is clear that once the data in the DHT has been in the system for several hours in a closed state (for cases) it would be moved to other storage so as not to clog the system with unnecessary data, the mechanism for doing this is not explicitly defined.
This would need to be done, as well as research into government data handling procedures in order to ensure that the system can comply with them. These, however, are implementation issues.
Failover Configuration Considerations
Particularly on the cell level architecture, there is a replication of the Application and Backup Servers to indicate the presence of multiple servers ready to take over in the advent of the primary failing.
The mechanism for this is not defined. However it would probably follow the methods of running firewalls/routers in failover mode. Research and testing in this area needs to be conducted to find out how this can be done to increase the reliability of the system.
References & Errata
Some of the references used in the creation of the system
- A Software Architecture Primer
- Computer-Assisted Dispatch Page http://en.wikipedia.org/wiki/Computer-assisted_dispatch
- XMLDigester http://commons.apache.org/digester/
- Prototyping Screens Team_Wunuontoo:_PrototypeScreens
- Port Numbering Conventions Team_Wunuontoo:_Port_Numbering
- Coding and other Development Standards Team_Wunuontoo:_Project_Plans#Milestone_2
Instructor comments
A fantastic start! I really like the way you have presented your identification of risks, enablers and constraints within a narrative text. 28th Aug Lian











