----------------------------------------------------------------------------------------------------------------------------------------
Responding to Computer Emergencies:
Triage, Expertise, Escalation and Tracking
Summary of a week-long online discussion with the following team members as participants:
Stuart Cate | Grace Dalton | Timothy Dzierzek |
James Franklin | Gary Hummel | |
Carolyn Keith | Saravanan Muthuswamy | Francois Pelletier |
Peter Reganti | Niklaus Schild | Jeffrey Smyth |
Richard Tuttle | Stephen Watson | Mani Akella |
Discussion Moderator :
MSIA Seminar 5 – Week 3 Discussion
Introduction
The Week 3 discussions about triage, expertise, escalation, and tracking in the Computer Security Incident Response Team (CSIRT) Management course of the Norwich University MSIA program proved insightful. This article summarizes of the discussions of that week.
The cohort’s collective work experience in the Information Security (IS) arena is wide in scope. The group includes members from the financial, chemical, legal, educational, government, military, and consumer industry organizations. This diverse background provides the discussion forum with diverse viewpoints and different modus operandi beyond that considered within a narrow professional work environment.
The discussion summary comprises the following topics:
· Triage and Escalation
· Problem-Tracking Software
· Triage and internal politics
Triage and Escalation
Group consensus is that triage is an important and required component for CSIRT operation. The size of the CSIRT must be proportional to the average number of incidents handled. This factor highlights the importance of maintaining historical statistical data for incident response. Because we cannot accurately predict the frequency of incident occurrence, triage becomes vital to effective CSIRT operation. By providing accurate categorization of each incident, triage makes it possible to understand incident effects and to assign priority to each incident upon arrival at the CSIRT desk. CSIRT management assumes the task to follow the incident throughout it lifecycle, and proactively adjust priority up or down, as required. This process allows for effective resource utilization while optimally responding to all incidents with consistent IS focus.
For this cohort, many represented organizations do not have a separate formal CSIRT. Instead, organizations use the IT Help Desk (HD) and associated incident escalation process to perform CSIRT response functions. For those cases where a separate CSIRT exists, organizations often utilize a single HD as point-of-contact (POC) for all incidents. HD staff then use the triage process to assign the incident response to the appropriate functional team.[1]
The prime business of the organization takes the leading role in determining the response and escalation process. For example, credit card data loss is a high priority incident for a financial organization. For these organizations, the response activity impacts, and possibly stops, all other CSIRT member’s work tasks until achieving incident resolution. For a retailer, the same data loss may only affect the transaction and sales group functional area. Management attention to the incident parallels the group response as they view the incident in terms of its affect either to the entire organization or to the individual group disruption.
Policy and procedure definition combined with the rigor in which the organization implements them greatly affects the frequency and quantity of incidents an organization faces. Triage-based incident statistics help the organization understand the big picture. This factor, in turn, helps the organization refine policies and procedures to achieve continuous improvement. Another important point highlighted was that while the core process can remain consistent across organizations, both policy and the actual escalation and individual incident priority would always remain specific to each organization’s needs, functions and structure.
Cohort members agree that training is vital to successful CSIRT operation. Because the HD is the POC, CSIRT-provided training ensures that HD staff captures all relevant information when taking the incident report. The training also ensures that the triage process functions appropriately. In addition, the training helps ensure that the response team captures all relevant information and evidence in a forensically correct fashion to preserve the chain of evidence.
An interesting parallel of the triage processes for a medical emergency as compared to the triage process for a CSIRT follows in the quote below. While the individual processes may be different, this shows that the core thinking processes are the same.
“It (triage) is a wonderful system in emergency scenarios, and adapts well to Computer Emergency Response. Now triage generally comes into play when you have a lot of casualties, although it is also done whenever you have multiple patients. Generally, you prioritize your patients. You have those that can wait, those who need emergency and immediate care, and those who are too far gone to bother helping. It seems cruel, but to save some people you can't bother treating those who are going to die anyways.
So, you do a quick evaluation of each patient. Can they wait in the treatment area? Do they need to be treated before they are shipped, or do they need to be loaded in the helicopter and shipped immediately?
CSIRT can benefit from such an arrangement. During busy times and major incidents you need to prioritize your responses so that you can make the best use of your time. What systems and incidents need treating immediately and which can wait until you can get to them? After all you have to seal the intrusion holes before you fix the servers, or you will just be doing it again later.
Triage is very appropriate in my opinion, and works well for most types of emergency response. Taking a few minutes to analyze the situation and prioritize your responses.”[2]
The response emphasizes the importance of triage to CSIRT operation.
“I think that no matter how great an organization's procedures are, every incident will be different. That point probably is obvious, but even with a single, simple incident, a CSIRT needs to look at and see how their procedures fit into the response. In a mass incidents, it gets much trickier. You have probably seen this in the medical side, though I hope not. There are not enough responders to go around. A CSIRT cannot possibly fix everything at once. So having a CSIRT that is skilled at triage is extremely important.[3]
ENISA (European Network and Information Security Agency)[4] agrees:“Triage is an essential element of any incident management capability, particularly for any established CSIRT…This process can help to identify potential security problems and prioritize the workload. ”
Problem-Tracking Software
Based on group postings, the most-used software for problem reporting and tracking is BMC’s Remedy, by a fair margin. The group reported using other software including Track-IT, Support Magic, Help Box (http://www.laytontechnology.com/pages/helpbox.asp), Heat and ORTS from the open-source world (http://otrs.org/). However, cohort members report many issues with Remedy that make using it a fair travail at times. The popularity, then, seems to be an indicator of excellent marketing rather than spectacular design, as the number of available tracking solutions is large.
Some Remedy implementations lacked the web interface, which provides user interaction. Other postings decried the lack of an efficient GUI design; this factor requires organizations to customize their installation to fit their individual needs. One can interpret a lack of an efficient GUI design coupled with the capability to customize as both a success and a failure. It is a success because that is clear recognition by BMC of the fact that individual organization’s needs are so widely different that it is a challenge to create a single interface. However, it is a failure because small organizations lack the work force, ability, or desire to customize COTS software, which reduces Remedy’s marketability. One member suggested that BMC could improve usability and product acceptance by providing three templates: complete (today’s default), help desk and asset management, and a single screen help desk only.
An interesting sub-discussion focused on a case where one IT manager disbanded the HD after implementing user-facing HD software. The manager’s expectation was that each user would use the software to report issues. He expected the software’s built-in triage function to route the issues to appropriate support teams. The manager believed that both users and IT staff would monitor system reports to track status. This perception eliminated effective service to those users who could not or would not use the software. This viewpoint also provided no capability for dynamic re-prioritization or a method to correct routing of misreported issues.
Triage and internal politics
Internal politics are a major consideration for any activity in the organization – especially sensitive functions like the CSIRT. Since the CSIRT, by definition, affects the computer operations of the entire organization during the investigation process, the potential exists for them to interact directly with many of the organization’s personnel over time. For somebody not constantly intimately familiar with CSIRT operation, the brief interaction might seem to be more of an abrasive intrusion rather than a genuine effort to help.
This means that CSIRT members need to be consummate service-oriented personnel with well-developed communication skills. In addition to communication, the team members need to be very sensitive to the political nuances within an organization. They must be able to interpret the true import of any statement rather than taking it at face value. CSIRT members must be able to isolate themselves from political influences in their investigative process, in order to stay true to their objective and be effective in proper incident resolution.
The potential exists for internal politics to cause HD staff to misrepresent incident ticket priorities - the team needs to be able to recognize this and present it to their management for appropriate action. At the same time, team members need a healthy respect for authority limits. They must be conscientious in not over-stepping their bounds without appropriate reason and permission.
The team needs to be aware of the internal drivers in an organization and business objectives, and be able to associate the triage, resolution and analysis process with the appropriate levels of attention and response. For a financial organization, the prime driver will be financial affect – for a military team, it could be team safety or mission objectives rather than cost that drive priority.
For each organization, service offerings are weighted in light of their perceived relation to the primary business. Additionally, the team must all accept that a person's perceptions are their reality - whether or not they agree with the rest. This acceptance helps the team to respond accordingly and appropriately. Each proposal needs a business case. One posting provides the following example:
“What is an industrious network administrator who needs an IPS system to do? They can take the initiative to test Snort via the freeware route. Assuming good results, they write up a business case to purchase required hardware and software including support. For the operating system, they can chose say either Red Hat or Novell offerings that include support. For the IPS, they can include a Sourcefire quote. But, even if it is the best system at a low cost, it will not fly if the network administrator is the single point of failure in manning the system.”[5]
Another important aspect of internal politics vis-à-vis the CSIRT is managing the business teams. During an incident, it is important for the CSIRT to manage not only the technical aspects of the incident but also the personnel representing the various aspects of the business who may have vested interests in following the progress of the incident.
To quote the cohort members,
“.. the politics is not in the triage process. It is in managing the business unit at the root of the incident. It’s a natural human reaction to want to protect your turf especially if you are at the root of the problem. These politics can be difficult to manage if people’s jobs are at stake.”
“From a disjointed perspective, it could be pointed out that business needs to be placed before personal considerations - however, this never seems to successfully happen in the real world.”
“..politics is closely, deeply interwoven into the fabric of our societies. However, in this case, granted that some parts of the 'parent structures' are more equal than the others, and always receive greater priority than the others - would you not accept that (apart from the extreme cases when we hop-step-jump to fix the CEO's son's games on a personal laptop) the simpler problems on the CEO's machine still have greater impact to the organization's working than perhaps a minor server crashing? Anything that has an impact on the parent structures' time has to be, in pure business value, of higher impact than large isolated technology failures.”[6]
“We could easily venture off into a discussion on political philosophy; I understand what you are saying. However, there is a psychological component missing in the value argument and I'm suggesting it is the psychology and not the value that drives behavior. This is the politics.”[7]
“C-Level positions have power. People respond to that power. From inside the company, when a C-level person asks for something the response is immediate and palpable because the C-level has power. That power can make or break a career and it can end a job. From outside the company, the board, stockholders, analysts, etc. may think the C-level person adds no value. Even if that view is held, from within the company people still respond because they want their job tomorrow and they may want to advance.
Value is determined outside by the market. People inside react to the power. Thus, the politics.”[8]
Conclusions
The cohort’s collective work experience in the Information Security (IS) arena is wide in scope. This diverse background provides the discussion forum with diverse viewpoints and different modus operandi beyond that considered within a narrow professional work environment. Group consensus is that triage is an important and required component for CSIRT operation. The size of the CSIRT must be proportional to the average number of incidents handled. The prime business of the organization takes the leading role in determining the response and escalation process. Cohort members agree that training is vital to successful CSIRT operation particularly in cases where the HD is the POC. One posting summarized the reasoning: “I think that no matter how great an organization's procedures are, every incident will be different.”[9]
Based on group postings, the most-used software for problem reporting and tracking is BMC’s Remedy, by a fair margin. The group reported using other software including Track-IT, Support Magic, Help Box (http://www.laytontechnology.com/pages/helpbox.asp), Heat and ORTS from the open-source world (http://otrs.org/). However, cohort members report many issues with Remedy that make using it a fair travail at times. The popularity, then, seems to be an indicator of excellent marketing rather than spectacular design, as the number of available tracking solutions is large. One in-progress case study included an example where the IT manager took the unusual approach of disbanding the HD after implementing user-facing HD software. The manager’s expectation was that each user would use the software to report and monitor issues.
The group recognized that internal politics are a major consideration for sensitive functions like the CSIRT. Since the CSIRT, by definition, affects the computer operations of the entire organization during the investigation process, the team must be comprised of strong communicators to avoid negative perceptions. In addition to communication, the team members need to be very sensitive to the political nuances within an organization. They must be able to interpret the true import of any statement rather than taking it at face value, or worse, weighting the value based on the person’s position in the organization. CSIRT members must be able to isolate themselves from political influences in their investigative process, in order to stay true to their objective and be effective in proper incid[1] Grance, Tim et al. (2004) Computer Security Incident Handling Guide. p. 3-14 quoted by Timothy Dzierzak in discussion
[2]
[3] Timothy Dzeirzek - quoted from discussion
[4] ENISA - http://www.enisa.europa.eu/ - ENISA, A Step-by-Step Approach on How to Setup a CSIRT,pg49 , as quoted by Gary Hummel in discussion
[5] Richard Tuttle, quoted from discussion
[6] Mani Akella, quoted from discussion
[7] James Franklin, quoted from discussion
[8] James Franklin, quoted from discussion
[9] Timothy Dzierzak, quoted from discussion