Friday, October 5, 2007

Politics, triage and CSIRTs

Below is the original version of an article I worked on with my fellow student, Rick Tuttle, on the subject that appeared in the Network World Security Strategies Newsletter (http://security-world.blogspot.com/2007_09_25_archive.html) :

----------------------------------------------------------------------------------------------------------------------------------------

Responding to Computer Emergencies:
Triage, Expertise, Escalation and Tracking

Summary of a week-long online discussion with the following team members as participants:

Stuart Cate

Grace Dalton

Timothy Dzierzek

James Franklin

Gary Hummel

Stanley Jamrog

Carolyn Keith

Saravanan Muthuswamy

Francois Pelletier

Peter Reganti

Niklaus Schild

Jeffrey Smyth

Richard Tuttle

Stephen Watson

Mani Akella

Discussion Moderator : Mich Kabay


Norwich University

MSIA Seminar 5 – Week 3 Discussion

Introduction

The Week 3 discussions about triage, expertise, escalation, and tracking in the Computer Security Incident Response Team (CSIRT) Management course of the Norwich University MSIA program proved insightful. This article summarizes of the discussions of that week.

The cohort’s collective work experience in the Information Security (IS) arena is wide in scope. The group includes members from the financial, chemical, legal, educational, government, military, and consumer industry organizations. This diverse background provides the discussion forum with diverse viewpoints and different modus operandi beyond that considered within a narrow professional work environment.

The discussion summary comprises the following topics:

· Triage and Escalation

· Problem-Tracking Software

· Triage and internal politics

Triage and Escalation

Group consensus is that triage is an important and required component for CSIRT operation. The size of the CSIRT must be proportional to the average number of incidents handled. This factor highlights the importance of maintaining historical statistical data for incident response. Because we cannot accurately predict the frequency of incident occurrence, triage becomes vital to effective CSIRT operation. By providing accurate categorization of each incident, triage makes it possible to understand incident effects and to assign priority to each incident upon arrival at the CSIRT desk. CSIRT management assumes the task to follow the incident throughout it lifecycle, and proactively adjust priority up or down, as required. This process allows for effective resource utilization while optimally responding to all incidents with consistent IS focus.

For this cohort, many represented organizations do not have a separate formal CSIRT. Instead, organizations use the IT Help Desk (HD) and associated incident escalation process to perform CSIRT response functions. For those cases where a separate CSIRT exists, organizations often utilize a single HD as point-of-contact (POC) for all incidents. HD staff then use the triage process to assign the incident response to the appropriate functional team.[1]

The prime business of the organization takes the leading role in determining the response and escalation process. For example, credit card data loss is a high priority incident for a financial organization. For these organizations, the response activity impacts, and possibly stops, all other CSIRT member’s work tasks until achieving incident resolution. For a retailer, the same data loss may only affect the transaction and sales group functional area. Management attention to the incident parallels the group response as they view the incident in terms of its affect either to the entire organization or to the individual group disruption.

Policy and procedure definition combined with the rigor in which the organization implements them greatly affects the frequency and quantity of incidents an organization faces. Triage-based incident statistics help the organization understand the big picture. This factor, in turn, helps the organization refine policies and procedures to achieve continuous improvement. Another important point highlighted was that while the core process can remain consistent across organizations, both policy and the actual escalation and individual incident priority would always remain specific to each organization’s needs, functions and structure.

Cohort members agree that training is vital to successful CSIRT operation. Because the HD is the POC, CSIRT-provided training ensures that HD staff captures all relevant information when taking the incident report. The training also ensures that the triage process functions appropriately. In addition, the training helps ensure that the response team captures all relevant information and evidence in a forensically correct fashion to preserve the chain of evidence.

An interesting parallel of the triage processes for a medical emergency as compared to the triage process for a CSIRT follows in the quote below. While the individual processes may be different, this shows that the core thinking processes are the same.

“It (triage) is a wonderful system in emergency scenarios, and adapts well to Computer Emergency Response. Now triage generally comes into play when you have a lot of casualties, although it is also done whenever you have multiple patients. Generally, you prioritize your patients. You have those that can wait, those who need emergency and immediate care, and those who are too far gone to bother helping. It seems cruel, but to save some people you can't bother treating those who are going to die anyways.

So, you do a quick evaluation of each patient. Can they wait in the treatment area? Do they need to be treated before they are shipped, or do they need to be loaded in the helicopter and shipped immediately?

CSIRT can benefit from such an arrangement. During busy times and major incidents you need to prioritize your responses so that you can make the best use of your time. What systems and incidents need treating immediately and which can wait until you can get to them? After all you have to seal the intrusion holes before you fix the servers, or you will just be doing it again later.

Triage is very appropriate in my opinion, and works well for most types of emergency response. Taking a few minutes to analyze the situation and prioritize your responses.”[2]

The response emphasizes the importance of triage to CSIRT operation.

“I think that no matter how great an organization's procedures are, every incident will be different. That point probably is obvious, but even with a single, simple incident, a CSIRT needs to look at and see how their procedures fit into the response. In a mass incidents, it gets much trickier. You have probably seen this in the medical side, though I hope not. There are not enough responders to go around. A CSIRT cannot possibly fix everything at once. So having a CSIRT that is skilled at triage is extremely important.[3]

ENISA (European Network and Information Security Agency)[4] agrees:

“Triage is an essential element of any incident management capability, particularly for any established CSIRT…This process can help to identify potential security problems and prioritize the workload. ”

Problem-Tracking Software

Based on group postings, the most-used software for problem reporting and tracking is BMC’s Remedy, by a fair margin. The group reported using other software including Track-IT, Support Magic, Help Box (http://www.laytontechnology.com/pages/helpbox.asp), Heat and ORTS from the open-source world (http://otrs.org/). However, cohort members report many issues with Remedy that make using it a fair travail at times. The popularity, then, seems to be an indicator of excellent marketing rather than spectacular design, as the number of available tracking solutions is large.

Some Remedy implementations lacked the web interface, which provides user interaction. Other postings decried the lack of an efficient GUI design; this factor requires organizations to customize their installation to fit their individual needs. One can interpret a lack of an efficient GUI design coupled with the capability to customize as both a success and a failure. It is a success because that is clear recognition by BMC of the fact that individual organization’s needs are so widely different that it is a challenge to create a single interface. However, it is a failure because small organizations lack the work force, ability, or desire to customize COTS software, which reduces Remedy’s marketability. One member suggested that BMC could improve usability and product acceptance by providing three templates: complete (today’s default), help desk and asset management, and a single screen help desk only.

An interesting sub-discussion focused on a case where one IT manager disbanded the HD after implementing user-facing HD software. The manager’s expectation was that each user would use the software to report issues. He expected the software’s built-in triage function to route the issues to appropriate support teams. The manager believed that both users and IT staff would monitor system reports to track status. This perception eliminated effective service to those users who could not or would not use the software. This viewpoint also provided no capability for dynamic re-prioritization or a method to correct routing of misreported issues.

Triage and internal politics

Internal politics are a major consideration for any activity in the organization – especially sensitive functions like the CSIRT. Since the CSIRT, by definition, affects the computer operations of the entire organization during the investigation process, the potential exists for them to interact directly with many of the organization’s personnel over time. For somebody not constantly intimately familiar with CSIRT operation, the brief interaction might seem to be more of an abrasive intrusion rather than a genuine effort to help.

This means that CSIRT members need to be consummate service-oriented personnel with well-developed communication skills. In addition to communication, the team members need to be very sensitive to the political nuances within an organization. They must be able to interpret the true import of any statement rather than taking it at face value. CSIRT members must be able to isolate themselves from political influences in their investigative process, in order to stay true to their objective and be effective in proper incident resolution.

The potential exists for internal politics to cause HD staff to misrepresent incident ticket priorities - the team needs to be able to recognize this and present it to their management for appropriate action. At the same time, team members need a healthy respect for authority limits. They must be conscientious in not over-stepping their bounds without appropriate reason and permission.

The team needs to be aware of the internal drivers in an organization and business objectives, and be able to associate the triage, resolution and analysis process with the appropriate levels of attention and response. For a financial organization, the prime driver will be financial affect – for a military team, it could be team safety or mission objectives rather than cost that drive priority.

For each organization, service offerings are weighted in light of their perceived relation to the primary business. Additionally, the team must all accept that a person's perceptions are their reality - whether or not they agree with the rest. This acceptance helps the team to respond accordingly and appropriately. Each proposal needs a business case. One posting provides the following example:

“What is an industrious network administrator who needs an IPS system to do? They can take the initiative to test Snort via the freeware route. Assuming good results, they write up a business case to purchase required hardware and software including support. For the operating system, they can chose say either Red Hat or Novell offerings that include support. For the IPS, they can include a Sourcefire quote. But, even if it is the best system at a low cost, it will not fly if the network administrator is the single point of failure in manning the system.”[5]

Another important aspect of internal politics vis-à-vis the CSIRT is managing the business teams. During an incident, it is important for the CSIRT to manage not only the technical aspects of the incident but also the personnel representing the various aspects of the business who may have vested interests in following the progress of the incident.

To quote the cohort members,

“.. the politics is not in the triage process. It is in managing the business unit at the root of the incident. It’s a natural human reaction to want to protect your turf especially if you are at the root of the problem. These politics can be difficult to manage if people’s jobs are at stake.”

“From a disjointed perspective, it could be pointed out that business needs to be placed before personal considerations - however, this never seems to successfully happen in the real world.”

“..politics is closely, deeply interwoven into the fabric of our societies. However, in this case, granted that some parts of the 'parent structures' are more equal than the others, and always receive greater priority than the others - would you not accept that (apart from the extreme cases when we hop-step-jump to fix the CEO's son's games on a personal laptop) the simpler problems on the CEO's machine still have greater impact to the organization's working than perhaps a minor server crashing? Anything that has an impact on the parent structures' time has to be, in pure business value, of higher impact than large isolated technology failures.”[6]

“We could easily venture off into a discussion on political philosophy; I understand what you are saying. However, there is a psychological component missing in the value argument and I'm suggesting it is the psychology and not the value that drives behavior. This is the politics.”[7]

“C-Level positions have power. People respond to that power. From inside the company, when a C-level person asks for something the response is immediate and palpable because the C-level has power. That power can make or break a career and it can end a job. From outside the company, the board, stockholders, analysts, etc. may think the C-level person adds no value. Even if that view is held, from within the company people still respond because they want their job tomorrow and they may want to advance.

Value is determined outside by the market. People inside react to the power. Thus, the politics.”[8]

Conclusions

The cohort’s collective work experience in the Information Security (IS) arena is wide in scope. This diverse background provides the discussion forum with diverse viewpoints and different modus operandi beyond that considered within a narrow professional work environment. Group consensus is that triage is an important and required component for CSIRT operation. The size of the CSIRT must be proportional to the average number of incidents handled. The prime business of the organization takes the leading role in determining the response and escalation process. Cohort members agree that training is vital to successful CSIRT operation particularly in cases where the HD is the POC. One posting summarized the reasoning: “I think that no matter how great an organization's procedures are, every incident will be different.”[9]

Based on group postings, the most-used software for problem reporting and tracking is BMC’s Remedy, by a fair margin. The group reported using other software including Track-IT, Support Magic, Help Box (http://www.laytontechnology.com/pages/helpbox.asp), Heat and ORTS from the open-source world (http://otrs.org/). However, cohort members report many issues with Remedy that make using it a fair travail at times. The popularity, then, seems to be an indicator of excellent marketing rather than spectacular design, as the number of available tracking solutions is large. One in-progress case study included an example where the IT manager took the unusual approach of disbanding the HD after implementing user-facing HD software. The manager’s expectation was that each user would use the software to report and monitor issues.

The group recognized that internal politics are a major consideration for sensitive functions like the CSIRT. Since the CSIRT, by definition, affects the computer operations of the entire organization during the investigation process, the team must be comprised of strong communicators to avoid negative perceptions. In addition to communication, the team members need to be very sensitive to the political nuances within an organization. They must be able to interpret the true import of any statement rather than taking it at face value, or worse, weighting the value based on the person’s position in the organization. CSIRT members must be able to isolate themselves from political influences in their investigative process, in order to stay true to their objective and be effective in proper incid


[1] Grance, Tim et al. (2004) Computer Security Incident Handling Guide. p. 3-14 quoted by Timothy Dzierzak in discussion

[2] Stanley Jamrog – quoted from discussion

[3] Timothy Dzeirzek - quoted from discussion

[4] ENISA - http://www.enisa.europa.eu/ - ENISA, A Step-by-Step Approach on How to Setup a CSIRT,pg49 , as quoted by Gary Hummel in discussion

[5] Richard Tuttle, quoted from discussion

[6] Mani Akella, quoted from discussion

[7] James Franklin, quoted from discussion

[8] James Franklin, quoted from discussion

[9] Timothy Dzierzak, quoted from discussion

Monday, September 24, 2007

Vulnerability Assessment and Intrusion Detection Systems

Is this take "infinity"? Anyways, here are my views on the subject...

Information Technology (IT) has permeated into the core functions of almost every business function today[1]. Technology has enabled the automation of most of our business processes, enabling us to conduct business at a much faster pace with greater reliability.

However, IT, along with its benefits, has brought along a complement of complexity and security concerns. Data volumes have grown exponentially, and systems and applications continue to proliferate daily. The increased footprint of the application space means that there are more applications that could be vulnerable to intrusion and unauthorized access.

The early days of IT security focused primarily on perimeter security and authentication controls. Firewalls provided perimeter security while various network-wide authentication solutions that include NIS+ (Network Information Services), LDAP (Lightweight Directory Access Protocol) and Microsoft’s Active Directory. Network devices rely on RADIUS (Remote Authentication Dial-In User Service) and TACACS+ (Terminal Access Controller Access-Control System). However firewalls and authentication servers do not provide necessary protection against application vulnerabilities. This is primarily because applications, to function, need access (outbound or inbound) and are afforded permission accordingly through the firewalls and proxy servers. This authorized route is then exploited to take advantage of any vulnerability that the application may contain.

Vulnerability assessment systems (VAS) are used to scan systems, application and networks to search for any vulnerability that may be present. These then need to be analyzed for cause and effect, and any additional necessary protections put in place.

Intrusion detection systems (IDS) constantly watch systems and network for any functional anomaly or activity that looks like an intrusion attempt. They are configured to react according to the nature and severity of the detected event.

The systems in detail

Vulnerability Assessment Systems

System and network vulnerabilities can be classified into three broad categories[2]:

  • Software vulnerabilities – these include bugs, missing patches and insecure default configurations.
  • Administration vulnerabilities – Insecure administrative privileges, improper use of administrative options or insecure password allowed
  • Risks associated with user activity – Policy avoidance by user (bypassing virus scans), installing unapproved software, sharing access with others

Vulnerability scanners are used to scan systems, applications and networks to identify vulnerabilities that cause these risks.

Vulnerability assessment systems come in two flavors – network-based and host-based. Network-based scanners scan the entire network to provide an overall view of the most critical vulnerabilities present on the network. They are able to quickly identify perimeter vulnerabilities and insecure locations that could provide easy access to an intruder. These include unauthorized telephone modems on systems, insecure system services and accounts, vulnerable network services (SNMP[3] and DNS[4] are two examples), network devices (e.g.: routers) configured with default passwords and insecure configurations (e.g.: a default allow rule for all traffic on a firewall).

One issue that is frequently faced by anyone using a network vulnerability scanner is that it can cause possible network interruptions and even service disruption and server outages during a scan. This happens because the scanner, in the process of scanning for vulnerabilities, could actually exploit existing vulnerabilities and generate Denial-of-Service (DoS[5]) attacks against networks and systems. To mitigate this risk, the scans are often scheduled for times when the business faces minimal interruption of service from scenarios like the ones described above. However, this also leads to the possibility of missing possible critical vulnerabilities since some services, applications and servers may not be available on the network when they are not in use, thus hiding possible vulnerabilities in them.

A clear advantage that network-based scanners have is that they are independent of the hosts and devices in use. They use their own resources for operation and do not need to be installed on hosts or network devices in order to complete their function. However, this also means that they cannot perform deep scans of individual systems since they can only scan those services and applications that are available and can be probed from the network.

This is the area host-based scanners excel at. They are installed on the host and have the ability to scan the host deeply to identify all possible vulnerabilities.

Host-based vulnerability scanners get to be more granular in their scanning and results. Since they are installed on the host, they have the ability to probe deeply into the host, searching for vulnerabilities on the host that would be otherwise invisible or not easily identifiable from the network. They are able to probe applications and the host operating system and system processes for possible weaknesses and vulnerabilities.

However, by the very nature of their function, they are intrusive and have the ability to upset the functional balance of a server. They are a powerful tool that, if subject to any form of misuse, can cause unforeseeable problems on the server and networks. Since they are designed to probe for vulnerabilities, any misuse can lead to a serious compromise of an organization’s digital assets.

Intrusion Detection Systems (IDS)

Intrusion detection systems complement the function of vulnerability assessment systems. While VASs probe for vulnerabilities, IDSs look at the network and system activity, inbound network data streams, and anomalous behavior. IDSs are designed to identify behavior that does not conform to pre-defined ‘normal’ activity. On detecting any signs of abnormal activity, they can trigger alerts or even evasive and preventive measures to halt or slow down the suspected attack while relevant personnel can investigate and clear or escalate the alert.

IDSs can be of two types – the traditional signature based kind that identify intrusion by searching for data patterns in attack streams that match signatures from a pre-built database, or anomaly detecting systems that watch networks and systems continuously to build a pattern of normal behavior, and compare this to activity during normal operations to detect possible intrusions. The newer IDSs available commercially tend to be a hybrid, using both these methods to improve their chances of positively detecting intrusion while reducing their rate of false positives.

Like VASs, IDSs also are of two distinct types based on their deployment method. Network based IDSs (sometimes referred to as NIDS) are stand-alone devices that sit on the network, normally at the point of ingress and egress, doing the job of watchdogs on the network. Host-based IDSs (referred to as HIDS) are more intrusive, being installed on individual hosts and watching over all host activity intimately from their vantage point.

Host-based IDSs and VASs are mostly limited in their scope to the host they are installed on, but are able to do a deep inspection of the local host. Network based IDSs and VASs can scan large networks and vast numbers of networked hosts and devices, but cannot get into the intimate works of individual devices – they are limited in their reach to what is visible from the network.


[1] THE ROLE OF INFORMATION TECHNOLOGY IN ORGANIZATION DESIGN Authors: Lucas, Henry C., Jr. and Baroudi, Jack http://hdl.handle.net/2451/14315

[2] ISS Whitepaper on vulnerability scanners - http://documents.iss.net/whitepapers/nva.pdf

Thursday, September 20, 2007

Living in a world of 'spin'

Today, 'spin' seems to direct more public thinking than ever before. Marketing and spin (check out this article - http://www.onlineopinion.com.au/view.asp?article=3752 ) are used
more than ever to direct thinking in a specific pattern, and direct public action. Is this the reason children are being taught lesser self-reliance in schools now? Big brother - is it slowly becoming reality?

Browsing the web, I chanced across an article that said Microsoft puts out patches quicker than any other OS vendor, and hence, MS users face much lesser risk - they even measured it using a new measurement they called "days at risk". This set me thinking. Sure, MS may release patches faster - but :

Do all users always patch everything the moment the patch is released?
Why has no one compared the number of times each OS vendor patches their patches?
For each vendor, what product needs the patch each time?
Which vendor has a much-used popular product that needs patches?
Does anyone have any measured statistics on the relation between the software needing patches and its popularity, use and misuse?

If these numbers are available , how can we as the 'hopeful' guardians of cyber-integrity link these numbers to such articles that present a lop-sided view of the situation? Growing up, I learnt about 'lies, damned lies and statistics'.

Some interesting links on thinking about lies, statistics and lawyers - when will marketing and sales be added to this roll?

http://www.experts.com/showArticle.asp?id=153
http://www.rgj.com/blogs/inside-nevada-politics/2006/09/tarkanian-refutes-lies.html
http://weblog.leidenuniv.nl/users/gillrd/2007/06/lies_damned_lies_and_legal_truths_1.php


So ho will teach the world common sense again?

Wednesday, September 5, 2007

Engineering failures - or 20/20 hindsight?

A few minutes ago I read an email saying Palm is withdrawing the Foleo platform at the twelfth hour - http://blog.palm.com/palm/2007/09/a-message-to-pa.html.

Yesterday, I heard a VOIP BlueBox podcast (http://www.blueboxpodcast.com/) arguing the relative merits of the SIP protocol from an engineering and design perspective – and the fact that security considerations seems to have been added in much later after the protocol design was essentially complete and the first set of users were already using it in the public user space.


A few weeks ago, newspapers and media were wringing hands at the ghastly bridge collapse in Minnesota (http://en.wikipedia.org/wiki/I-35W_Mississippi_River_bridge). Every media report was quick to focus on the seeming 'design failures' and money was quickly sanctioned across the country to 'inspect' the rest of the existing bridges.


Two years ago, after hurricanes Katrina and Rita wreaked havoc, more studies focused on the engineering failures there.


But is all of this truly engineering failure? Are we looking at the original specifications for the designs in consideration? TCP/IP and the associated network protocols worked perfectly for their original design – fault-tolerant robust network connectivity to share information between peer universities. The security problem surfaced after commercial interests worked to expand the original network into the Internet of today, without adapting the original protocol for their proposed use and/or testing it for the proposed set of uses.


The same is true for the bridge collapse and the hurricane stories – the engineers did their work and highlighted the limits of their design. However, other interests kicked in, signed off on unknown risks without complete information, and the result is the slurry pools we see today :) so I ask myself the question – should we be blaming the engineers for poor design?


Then again, all testing does not necessarily highlight all issues, as the Skype issue (http://blog.tmcnet.com/blog/tom-keating/skype/skype-offline-latest-update.) proves. The protocol seemed to work fine – till it reached the perfect tipping point – software updates, a P2P mesh that was never tested at this volume (I too would love a lab that could test 20 million simultaneous online users and help me prepare for all eventualities – but is that a fair request to make commercially of any organization to setup?), and a network with unprecented global usage. So who do we blame this on? Skype - (who else?) for providing a service that costs - for basic usage - nothing at all except the cost of an internet connection :)


The current environment seems to focus on finding someone to blame for all failures - irrespective of the validity of the failure and invalidity of the use that caused the failure. Engineering needs to step up to the plate in their defense. They need to shake off their recitence at public speaking, and document their engineered specifications better. And others need to watch their use patterns to fit the engineered specifications - or look for engineering modifications to fit new proposed usage. In the absence of this rigor, watch out for more failures in similar patterns! 20/20 hindsight is always correct - how about moving that correctness to before the failure rather than armchair pontification?

Tuesday, August 28, 2007

Do infosec engineers live in ivory towers?

Okay - own up, infosec engineers - do you train in the business process and understand the relevance of business need for survival?

Most of the engineering I have seen seems to have little relevance to or understanding of ground reality - is this endemic? For example - Risk Management and Information Security will spend days closeted with security vendors and their offered toys - and come up with new designs centered on their toys of choice for the *new and improved* infrastructure security model – with little to scant attention paid to how this will translate to reality. There is little indication for a proper upgrade/replacement path, no attention to forward facing support structures, no explanation for ROI and scant information on what to do with the current setup in place or why it is no longer good enough.

Would this explain the lack of respect for most IT and IS outfits in business circles? Would this be a reason businesses tend to treat IT as an exotic toy that can enable business, but can be dropped like hot potatoes at the first sign of financial distress? Are we responsible for our lack of appreciation of digital information handling as a true business enabler instead of just another tool?

If those are the questions, here are some suggestionsJ!

1) Involve business and production support in the requirements gathering and the design process from the beginning.

2) Understand the business drivers that are funding the current infrastructure, and relate the change and its cost to these drivers. A solution that costs more that the expected return will end up in the gutter every time!

3) Understand at least the basic accounting principles that the firm uses – depreciation, asset classes and expense classification are good starters. Use these to present the offering in language that the bean counters will understand (and do not look down on them – they are the ones that been the business moving forward on the financial path even as marketing and sales do their own wonders – this is the grease that smoothens the path after all – and no money = no job pretty soon!)

4) Understand compliance needs. The latest security gadget might not fit after all if the logging schema requires a full redesign to accommodate compliance and audit requirements.

5) Support – can the new stuff all be supported once in? What is the associated cost there? Do the support teams need training? When will this happen? Can the shifts all have proper representation at training while still manning the fort? What is the escalation process? Who is responsible for design updates after the product is in production? Will engineering offer sufficient support for the teething period in the initial production days? How will this commitment affect other engineering work?

Okay – all questions are answered to everyone’s satisfaction, and the project is ready for rollout. Now for the BIG question – is it still required?

To quite Benjamin Disraeli (from the last phrase of his book – Coningsby ) – “They stand now on the threshold of public life. They are in the leash, but in a moment they will be slipped. What will be their fate? Will they maintain in august assemblies and high places the great truths which, in study and in solitude, they have embraced? Or will their courage exhaust itself in the struggle, their enthusiasm evaporate before hollow-hearted ridicule, their generous impulses yield with a vulgar catastrophe to the tawdry temptations of a low ambition? Will their skilled intelligence subside into being the adroit tool of a corrupt party? Will Vanity confound their fortunes, or Jealousy wither their sympathies? Or will they remain brave, single, and true; refuse to bow before shadows and worship phrases; sensible of the greatness of their position, recognise the greatness of their duties; denounce to a perplexed and disheartened world the frigid theories of a generalising age that have destroyed the individuality of man, and restore the happiness of their country by believing in their own energies, and daring to be great?”

Friday, August 24, 2007

Project management in the security space - trials and travails

Today's information security project manager faces the same dilemma that many engineers have learned to deal with over the decades - how to deal with incomplete information. Rigid time-lines do not allow for much flexibility when the information sought to complete designs or implement solutions is not available. Given the management concerns and compliance reluctance on release of ANY information even barely classifiable as sensitive, this happens more often than one is wont to believe. So how do project managers deal with this?

Some project managers I have worked with in the past have used the universal favorite - the segments of the project that could be held up by the delayed information are somehow contrived to be 'out-of-scope' - and the rest of the project is repainted as the complete project, and delivered 'on-time' and as an added bonus, 'under-budget' since the removal of the difficult items did not extend to corresponding budget corrections :)

Other project managers have made a fine art of this principle, using it fairly indiscriminately to always deliver projects on time and under budget or on budget. But this still leaves the base question unanswered - how do we deal handle getting the information in time? What are the project manager's responsibilities and practical options for dealing with the solution?

Here are some tools that I have found useful.
  1. Education - Project managers need to comprehensively understand information security and information assurance, at least to the extent we understand the fields today, before they are asked to manage IA and IS projects. These projects have the capability to negatively affect both public image and the bottom line if not carefully herded through to completion with due diligence. While the project management skills requirements remain the same, these project managers additionally need to understand not just the politics in an organization but also the information security and assurance drivers within the organization, and their interaction with all connected external clients and partners, and the way the client and partner relations drive the information security and assurance processes.
  2. Due diligence - This is true for any project - do all necessary AND apparently unnecessary investigation and research up front to avoid unpleasant surprises later. Investigate the relevance of all required data, and clear the same with legal, compliance and risk teams well in advance. If new data paths are being created or old ones removed, do the same check with the above teams - and repeat in case the one member not present in the original meetings has a difference of opinion. If anyone complains about redundant questions or questioning, repeat the question again - and then again, for good measure. One person's displeasure is fair play versus possible senior management unhappiness due to project delays and expanding budgets.
  3. Measure risk - and report as widely as possible. Redo risk measures for all relevant systems, and all ancillary systems that could get touched. Develop a process to automatically (hopefully !) update risk measures of individual components as other components get added or deleted, or the metrics for a specific component change.
  4. Work closely with risk management, help desk and computer security incident response teams - involve them from the get go to ensure that any incidents as the project progresses keep the project team in the information loop for necessary course corrections - instead of the 'end-of-project-review' when the security teams reject the entire suggestion!
  5. Inform the architecture team (create one if the organization does not already have one) and ensure that there is a two-way flow of information regarding changes in the infrastructure and architecture landscape progresses: think of the number of projects that lost steam when someone realized that their work was no longer needed due to other design or business need changes.
  6. Ensure business representation and participation - the project exists because the business felt the need for it, after all. No IT work exists in a vacuum - it is driven by business need and not vice versa.
  7. The Information Security project manager needs SPECIAL people management skills. If you thought that IT professionals were a weird bunch, wait till you meet your first IT security tech. Then multiply that by 7 (the average team size ?!) and consider the fact that you will be closeted with them in conference rooms feeding them brain food (pizza and coke) for extended periods of time, at the same time having lost all contact with the real world. Think of spending sixteen hours at a stretch between fun terms like 10Base-T, Fuzzy logic randomizer, BMUS, CIA Triad , AES/DES/3DES, and YMMV interspersed by an extremely animated discussion (politic name for heated argument) on the relative merits of a layer two firewall versus a layer three firewall, or a unified threat management appliance versus discrete specialized components for each desired function, and their combined throughput merits.
  8. Recognize excuses - and deal with them appropriately. Information security personnel are no different from the rest of the world - they succumb to pressure. Recognize the symptoms, and deal with it. However, realize that this crew is not really easily replaceable, and their inherent institutional knowledge is definitely not only replaceable, but takes a long time to reacquire for a replacement.
  9. Pay attention to the undertone murmurings. In this environment, they could indicate potential show-stoppers, and make or break the project. Typically, most project failures can be blamed on poor management rather than technical ineptitude, but information security projects can fail for a new reason - changes in the information security landscape. Keep an ear to the ground, and follow the regulatory world with extra attention.
  10. Love the job - Very important. If you do not enjoy this work, stay away - it can make a lot of people sleep a lot easier, including yourself! These projects can add feathers, but can also bring on the tar!
I would love to hear other's project management tales and experiences. So please share if you can!

Thursday, August 23, 2007

Some thoughts on understanding the people value in a Information Security team
Where is the new internet world headed? Complexity begets an accompanying loss of assurable security, as is evidenced by all the unhappy digital break-in news around us. There is even lesser comfort in the fact that most of the software out today is was never designed with security in mind, and is today uncomfortably ensconced in an ostensibly protective cocoon of security devices, that seem to work more to prevent the application from working rather than prevent it from attack.
Our biggest shortfall today seems to be our lack of recognition that what we know is not even the tip of the iceberg - and yet most leaders and managers focus on just that little tidbit and ignore the larger danger of the unknown and undefined lurking below. In this headlong rush to cut costs while maintaining operations, the easiest win SEEMS to be to automate functions and drop head count, but that is the worst thing to do in the security domain. The big losses are:
  1. Loss of institutional knowledge that seasoned warriors have, that will take newbies ages to learn
  2. Automated scanners and detectors can only recognize known attacks - they are helpless against the unknown or zero-day attacks and vulnerabilities
  3. Today's fuzzy logic solutions are not seasoned solutions. While they represent cutting edge technology, they still have to be field proven - and do you want to be the one providing the field test opportunity, especially with the crown jewels of your digital assets at stake?
Automated solutions can at best complement a well-rounded security team - they cannot replace them (not yet, anyways!).