We have all gone through this cycle of vulnerability detected and patches applied in our careers. Some of us still go through this vicious cycle of tense, challenging and nerve wrecking moments when you are racing against time and people in business are asking for updates while the customer support is assuring the customers with eyes on the screen waiting for the announcement “Patch Applied”, “Service Restored” etc
Software Patch and Vulnerability Management continue to be a major challenge for many organizations. There is no single software product or vendor source of these vulnerabilities. Organizations must consider patching at all levels of software and only applying Microsoft Patch through Tuesday updates to protect systems and data from cyber-attack is not sufficient.
Organizations that were diligent with Microsoft patches avoided WannaCry related ransomware. However, flaws with Apache Struts and Intel Processors left organizations vulnerable to cyber-attack (e.g., Spectre and Meltdown).
A lot of software companies have elected to stop providing individual patches each release period. Instead, separate and distinct patches are bundled in a roll-up model. The reason for this change is to prevent patch fragmentation that led to problems like dependency errors, lengthy scans, and testing complexity. This practice has created an all or nothing condition for customers in which selecting individual patches are no longer available. Further, software companies are building these patch bundles in a monthly rollup manner. These patch bundles not only contain all the recently announced patches, but also the previously shipped patches. This cumulative update model is intended to improve security, quality, and reliability. Yes i am referring to the Microsoft and Adobe model, however with this model in practice comes the requirement for customers to perform extensive application program compatibility testing in a short period of time—especially when functionality and non-functionality (i.e., security) code changes are mixed in the update. The days of cherry-picking patches are over.
Orchestrating patching is complex and costly. Patching has many dependencies including asset management, notification tracking, risk assessment, patch preparation, QA, release management, communications, and auditing. As with the installation of any software update, many teams must collaborate to ensure success and avoid unintended interruption of service. If any of these teams are not resourced and prepared for this demand, then patches are not properly tested and announced prior to deployment creating availability and integrity risks. If patch deployment is delayed to perform necessary QA and communication, vulnerabilities linger longer for cyber- criminals to discover and exploit. Traditional operations and project management methods of patching are not nearly rapid enough.
Sadly most organizations claim to have adopted the Agile methodology which is an iterative approach to software development and delivery but fail to address when it comes to the needs of patch upgrades and its mostly neglected until an incident takes place. I’ll try my best to summarize how Agile can be implemented for patch vulnerability assessment and the structure through which you will be able to maintain pace as well as deliver quality.
Agile Scrum Patching Framework
The following table maps the terms for describing patching:
Agile Scrum Terms | Patch Terms |
Stories | Patches |
Grooming | Patch Prioritization |
Product Backlog | Remediation Database |
Sprint Planning Meeting | Selection of next patches to deploy |
Sprint Backlog | Patch activity definition and Sign up |
1 -4 week Sprint | Patch preparation and Testing |
Sprint Review | Remediation Verification Audit |
Sprint Retrospective | Patch Lessons Learned |
The Agile Scrum framework has essentially three major components: roles, artifacts, and events. Key roles include the Scrum Team, Scrum Master, and Product Owner. The Scrum Team is cross-functional (programmers, testers, QA, etc.) with 7-9 dedicated full-time members. The Scrum Master is a servant leader immersed in the activities of the team. This role is vital to keeping the team productive. This role is accountable for team alignment with Scrum values and practices. Additional Scrum Master duties include assuring dates and deliverables do not change (scope creep). The Product Owner represents the stakeholders and business requirements. The role defines feature deliverables and priorities.
Artifacts of Scrum include Stories, Backlogs and Burn Down Charts. Stories are a light-weight approach to define business requirements. Stories are intended to promote discussion. They provide a structured way for separate and discrete business needs to be conveyed so a technical team understands at a high level what work must be done.
Stories focus on “what?” and “why?” of requirements—not the “how?”. Product Backlogs contain the list of prioritized work (Stories) to be performed. Sprint Backlog contains a list of tasks to complete during the Sprint that Scrum members select from. A Burn Down Chart shows progress with the work committed to by the Scrum Team within a Sprint and is updated daily.
You can read more about Scrum Basics of Scrum 1, Basics of Scrum 2 and Basics of Scrum 3
Why Agile Scrum
Agile is release-oriented and intended for conditions where value is required fast, frequent, and without fail. The advantages of the Agile approach for patching are:
Terms | Details |
Iterative & Frequent | Continuous feedback during each incremental change daily standup meeting |
Fast, low overhead and visibility | Short 30 days sprints, low costs in terms of process and management. Backlog shared with business and IT Management |
Adaptive and cross functional | Retrospective meetings after each sprint operations, infrastructure, QA and Security teams |
Flexible | Priority can change as new critical security patches released |
A key principle of Agile is recognition that during a project, business requirements (including risk) change. Therefore, the Agile framework is built to maximize the ability to adapt to evolving requirements in rapid manner.
Because the proposed Agile Scrum approach avoids “borrowing” team members each time a new patch is announced, it prevents the mad scramble to find free key resources when a critical security update is released. Scrum teams are dedicated and cross-functional, with all necessary skills represented to fulfill the Story commitments.
Further, multiple Scrum teams can work autonomously or be linked as part of a software “train”. If the volume of patches gets too high, an additional Agile team can be temporarily formed to reduce the backlog queue length. With security updates and enhancements, each train can be related to a common platform (e.g., RHEL, Windows, etc.) or environment (e.g., Enterprise Web Apps, Customer-facing Apps, PCI apps, etc.).
As each month passes, new patches are released that result in priority and requirement changes. The Agile framework is intended for this condition of dynamic requirements. A backlog of patch requirements is continuously groomed by Security and IT Operations to ensure work is properly defined and prioritized. The backlog also serves as a simple risk register for patching. The backlog is presented to the Agile team to advance patches in risk priority order. This approach ensures the team is always dedicating their limited time and resources to the highest risk security updates. Many software companies release new security updates to the general public every month with the recommendation to implement within 30 days (e.g., Microsoft Patch Tuesday). The recurring 1-4 week “time-boxed” sprints align well with today’s typical vendor patch release cycle.
One of the biggest objections to patching is the risk of “breaking” existing application programs (e.g., financial applications, human resources applications, supply chain, etc.). A patch might target a single piece of software, this one software item (e.g., Java, Internet Explorer, etc.) might be used by multiple application programs. Unless testing occurs, patching might cause unintended service availability and data integrity issues with application programs. For example, patching Internet Explorer or Chrome might cause a financial reporting application to present errors for key menu items within the browser, or Java patch also includes functionality changes that cause HTTP errors on the website. Further, many patches do not uninstall easily, Quality Assurance (QA) and customer testing are incorporated into Agile with test planning, tracking and reporting during the entire sprint. This continuous testing approach provides organizations the ability to identify patch compatibility issues early before any unintended harm occurs from updates and patches.
Prerequisites
The first step to operate patch and vulnerability management effectively is a patch policy. This policy provides the authority to advance patch remediation when there are conflicting IT related development / delivery priorities and limited resources. The policy must contain a few key components to be effective. The policy must clearly articulate what must be patched and target completion period. Data classification, regulation, asset value, redundancy, location, and business purpose are all factors that determine patch eligibility and timing. These all must be considered in the policy so that anyone that reads the policy draws the same conclusion on scope and timing. Procedures for
Obtaining an exemption and who can authorize the exemption (and ultimately accept risk) must also be included in the patch policy. Other key policies include asset management, change management, and release management. Without these policies, organizations will not have the necessary information to know what system is eligible for a required patch and how the required software update is to be delivered properly. Lastly, a software / Application lifecycle management policy is vital. The bad news is End-of-Life (EOL) software might have unknown vulnerabilities or exploits in the wild in which a remediation will never be made available. Patching can’t fix EOL vulnerabilities.
Risk assessment is necessary to determine the severity of security vulnerabilities, scope of the vulnerability, the likelihood of being the target of an exploit, and organizational appetite for risk duration. When conducting a risk assessment, it is important for organizations to consider the following factors that could lead to increased risk:
- high value or high exposure assets are impacted
- assets historically targeted are impacted
- patch was released outside of a vendor’s regular patch release schedule
- exploits related to a security vulnerability are workable or can be automated
Patching demands a high volume of unique risk assessments as new patches are released seemingly every day. Further, risk assessment can require a significant investment of time and resources. Organizations are advised to adopt a uniform standard to evaluate and report patch risk levels across multiple products and vendors. Risk ratings can be acquired by an authoritative source (e.g., NIST, FIRST CVSS, FAIR, or OCTAVE), by software vendor (e.g., Microsoft DREAD), or patch management system (e.g., Ivanti or Flexera). These ratings are then evaluated and revised in the context of the organization and target systems.
Lastly, Audit and Regulatory Compliance teams require the asset inventory to select test samples and confirm the operational effectiveness of control. Therefore, Security, Application Development, Operations, and Infrastructure teams should collaborate to establish an enterprise-wide approach to automate software delivery that covers all technology. This is a key strategic and tactical conversation that should occur independent of any specific vendor patch notification and occur repeatedly.
How to Implement
For patch management to operate continuously and effectively, organizations must consider people, process, and technology. Lets explore these areas as they apply to Agile Patching, focus on people, roles, and organization.
The Agile roles are mapped for patching framework in the table below.
Task | Owner | Responsibility |
Patch & Vulnerability Group | Development Team | Dedicated, Cross functional team working together to deliver the committed product/patch |
Patch Service Owner | Scrum Master | Servant-leader responsible for keeping the team focused on principles, scope and deadlines. Removes impediments to team commitments and shields team from external interference. Ensures definition of done is agreed. |
Security Manager | Product Owner | Monitor new requirements (vendor patch announcements), assess risk, prioritize order of work, measure progress, communicate with business stakeholders. |
Patch & Vulnerability Group – Scrum
Scrum Teams are not assembled and then disassembled as each new project is approved. New work is presented to teams instead of new teams presented to work. A Scrum Team typically remains together through multiple projects (even multiple years). This approach optimizes staff performance and product quality while patching gains these same benefits with a dedicated team having a common mission to rapidly implement patch remediation and predictably reduce risk.
As mentioned earlier, Agile is not exclusively for software development. When Scrum is applied to patch and vulnerability management, the majority of members are not developers. Security, IT Operations, and QA represent the majority of the PVG membership. With PVG, there is very little coding done. The coding is done substantially by the software manufacturer (e.g., Microsoft, Google, WordPress, Adobe, etc.). However, coding in the form of scripting (e.g., shell scripts, Microsoft PowerShell, etc.) and installation packaging is often necessary for patch automation. Therefore, there is a requirement for software engineering expertise within the PVG. The Security representative confirms IT assets and software eligible for patch, effectiveness of patch, and order of deployment. The IT Operations representatives in the Scrum lead software packaging, scripting, and deployment efforts. The QA representatives in the Scrum confirm the patches do not break applications that reside on the target systems. Testing is an integral part of the Agile framework and necessity for PVG.
Scrum Master = Patch Service Owner
A Scrum Team can face many hurdles on their journey thru each Sprint. External interference (e.g., IT operational incidents, other projects competing for the same resource, etc.); team confusion about objectives; conflicting stakeholder demands; and, poor meeting etiquette all threaten progression of Sprint. A team member is necessary that represents the management of Scrum to sustain the principles and pace of Agile.
This is the purpose of the Scrum Master who facilitates communication, builds consensus, enables cooperation across all roles and functions. In addition, the Scrum Master embraces innovation and is a change agent by welcoming continuous improvement. The Patch Service Owner and Security Manager collaborate to ensure the right patches are being advanced in the right order. In addition, they come to an agreement of Definition of Done (DoD). DoD is the acceptance criteria for a Story and is used to assess that a Story has been completed as required. This is vital as exemptions, resource limitations, and timing all can affect the outcome. Patching can face many impediments including change freezes, end-of-life software, and dependencies on surrounding people, process, and technology. The Patch Service Owner works with the Security Manager (like Scrum Master works with Product Owner) to overcome these challenges so that they do not affect the Patch and Vulnerability Group (PVG) performance. It is important to understand that the Patch Service Owner (like Scrum Master) is not the staff manager of the PVG. The Patch Service Owner ensures the patching process is followed and understood—but has no formal authority over the PVG like a traditional staff manager. Separate staff managers are still required for each member of the PVG. The staff managers plan resourcing and rotation of the PVG (Scrum) members so that there are no key functions (i.e., Security, IT Operations, etc.) missing or unavailable for Sprints to advance. The Patch Service Owner facilitates collaboration and tries to drive decision-making at the team level—not staff manager level. Lastly, the Patch Service Owner drives cadence so that the PVG is all in step together and advancing in the right direction.
Product Owner = Security Manager
In Agile, the Product Owner is responsible for managing relationships with stakeholders and Product Backlog grooming. This includes defining new features, prioritization, scope, and timing. The PVG Security Manager is equivalent to the Agile Product Owner. PVG Security Manager monitors vendor patch notifications, performs preliminary risk assessment, leads discussion with stakeholders, identifies dependencies, and considers compensating controls. Each patch remediation is equivalent to an Agile Story. Each story contains the risk, remediation guidance, scope, timing, dependencies, and Definition of Done. As new vulnerabilities are identified and patch remediations are available, grooming is required of the Remediation Database. What was priority #1 yesterday, might no longer be today. The Security Manager considers the appropriateness of exemptions and engages all necessary stakeholders to accept risk. At the end of each Sprint, the PVG Security Manager leads a review of the Sprint deliverables to confirm vulnerability has been properly mitigated. Metrics are discussed and possible sample testing is performed. The Security Manager might then engage Internal Audit for additional control testing. Lastly, the Security Manager participates in a lessons learned meeting with the PVG after every sprint. This is equivalent to Agile Retrospective Review and is intended to sponsor continuous improvement of the patching process.
Additional PVG Members
Depending on the size of an organization, the PVG might benefit from additional members not typically found in the Agile model. For example, an individual responsible for communications might be appropriate. This individual could routinely collaborate with change and release management to obtain approval for patch deployment. Out-of- band or emergency patches might demand a communications representative from PVG to explain roles and approach to remediation.
Certification and Accreditation procedures might need updated reflecting the new mandatory patches that must be in place prior to production deployment of new systems. Another member to consider for PVG is a representative from the software distribution infrastructure team. The scope of the PVG does not include the “weeding and feeding” of the infrastructure used to deploy software. PVG is not intended to build a software deployment solution or ensure new systems have properly operating agents. Patches and security updates must blend into established software lifecycle management processes and technology standards. This includes the technology platform used to deliver the software or patches. The PVG is a customer of the automation technology platform used to deploy software—not owner or administrator. As mentioned earlier, prerequisites for patching include asset inventory management, configuration management, and automated software deployment infrastructure. These IT services are important and must to be in place and operating effectively independent of PVG.
Patching Process
Now that we have the key roles identified, we propose the necessary processes for patching using Agile Scrum framework.
Stories = Patches
As mentioned earlier, customers define features and requirements for code development at a high level in Agile using Stories. Risk, dependencies (also known as surround), and initial estimate of effort are also included in each Story. The Scrum team then selects Stories to advance in 1-4 week Sprints. For Patch and Vulnerability Group (PVG), each Story represents a patch remediation. The Security team communicates through these Stories the requirements necessary for successful patching including initial risk assessment, system eligibility, vendor instruction, location of patch code, priority, scope, timing, exemptions, and dependencies.
Product Backlog = Remediation Database
Agile development planning and project requirements are defined using a collection of stories within the Product Backlog. The Agile Product Backlog is equivalent to Remediation Database. As the Agile Product Backlog conveys to development teams what development needs to be done and when, the Remediation Database conveys to IT what patching needs to be done and when. As new patches are released from vendors, a risk assessment is performed by the Security Manager (Agile Product Owner). Once high-level requirements for patching are determined, the patch and associated information is added to Remediation Database (i.e., Agile Story representing the new patch is placed in the Product Backlog).
Grooming = Patch Prioritization
The Agile Product Backlog is dynamic as new customer demands and features are identified with new Stories. Because of this, Stories in the Product Backlog are routinely examined by the Product Owner to ensure that the most important features are being advanced first for development. The patches listed in the Remediation Database also require this continuous grooming by the Security Manager as new patches are released frequently. New patches often change work priority and effort. This Agile approach with grooming stories ensures all IT teams are on the same page with new business conditions, new requirements, and appropriate priority of feature development. Patching must have this same recurring grooming practice to ensure all IT teams are on the same page with evolving risk conditions, new patching demands, and appropriate priority of patch remediation.
Sprint Zero = Infrastructure and Organization Preparation
Many organizations conduct a Sprint Zero event before formally beginning an Agile project. The purpose is to work out environmental, organizational, and technical issues prior to the code development. This way the Scrum project work is optimized and value is maximized. Prior to any patching being performed, there is the same need to prepare environmental, organizational, and technical issues. All technologies and standards used by the PVG are examined during this event to ensure they are ready when called upon for patch preparation and deployment. During this event, the IT Operations team confirms that the automation tools are working as intended and prepared for deployment of software to computer assets. If the software deployment infrastructure is only partially ready, then the patches will be deployed to only a fraction of the intended targets leaving unintended residual risk. Further, during Sprint Zero the Security team confirms that vulnerability scanners are working as intended to identify configuration and code weaknesses. Vulnerability Scanning is necessary for confirming the patch remediation design is effective during QA and in-place after deployment to PROD. All of these actions during Sprint Zero are independent of any new patches being released. Essentially Sprint Zero for PVG means keeping the organization ready for patch demands. Sprint Zero is blended into the Scrum lifecycle and Scrum practioners either include Sprint Zero before starting a Sprint or dedicate an entire Sprint.
Sprint Execution = Patch Preparation and Testing
Once a story or stories have been selected by the Scrum team, a Sprint begins. During the Sprint no new stories can be introduced and the functionality requirements may not change. During the Sprint, the Scrum team is dedicated and works uninterrupted on a single Sprint. The Scrum team breaks down the Stories into tasks necessary to complete within Sprint. These tasks are posted within the Sprint Backlog. Scrum team decides on the actions, dependencies, and time required to complete the tasks. Scrum team members then select and commit to advance these tasks from the Sprint Backlog. Daily Standup meetings keep the team in sync with what is to be accomplished. A Burn Down Chart is used by the Scrum Master (Patch Service Owner) to track progress with tasks as all Scrum team members work toward a common completion date.
Sprint Execution is equivalent to Patch Preparation and Testing. Vendor requirements for patching are broken down into common tasks. The Scrum team then estimates effort (typically in units of days) to complete patch task). Each member of the team selects patching task(s) from the Sprint Backlog that they are committing to advance. Daily standup meetings occur to answer what was completed yesterday; what will be done today; and, are there any blockers. The daily standup is not intended to be a brainstorming session for problem-solving. It is for communicating the problem to all and identifying the necessary resources to overcome blockers. Separate breakout sessions during the day will focus on the problem and identify the necessary problem busters. PVG member attendance is mandatory for these 15-30 minute meetings to enable optimum communication and collaboration. Other teams are also welcome— though only PVG members are to be primary communicators.
Sprint Review = Patch Remediation Verification / Audit
When a Sprint is for developing functionality improvements and new features, a demonstration called Sprint Review is done to show how Scrum team delivered on its commitments. The Product Owner and Stakeholders can observe the new features at the Sprint Review and confirm the new code is working as intended. This Sprint Review ceremony must also occur for patching. The challenge for the Scrum team is how to credibly demonstrate non-functionality improvements like code fixes or security patches. Definition of Done is critical for non-functionality requirements. As the “new features” of patches are not easily observable, the DoD provides the only means of qualifying the work is complete as committed.
For patching, the following must be addressed during the Sprint Review:
- Patches included in scope
- Collection (systems in scope for patch)
- Compensating Controls and Approved Exemptions
- Necessary surround changes and dependencies
- QA results (application program capability)
- Vulnerability Scan results
The Patch Process Owner should lead this conversation with support from the Security Manager. Also, blockers that were discovered during the Sprint should be discussed when relevant to Release Management. The Burn Down Chart should be presented so that Management has a better understanding of task estimate accuracy and baseline for the duration of time (velocity) to prepare patches for release. This is insightful for emergency or out-of-band critical patch events.
The Sprint Review artifacts are critical to demonstrating the continuous and operational effectiveness of patching. Auditors (e.g., Internal Audit, PCI Qualified Security Assessor, etc.) may require samples of the Sprint Review to satisfy PCI Data Security Standard Requirement.
Release Management = Release Management
Once the Sprint Review is completed and the Security Manager has accepted the work, the patch package is ready for release. This is similar to Agile practices for software development. The Scrum team does not actually implement the code—they only create it. The PVG does not actually implement the patch—they only prepare it. Ideally, the organization already has a software distribution infrastructure or release management automation in-place. The patch packages are then blended with the other software lifecycle management activities. This approach avoids conflict by having competing software distribution standards.
Sprint Retrospectives = Patching Lessons Learned
One of the major advantages of the Sprint methodology is continuous learning and improvement. Discussions about what is and what is not working are held after every Sprint. Eligible topics include practices, estimates, communication, recurring blockers, environment, tools, etc.. The whole team participates. Short, iterative Sprint cycles allow improvements to be incorporated easily and quickly. The PVG lessons learned from patching are also valuable for software lifecycle management. The benefits realized by optimizing patching also can be realized directly with general software updates (e.g., software packaging) and indirectly (e.g., disaster recovery test scripts).
Agile Patching Sprint and Release Calendar
Below is a proposed calendar of events based on the Agile Patching approach.
Conclusion
Patching is complex, people expensive, and time-consuming. Many resources are needed to prepare and deploy critical security updates. Unfortunately for most security updates, no new software functionality is being introduced. Management might perceive this activity as having a high cost but no or limited tangible benefit.
The cost of deploying the patches is not typically funded with new revenue streams or consumer markets. Further, the liability of not rapidly deploying patches can be significant. Risk is transferred from the software manufacturer to organization/individual when a defect repair is not implemented in the manner and timeline required by the manufacturer.
Lastly, the requirements and priorities for patching are quite fluid. Organizations want patching that is fast, frequent, and without fail.