1 It’s the Big One: Managing Your Remediation Efforts After a CyberattackApril 19, 2016
2 UVa Background Founded by Thomas Jefferson in 1819, the University of Virginia is made up of eleven schools in Charlottesville plus the College at Wise in Southwest Virginia. For 2016, the University was ranked the No. 3 best public university by U.S. News and World Report 15,669 total undergraduate students (on Grounds) 6,316 total graduate and professional students (on Grounds) 12,845 full-time faculty and staff 240 Central IT staff (Half of the University’s IT spend is decentralized; in the units and schools)
3 Panelist Background Dana German - Deputy CIO & AVP for Strategic Initiatives & Portfolio Management Brian Davis - Director of Information Security, Policy & Access Dave Strite – AVP for User Experience & Engagement Clayton Lockhart – AVP for Enterprise Infrastructure
4 The Big One (Summer of 2015) Phoenix Project June July AugustUVa Attack Confirmed (Federal Authorities; Mandiant) Active Monitoring & Continued Investigation Remediation Prep Phoenix Project Initial Contact By Federal Authorities May 29, 2015 June July August June 11, 2015 August 14, 2015 5:00 pm
5 The Big One: How Did it Happen?Contacted by Federal agency investigating an attack from our network against a third party by a nation state actor UVa’s investigation determined that in addition to using UVa as a pass through, the attackers pivoted into our Active Directory infrastructure; subsequent investigation found a second attack group Attack/Group A Exploited a user-installed dropbox on a public website to upload webshells in May 2015 Leveraged webshells to obtain administrative passwords and move laterally into our domain controllers and our Windows-based infrastructure Attack/Group B: Uploaded webshells in March 2014 through a ColdFusion administration page. Leveraged webshells to obtain an administrative password and move laterally into other servers Average time APT in before discovery is 18 months – CHECK STAT
6 Resist immediate temptation to ‘fix’Worked with Federal agency and Mandiant to respond Following normal procedures, once identification is complete we move quickly to containment and eradication. Step 1.1: “Immediately contain and limit the exposure.” Given the depth and breadth of the penetration and the sophistication of the attackers, we were advised to allow the attackers to remain active in our systems until we could be sure that the full extent of their activities were known so that we could effectively remove them don’t act too soon; don’t wait too long; take your institutional calendar into account had to repeated restrain engineers from trying to immediate fix the problems identified Mandiant provided investigative tools and expertise that we didn’t have in addition to a few site visits, had daily debriefings with core team we watched for weeks as the list of affected systems grew (retroactively) By second week of July, we chose August 14 as our remediation date – the last weekend before student move-in and the earliest we could hope to have the infrastructure fixes in place
7 Phoenix Project Magnitude of remediation effort and constrained resources/time-line led to “project” approach Project approach/methodology kicked into high gear on 7/13/15 Project team structure Daily team lead meetings w/ PM & CIO each morning Daily status meetings w/ PM & CIO at end of the day Many specialized, sub-group meetings Temporary project space (privacy & confidentiality; large meeting room; smaller break-out rooms; regularly refreshed snacks/coffee/sodas/water; office supplied; AV equipment, etc.)
8 Phoenix Project Utilized standard project planning/management templates: Risk Register (Example: Security compromise would become public before we were ready to remediate) Issues Log (Example: CyberArk procurement & planned use versus time available for setup & configuration) MS Project Schedule (initially based upon Mandiant’s recommendations, and evolved during project) Communication Plan (Example: Entry for notifying BOV members, Governor’s office, Deans, etc. at 4pm on 8/14/15) Twice weekly calls with Mandiant to review continued investigation results Twice weekly meetings with CIO and designated University leadership Developed many MOPs Developed an ‘UberMOP’ that served as our detailed master plan/check-list during final remediation weekend
9 Phoenix Project PROJECT TEAM LEADS: Active Directory ExchangeWindows Servers Network Workstations Password Changes Enhanced Logging Help Desk/User Support/Access Management Data Scanning Communications – Internal Communications - External Total of 88 Central IT staff involved including CIO, PM, Team Leads
10 MS Infrastructure/Server RebuildNumber of servers involved: Accessed: 46 Compromised: 12 Suspect: 4 False Positives: 3 Compromised servers: 7 DCs; 3 Exchange; 2 SCCM Infrastructure rebuild Server rebuilds, scrubbing, some had to wait Special clean rebuild network Special challenges: No unified systems inventory Non-standard builds Identification of application owners, ability to re-install application software, etc. Rebuilding NE Rehosting; consideration of implications; shut-down of some servers/applications Activities completed pre-remediation weekend vs. during remediation weekend vs. later Sequencing and dependencies during remediation weekend Figuring out impact of going dark; location of machines affected; which segments could be left open during remediation weekend/Go-Dark period Total: 65
11 Support Planning Busiest support time of year53,000+ user password change Service account passwords Predicting support volume Analyzed support scenarios with user experience testing Enhanced self-service password application to reduce calls Increased support staff Help Desk (prepping Blackboard with little information) Access Management 420 Tier 2 1236 with extended hours Support staff not told or trained until two days before event Revised support tools and documentation put in place 90% changed through self-service Call center provided by insurance
12 Communication Planning & ChallengesStakeholders University Communications and University Spokesperson Legal Enterprise Risk Management Timing Need to know (“Read-in” list) Just-in-time Going public at last minute Audience In addition to faculty, staff, student, we had governor, public, parents, press, board members Message Limited in what we could say In some cases we had last minute information Content had to be reviewed Channels Emergency notification system Back up web sites – University websites down
13 Tabletop Exercise Dry run/partial simulation of remediation weekendTeam leads gathered in project space; technicians/engineers worked from their desks Incident bridge used Walked through all details in UberMOP Technicians knew about their individual role/piece of the puzzle, but the Table Top provided the big picture for how everything was to come together Technicians/engineers informed about overall, well-planned/timed communication strategy 100% participation in Tabletop was mandatory (for all with a primary or backup role) Tabletop occurred on the morning of 8/13/15 (originally slated for earlier that week, but we weren’t ready)
14 Remediation Weekend (Aug 14 – 16)Religiously executed UberMOP and Communication Plan “Incident” bridge commenced at 4:00pm Friday, August 14th Go-Dark period began at 5:00pm PM served in a hybrid Incident Manager role; had designated Incident Coordinator assigned in shifts Regular status reporting on bridge; UberMOP items checked-off as completed Worked around the clock Friday through Saturday 10pm Resumed at 8am Sunday; finished at 4pm Sunday
15 Remediation Summary Total Staff “Read-in”: 198 through Aug. 12; adding support group brought it to 241 by Aug. 14 Total Central IT Staff Involved (at varying levels): 88 Total Central IT Staff Time Spent: 12,179 hours during 6/11/15 – 8/30/15 (Equivalent of 5.9 years for 1 FTE) Project costs: Out-of-Pocket: $1,311,000 IT Staff Time: $ 800,000 TOTAL: $2,111,000 Notes: Possible additional vendor costs for further vulnerability identification/analysis: $400,000 Insurance offsets: $304,864 (3rd party insurance helped with ‘investigation’ cost); $19,000 (self-insured account) 3,141 hours logged during remediation week/weekend: 8/10/15 – 8/16/15 21 people worked >70 hours that week; up to a high mark of 91 hours
16 General Lessons LearnedStrong leadership support was key; CIO engagement contributed to rapid, balanced decision making Project management approach (discipline/structure/daily meetings; logging risks and issues; formal communication strategy) was vital per magnitude of the remediation effort Delineate & document specific team/individual roles & responsibilities for all remediation activities (w/ primary and backup assignments) and ensure that the entire team knows what everyone is doing; Assign responsibilities for issue resolution; regularly review and update risk register and issues log Provide secure space and collaboration resources for core team You really can stop working on projects and other ‘high priority’ items when you have to; Phoenix Project was our highest priority and trumped everything Bring in outside experts (Mandiant, etc.) for investigation and remediation support Have at least one tabletop/practice run Use a conference bridge throughout the entire remediation event & if you have formalized “Incident management’ practices, utilize them to augment project efforts Inventory and document systems/applications/data and cross-train staff BEFORE you have an event! 3,141 hours logged during remediation week/weekend: 8/10/15 – 8/16/15 21 people worked >70 hours that week; up to a high mark of 91 hours
17 General Lessons Learned (continued)There is great need for proactive detection tools Invest in insurance! (not just for $ assistance, but also for outside counsel expertise, etc.) What other lessons have YOU learned during incident remediation(s) that you’d be willing to share with this group? 3,141 hours logged during remediation week/weekend: 8/10/15 – 8/16/15 21 people worked >70 hours that week; up to a high mark of 91 hours
18 Questions & Answers ?
19 Contact Info Dana German Brian Davis Dave Strite – Clayton Lockhart –
20 Appendix Risk Register Template Issues Log TemplateCommunication Plan Template
21 Risk Register TemplateLog risks that can derail your project! Pay extra attention to ‘red’ risks. They are more likely to happen and have a greater negative impact if they occur! Risk Number Risk Status Date Identified Risk Description Impact (1-4) Probability (1-4) Risk Factor (I*P) Mitigation Strategy/Status Owner Next Review or Expected Mitigation Date 1 Active 12-Jan-14 Test Risk 1 2 Mitigation 1 Bob 15-Feb-14 Test Risk 2 3 6 Mitigation 2 Suzie 1-Mar-14 Retired Test Risk 3 4 12 Mitigation 3 Sam Test Risk 4 Mitigation 4 2-Apr-14 5 Test Risk 5 Mitigation 5 Test Risk 6 Mitigation 6 7 Test Risk 7 Mitigation 7 8 Test Risk 8 9 Mitigation 8 15-Mar-14 Test Risk 9 Mitigation 9 10 Test Risk 10 Mitigation 10
22 Issues Log Template ID Date Reported Status Priority Short Description Description Assigned To Addtl Comments Resolution 1 11-Mar-14 Open Normal 2 3 12-Mar-14 4 13-Mar-14 High 5 17-Mar-14 6 7 25-Mar-14 8 28-Mar-14 Resolved 9 10 20-Mar-14 Document issues that surface and need to be addressed. Make specific assignments for issue resolution and regularly track status until issues are resolved.
23 Communication Plan TemplateCommunication Date Time of Day Audience Message Mechanism Assigned To Status Faculty Staff , webiste updates Students Board of Visitors , phone calls Deans In-person, Dean's Meeting President's Cabinet Local IT Liaisons , website updates Governor Phone call