CISO Guidance for AI Security

by | Jan 4, 2024

Editor’s Note: This blog post was originally published on June 28, 2023 and has been revised and updated for accuracy and comprehensiveness.

Every day, organizations are finding new uses for AI models. As with any transformative technology, AI introduces both risks and opportunities for businesses, and organizations should be prepared to protect their AI technology at the same level they protect traditional “crown jewel” or other sensitive data. Senior leadership will look to CISOs for guidance, both on how to protect AI and how to use it to enhance the security of their organizations. What follows is some guidance on how a CISO might proceed as AI technology evolves and is deployed.

CISO Scenario: Microsoft Copilot

Minimal Increased Risk
Moderate Increased Risk
Significant Increased Risk
Security Considerations in Copilot

Sensitive Data Exposure to Employees

Once enabled, Copilot will provide answers based on existing user access controls.

Existing SharePoint and OneDrive user access controls can be over-permissive. This can result in employees accessing data not intended for their use.

This risk is not new, but exacerbated by the ease with which AI can correlate, analyze and present the data back to the employee.

External Connections

Copilot allows for read/write API connections to cloud and external data sources such as ticketing systems.

There are two options for connections, read-only graph connectors and read/write plugins. The connectors and plugins respect the authorizations defined by the external application which stresses the importance of strong access controls everywhere.

This risk is the same as an external API which performs functions in an application.

No Out-of-the-Box Alerting Available

Copilot search queries and results are not indexed for real-time security results.

Existing SharePoint search alerts do not currently work with Copilot.

Threat actors with access to an employee account may have an advantage in gaining access to sensitive data compared with previously established techniques.

Copilot queries and results are only currently available in a forensic eDiscovery context.

Top Recommendations
DATA CLASSIFICATION AND RIGHTS MANAGEMENT

Use Purview sensitivity labels to classify and encrypt documents containing sensitive data, and restrict access. Labels can be applied automatically and retroactivity with Purview.  Accuracy is understandably a challenge and tuning needs resources.

Syntex is a $3/user/month add-on that can help manage SharePoint permissions, but will require significant effort.

 

DATA LOSS PREVENTION

Microsoft Purview or existing DLP controls can and should be used to retroactively and temporarily limit access to documents that have overly permissive access controls.

 

METHODICALLY ENABLE SHAREPOINT SITES

The semantic index is queried when a user submits a co-pilot search. Excluding SharePoint sites from Microsoft Search and the semantic index will prevent Copilot from returning results from those sites regardless of permissions on the files within those sites.

CISO Scenario: Internal LLMs

Minimal Increased Risk
Moderate Increased Risk
Significant Increased Risk
Security Considerations in Internal LLMs

Training Data

Training data that helps to build and train internal LLM models may contain sensitive data.

Internally developed AI training sets can quickly become “crown jewels” of the organization.

Employees with overly permissive access or threat actors with malicious intent may alter the training data resulting in bad model or one that produces incorrect results.

Training data can also contain PII, ePHI and other sensitive business IP.

LLM Access Control

Access to the production LLM will have the same implications of access to production web apps.

Traditional access controls can help prevent unauthorized access to production internal LLMs.

This is not a new risk and purpose-built LLM’s should be limited to individuals and groups that require access via your existing IAM controls.

Dev Environments

LLM creation and development won’t be limited to approved developer environments.

The ease of which a new LLM can be created and deployed will lead to LLM sprawl and the use of unauthorized LLM usage within business units.

Consider this risk like “shadow IT”.  Providing approved architectures, environments and tools will help curb this.

Top Recommendations
AI and LLM TRAINING AND INTERNAL POLICIES

Continue to educate your employees about the risks associated with public LLMs and the importance of safeguarding sensitive data. Training should include how to recognize and handle sensitive information and the consequences of policy violations.

 

REPOSITORY SCANNING

Establish and enforce a policy that bans the use of data sets in code repositories. Establish regular scanning of inventoried repositories for configuration violations, the presence of data sets, and other sensitive data, such as credentials. Look to establish regular OSINT scanning for repositories that should not be public. Leverage inventory data to communicate alerts back to asset owners.

CISO Scenario: Public and Third Party LLMs

Minimal Increased Risk
Moderate Increased Risk
Significant Increased Risk
Risks of Exposing Internal Data to Public LLMs

Accidental Sensitive Data Exposure

Without blocking, it is easy for an employee to accidentally expose sensitive data to an LLM via file upload and copy/paste.

LLM’s allow for file upload; however, the current likelihood of a large data set that a public LLM could train on is minimal due to the size of complete data sets and public LLM current limitations.

Leaking incomplete data sets is a more common scenario but can be limited by implemented appropriate firewall or proxy blocks. Currently, ChatGPT has a 10GB-per-day limit on file uploads.

Malicious Sensitive Data Exposure

Threat actors may become interested in gaining access to company data, but the risk has not increased.

The existence of publicly available LLM’s does not significantly change the threat profile for organizations, except those creating sought-after private AI algorithms.

A threat actor motivated to steal sensitive data has not gained any significant advantage with the advent of public LLM’s.

Third Party Data Loss

Third parties may be using the data you provide them to train their LLMs.

Third party usage of data is now and will continue to be difficult to track

While policies and contracts can stipulate that third parties should not use data in public or semi-public LLM’s, there are no controls limiting this exposure. In addition, third-party tools may be silently training a non-public LLM for later use.

Top Recommendations
KEEP FOCUS ON SECURITY FUNDAMENTALS

Continue to focus on your security programs initiatives driven by measurable results from purple teams within your continuous security testing program, NIST/ISO gap assessments and basics like strong auth, vuln management, and IAM.

 

THIRD PARTY RISK ASSESSMENTS 

Create new standard questions and disseminate them across the organization for third party risk assessments. Review TPA’s to ensure that proper legal protections are in place to minimize unwanted exposure to AI service.  This also applies to M&A activity. If M&A includes new data sets, ensure a process is in place to onboard them into the enterprise data set inventory.

 

MONITORING AND DATA LOSS PREVENTION

Adapt your current detect and block solutions to focus on new AI risks such as blocking unapproved public LLMs, forcing always-on VPN/Proxy to monitor traffic, alerting on large data uploads to known public LLMs, and alerting on training data moving from approved locations.

Understanding Organizational AI Initiatives

  • Build Informed Policies: The business-engaged CISO should begin by understanding current and anticipated uses of AI. The goal is to outline and approve policies covering data privacy, ethical use and compliance with regulations. The CISO will not solely be responsible for these policies but must be a leader and stakeholder.
  • Collaborate and Integrate Security into the AI pipeline: Following a model similar to their engagement with traditional Development Teams, the CISO should work closely with AI stakeholders for early consideration of AI security policies.

 

Consuming Publicly Available AI

  • Train and Spread Awareness: The CISO should review and update the Security Awareness program with simple guidance and resources for all employees to understand policy and risks with the use of AI and large language models. This includes training employees on how to use AI technologies responsibly and securely. CIOSs should add “deep fake” content to existing phishing awareness training.
  • Identify Uses: Gather uses through inquiry then verify and hunt using existing tools such as CASB and EDR to gather data on public AI endpoint traffic volume and the sources generating the traffic.

 

Protecting Internally Developed AI

  • Protect Models: Internally developed AI models can quickly become “crown jewels” of the organization and the same policies and protections that are placed on existing high target data should also be placed on AI models.
  • Use “Traditional” Controls: Use existing data loss prevention and identity and access management controls to prevent and alert when unauthorized attempts or anomalous access to the training data or models occur.

 

Enhancing Incident Response

  • Role of Purple Teams: Conduct purple teams that include AI-generated payloads and scripts to understand the effectiveness and limits of controls.
  • New TTX Injects: Include AI in the next tabletop exercise scenario and begin to generate an incident response plan for security incidents (deep fakes, data loss, AI model poisoning, etc.)

 

Enhancing Detection and Response

  • Increase SOC Efficiency: Boost SOC agility by integrating AI for rapid SIEM and EDR content prototyping for emerging threats. Use modern SOAR to show analysts full contextual information to enable faster triage and mitigation.
  • Reduce Attack Surface: Promote fewer vulnerabilities to production using AI SAST tooling to identify security flaws in source code.

 

Potential Risks with AI

There are some clear risks that the CISO should also convey to the organization as using public AI becomes more ingrained into the culture:

IP and Content Risks

  • Potential Intellectual property (IP) or contractual issues may arise given the lack of approvals necessary to use the data to train or develop the AI models.
  • AI may be generating misleading and harmful content caused by low-quality data used to train generative AI models.

Data Loss and Tampering Risks

  • AI models create the risk of lost client or internal data as they are entered into the models. We don’t yet know how the public models treat input data and how/if the queries are stored with security in mind.
  • Vulnerable to “poisoning” if the data it uses as input is tampered with.

Usage Risks

  • Many AI models lack transparency in how they make decisions.
  • They can create misinformation and other harmful content. Humans may then pass off AI-generated “hallucinations” or incorrect responses as fact.

AI-Focused Alerts and Blocks for Consideration

RULE TITLE DESCRIPTION LOG/ALERT/BLOCK PURPLE TEAM TEST CASE
High Volume of Data Uploaded to Public LLM An employee or contractor uploads a large volume of data to a public LLM such as ChatGPT. ALERT – High (DLP) Attempt to upload a large data set to an approved Public LLM.
Attempt to Access Blocked Public LLM An employee or contractor attempts to gain access to a public LLM such as ChatGPT that has been blocked. LOG – Informational (SIEM) Attempt to access a blocked Public LLM using a corporate asset.
Block Unauthorized Public LLM Block unauthorized Public LLMs at the firewall and/or proxy. BLOCK (Firewall / Proxy) Attempt to access a blocked Public LLM using a corporate asset.
LLM Training Data has Sensitive Information Code Repository Scanning tool identifies sensitive data sets within code repositories. The sensitive data sets should be stored outside of the code repository. ALERT – High (Code Repository Scanner) Insert benign sensitive data into a code repository.
LLM Training Data Potentially Tampered Code Repository Scanning tool identifies a change to the training data used for internal LLMs outside of the change control window. ALERT – Medium (Code Repository Scanner) Make a benign change to the training data set outside of approved change control windows.
Copilot Returned Sensitive Information in Query Response A Copilot response to an employee or contractor query returns sensitive data such as PII or passwords. ALERT – Medium (SIEM) Design and execute a query that returns sensitive benign data from a fake data set.

AI Poll Results

SRA AI Security Quick Poll – Sample Size: 28
Percent of organizations blocking public LLMs at the firewall
Toolsets in use at organizations specifically designed to address LLM net new threats
AI will continue to be part of how organizations conduct business. Like other recent new technologies and adoptions like cloud, protecting the business processes and the systems that support AI will need to be part of the cybersecurity plan.

Chris Salerno

Chris Salerno
Archive

Chris leads SRA’s 24x7 CyberSOC services.  His background is in cybersecurity strategy based on NIST CSF, red and purple teams, improving network defenses, technical penetration testing and web applications.

Prior to shifting his focus to defense and secops, he led hundreds of penetration tests and security assessments and brings that deep expertise to the blue team.

Chris has been a distinguished speaker at BlackHat Arsenal, RSA, B-Sides and SecureWorld.

Prior to Security Risk Advisors, Chris was the lead penetration tester for a Big4 security practice.