Reforming the Security Clearance Process and Modernizing the Trusted Workforce for the 21st Century
On April 24, 2019, President Trump signed the Executive Order on Transferring Responsibility for Background Investigations to the Department of Defense, noting that the Secretary of Defense “shall design, develop, deploy, operate, secure, defend, and continuously update and modernize, as necessary, information technology systems that support all personnel vetting processes….”
The Administration has called the clearance process a target for government reform, noting in 2018 that “background investigations are critical to enabling national security missions and ensuring public trust in the workforce across the Government.”
The Administration’s efforts are part of an ongoing focus on reforming the clearance process, and reducing the existing backlog. That backlog peaked to 725,000 in 2018, with some Americans waiting more than 500 days just to start their first day at work.
In addition to the Executive Order, Senator Mark Warner (D-VA), re-introduced his legislation, The Modernizing the Trusted Workforce for the 21st Century Act (S.314) in February 2019. It outlines a plan that would constitute the largest overhaul of the security clearance process since its creation in 1947. The bill outlines specific and ambitious targets to not only reduce the approximate backlog of investigations, but also significantly reduce the length of investigations.
S.314 requires a plan to reduce the investigative backlog to 200,000 by the end of 2020, reduce issuance of secret level clearance to thirty days or fewer, and issuance of top secret level within 90 days.
Additionally S.314 would establish a “clearance in person” concept, which would enable clearances to follow an individual who changed agencies through “reciprocally recognized” clearances.
According to the legislation, reciprocity of security clearances at the same level should be recognized in two weeks or fewer. To achieve this goal and those previously stated, the bill calls for monitoring tools that look at an individual’s risk profile on a dynamic ongoing basis.
The legislation allows individuals who once held a clearance to have continued access to classified information (CI) with a lapsed clearance for up to three years if they voluntarily enroll in a continuous evaluation (CE) program. While the previous duration was two years, an additional year would significantly increase the amount of people in continuous evaluation programs.
Hence, leveraging technology to achieve the goals stipulated in S.314 is an absolute necessity. A security clearance process that was initially developed nearly 50 years ago is anachronistic in a time when individuals are often better known by members of digital communities than their physical neighborhoods.
While investigators often refrain from interviewing neighbors who have never interacted with their person of interest (POI), individuals interacting with the POI online remain largely untapped. Furthermore, while interviews and record collection provides static data on an individual’s behavior, continuous collection and analysis of online data enables deeper insight into emerging behaviors.
AI and machine learning technologies provide a compelling solution set for detecting and analyzing emergent behavioral data, and insider threat risk. It can provide investigators with near real time information, allowing them to prioritize investigations and allocate investigatory resources. This paper will outline the critical need to leverage these technological approaches in developing a 21st Century security clearance process.
The shift from a clearance process bogged down by periodic re-investigations to a system of continuous evaluation will require automation and the ability to ingest, filter, and prioritize massive amounts of data from the open web.
Lumina’s Radiance system is an AI-driven risk analysis SaaS platform that is easily and rapidly deployed and can revolutionize the security clearance screening process through machine-learning automation. Radiance has two components, Open Source Intelligence (OS-INT) and Internet Intelligence, (NET-INT).
The Radiance OS-INT component provides continuous deep-web extraction, ingesting data containing names of persons, entities, or screen names from public sources executing over 324,000 queries for each name across all the major search engines and cross-referencing over 1,000,000 queries into Lumina’s proprietary databases of risk. The data is cleaned and prioritized by behavioral risk profiles (BRPs) that are configured against selectors for the Adjudicative Guidelines for Determining Security Clearance Eligibility. OS-INT returns prioritizes and actionable results in an average of 4-5 minutes. Similar results would take an individual running a manual web query more than 18 years to read and analyze.
Radiance’s NET-INT component identifies, catalogues, and monitors the research behavior of Internet protocol (IP) addresses exhibiting anomalous behavior across the globe. The platform collects and stores more than a million interactions every day and since its inception has recorded more than 623,000 IP addresses engaged with threat-related risk topics. Behavioral dimensions are configured to capture content relevant to client selectors provide pattern of life data through near real time behavioral analysis.
Radiance OS-INT is a scalable proprietary platform that is designed to overcome the challenges of massive data ingestion and processing unstructured data. Leveraging advanced machine intelligence, Radiance OS-INT enhances the speed and efficiency of source assessment and risk identification by identifying key words or sources associated with the client’s stipulated topic areas.
OS-INT uses data mining, artificial intelligence, and machine learning to gather intelligence across the entire Internet for all documents containing a given input, or key word(s). Key word(s) can be the name of an individual or entity as well as a social media handle, username, email address, location, or other keyword the user chooses.
Unlike social media monitoring, OS-INT is not reliant on a single platform or social media API, allowing for continuous ingestion of all open source data. OS-INT utilizes continuous deep-web extraction to ingest data from public sources. The volumes of publicly available electronic information (PAEI) are cleaned and prioritized, yielding relevant insights into high-risk individuals, entities, events, or sources by aggregating all the data scattered across the Internet and measuring it against configurable BRPs. BRPs are the text classification component of Radiance OS-INT, identifying a document in relation to a given topic area.
Behavioral Risk Profiles
A major obstacle to OS-INT SaaS solutions is noise, the inability to coalesce the massive amount of data to a consumable amount to inform a human analyst or investigator.
The adjudicative decision is a human decision and always should be. Hence data distillation, filtering the massive data trove to present only truly actionable intelligence is imperative. Radiance BRPs filter data into truly actionable intelligence needed to make informed decisions regarding a subject’s suitability for positions with access to national security information.
Radiance BRPs drastically outperform existing Natural Language Processing approaches to identify behavioral affinities in unstructured data. The technology performed at a 93.1 percent accuracy level on unstructured, lower-case or messy documents such as HTML, JSON or any file type compared to 0 percent for Stanford Named Entity Recognizer and other popular open-source named entity recognition software.
BRPs are proprietary and configurable data filters that capture online content associated with areas of risk or interest, such as the 13 Adjudicative Guidelines. Each BRP is a collection of selectors – terms, phrases and expressions – that are representative of the specific area of risk or interest. Keywords, including names of individuals, entities, usernames, social media handles, locations, etc., can be run through Radiance OS-INT and filtered (via text correlation) by BRPs. BRPs can be configured to the DDS’s selector criteria to identify sources in relation to various topics.
As mentioned above, BRPs are the topic areas of interest. BRPs are created either through subject matter expertise accompanied by extensive research or by leveraging supervised machine learning. The latter process utilizes a genetic algorithm for BRP creation as described below (Figure 1):
- Search and ingest every webpage and document on the entire Internet (or a desired part of the Internet) related to the chosen risk profile or topic of interest, such as each of the Adjudicative Guidelines.
- Tag content relevant to each of the 13 Adjudicative Guidelines and divide it into testing and training data sets.
- The training data teaches the algorithm to identify relevant content and the testing data provides an evaluation of a final model fit on the training data.
- This is an automated, continuous, “evolutionary” process that enhances the algorithm’s accuracy over time.
The BRPs generated are fully “human readable” and therefore are also auditable. Because of this it is possible for subject matter experts to read and audit BRPs if desired.
Current BRPS include the 13 Adjudicative Guidelines: Foreign Influence, Foreign Preference, Sexual behavior, personal conduct, financial considerations, alcohol consumption, drug involvement, psychological conditions, criminal conduct, handling protected information, outside activities, and use of Information Technology systems. Other use case BRPs also include information about drug misuse, financial crime and Know Your Customer (KYC), school shooting threats, suicide, workplace violence, and bribery and corruption.
Security Executive Agent Directive (SEAD) 5 of the Adjudicative Guidelines stipulates that only open source data, not data protected by passwords, private accounts, or otherwise accessible non-publically available data, is permissible for decision making.
The configuration of BRPs ensures that collection of such information adheres to SEAD 5 guidelines by only collecting publicly available information, within the scope of the investigation and does not use account creation or digital interaction with POIs,
Critically, this advanced SaaS platform can scrape the entire Internet not just social media. This capability contributes to the requirement that social media data be substantiated with other sources.
Social media data can often be dismissed due to lack of substantiation, but the Radiance platform can link social media data to corroborating data sources outright by connecting screen names to Personally Identifiable information (PII), or inform traditional investigatory processes for incorporation into adjudicative decision making.
Current AI and machine learning technologies have the ability to make SEAD 5, which has largely been aspirational, actionable. This is critical to modernizing the security clearance process to fit the realities of current information age.
Radiance NET-INT Overview
Most online activity is a user’s consumption of data. The amount of activity online which is commenting (reacting to original content), or generating original content is dwarfed by consumption of information online. What someone consumes online is a much stronger indicator of their behavior than their comments or writing that they know are exposed to open observation and evaluation.
A CE system that incorporates monitoring of POIs’ Internet research behavior has the ability to predict emergent behavior that can be indicative of violation of the adjudicative guidelines.
Radiance NET-INT is a one-of-a-kind, scalable, proprietary, fully deployed and operational platform that identifies IP addresses accessing content of interest to the client. The system catalogues and monitors the research behavior of the IP addresses.
Designed to overcome the challenges of massive data ingestion and of processing unstructured data, NET-INT monitors behavior across pre-categorized behavioral dimensions that can be customized to the client’s selectors and chained together to achieve sufficient topical coverage.
Currently, there are 43 pre-categorized behavioral dimensions, ranging from attack planning and cyber tactics, to radicalization and insider threat behavior. NET-INT may be configured to add behavioral dimensions directly related to the 13 Adjudicative Guidelines.
NET-INT identifies geographic areas where anomalous online behavior is originating. For IP addresses that can be linked to an individual or an organization, NET-INT can provide real-time insight on what said individual or institution is researching and, thus, what data streams are informing their actions. This can be a critical piece in supporting the decision-making process pertaining to security clearance evaluation.
NET-INT uses the massive amounts of data ingested to catalogue, index, and redeploy Internet content that comprises topics pertinent to the clients’ topics of interest. This collection of documents is updated, expanded, and redeployed through a machine learning process, expanding depth and breadth of coverage. NET-INT captures the pattern of life data of an IP address to identify and categorize the topical coverage of its online research activity. Anomalous behavior is prioritized for the analysis by statistical modeling that can be configured to prioritize key indicators selected by the clients. NET-INT captures a user’s IP address when the user clicks on the deployed content to access the associated web page, and aggregates these hits for each IP address. The platform presents the IP address, URL click (both live page and cashed), click dates, click times, approximate geo-location, and whether the captured IP address is associate with known VPS, Tor nodes, or other anonymizing techniques.
In delivering the scenarios shown in Figure 2, Lumina deploys the following additional proprietary and unique IP:
- Proprietary AI in the form of Unsupervised Machine Learning for Risk Detection;
- Advanced Proprietary Algorithm in the form of Distributed Injection of Internet Content; and
- Proprietary Data Sets in the form of the NET-INT Sampling of Internet Behavior.
NET-INT screens all IP addresses that touch an organization’s online infrastructure, such as an IP address from a user submitting a Form SF-86 online, against all IP addresses displaying anomalous behavior collected over the lifespan of the system.
A screening match exists when one of the IP addresses associated with an organization’s online infrastructure, or IP address associated with an entity or person of interest, is also present in the NET-INT database of at-risk IP addresses. Matches are prioritized based on behavioral driven risk scoring.
Artificial intelligence and machine learning will play a critical role in ending the security clearance backlog, reducing the amount of time required for an investigation and allowing for continuous, dynamic evaluation rather than periodic review.
The Radiance platform can help achieve these goals. It ingests, filters, and prioritizes massive amounts of data from the open web, providing investigators with near real time information, allowing them to prioritize investigations and allocate investigatory resources.
Radiance’s Open Source Intelligence (OS-INT) component provides continuous deep-web extraction, ingesting data containing names of persons, entities, or screennames from public sources executing over 324,000 queries for each name across all the major search engines and cross-referencing over 1,000,000 queries into Lumina’s proprietary databases of risk. The data is cleaned and prioritized by BRPs that are configured against selectors for the Adjudicative Guidelines for Determining Security Clearance Eligibility. OS-INT returns prioritizes and actionable results in an average of 4-5 minutes. Similar results would take an individual running a manual web query more than 18 years to read and analyze.
Radiance’s Internet Intelligence (NET-INT) component identifies, catalogues, and monitors the research behavior of Internet protocol (IP) addresses exhibiting anomalous behavior across the globe. The platform collects and stores more than a million interactions every day and since its inception has recorded more than 623,000 IP addresses engaged with threat-related risk topics. Behavioral dimensions are configured to capture content relevant to client selectors provide pattern of life data through near real time behavioral analysis.
Radiance and its AI and machine-learning capabilities can revolutionize the security clearance screening process, creating a 21st century system to ensure a trusted federal workforce.
 Executive Order on Transferring Responsibility for Background Investigations to the Department of Defense, April 24, 2019 https://www.whitehouse.gov/presidential-actions/executive-order-transferring-responsibility-background-investigations-department-defense/
 Delivering Government Solutions in the 21st Century, Reform Plan and Reorganization Recommendations, Executive Office of the President, June 2018 https://www.performance.gov/GovReform/Reform-and-Reorg-Plan-Final.pdf
 Aaron Boyd, “The Security Clearance Process Is About to Get Its Biggest Overhaul in 50 Years,” Nextgov, February 28, 2019, https://www.nextgov.com/cio-briefing/2019/02/security-clearance-process-about-get-its-biggest-overhaul-50-years/155229/.
Erica Fanning, “Four Steps to Fix the Security Clearance Backlog,” Defense One, December 11, 2018, https://www.defenseone.com/ideas/2018/12/four-steps-fix-security-clearance-backlog/153445/.
 Sen. Mark Warner, “Modernizing the Trusted Workforce for the 21st Century Act of 2019,” Pub. L. No. 3.314 (2019), https://www.congress.gov/bill/116th-congress/senate-bill/314/text?format=txt.
 Sen. Mark Warner, “Modernizing the Trusted Workforce for the 21st Century Act of 2019,” § 3.
 Sen. Mark Warner, “Modernizing the Trusted Workforce for the 21st Century Act of 2019,” § 5.
 Sen. Mark Warner, Modernizing the Trusted Workforce for the 21st Century Act of 2019, § 8.
 Erica Fanning, “Four Steps to Fix the Security Clearance Backlog.”
 Note: This analysis is based on average first page of a Google search engine results page containing approximately 1,890 words and average person reading 200-250 words a minute.
“Security Executive Agent Directive 5,” May 5, 2016, https://www.dni.gov/files/documents/Newsroom/Press%20Releases/SEAD5-12May2016.pdf.