Getting Started 2: How to Load and Visualize Data for Cyber Threat Intelligence Analysis
To make this step more engaging, we’ll use a dataset on State-Sponsored Cyber Operations. This dataset provides an opportunity to apply EDA techniques to real-world, security-related data, helping to uncover patterns and relationships that may inform further modeling or research.
Tutorial Objectives
By the end of this tutorial, you will be able to:
Load, clean, and manipulate data using Pandas DataFrames
Perform numerical operations and transformations using NumPy
Create informative visualizations using Matplotlib and Seaborn
Notebook Setup and Visualization Configuration
Key Configuration Commands
%matplotlib inlineEnsures that all matplotlib plots are displayed directly within the notebook cells.%config InlineBackend.figure_formatControls the output format of plots. Different formats are suited for different use cases:
Format |
Description |
When to Use |
|---|---|---|
|
High-resolution (2× DPI) PNG output |
Ideal for high-quality visuals. |
|
Standard-resolution PNG output |
Suitable for general use; faster rendering and smaller output size. |
|
Scalable Vector Graphics (vector-based) |
Best for publication-quality plots that scale without losing quality; ideal for LaTeX or web embedding. |
|
Portable Document Format (vector-based) |
Useful for exporting plots to reports; less common in notebooks. |
|
Compressed raster format |
Rarely used for plots due to lossy compression; generally not recommended for analysis. |
[356]:
# Render our plots inline
%matplotlib inline
%config InlineBackend.figure_format = 'svg'
[357]:
# Import requiered packages
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
Modifying the Style of Graphs
[358]:
# Modifying the style of the graphs
plt.style.use(['seaborn-v0_8-talk', 'seaborn-v0_8-ticks', 'seaborn-v0_8-whitegrid'])
Load the Data
pd.read_csv function has many options; you can explore them in the official documentation.Features in the Dataset
Title: Title/Summary of the cyber operation.
Date: Date when the cyber operation was conducted.
Affiliations: Affiliations behind the cyber operation, if known.
Descriptions: Description of the cyber operation conducted.
Response: Response of the suspects behind the cyber operation, if any.
Victims: Victims that were targeted by the cyber operation, if known.
Sponsor: The suspected state sponsor of the cyber operation.
Type: Type of cyber operation conducted.
Category: Demographic category of the cyber operation’s victims.
Sources_1: Source that reported the cyber operation.
Sources_2: Source #2 that reported the cyber operation, if any.
Sources_3: Source #3 that reported the cyber operation, if any.
[359]:
df = pd.read_csv(
'cyber-operations-incidents.csv',
delimiter=',', # Define a custom separator (default is ',')
header=0, # Row to use as column names
index_col=None, # Column to use as row labels
na_values=['NA'], # Values to treat as missing
encoding='utf-8' # File encoding
)
Notes:
delimiterlets you specify the character separating fields (default is comma,).headerindicates which row to treat as column names.index_colallows using a column as the DataFrame index.na_valuesdefines strings to interpret as missing data (NaN).Always ensure the encoding matches your file to avoid read errors.
Data Inspection and Basic Statistics
Before performing detailed analysis, it is important to inspect the dataset and understand its structure.
Preview the first 5 rows of the dataset:
[360]:
df.head()
[360]:
| Title | Date | Affiliations | Description | Response | Victims | Sponsor | Type | Category | Sources_1 | Sources_2 | Sources_3 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Targeting of dissidents and opposition politic... | 1/20/2022 | Believed to be the work of Ghana's government | The Israeli spyware firm NSO Group sold Pegasu... | Denial https://www.haaretz.com/israel-news/t... | Political dissidents and opposition leaders in... | Ghana | Espionage | Civil society | https://www.haaretz.com/israel-news/tech-news/... | NaN | NaN |
| 1 | Targeting of Ukrainian state entities | 3/7/2022 | UNC1151 | The Belarusian threat actor UNC1151 targeted m... | NaN | Ukrainian state entities | Belarus | Espionage | Government | https://cert.gov.ua/article/37626 | NaN | NaN |
| 2 | Targeting of Russian and Belarusian websites | 5/4/2022 | Ukrainian IT Army | Ukraine?? IT Army compromised Docker Engine ho... | NaN | Russian and Belarusian websites were targeted,... | Ukraine | Denial of service | Government, Private sector, Civil society | https://www.crowdstrike.com/blog/compromised-d... | NaN | NaN |
| 3 | Targeting of the Belgian Federal Public Servic... | 7/18/2022 | Zirconium | Chinese threat actors targeted the Belgian Fed... | Denouncement https://diplomatie.belgium.be/e... | Belgian Federal Public Service Interior | China | Espionage | Government | https://diplomatie.belgium.be/en/news/declarat... | https://www.bleepingcomputer.com/news/security... | NaN |
| 4 | Targeting of Ukrainian government agencies | 8/11/2022 | Gamaredon | The Computer Emergency Response Team of Ukrain... | Confirmation | Ukrainian government agencies | Russian Federation | Espionage | Government | https://cert.gov.ua/article/2681855 | NaN | NaN |
Preview the last 2 rows of the dataset:
[361]:
df.tail(2)
[361]:
| Title | Date | Affiliations | Description | Response | Victims | Sponsor | Type | Category | Sources_1 | Sources_2 | Sources_3 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 766 | Trisis | NaN | Also known as Triton and Xenotime | This threat actor targets the Triconex safety ... | Confirmation https://www.justice.gov/opa/pr/... | Saudi Arabia | Russian Federation | Sabotage | Private sector | https://www.fireeye.com/blog/threat-research/2... | https://www.fireeye.com/blog/threat-research/2... | https://dragos.com/blog/trisis/TRISIS-01.pdf |
| 767 | Targeting of the Belgian Federal Public Servic... | NaN | APT 30 | Chinese threat actors targeted the Belgian Fed... | Denouncement https://diplomatie.belgium.be/e... | Belgian Federal Public Service Interior | China | Espionage | Government | https://diplomatie.belgium.be/en/news/declarat... | https://www.bleepingcomputer.com/news/security... | NaN |
[362]:
df[3:7]
[362]:
| Title | Date | Affiliations | Description | Response | Victims | Sponsor | Type | Category | Sources_1 | Sources_2 | Sources_3 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3 | Targeting of the Belgian Federal Public Servic... | 7/18/2022 | Zirconium | Chinese threat actors targeted the Belgian Fed... | Denouncement https://diplomatie.belgium.be/e... | Belgian Federal Public Service Interior | China | Espionage | Government | https://diplomatie.belgium.be/en/news/declarat... | https://www.bleepingcomputer.com/news/security... | NaN |
| 4 | Targeting of Ukrainian government agencies | 8/11/2022 | Gamaredon | The Computer Emergency Response Team of Ukrain... | Confirmation | Ukrainian government agencies | Russian Federation | Espionage | Government | https://cert.gov.ua/article/2681855 | NaN | NaN |
| 5 | Targeting of Palestinian entities and activists | 2/2/2022 | APT-C-23 | The Hamas-linked threat actor APT-C-23 used ma... | NaN | NaN | NaN | Espionage | Civil society | https://www.cyberscoop.com/palestinian-targete... | https://blog.talosintelligence.com/2022/02/ari... | NaN |
| 6 | Targeting of government agencies in the Europe... | 4/5/2022 | Gamaredon | Russian threat actor Gamaredon targeted Ukrain... | Confirmation https://cert.gov.ua/article/39086 | Government agencies of EU member states | Russian Federation | Espionage | Government | https://www.bleepingcomputer.com/news/security... | NaN | NaN |
"Title", "Date", and "Sponsor":[363]:
df.loc[3:7, ["Title", "Date", "Sponsor"]]
[363]:
| Title | Date | Sponsor | |
|---|---|---|---|
| 3 | Targeting of the Belgian Federal Public Servic... | 7/18/2022 | China |
| 4 | Targeting of Ukrainian government agencies | 8/11/2022 | Russian Federation |
| 5 | Targeting of Palestinian entities and activists | 2/2/2022 | NaN |
| 6 | Targeting of government agencies in the Europe... | 4/5/2022 | Russian Federation |
| 7 | Targeting of Rostec | 5/19/2022 | China |
[364]:
df.iloc[[0,1,3,4,6], 0:3]
[364]:
| Title | Date | Affiliations | |
|---|---|---|---|
| 0 | Targeting of dissidents and opposition politic... | 1/20/2022 | Believed to be the work of Ghana's government |
| 1 | Targeting of Ukrainian state entities | 3/7/2022 | UNC1151 |
| 3 | Targeting of the Belgian Federal Public Servic... | 7/18/2022 | Zirconium |
| 4 | Targeting of Ukrainian government agencies | 8/11/2022 | Gamaredon |
| 6 | Targeting of government agencies in the Europe... | 4/5/2022 | Gamaredon |
Obtain a concise summary of the dataset, including column names, non-null counts, and data types:
[365]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 768 entries, 0 to 767
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Title 768 non-null object
1 Date 753 non-null object
2 Affiliations 621 non-null object
3 Description 768 non-null object
4 Response 186 non-null object
5 Victims 741 non-null object
6 Sponsor 728 non-null object
7 Type 731 non-null object
8 Category 743 non-null object
9 Sources_1 763 non-null object
10 Sources_2 488 non-null object
11 Sources_3 190 non-null object
dtypes: object(12)
memory usage: 72.1+ KB
Generate descriptive statistics for all numeric columns, such as mean, standard deviation, minimum, and maximum values:
[366]:
df.describe()
[366]:
| Title | Date | Affiliations | Description | Response | Victims | Sponsor | Type | Category | Sources_1 | Sources_2 | Sources_3 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 768 | 753 | 621 | 768 | 186 | 741 | 728 | 731 | 743 | 763 | 488 | 190 |
| unique | 764 | 603 | 377 | 767 | 128 | 657 | 61 | 12 | 35 | 715 | 470 | 186 |
| top | Targeting of Southeast Asian telecommunication... | 10/4/2018 | Lazarus Group | Chinese threat actors targeted the Belgian Fed... | Unknown | United States | China | Espionage | Private sector | https://blogs.microsoft.com/on-the-issues/2020... | https://www.bleepingcomputer.com/news/security... | https://www.fireeye.com/content/dam/fireeye-ww... |
| freq | 2 | 6 | 26 | 2 | 47 | 21 | 263 | 608 | 199 | 5 | 3 | 2 |
The dataset contains 768 rows and 12 columns:
[367]:
df.shape
[367]:
(768, 12)
[368]:
# Number of rows/observations - the data numerosity
df.shape[0]
[368]:
768
[369]:
# Number of features/column/attributes - the data dimensionality
df.shape[1]
[369]:
12
These commands provide a first overview of the data, helping identify missing values, data types, and key summary statistics.
Data Cleaning and Preparation
We can list all columns and check their data types using:
[370]:
df.columns
[370]:
Index(['Title', 'Date', 'Affiliations', 'Description', 'Response', 'Victims',
'Sponsor', 'Type', 'Category', 'Sources_1', 'Sources_2', 'Sources_3'],
dtype='object')
[371]:
df.dtypes
[371]:
Title object
Date object
Affiliations object
Description object
Response object
Victims object
Sponsor object
Type object
Category object
Sources_1 object
Sources_2 object
Sources_3 object
dtype: object
- Columns with the data type
objectare strings. - Some columns, such as dates, are better represented in a proper datetime format.
To convert strings to datetime, we can use
pd.to_datetime. Before doing so, it is important to understand Python date parsing conventions. See the official documentation for reference.Relevant formatting codes include:
%m– Month as a zero-padded decimal number%d– Day of the month as a zero-padded decimal number%Y– Year with century as a decimal number
[372]:
df["Date"] = pd.to_datetime(df["Date"], format="%m/%d/%Y")
df.dtypes
[372]:
Title object
Date datetime64[ns]
Affiliations object
Description object
Response object
Victims object
Sponsor object
Type object
Category object
Sources_1 object
Sources_2 object
Sources_3 object
dtype: object
[373]:
df.head(2)
[373]:
| Title | Date | Affiliations | Description | Response | Victims | Sponsor | Type | Category | Sources_1 | Sources_2 | Sources_3 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Targeting of dissidents and opposition politic... | 2022-01-20 | Believed to be the work of Ghana's government | The Israeli spyware firm NSO Group sold Pegasu... | Denial https://www.haaretz.com/israel-news/t... | Political dissidents and opposition leaders in... | Ghana | Espionage | Civil society | https://www.haaretz.com/israel-news/tech-news/... | NaN | NaN |
| 1 | Targeting of Ukrainian state entities | 2022-03-07 | UNC1151 | The Belarusian threat actor UNC1151 targeted m... | NaN | Ukrainian state entities | Belarus | Espionage | Government | https://cert.gov.ua/article/37626 | NaN | NaN |
Display all rows in the DataFrame that are identified as duplicates based on all columns:
[374]:
df[df.duplicated()==True]
[374]:
| Title | Date | Affiliations | Description | Response | Victims | Sponsor | Type | Category | Sources_1 | Sources_2 | Sources_3 |
|---|
Display all rows where duplicate values occur in the “Description” column:
[375]:
df.loc[df.duplicated(subset=['Description'], keep=False)]
[375]:
| Title | Date | Affiliations | Description | Response | Victims | Sponsor | Type | Category | Sources_1 | Sources_2 | Sources_3 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3 | Targeting of the Belgian Federal Public Servic... | 2022-07-18 | Zirconium | Chinese threat actors targeted the Belgian Fed... | Denouncement https://diplomatie.belgium.be/e... | Belgian Federal Public Service Interior | China | Espionage | Government | https://diplomatie.belgium.be/en/news/declarat... | https://www.bleepingcomputer.com/news/security... | NaN |
| 767 | Targeting of the Belgian Federal Public Servic... | NaT | APT 30 | Chinese threat actors targeted the Belgian Fed... | Denouncement https://diplomatie.belgium.be/e... | Belgian Federal Public Service Interior | China | Espionage | Government | https://diplomatie.belgium.be/en/news/declarat... | https://www.bleepingcomputer.com/news/security... | NaN |
Boolean Indexing
[376]:
# Define date range for 2022-2023
start_date = pd.Timestamp('2022-01-01')
end_date = pd.Timestamp('2023-12-31')
# Boolean condition
boolean_condition = (df['Sponsor'] == 'Belarus') & (df['Date'] >= start_date) & (df['Date'] <= end_date)
# Filtered DataFrame
df_filtered = df[boolean_condition]
df_filtered.shape[0] # Number of rows in the filtered DataFrame
[376]:
2
Aggregation
Aggregation allows us to summarize data by groups, which is useful to identify patterns and trends in cyber operations.
For example, we can count the number of cyber operations conducted by each sponsor
[377]:
# Count the number of cyber operations per sponsor
df.groupby("Sponsor", as_index=False).size().sort_values(by='size', ascending=False)
[377]:
| Sponsor | size | |
|---|---|---|
| 6 | China | 263 |
| 37 | Russian Federation | 155 |
| 20 | Iran (Islamic Republic of) | 98 |
| 26 | Korea (Democratic People's Republic of) | 79 |
| 53 | United States | 12 |
| ... | ... | ... |
| 25 | Kazakhstan | 1 |
| 28 | Lebanon | 1 |
| 31 | Myanmar | 1 |
| 32 | Netherlands | 1 |
| 30 | Morocco | 1 |
61 rows × 2 columns
Explanation:
groupby("Sponsor")groups the data by theSponsorcolumn..size()counts the number of rows in each group, representing the number of operations..sort_values(by='size', ascending=False)sorts the sponsors from most to least active.
Sum the number of victims per sponsor to identify which sponsors have caused the greatest impact:
[378]:
# Sum the number of victims per sponsor
df.groupby("Sponsor", as_index=False)["Victims"].sum().sort_values(by="Victims", ascending=False)
[378]:
| Sponsor | Victims | |
|---|---|---|
| 55 | United States, Australia, Canada, New Zealand,... | Yandex |
| 27 | Korea (Republic of) | World Health OrganizationNorth Koreans and Nor... |
| 59 | Uzbekistan | Uzbek activists and dissidents |
| 28 | Lebanon | United States, Canada, Germany, Lebanon, France |
| 3 | Belarus | Ukrainian state entitiesOver seventy Ukrainian... |
| ... | ... | ... |
| 21 | Israel | An activist in IsraelMayors and other local of... |
| 52 | United Kingdom, United States | Algeria, Afghanistan, Belgium, Brazil, Fiji, G... |
| 20 | Iran (Islamic Republic of) | Albanian government networksVMware Horizon ser... |
| 39 | Saudi Arabia, United Arab Emirates | Al Jazeera employees |
| 0 | United Kingdom" | Israel |
61 rows × 2 columns
Individual Variables Exploration
Display the number of records per year by extracting the year component from the Date column and counting its occurrences, sorted in chronological order:
[379]:
df['Date'].dt.year.value_counts().sort_index()
[379]:
Date
2005.0 1
2006.0 3
2007.0 10
2008.0 7
2009.0 4
2010.0 9
2011.0 15
2012.0 12
2013.0 18
2014.0 32
2015.0 34
2016.0 33
2017.0 46
2018.0 80
2019.0 74
2020.0 120
2021.0 108
2022.0 147
Name: count, dtype: int64
Visualize the trend of cyber operations over time by plotting the number of recorded cyber events per year:
[380]:
# Extract number of cyber operations per year
cyber_per_year = df['Date'].dt.year.value_counts().sort_index()
# Set plot style
sns.set(style="whitegrid")
# Create the plot
plt.figure(figsize=(10,6))
sns.lineplot(x=cyber_per_year.index.astype(int), y=cyber_per_year.values, marker="o")
plt.title("Number of Cyber Events per Year", fontsize=16)
plt.xlabel("Year", fontsize=12)
plt.ylabel("Cyber Operations", fontsize=12)
plt.xticks(cyber_per_year.index.astype(int)) # Ensure x-axis shows integer ticks
plt.grid(True)
plt.show()
Display the 15 most frequent affiliations involved in cyber operations, excluding entries labeled as Unknown, to identify the primary actors appearing in the dataset:
[381]:
aff_counts = df[df.Affiliations!='Unknown'].Affiliations.value_counts()
aff_counts.nlargest(15)
[381]:
Affiliations
Lazarus Group 26
APT 28 15
Believed to be the work of APT 28 15
Sandworm 15
Winnti Umbrella 10
Mustang Panda 10
Gamaredon 10
Ocean Lotus 9
MuddyWater 9
Kimsuky 9
Charming Kitten 8
Believed to be the work of?APT 28 7
Turla 7
Believed to be the work of Russia's government 6
APT 10 6
Name: count, dtype: int64
Visualize the top 15 affiliations involved in cyber operations using a bar chart:
[382]:
top_10_aff = aff_counts.nlargest(15)
plt.figure()
ax = sns.barplot(x=top_10_aff.values, y=top_10_aff.index)
ax.bar_label(ax.containers[0])
ax.set_title('Top 10 Affiliations')
plt.ylabel('Affiliation')
plt.show()
Display the top 15 state sponsors associated with cyber operations in the dataset:
[383]:
top_15_sponsors = df[df.Sponsor!='Unknown'].Sponsor.value_counts().nlargest(15)
plt.figure()
ax = sns.barplot(x=top_15_sponsors.values, y=top_15_sponsors.index)
ax.bar_label(ax.containers[0])
ax.set_title('Top 15 Sponsors')
plt.ylabel('Sponsor')
plt.show()
Exercises
Answer the following Questions about the Dataset
Exercise 1: Distribution of Operation Types
Investigate the distribution of Type of state-sponsored cyber operations in the dataset.
[384]:
# TODO: Step 1 - Remove invalid or unknown Type entries
# Hint: Filter out invalid rows
[385]:
# TODO: Step 2 - Create the countplot
# Hint: Use sns.countplot() with the `order` parameter set to value_counts().index
[386]:
# TODO: Step 3 - Annotate counts above each bar
# Hint: Use ax.bar_label(ax.containers[0])
[387]:
# TODO: Step 4 - Set title and rotate x-axis labels
# show the plot
Exercise 2: Top 10 Affiliations
Some affiliations refer to the same group in different ways (e.g., “APT 28”). Normalize all variations containing “APT 28” into a single value, then display the top 10 affiliations using a barplot.
[388]:
# TODO: Step 1 - Normalize "APT 28" variations
# Hint: Some entries refer to the same group differently, e.g.,
# 'APT 28', 'Believed to be the work of APT 28', 'Believed to be the work of?APT 28'
[389]:
# TODO: Step 2 - Filter out 'Unknown' affiliations
[390]:
# TODO: Step 3 - Count affiliations and get top 10
[391]:
# TODO: Step 4 - Create a barplot of the top 10 affiliations
Exercise 3: APT 28 Cyber Attacks Over Time
Analyze the activity of APT 28 by plotting the number of cyber operations attributed to this group over the years 2010–2020.
[392]:
# TODO: Step 1 - Filter the dataset for all rows where 'Affiliations' contains "APT 28"
# Hint: Use str.contains() with case=False and na=False
[393]:
# TODO: Step 2 - Extract the year from the 'Date' column
# Hint: Use .dt.year
[394]:
# TODO: Step 3 - Sort the counts by year
[395]:
# TODO: Step 4 - Create a line plot showing the number of attacks per year
Solution - Exercise 1: Distribution of Operation Types
First, we analyze the current missing values. There are 37 NaN entries in the Type column:
[396]:
# Step 1 - Remove invalid or unknown Type entries
# count NaN values in each column
df.isna().sum()
[396]:
Title 0
Date 15
Affiliations 147
Description 0
Response 582
Victims 27
Sponsor 40
Type 37
Category 25
Sources_1 5
Sources_2 280
Sources_3 578
dtype: int64
[397]:
# Delete rows with NaN in the 'Type' column
df.dropna(subset=['Type'], inplace=True)
# Verify that all NaNs in 'Type' are removed
df.isna().sum()
[397]:
Title 0
Date 11
Affiliations 135
Description 0
Response 549
Victims 18
Sponsor 35
Type 0
Category 12
Sources_1 4
Sources_2 270
Sources_3 552
dtype: int64
We analyze all current values in the Type column and notice that some entries contain invalid data, such as URLs.
[398]:
# show unique values in 'Type' column
df.Type.unique()
[398]:
array(['Espionage', 'Denial of service', 'Data destruction',
'Financial Theft', 'Sabotage',
'https://blog.talosintelligence.com/2022/03/transparent-tribe-new-campaign.html',
'Defacement',
'https://www.direkt36.hu/en/putyin-hekkerei-is-latjak-a-magyar-kulugy-titkait-az-orban-kormany-evek-ota-nem-birja-elharitani-oket/',
'Private sector', 'Government', 'Government, Military', 'Doxing'],
dtype=object)
[399]:
# Filter out rows where 'Type' is a URL using regular expression
df = df[~df['Type'].str.contains(r'^https?://', na=False)].reset_index(drop=True)
# show unique values in 'Type' column
df.Type.unique()
[399]:
array(['Espionage', 'Denial of service', 'Data destruction',
'Financial Theft', 'Sabotage', 'Defacement', 'Private sector',
'Government', 'Government, Military', 'Doxing'], dtype=object)
Now we observe that there are two invalid categories in the Type column. Therefore, we need to delete the rows containing 'Government' and 'Government, Military' types.
[400]:
# List of types to remove
types_to_remove = [
'Government',
'Government, Military'
]
# Filter the DataFrame
df = df[~df['Type'].isin(types_to_remove)].reset_index(drop=True)
df.Type.unique()
[400]:
array(['Espionage', 'Denial of service', 'Data destruction',
'Financial Theft', 'Sabotage', 'Defacement', 'Private sector',
'Doxing'], dtype=object)
Create the plot.
[401]:
# Step 2 - Create the countplot
# Hint: Use sns.countplot() with the `order` parameter set to value_counts().index
plt.figure(figsize=(12,6))
ax = sns.countplot(data=df, x='Type', order=df['Type'].value_counts().index)
ax.set_title('Distribution of Operation Types', fontsize=16)
# Step 3 - Annotate counts above each bar
# Hint: Use ax.bar_label(ax.containers[0])
ax.bar_label(ax.containers[0])
# Step 4 - Set title and rotate x-axis labels
# show the plot
plt.xticks(rotation=45)
plt.show()
Solution - Exercise 2: Top 10 Affiliations
We start by printing all unique values in the Affiliations column.
[402]:
# Step 1 - Normalize "APT 28" variations
# Hint: Some entries refer to the same group differently, e.g.,
# 'APT 28', 'Believed to be the work of APT 28', 'Believed to be the work of?APT 28'
df.Affiliations.unique()
[402]:
array(["Believed to be the work of Ghana's government", 'UNC1151',
'Ukrainian IT Army', 'Zirconium', 'Gamaredon', 'APT-C-23',
'Believed to be the work of the Chinese government',
'Believed to be the work of the Iranian government',
'Lazarus Group', 'APT 35',
'Believed to be the work of the Canadian government',
"Believed to be the work of Israel's government.",
'Believed to be the work of the Russian government', 'Gallium',
'Mirage', 'Believed to be the work of the Jordanian government',
'Lucky Cat', 'APT 40', 'Sandworm', 'APT 37',
"Believed to be the work of Israel's government", 'APT 28', nan,
'Believed to be the work of the Iranian government.', 'APT 18',
'Ocean Lotus', 'Mustang Panda',
'Believed to be the work of the government of the Russian Federation.',
'APT 10', 'Kimsuky', 'Charming Kitten',
"Believed to be the work of China's government.",
'National Security Agency, Tailored Access Operations', 'APT 5',
'Believed to be the work of the Spanish government',
'Believed to be the work of the government of Thailand.',
'Believed to be the work of Hamas.',
"Believed to be the work of China's government", 'APT 36',
"Believed to be the work of Palestine's government",
'Iran, APT 35',
'Believed to be the work of the Chinese government\r\n?',
'Confucius', 'Winnti Umbrella',
"Believed to be the work of Russia's government", 'Konni Group',
'Emissary Panda', 'Patchwork', 'APT 41', 'Turla', 'Poison Carp',
'Hafnium', 'Siamesekitten, OilRig',
"Believed to be the work of El Salvador's government.",
'The Dukes', 'Prince of Persia', 'Red Menshen', 'MuddyWater',
'Space Pirates', 'Thrip', 'UNC1151?',
"Believed to be the work of Bahrain's government",
'ModifiedElephant, Pune Police', 'Also known as Space Pirates',
'Believed to be the work of the Mexican government', 'APT 34',
'Believed to be the work of the United Arab Emirates government',
'Also known as PLA Unit 69010.', 'TAG-28', 'Russian Federation',
'APT37', 'China', 'UNC215', 'BlackTech', 'RedFoxTrot', 'Cozy Bear',
'TAG-22, Winnti Umbrella', 'Nobelium', 'Aggah, Gorgon Group',
'TA453, Charming Kitten', 'Domestic Kitten',
'APT 41, Barium, TA415', 'Lazarus Group?',
'APT 31, Hurricane Panda', 'Calypso',
'Law enforcement agencies in the United States, Canada, France, Germany, Lithuania, the Netherlands, Ukraine, and United Kingdom',
'HoneyMyte, Mustang Panda, Bronze President',
'North Korean government', 'Nobelium, APT 29', 'Pakistan',
'Federal Bureau of Investigation, Bulgarian National Investigative Service',
'APT 34, OilRig', 'Tortoiseshell, ?Charming Kitten', 'Sharp Panda',
'APT 37, ScarCruft, Stardust Chollima', 'Emennet Pasargad?',
'Belarus, Ghostwriter.', 'Comment Crew', 'Fancy Bear, APT 28',
'Siamesekitten, Lyceum', 'Chinese government', 'Philippine Army',
'MysterySnail', 'Primitive Bear, Gamaredon',
'Judgement Panda, APT 31',
'Also known as?Lyceum. The group has also been connected to OilRig.',
'IndigoZebra', 'South Sudanese government', 'Tick',
'Naikon, APT 30, Deadringer', 'Charming Kitten?',
'Russian government', 'TA428, Wekby, Dynamite Panda',
'Winnti Umbrella, SparklingGoblin',
'NSA, U.S. Cyber Command, FBI, and partner agencies in several unknown countries.?',
'Emirati government', 'Soft Cell, GALLIUM, Deadringer',
'Believed to be the work of the Saudi Arabian government.',
'Cozy Bear, APT29', 'Winnti Umbrella, Grayfly',
"Believed to be the work of the Islamic Republic of Iran's government",
'Static Kitten', 'Tonto Team or TA428', 'APT 31, Judgement Panda',
'PLA Unit 69010, RedFoxTrot', 'Mustang Panda, Bronze President',
'Cozy Bear, APT 29', 'APT 36, Mythic Leopard',
'Charming Kitten, Phosphorous',
'Believed to be the work of the Polish government.', 'Israel',
'Saudi and Emirati governments', 'APT 30',
'Believed to be the work of the Chinese government.?',
'Government of China', 'Unknown, suspected Bronze Butler.',
'Iranian government',
'Believed to be the work of the Russian government.?',
'Energetic Bear', 'Also known as APT 31 and Judgement Panda.',
'Zinc', 'Believed to be the work of Togolese security forces.?',
'DarkHotel', 'Greenbug', 'Cerium', 'Lotus Blossom',
'Cobalt Dickens', 'Pirate Panda', 'Also known as Parasite',
'Fox Kitten', 'Also known as Parasite, UNC757, and Fox Kitten.',
'Government of Saudi Arabia', 'Hidden Cobra', 'Pioneer Kitten',
'KINGDOM', 'Syrian Electronic Army', 'KilllSomeOne',
'APT 34, possibly APT 33/APT 39.', 'Cozy Bear,?APT 29',
'U.S. Cyber Command', 'Tonto Team',
'Australian Signals Directorate', 'APT 33',
'Believed to be Lazarus Group', 'Bronze Butler',
'Guangdong State Security Department (GSSD) of China?? Ministry of State Security (MSS)',
'Vicious Panda', 'Crouching Yeti', 'Government of Turkey',
'Ghostwriter', 'Believed to be the work of Fancy Bear.',
'Russian military intelligence agency, known as GRU', 'PKplug',
'Unnamed state-sponsored APT',
'Believed to be the work of APT 10.', 'The Lazarus Group',
'Five Eyes',
'Believed to be the work of Chinese state-sponsored hackers.',
'Phosphorus', 'Government of Egypt', 'Iran', 'Rancor', 'Unknown',
'Also known as Earth Longzhi.',
'Believed to be the work of Leviathan.',
'Believed to be the work of Phosphoros, also known as APT 35',
'Government of Uzbkeistan', 'Government of the United States',
'Government of Morocco', 'Hong Kong police',
'Believed to be the work of Russian and Chinese hackers.',
'Believed to be the work of APT 10 or the Jiangsu Province Ministry of State Security.',
'the Dukes', 'China?? Ministry of State Security',
'Believed to be the work of at least two unnamed APT groups, including Evil Eye',
'APT 3', 'Believed to be the work of Turbine Panda', 'APT 20',
'APTs associated with Beijing',
'Suspected to be the work of Iranian actors.',
'Government of the UAE', 'Iranian Revolutionary Guard',
'Suspected to be the work of the Dukes.?',
'Believed to be the work of Cobalt Dickens.', 'Wicked Panda',
'Believed to be the work of Poison Carp.',
'APT 17 and Bronze Butler', 'Believed to be the work of Winnti.',
'Bronze President',
'Believed to be the work of the UAE government.',
'Also known as Evil Eye', 'Golden Falcon',
'Also known as Billbug and Lotus Blossom.',
'Believed to be associated with the Lazarus Group',
'Believed to be the work of Crouching Yeti',
'Believed to be the work of APT 28',
'Believed to be associated with Sandworm. Also known as VPNFilter.',
'Believed to use similar infrastructure as the NetTraveler, Icefog, and APT 17 threat actors',
'Believed to be the work of Whitefly.?',
'Believed to be the work of TempTick?and Turla',
'Believed to use similar tools as DragonOK',
"China's Jiangsu Ministry of State Security",
'Believed to be the work of Turla',
'Believed to be the work of APT 10',
'Also known as Phosphoros and Newscaster Team',
'Believed to be the work of the Lazarus Group',
'Suspected to be the work of the Lazarus Group.',
"China's Tianjin State Security Bureau",
'Believed to be the work of Lazarus Group',
'Suspected to be the work of Chinese military hackers.',
'Believed to be the work of Leviathan',
'Believed to be associated with the Axiom, APT 17, and Mirage threat actors. Believed to share the same tools and infrastructure as the threat actors that carried out Operation Aurora, the 2015 targeting of video game companies, the 2015 targeting of the Thai government, and the 2017 targeting of Chinese-language news websites',
'Believed to use similar tools as the Lazarus Group',
'Possibly linked to an Iran-based group that uses the Shamoon virus to target oil and gas entities with a connection to Saudi Arabia.',
'Also known as Transparent Tribe and C-Major',
'Believed to be the work of?APT 28',
'Believed to be linked to Crouching Yeti',
'Believed to be the work of Gamaredon.',
'Believed to be linked to the Crouching Yeti threat actor',
'Suspected to be the work of North Korean authorities.',
'Also known as Bronze President and?HoneyMyte.?Also known as RedDelta.',
"Linked to Turbine Panda and China's Ministry of State Security",
'Believed to be the work of the same threat actors behind the alleged Russian compromise of networking equipment',
'Believed to be the work of the Mexican government.',
'Believed to be the work of Russian hacker group Snake.',
'Believed to be the work of Russian state-sponsored group Dragonfly, also known as Energetic Bear',
'Believed to be the work of the same actors behind the compromise of the 2018 Winter Olympics',
'Suspected to be the work of Chinese operatives working for the Ministry of State Security.',
'Believed to be the work of Sandworm.',
'Also known as the Silent Librarian. Believed to be associated with the Mabna Institute.',
'Believed to be associated with the U.S. targeting of the Islamic State Group.',
'Believed to be the work of a Russian hacking group.',
'Believed to be the work of a subset of the Lazarus Group known as Bluenoroff',
'Also known as TEMP.Zagros, Seedworm and Static Kitten.?',
'Believed to be the work of APT 28.?',
"Also known as TEMP.Periscope, Leviathan, and Mudcarp.?Believed to be behind the compromise of Cambodia?? election organizations?and the Target of universities' maritime military secrets.",
'Possibly linked to Sandworm.', 'Possibly linked to APT 30',
'Also known as Tick and RedBaldKnight',
'Also known as Cloud Hopper, Red Apollo, CNVX, Stone Panda, MenuPass, and POTASSIUM.?Believed to be a part of the Tianjin bureau of the Chinese Ministry of State Security.',
'Believed to be the work of the same threat actors who used the?Duqu 2.0 tool.?',
'Possibly linked to OilRig', 'Possibly linked to NetTraveler',
'Believed to be the work of Winnti Umbrella',
'Believed to be the work of the Lazarus Group.',
'Also known as Magnallium and Elfin. Suspected to be linked to the Shamoon malware attacks in 2018.',
'Also known as the Lamberts',
'Also known as Cobalt Gypsy and possibly associated with Rocket Kitten',
'Associated with Turla', 'Believed to be the work of the Dukes',
'Believed to have been partially the work of Chrysene',
"This incident is believed to be the work of APT 28. The Dukes were also found to be in the Democratic National Committee's networks.",
'Also known as Twisted Kitten and Crambus; possibly related to Magic Hound,?APT 34, and Chrysene.?Believed to use similar tools and infrastructure as Chafer.? The group has also been connected to SiameseKitten.',
'Believed to be associated with the United Kingdom?? cyber operations against the Islamic State, revealed in 2018.',
'Used an evolution of the tools used by Sandworm.', 'Indonesia',
'Believed to be the work of?Winnti Umbrella',
'Same state sponsor as the 2016 breach',
'Believed to be responsible for Dark Seoul, Ten Days of Rain, the Sony Pictures Entertainment attack,?the SWIFT-related bank heists,?WannaCry, and?Operation GhostSecret. Believed to use the same tools as Covellite.?Known to the U.S. government as Hidden Cobra.?Also known as BlueNoroff.',
'Possibly linked to Icefog?and the Dark Seoul compromise',
'Believed to be the work of the Turla group',
'Same state sponsor as the 2014 breach', 'Also known as Strider',
'Possible link to Sandworm.',
'Also known as Naikon, PLA Unit 78020, Lotus Panda. Possibly responsible for the JadeRAT tool.?',
'Believed to be the work of Deep Panda',
'Also known as the Gamaredon Group and?Primitive Bear.',
'Also known as APT 32',
'Associated with Duqu. Believed to have been the tool used to compromise Kaspersky Labs.',
'Believed to use similar tools and infrastructure as OilRig',
'Also known as UPS, Gothic Panda, Buckeye, and Boyusec',
'The government of Uganda',
'Also known as TG-3390, APT 27, TEMP.Hippo, and?LuckyMouse.?Believed to have been responsible for the compromise of a Mongolian data center.',
'Suspected to be the work of APT 28',
'Also known as Deputy Dog and Axiom.?Believed to be associated with Winnti Umbrella',
'Possibly?linked to?Saffron Rose',
'Also known as Tilded Team, it is associated with Flame, Stuxnet, Regin, Gauss, and Duqu.',
'Suspected to be the work of the Dukes',
'Suspected to be the work of Deep Panda',
'Possibly uses the same infrastructure as?Mirage',
'Believed to be the work of the same actors behind Dark Seoul?and possibly linked to Onion Dog',
'This threat actor is believed to have been responsible for the Japan Pension Service incident. It is also known as Emdivi?and Cloudy Omega.',
'Possibly attributed to APT 28',
'Believed to be the work of Saffron Rose',
'Linked to Blue Termite',
'Also known as Charming Kitten, APT 35, Newscaster, Ajax Security Team, Phosphorus, Group 83, and TA453. Possibly linked to Rocket Kitten',
'Uses similar tools as Moafee?and Rancor, possibly indicating a link between the threat actors',
'Uses similar tools as DragonOK,?possibly indicating a link between the two',
'Also known as Spring Dragon and Eslie',
'Also known as Energetic Bear, Dragonfly, Group 24, and Koala Team. Possibly associated with Allanite and Dymalloy.?Believed to be behind the targeting of critical infrastructure entities in the United States and Germany, prompting both countries to issue technical alerts',
'Turla is also known as Snake, Venomous Bear, Uroburos, Group 88, Waterbug, and Turla Team. Estonian intelligence services?associate this group with the Russian federal security service (FSB). Turla is associated with Agent.btz?and believed to be behind the RUAG espionage incident, an attempted compromise of the Swiss defense ministry,?the targeting of diplomatic missions in Eastern Europe, the targeting of South Korean actors prior to the meeting of U.S. President Donald J. Trump and North Korean leader Kim Jong-un, and the compromise of the German?and Finnish foreign ministries.?',
'Believed to be closely linked to, though distinct from, Winnti.',
'Also known as Voodoo Bear, and Electrum. Believed to be responsible for the 2008 DDoS attacks in Georgia and the 2015?Ukraine power grid outage.?Code used by this threat actor was also found in the alleged Russian compromise of networking equipment. There is possible overlap between Black Energy and the threat actor behind the compromise of the 2018 Winter Olympics.',
'Also known as Dynamite Panda, TG-0416, and TA428',
'Believed to be the work of APT 18',
'Also known as APT 2, PLA Unit 61486, and TG-6952',
'Related to the compromise at the Office of Personnel Management',
'Also known as IXESHE, Numbered Panda, and Group 22',
'Also known as Mask and Ugly Face', 'Animal Farm',
'Believed to be the work of the Equation Group',
'Also known as Flying Kitten, Sayad, Ajax Security Team, and Group 26',
'Also known as Cutting Kitten and Group 41',
'Believed to be responsible for the attempted compromise of U.S. think tanks',
'Mirage is possibly linked to the Hellsing?threat actor. It is also known as Vixen Panda, GREF, APT 15, and Playful Dragon.?Believed to be associated with Winnti Umbrella',
'Also known as APT 29, Cozy Bear, Dark Halo, Nobelium, and Cloaked Ursa.?Estonian intelligence services associate this group with the Russian Federal Security Service (FSB) and Foreign Intelligence Service (SVR).',
'Also known as APT 21',
'Also known as JS Sykipot, Backdoor Sykipot, and Maverick Panda',
'Also known as Dagger Panda and possibly linked to Onion Dog',
'Also known as Shell Crew, Web Masters, APT 19, Kung Fu Kittens, Black Vine, Temp.Avengers, and Group 13',
'Also known as Operation Troy and Dark Seoul',
'Also known as APT 1, Comment Crew, Comment Panda, TG-8223, Group 3, GIF89a, and Byzantine Candor',
'Has links to the Shadow Network?and?SabPub. Also known as TA413.',
'A variant of Lucky Cat for Mac computers',
'This tool is believed to be the work of the Equation Group. The tool Gauss?is believed to be based on the Flame platform.',
'Also known as Operation Ababil',
'Believed to be based on the Flame platform and possibly the work of the Equation Group',
'Believed to be the work of PLA Unit 61398',
'Possibly assisted in Operation Aurora, the RSA incident, and the Joint Strike Fighter Program compromise',
'This is believed be a tool set used by the Equation Group. It is related to Stuxnet?and Flame.',
'Believed to be a tool used by PLA Unit 61398',
'Possibly linked to the Nitro incident',
'Possibly linked to the Mitsubishi Heavy Industries incident.',
'This incident is believed to be the work of Unit 61398 and at least two other Chinese groups, according to the New York Times. The threat actor Sneaky Panda may also have assisted.',
'Believed to be the work of PLA Unit 61398 with possible assistance from Sneaky Panda.?Believed to be associated with Winnti Umbrella',
'Also known as Manganese',
'Codenamed ??lympic Games??and believed to be the work of the Equation Group',
'Has links to Lucky Cat and GhostNet',
'Sneaky Panda is among some of the actors suspected to be behind this incident.',
'Believed to be linked to Sandworm.',
'Believed?to be partially the work of PLA Unit 61398.', 'Antlion',
'Believed to be the work of MuddyWater.',
'Believed to be responsible for the targeting of employees of companies that operate U.S. power plants. Also believed to be linked to Crouching Yeti',
'Believed to be the work of the Ugandan government.', 'RedAlpha',
'Also known as Triton and Xenotime'], dtype=object)
Affiliations column is NaN to ensure we only work with valid entries.[403]:
# import re for regular expressions
import re
# Drop rows where 'Affiliations' is NaN
df = df.dropna(subset=['Affiliations']).reset_index(drop=True)
# Now you can normalize APT 28
def normalize_apt28(x):
if re.search(r'\bAPT\s*28\b', x, flags=re.IGNORECASE):
return 'APT 28'
return x
df['Affiliations'] = df['Affiliations'].apply(normalize_apt28)
df.Affiliations.unique()
[403]:
array(["Believed to be the work of Ghana's government", 'UNC1151',
'Ukrainian IT Army', 'Zirconium', 'Gamaredon', 'APT-C-23',
'Believed to be the work of the Chinese government',
'Believed to be the work of the Iranian government',
'Lazarus Group', 'APT 35',
'Believed to be the work of the Canadian government',
"Believed to be the work of Israel's government.",
'Believed to be the work of the Russian government', 'Gallium',
'Mirage', 'Believed to be the work of the Jordanian government',
'Lucky Cat', 'APT 40', 'Sandworm', 'APT 37',
"Believed to be the work of Israel's government", 'APT 28',
'Believed to be the work of the Iranian government.', 'APT 18',
'Ocean Lotus', 'Mustang Panda',
'Believed to be the work of the government of the Russian Federation.',
'APT 10', 'Kimsuky', 'Charming Kitten',
"Believed to be the work of China's government.",
'National Security Agency, Tailored Access Operations', 'APT 5',
'Believed to be the work of the Spanish government',
'Believed to be the work of the government of Thailand.',
'Believed to be the work of Hamas.',
"Believed to be the work of China's government", 'APT 36',
"Believed to be the work of Palestine's government",
'Iran, APT 35',
'Believed to be the work of the Chinese government\r\n?',
'Confucius', 'Winnti Umbrella',
"Believed to be the work of Russia's government", 'Konni Group',
'Emissary Panda', 'Patchwork', 'APT 41', 'Turla', 'Poison Carp',
'Hafnium', 'Siamesekitten, OilRig',
"Believed to be the work of El Salvador's government.",
'The Dukes', 'Prince of Persia', 'Red Menshen', 'MuddyWater',
'Space Pirates', 'Thrip', 'UNC1151?',
"Believed to be the work of Bahrain's government",
'ModifiedElephant, Pune Police', 'Also known as Space Pirates',
'Believed to be the work of the Mexican government', 'APT 34',
'Believed to be the work of the United Arab Emirates government',
'Also known as PLA Unit 69010.', 'TAG-28', 'Russian Federation',
'APT37', 'China', 'UNC215', 'BlackTech', 'RedFoxTrot', 'Cozy Bear',
'TAG-22, Winnti Umbrella', 'Nobelium', 'Aggah, Gorgon Group',
'TA453, Charming Kitten', 'Domestic Kitten',
'APT 41, Barium, TA415', 'Lazarus Group?',
'APT 31, Hurricane Panda', 'Calypso',
'Law enforcement agencies in the United States, Canada, France, Germany, Lithuania, the Netherlands, Ukraine, and United Kingdom',
'HoneyMyte, Mustang Panda, Bronze President',
'North Korean government', 'Nobelium, APT 29', 'Pakistan',
'Federal Bureau of Investigation, Bulgarian National Investigative Service',
'APT 34, OilRig', 'Tortoiseshell, ?Charming Kitten', 'Sharp Panda',
'APT 37, ScarCruft, Stardust Chollima', 'Emennet Pasargad?',
'Belarus, Ghostwriter.', 'Comment Crew', 'Siamesekitten, Lyceum',
'Chinese government', 'Philippine Army', 'MysterySnail',
'Primitive Bear, Gamaredon', 'Judgement Panda, APT 31',
'Also known as?Lyceum. The group has also been connected to OilRig.',
'IndigoZebra', 'South Sudanese government', 'Tick',
'Naikon, APT 30, Deadringer', 'Charming Kitten?',
'Russian government', 'TA428, Wekby, Dynamite Panda',
'Winnti Umbrella, SparklingGoblin',
'NSA, U.S. Cyber Command, FBI, and partner agencies in several unknown countries.?',
'Emirati government', 'Soft Cell, GALLIUM, Deadringer',
'Believed to be the work of the Saudi Arabian government.',
'Cozy Bear, APT29', 'Winnti Umbrella, Grayfly',
"Believed to be the work of the Islamic Republic of Iran's government",
'Static Kitten', 'Tonto Team or TA428', 'APT 31, Judgement Panda',
'PLA Unit 69010, RedFoxTrot', 'Mustang Panda, Bronze President',
'Cozy Bear, APT 29', 'APT 36, Mythic Leopard',
'Charming Kitten, Phosphorous',
'Believed to be the work of the Polish government.', 'Israel',
'Saudi and Emirati governments', 'APT 30',
'Believed to be the work of the Chinese government.?',
'Government of China', 'Unknown, suspected Bronze Butler.',
'Iranian government',
'Believed to be the work of the Russian government.?',
'Energetic Bear', 'Also known as APT 31 and Judgement Panda.',
'Zinc', 'Believed to be the work of Togolese security forces.?',
'DarkHotel', 'Greenbug', 'Cerium', 'Lotus Blossom',
'Cobalt Dickens', 'Pirate Panda', 'Also known as Parasite',
'Fox Kitten', 'Also known as Parasite, UNC757, and Fox Kitten.',
'Government of Saudi Arabia', 'Hidden Cobra', 'Pioneer Kitten',
'KINGDOM', 'Syrian Electronic Army', 'KilllSomeOne',
'APT 34, possibly APT 33/APT 39.', 'Cozy Bear,?APT 29',
'U.S. Cyber Command', 'Tonto Team',
'Australian Signals Directorate', 'APT 33',
'Believed to be Lazarus Group', 'Bronze Butler',
'Guangdong State Security Department (GSSD) of China?? Ministry of State Security (MSS)',
'Vicious Panda', 'Crouching Yeti', 'Government of Turkey',
'Ghostwriter', 'Believed to be the work of Fancy Bear.',
'Russian military intelligence agency, known as GRU', 'PKplug',
'Unnamed state-sponsored APT',
'Believed to be the work of APT 10.', 'The Lazarus Group',
'Five Eyes',
'Believed to be the work of Chinese state-sponsored hackers.',
'Phosphorus', 'Government of Egypt', 'Iran', 'Rancor', 'Unknown',
'Also known as Earth Longzhi.',
'Believed to be the work of Leviathan.',
'Believed to be the work of Phosphoros, also known as APT 35',
'Government of Uzbkeistan', 'Government of the United States',
'Government of Morocco', 'Hong Kong police',
'Believed to be the work of Russian and Chinese hackers.',
'Believed to be the work of APT 10 or the Jiangsu Province Ministry of State Security.',
'the Dukes', 'China?? Ministry of State Security',
'Believed to be the work of at least two unnamed APT groups, including Evil Eye',
'APT 3', 'Believed to be the work of Turbine Panda', 'APT 20',
'APTs associated with Beijing',
'Suspected to be the work of Iranian actors.',
'Government of the UAE', 'Iranian Revolutionary Guard',
'Suspected to be the work of the Dukes.?',
'Believed to be the work of Cobalt Dickens.', 'Wicked Panda',
'Believed to be the work of Poison Carp.',
'APT 17 and Bronze Butler', 'Believed to be the work of Winnti.',
'Bronze President',
'Believed to be the work of the UAE government.',
'Also known as Evil Eye', 'Golden Falcon',
'Also known as Billbug and Lotus Blossom.',
'Believed to be associated with the Lazarus Group',
'Believed to be the work of Crouching Yeti',
'Believed to be associated with Sandworm. Also known as VPNFilter.',
'Believed to use similar infrastructure as the NetTraveler, Icefog, and APT 17 threat actors',
'Believed to be the work of Whitefly.?',
'Believed to be the work of TempTick?and Turla',
'Believed to use similar tools as DragonOK',
"China's Jiangsu Ministry of State Security",
'Believed to be the work of Turla',
'Believed to be the work of APT 10',
'Also known as Phosphoros and Newscaster Team',
'Believed to be the work of the Lazarus Group',
'Suspected to be the work of the Lazarus Group.',
"China's Tianjin State Security Bureau",
'Believed to be the work of Lazarus Group',
'Suspected to be the work of Chinese military hackers.',
'Believed to be the work of Leviathan',
'Believed to be associated with the Axiom, APT 17, and Mirage threat actors. Believed to share the same tools and infrastructure as the threat actors that carried out Operation Aurora, the 2015 targeting of video game companies, the 2015 targeting of the Thai government, and the 2017 targeting of Chinese-language news websites',
'Believed to use similar tools as the Lazarus Group',
'Possibly linked to an Iran-based group that uses the Shamoon virus to target oil and gas entities with a connection to Saudi Arabia.',
'Also known as Transparent Tribe and C-Major',
'Believed to be linked to Crouching Yeti',
'Believed to be the work of Gamaredon.',
'Believed to be linked to the Crouching Yeti threat actor',
'Suspected to be the work of North Korean authorities.',
'Also known as Bronze President and?HoneyMyte.?Also known as RedDelta.',
"Linked to Turbine Panda and China's Ministry of State Security",
'Believed to be the work of the same threat actors behind the alleged Russian compromise of networking equipment',
'Believed to be the work of the Mexican government.',
'Believed to be the work of Russian hacker group Snake.',
'Believed to be the work of Russian state-sponsored group Dragonfly, also known as Energetic Bear',
'Believed to be the work of the same actors behind the compromise of the 2018 Winter Olympics',
'Suspected to be the work of Chinese operatives working for the Ministry of State Security.',
'Believed to be the work of Sandworm.',
'Also known as the Silent Librarian. Believed to be associated with the Mabna Institute.',
'Believed to be associated with the U.S. targeting of the Islamic State Group.',
'Believed to be the work of a Russian hacking group.',
'Believed to be the work of a subset of the Lazarus Group known as Bluenoroff',
'Also known as TEMP.Zagros, Seedworm and Static Kitten.?',
"Also known as TEMP.Periscope, Leviathan, and Mudcarp.?Believed to be behind the compromise of Cambodia?? election organizations?and the Target of universities' maritime military secrets.",
'Possibly linked to Sandworm.', 'Possibly linked to APT 30',
'Also known as Tick and RedBaldKnight',
'Also known as Cloud Hopper, Red Apollo, CNVX, Stone Panda, MenuPass, and POTASSIUM.?Believed to be a part of the Tianjin bureau of the Chinese Ministry of State Security.',
'Believed to be the work of the same threat actors who used the?Duqu 2.0 tool.?',
'Possibly linked to OilRig', 'Possibly linked to NetTraveler',
'Believed to be the work of Winnti Umbrella',
'Believed to be the work of the Lazarus Group.',
'Also known as Magnallium and Elfin. Suspected to be linked to the Shamoon malware attacks in 2018.',
'Also known as the Lamberts',
'Also known as Cobalt Gypsy and possibly associated with Rocket Kitten',
'Associated with Turla', 'Believed to be the work of the Dukes',
'Believed to have been partially the work of Chrysene',
'Also known as Twisted Kitten and Crambus; possibly related to Magic Hound,?APT 34, and Chrysene.?Believed to use similar tools and infrastructure as Chafer.? The group has also been connected to SiameseKitten.',
'Believed to be associated with the United Kingdom?? cyber operations against the Islamic State, revealed in 2018.',
'Used an evolution of the tools used by Sandworm.', 'Indonesia',
'Believed to be the work of?Winnti Umbrella',
'Same state sponsor as the 2016 breach',
'Believed to be responsible for Dark Seoul, Ten Days of Rain, the Sony Pictures Entertainment attack,?the SWIFT-related bank heists,?WannaCry, and?Operation GhostSecret. Believed to use the same tools as Covellite.?Known to the U.S. government as Hidden Cobra.?Also known as BlueNoroff.',
'Possibly linked to Icefog?and the Dark Seoul compromise',
'Believed to be the work of the Turla group',
'Same state sponsor as the 2014 breach', 'Also known as Strider',
'Possible link to Sandworm.',
'Also known as Naikon, PLA Unit 78020, Lotus Panda. Possibly responsible for the JadeRAT tool.?',
'Believed to be the work of Deep Panda',
'Also known as the Gamaredon Group and?Primitive Bear.',
'Also known as APT 32',
'Associated with Duqu. Believed to have been the tool used to compromise Kaspersky Labs.',
'Believed to use similar tools and infrastructure as OilRig',
'Also known as UPS, Gothic Panda, Buckeye, and Boyusec',
'The government of Uganda',
'Also known as TG-3390, APT 27, TEMP.Hippo, and?LuckyMouse.?Believed to have been responsible for the compromise of a Mongolian data center.',
'Also known as Deputy Dog and Axiom.?Believed to be associated with Winnti Umbrella',
'Possibly?linked to?Saffron Rose',
'Also known as Tilded Team, it is associated with Flame, Stuxnet, Regin, Gauss, and Duqu.',
'Suspected to be the work of the Dukes',
'Suspected to be the work of Deep Panda',
'Possibly uses the same infrastructure as?Mirage',
'Believed to be the work of the same actors behind Dark Seoul?and possibly linked to Onion Dog',
'This threat actor is believed to have been responsible for the Japan Pension Service incident. It is also known as Emdivi?and Cloudy Omega.',
'Believed to be the work of Saffron Rose',
'Linked to Blue Termite',
'Also known as Charming Kitten, APT 35, Newscaster, Ajax Security Team, Phosphorus, Group 83, and TA453. Possibly linked to Rocket Kitten',
'Uses similar tools as Moafee?and Rancor, possibly indicating a link between the threat actors',
'Uses similar tools as DragonOK,?possibly indicating a link between the two',
'Also known as Spring Dragon and Eslie',
'Also known as Energetic Bear, Dragonfly, Group 24, and Koala Team. Possibly associated with Allanite and Dymalloy.?Believed to be behind the targeting of critical infrastructure entities in the United States and Germany, prompting both countries to issue technical alerts',
'Turla is also known as Snake, Venomous Bear, Uroburos, Group 88, Waterbug, and Turla Team. Estonian intelligence services?associate this group with the Russian federal security service (FSB). Turla is associated with Agent.btz?and believed to be behind the RUAG espionage incident, an attempted compromise of the Swiss defense ministry,?the targeting of diplomatic missions in Eastern Europe, the targeting of South Korean actors prior to the meeting of U.S. President Donald J. Trump and North Korean leader Kim Jong-un, and the compromise of the German?and Finnish foreign ministries.?',
'Believed to be closely linked to, though distinct from, Winnti.',
'Also known as Voodoo Bear, and Electrum. Believed to be responsible for the 2008 DDoS attacks in Georgia and the 2015?Ukraine power grid outage.?Code used by this threat actor was also found in the alleged Russian compromise of networking equipment. There is possible overlap between Black Energy and the threat actor behind the compromise of the 2018 Winter Olympics.',
'Also known as Dynamite Panda, TG-0416, and TA428',
'Believed to be the work of APT 18',
'Also known as APT 2, PLA Unit 61486, and TG-6952',
'Related to the compromise at the Office of Personnel Management',
'Also known as IXESHE, Numbered Panda, and Group 22',
'Also known as Mask and Ugly Face', 'Animal Farm',
'Believed to be the work of the Equation Group',
'Also known as Flying Kitten, Sayad, Ajax Security Team, and Group 26',
'Also known as Cutting Kitten and Group 41',
'Believed to be responsible for the attempted compromise of U.S. think tanks',
'Mirage is possibly linked to the Hellsing?threat actor. It is also known as Vixen Panda, GREF, APT 15, and Playful Dragon.?Believed to be associated with Winnti Umbrella',
'Also known as APT 29, Cozy Bear, Dark Halo, Nobelium, and Cloaked Ursa.?Estonian intelligence services associate this group with the Russian Federal Security Service (FSB) and Foreign Intelligence Service (SVR).',
'Also known as APT 21',
'Also known as JS Sykipot, Backdoor Sykipot, and Maverick Panda',
'Also known as Dagger Panda and possibly linked to Onion Dog',
'Also known as Shell Crew, Web Masters, APT 19, Kung Fu Kittens, Black Vine, Temp.Avengers, and Group 13',
'Also known as Operation Troy and Dark Seoul',
'Also known as APT 1, Comment Crew, Comment Panda, TG-8223, Group 3, GIF89a, and Byzantine Candor',
'Has links to the Shadow Network?and?SabPub. Also known as TA413.',
'A variant of Lucky Cat for Mac computers',
'This tool is believed to be the work of the Equation Group. The tool Gauss?is believed to be based on the Flame platform.',
'Also known as Operation Ababil',
'Believed to be based on the Flame platform and possibly the work of the Equation Group',
'Believed to be the work of PLA Unit 61398',
'Possibly assisted in Operation Aurora, the RSA incident, and the Joint Strike Fighter Program compromise',
'This is believed be a tool set used by the Equation Group. It is related to Stuxnet?and Flame.',
'Believed to be a tool used by PLA Unit 61398',
'Possibly linked to the Nitro incident',
'Possibly linked to the Mitsubishi Heavy Industries incident.',
'This incident is believed to be the work of Unit 61398 and at least two other Chinese groups, according to the New York Times. The threat actor Sneaky Panda may also have assisted.',
'Believed to be the work of PLA Unit 61398 with possible assistance from Sneaky Panda.?Believed to be associated with Winnti Umbrella',
'Also known as Manganese',
'Codenamed ??lympic Games??and believed to be the work of the Equation Group',
'Has links to Lucky Cat and GhostNet',
'Sneaky Panda is among some of the actors suspected to be behind this incident.',
'Believed to be linked to Sandworm.',
'Believed?to be partially the work of PLA Unit 61398.', 'Antlion',
'Believed to be the work of MuddyWater.',
'Believed to be responsible for the targeting of employees of companies that operate U.S. power plants. Also believed to be linked to Crouching Yeti',
'Believed to be the work of the Ugandan government.', 'RedAlpha',
'Also known as Triton and Xenotime'], dtype=object)
Affiliations column contain each known alias of APT 28, including “Fancy Bear”, “Sofacy”, “Pawn Storm”, and “Sednit”.[404]:
# Count rows where 'Affiliations' contains 'Fancy Bear' (case-insensitive)
count_fancy_bear = df['Affiliations'].str.contains('Fancy Bear', case=False, na=False).sum()
# Sofacy
count_sofacy = df['Affiliations'].str.contains('Sofacy', case=False, na=False).sum()
# Pawn Storm
count_pawn_storm = df['Affiliations'].str.contains('Pawn Storm', case=False, na=False).sum()
# r'Sednit'
count_sednit = df['Affiliations'].str.contains('Sednit', case=False, na=False).sum()
print(f"Number of rows containing 'Fancy Bear': {count_fancy_bear}")
print(f"Number of rows containing 'Sofacy': {count_sofacy}")
print(f"Number of rows containing 'Pawn Storm': {count_pawn_storm}")
print(f"Number of rows containing 'Sednit': {count_sednit}")
Number of rows containing 'Fancy Bear': 2
Number of rows containing 'Sofacy': 0
Number of rows containing 'Pawn Storm': 0
Number of rows containing 'Sednit': 0
[405]:
df.loc[df['Affiliations'].str.contains('Fancy Bear', case=False, na=False), 'Affiliations'] = 'APT 28'
[406]:
# TODO: Step 2 - Filter out 'Unknown' affiliations
df = df[df['Affiliations'] != 'Unknown'].reset_index(drop=True)
[407]:
# TODO: Step 3 - Count affiliations and get top 10
# Hint: Use value_counts().nlargest(10)
top_10_aff = df['Affiliations'].value_counts().nlargest(10)
[408]:
# TODO: Step 4 - Create a barplot of the top 10 affiliations
plt.figure(figsize=(10,6))
ax = sns.barplot(x=top_10_aff.values, y=top_10_aff.index)
ax.bar_label(ax.containers[0])
ax.set_title('Top 10 Affiliations')
plt.ylabel('Affiliation')
plt.show()
Solution - Exercise 3: APT 28 Cyber Attacks Over Time
[ ]:
# Step 1 - Filter the dataset for all rows where 'Affiliations' contains "APT 28"
# Hint: Use str.contains() with case=False and na=False
mask_apt28 = df['Affiliations'].str.contains('APT 28', case=False, na=False)
df_apt28 = df[mask_apt28].reset_index(drop=True)
[ ]:
# Step 2 - Extract the year from the 'Date' column
# Hint: Use .dt.year
df_apt28['Year'] = df_apt28['Date'].dt.year
[ ]:
# Step 3 - Sort the counts by year
cyber_per_year_apt28 = df_apt28['Year'].value_counts().sort_index()
[ ]:
# Step 4 - Create a line plot showing the number of attacks per year
plt.figure(figsize=(10,6))
sns.lineplot(x=cyber_per_year_apt28.index.astype(int), y=cyber_per_year_apt28.values, marker="o")
plt.title("Number of APT 28 Cyber Events per Year", fontsize=16)
plt.xlabel("Year", fontsize=12)
plt.ylabel("Cyber Operations", fontsize=12)
plt.xticks(cyber_per_year_apt28.index.astype(int)) # Ensure x-axis shows integer ticks
plt.grid(True)
plt.show()
The observed peak in APT 28 activity around 2016–2019 could be related to the deployment of the malware X-Agent, which has been linked to the group during that period.
Conclusion
In this tutorial, we learned how to load, clean, and visualize data using Python. Through Exploratory Data Analysis (EDA), we explored trends in cyber operations over time, identified the most active affiliations, sponsors, and targeted sectors, and handled inconsistent or missing data. Using plots and charts, we visualized insights from the dataset, providing a solid foundation for more advanced analyses in machine learning and cybersecurity.