Starting with version 4.0, Firegen collects a summary of each analysis, capturing relevant statistics that can be used to detect anomalies in several aspects of the data collected. The information collected can be accessed and analyzed using the “Stats” button on the main Firegen GUI. The Statistics window will provide a view of all the records and the ability to generate reports on detected anomalies:

Firegen Statistics Window

Firegen will collect the following statistics for the analyzed time interval (when available – some logs don’t record all the information that Firegen is able to collect):

– Connections – The number of TCP/IP connections
– Denials – The number of denials
– Warnings – The number warning messages
– URLs – The number of URLs accessed (not all logs contain this information)
– Message types – The number of distinct types of firewall messages (some firewalls may have just one or two types of messages)
– Traffic (in Mb) – The amount of traffic in megabytes
– For each hour in the analyzed interval: connections, denials and traffic

In total, there are 79 distinct streams of data that can be used to determine if the current analysis contains an anomaly. All the data is saved in XML format in the C:\Program Files\Altair Technologies\Firegen 4.0\configs\Firegen Statistics.xml file. Years worth of stats can be stored in under 1 Mb.

When an analysis is performed, the current set of data is tested against the historical records using a customized Multivariate Gaussian Distribution Probability algorithm. The base algorithm is based on the following formula:

The p function describes the probability of a set of data being an outlier, using a anomaly probability threshold (epsilon). Through normalization, several sets of data can be combined in order to compute the anomaly probability for that particular combination (i.e. connections and denials or connections and message types or even all combined).

The Multivariate Distribution Probability can be implemented both as supervised and unsupervised machine learning. The supervised learning is using a training set to estimate the anomaly threshold (epsilon). A supervised learning implementation is not practical for situation where the type of anomalies differ vastly from one user to another. In Firegen, we have implemented an unsupervised version of the algorithm, with a regression function use to estimate an epsilon to match the user’s estimated % of anomalies vs. normal data. The default is 1%, meaning that Firegen will calculate the anomaly threshold required to designate 1% of the analyzed records as “anomalies”.

Firegen provides the user with the ability to adjust the weight of each data set (i.e. denials may be more important than traffic or message types). A data set can be excluded alltogether from the anomaly computation.

Tweaking the anomaly threshold and the weight of of each data set allows for a customized anomaly criteria, to fit the organization’s monitoring requirements.

For example, when running the anomaly assessment for just the connections Firegen will create an HTML report with the list of anomalies:

No Date Anomaly Type Value Average Standard Deviation Anomaly Threshold
1 4/7/2016 Connections 568,725 111,120 35,248 3.07643333070155E-37
2 4/6/2016 Connections 530,577 111,120 35,248 2.16592060682371E-31

Graphically, a one dimension set of data for these anomalies would look as shown below (the graph is plotted in Python using the stats generated by Firegen – the script is available to Firegen customers, on request):

The two anomalies, from April 6 and 7, 2016 are clear outliers, with over 500,000 connections compared to the 111,000 average. In certain situations, such extreme outliers may skew the anomaly detection for “normal” outliers. For this reason, the user has the option to delete a particular record once its analysis has been completed. For example, if the 2 anomalies highlighted above are removed from the records, the firewall connections anomalies are no longer that extreme:

Performing such types of adjustments is optional, but allows the user to “teach” Firegen how to better estimate anomalies.

A similar report for Denials has the following report generated by Firegen:

No Date Anomaly Type Value Average Standard Deviation Anomaly Threshold
1 12/11/2016 Denials 36,923 9,351 6,154 5.4186813891671E-05
2 1/23/2016 Denials 38,047 9,351 6,154 2.35102901322701E-05
3 11/29/2015 Denials 29,965 9,351 6,154 0.00453225829447875
4 11/27/2015 Denials 30,200 9,351 6,154 0.0039851572259011
5 11/23/2015 Denials 31,595 9,351 6,154 0.00180201630091334
6 11/22/2015 Denials 32,479 9,351 6,154 0.00106116306871397
7 11/21/2015 Denials 31,172 9,351 6,154 0.00230480454316011
8 11/20/2015 Denials 42,424 9,351 6,154 6.62295230253028E-07
9 11/19/2015 Denials 37,115 9,351 6,154 4.70949957455379E-05
10 11/18/2015 Denials 46,040 9,351 6,154 2.36945409735921E-08
11 11/17/2015 Denials 40,470 9,351 6,154 3.46935261175945E-06
12 11/13/2015 Denials 42,195 9,351 6,154 8.08357056190902E-07
13 11/12/2015 Denials 37,553 9,351 6,154 3.40738692736354E-05

This indicates that the firewall might’ve been under attack towards the end of 2015 along with some increased activity in January 2017.

For message types (from a Cisco ASA firewall):

No Date Anomaly Type Value Average Standard Deviation Anomaly Threshold
1 9/27/2016 Message types 27 19 02 0.00222262943167033
2 8/16/2016 Message types 27 19 02 0.00222262943167033
3 2/20/2016 Message types 27 19 02 0.00222262943167033
4 1/22/2016 Message types 27 19 02 0.00222262943167033

The dates where 27 types of messages were recorded (vs the normal 16-20 range) indicate that different types of events were recorded by the firewalls. In many cases we found out that this type of anomalies are indications of admins making configuration changes.
Analyzing all the data collected by Firegen creates an aggregated report:

While the anomalies reports can be generated on demand, Firegen will perform an analysis of each set of data for all the scheduled and on demand reports. A summary of the findings (anomaly or normal) is included in the report and if the report is emailed, the subject line will provide either a notification that the data is anomalous or, if within normal limits, a “No anomalies detected” confirmation. An attack will not always create a logging anomaly, however, the ones that do will trigger the attention of the administrator and alleviate the risks of “reporting fatigue”.

The hourly statistics are not yet included in the Firegen analysis but they are nevertheless collected for future use or integration with 3rd party analysis tools. For example, the anomalies for the traffic between hours 00:00 and 01:00, plotted using a Python script:

From this graphic it appears that the there was an unusual amount of traffic on Sep 20, 2016, between 00:00-01:00 With almost 4 GB, this outlier is way above the 500 Mb average.

In order to compute anomalies, a certain number of records is required, the more the better. For new Firegen installations, if left to accumulate just through the daily scheduled reports, it may take several months to achieve a certain level of confidence that the data is indeed anomalous when reported as such. We recommend at least 45 day worth of analysis in order to get relevant anomaly reports. If the raw logs are available, Firegen can be scheduled to perform the analysis of a custom range of days in order to create the basis for the anomalies threshold. This is particularly useful for those willing to see the anomaly detection feature in action during the 30 days trial period. The Firegen40CLI command-line based analyzer allows the administrator to launch an analysis iterating through a number of days starting from a specific date (i.e. to perform the daily analysis for 90 days starting with 2017/01/01 – providing that the logs are available). Once the analysis is completed, the statistics database will be filled with 90 days worth of data that can be used for anomaly detection. Example:

firegen40cli -n -a analysis_profile_name -r 90 -s “2017/01/01” -i

The switches have the following meaning:
– n – Run the analysis without opening the report in the default browser
– a – Analyze “analysis_profile_name” (the name of the analysis profile configured for the particular firewall(s)
– r – Repeat the analysis for 90 consecutive days
– s – Start with 2017/01/01 (yyyy/mm/dd)
– i – Initialize the statistics