.. BroAPT documentation master file, created by
   sphinx-quickstart on Sat Mar 14 11:21:30 2020.
   You can adapt this file completely to your liking, but it should at least
   contain the root `toctree` directive.

BroAPT: A system for detecting APT attacks in real-time
=======================================================

.. toctree::
   :maxdepth: 2

   quickstart
   configuration
   framework
   api

Cybersecurity has long been a significant subject under discussion. With rapid
evolution of new cyber attack methods, the threat of Internet is becoming more
and more intense. Advanced persistent threat (APT) has become a main source of
cybersecurity events. It is now even more important to identify and classify
network traffic by direct analysis on the traffic itself in an accurate and
timely manner.

We hereby describe BroAPT system, an APT detection system based on `Bro IDS`_
(old name at time of implementation, now known as Zeek IDS). The system monitors APT
based on comprehensive analysis of the network traffic. It is granted with high
performance and extensibility. It can reassemble then extract files transmitted in
the traffic, analyse and generate log files in real-time; it can also classify
extracted files through targeted malicious file detection configuration; and it
detects APT attacks based on analysis of the log files generated by the system
itself.

.. _Bro IDS: https://www.zeek.org

The BroAPT system consists of two major parts. One is the core functions. This
part runs in a Docker container, which currently is based on CentOS 7 image.
The core functions can be described by two different components: an extraction
framework BroAPT-Core and a detection framework BroAPT-App. The other is the
command line interface (CLI) and a daemon server BroAPT-Daemon, which is a
RESTful API server based on `Flask`_ framework. This part runs on the host
machine of the Docker container.

.. _Flask: https://flask.palletsprojects.com

CLI is the entrypoint for the whole BroAPT system. When running, the CLI
configures the daemon server and bring it up, then start the Docker container
with core functions. Within BroAPT-Core extraction framework, it will read in a
PCAP file and process it with Bro IDS, which will reassemble then extract files
transmitted by the traffic and generate log files from its logging system.
Afterwards, BroAPT-App detection framework will take the extracted file, parse
it's file name to extract MIME type information of this file. Then the framework
will fetch specific detection API of such MIME type and process it to detect if
the file is malicious. If needed, the framework will generate a request to
BroAPT-Daemon server to process a remote (privileged) detection on such file.

Of BroAPT-Core extraction framework, it mainly has three steps. First, file check.
The system will scan for new PCAP files and send them to the BroAPT-Core extraction
framework. Second, Bro analysis. The system will process the PCAP with file
extraction scripts, reassemble then extract files transmitted through the traffic.
The extraction can be grouped by MIME type of files or application layer protocol
which transmitted the file. Also, the user may load external Bro scripts as site
functions to process along with the main extraction scripts. Third, post-processing
and cross-analysis. After processing the PCAP file with Bro IDS, the system will
have several extracted files and a bunch of log files. Besides those standard Bro
logs, there will be logs defined by the site functions and generated by the logging
system of Bro IDS. Then the system, by default, will generate connection information
of the extracted files through Bro logs, which includes timestamp, source and
destination, MIME type, as well as hash values. Plus, the user may also register
Python hooks to the system, as they will be called every time a PCAP
is processed. These hooks can to used to provide further investigation upon the
logs generated by Bro IDS.

To work along with Bro intrusion detection system (IDS), the system is implemented
in a multi-processing manner. Since CPython's multi-threading is not working as
expected -- cannot perform parallel processing -- we implemented BroAPT system with
full support of multi-processing to accelerate the main processing logic.
Synchronised queues are used to communicate and coordinate processes within the
system: in BroAPT-Core extraction framework, we used a queue to send basic
information about the extracted files to BroAPT-App detection framework, and
another queue to procede Python hooks with the generated log files.

Currently, we have introduced several site functions and Python hooks to BroAPT
system. There are six bunches of Bro scripts. Constant definitions for common
application layer protocols, such as HTTP and FTP, these constants are fetched from
IANA registry. Extend standard Bro log http.log with new entry of COOKIE information
and data in POST request. Calculate hash values of all files transmitted through
network traffic. And two Bro modules to perform phishing emails based on
cross-analysis of SMTP and HTTP traffic. The Python hook function currently
included is to parse http.log then extract information of HTTP connections and
generate a new log file.

As for BroAPT-App detection framework, we genetically designed the client-server
remote detection framework based on  the support of BroAPT-Daemon server. Briefly,
the BroAPT-App detection framework will take the extracted files as input source.
The system will perform file check to extract information from it. These information
includes path to the file, MIME type and unique identifier (UID) of such file, etc.
Then the system will parse an API configuration file to obtain a mapping of MIME
type specific malicious file detection APIs. Based on the MIME type we had from the
file, the system will perform APT detection with the selected API. When detection,
the system will firstly prepare the working environment according to the API
configuration: it will assign environment variables, change working directory
accordingly, expand variables defined in scripts then execute installations scripts.
Afterwards, the system will execute detection scripts, then report generation script
to generate detection results for the target file. If a remote detection is required,
the system will prepare the request data, then post it to the BroAPT-Daemon server
running on the host machine. The BroAPT-Daemon server will process the detection
ibid.

Speaking of installation, we introduced several attributes to manage and avoid
resource competition. We used a shared memory space to indicate whether such API
has been proceded with installation. This indicator will avoid reinstallation of
APIs. It is shared with all MIME type specific APIs that sharing the same detection
process, not just processes using the same API. Additionally, we have a synchronised
process lock to prevent parallel installation for the same APIs. However,
considering the APIs might fail due to network connection issue, we will try to
rerun the script if it fails.

We have by far introduced, six different APIs targeted for dozens of MIME types. We
used VirusTotal as the basic general detection method for BroAPT, which will detect
any MIME types that have no registered API; VirusTotal aggregates many antivirus
products and online scan engines to check for ciruses that the user's own antivirus
may have missed, or to verify against any false positives. We used `AndroPyTool`_ to
detect APK files (MIME type: ``application/vnd.android.package-archive``);
AndroPyTool is a tool for extracting statis and dynamic features from Android APKs,
which combines different well-known Android application analysis tools such as
DroidBox, FlowDroid, Strace, AndroGuard or VirusTotal analysis. We used
`MaliciousMacroBot`_ to detect Office documents (MIME type:
``application/vnd.openxmlformats-officedocument``, or ``application/msword``,
``application/vnd.ms-excel``, ``application/vnd.ms-powerpoint``, etc.);
MaliciousMacroBot provides a powerful malicious file triage tool through clever
feature engineering and applied machine learning techniques like Random Forest and
TF-IDF. We used `ELF Parser`_ to detect Linux ELF binaries (MIME type:
``application/x-executable``); ELF Parser is a static ELF analysis tool to quickly
determine the capabilities of an ELF binary through statis analysis. We used `LMD`_
to detect other common Linux exploitable files (MIME type:
``application/octet-stream``, ``text/html``, ``text/x-c``, ``text/x-perl``,
``text/x-php``, etc.); LMD is a malware scanner for Linux systems based on threat
data from network edge intrusion detection systems to extract malware that is
actively being used in attacks and generates signatures for detection. And we used
`JaSt`_ to detect JavaScript files (MIME type: ``application/javascript`` or
``text/javascript``); JaSt is a tool to syntactically detect malicious (obfuscated)
JavaScript files based on machine learning and clustering algorithms.

.. _AndroPyTool: https://github.com/alexMyG/AndroPyTool
.. _MaliciousMacroBot: https://github.com/egaus/MaliciousMacroBot
.. _ELF Parser: http://elfparser.com
.. _LMD: https://www.rfxn.com/projects/linux-malware-detect
.. _JaSt: https://github.com/Aurore54F/JaSt

As described above, BroAPT is an APT detection system based on Bro IDS with high
extensibility and compatibility with high-speed traffic. We tested BroAPT system
with real-time traffic collected from the network edge of a college. The system
will extract all targeted files from an approximately 35G PCAP file within one
minute. And the Bro site functions introduced within BroAPT-Core extraction
framework has no significant impact on performance of the system, whilst the Python
hook functions will smoothly work along and generate new log files as it intended
to. Also, the detection APIs we used in BroAPT-App detection system has proved that
they are working perfectly with reasonable false-positive rates. In a word, the
BroAPT system is working as expected in the real network environment.

However, besides the implementation above, we have tried several other
implementations during the project. We used pure Python scripts based on `PyPCAPKit`_
(a multi-engine PCAP file analysis tool) with supoort of `DPKT`_ to reassemble and
extract files transmitted through the traffic, but the process efficiency was not
quite good. Not to mension hybrid implementation with Bro scripts logging TCP
traffic data and Python or C/C++ programs to reassembly then extract the traffic,
and the miserable pure Bro implementation of TCP reassembly. At last, File Analysis
framework of Bro IDS proved its worthiness to the BroAPT system. And thus we adopted
the current implementation.

.. _PyPCAPKit: https://github.com/JarryShaw/PyPCAPKit
.. _DPKT: https://dpkt.readthedocs.io

Although our research on APT detection is quite preceding, the BroAPT system
utilised Bro IDS and works as an APT detection system which is compatible with
high-speed network traffic. The system has been proved in practical scenarios, and
is the basis of follow-up researches on APT detection.

For more information, please refer to the
:download:`Graduation Thesis <../../thesis.pdf>` of BroAPT (in Chinese).

Liscensing
==========

This work is in general licensed under the `Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License <http://creativecommons.org/licenses/by-nc-nd/4.0/>`__.
Part of this work is derived and copied from `Zeek <https://github.com/zeek/zeek>`__,
`Broker <https://github.com/zeek/broker>`__, and `file-extraction <https://github.com/hosom/file-extraction>`__
all with **BSD 3-Clause License**, which shall be dual-licensed under the two licenses.

Original developed part of this software and associated documentation files (the "*Software*")
are hereby licensed under the **Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License**.
No permits are foreordained unless granted by the author and maintainer of the *Software*, i.e.
`Jarry Shaw <https://github.com/JarryShaw>`__.

Indices and tables
==================

* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`