.. BroAPT documentation master file, created by sphinx-quickstart on Sat Mar 14 11:21:30 2020. You can adapt this file completely to your liking, but it should at least contain the root `toctree` directive. BroAPT: A system for detecting APT attacks in real-time ======================================================= .. toctree:: :maxdepth: 2 quickstart configuration framework api Cybersecurity has long been a significant subject under discussion. With rapid evolution of new cyber attack methods, the threat of Internet is becoming more and more intense. Advanced persistent threat (APT) has become a main source of cybersecurity events. It is now even more important to identify and classify network traffic by direct analysis on the traffic itself in an accurate and timely manner. We hereby describe BroAPT system, an APT detection system based on `Bro IDS`_ (old name at time of implementation, now known as Zeek IDS). The system monitors APT based on comprehensive analysis of the network traffic. It is granted with high performance and extensibility. It can reassemble then extract files transmitted in the traffic, analyse and generate log files in real-time; it can also classify extracted files through targeted malicious file detection configuration; and it detects APT attacks based on analysis of the log files generated by the system itself. .. _Bro IDS: https://www.zeek.org The BroAPT system consists of two major parts. One is the core functions. This part runs in a Docker container, which currently is based on CentOS 7 image. The core functions can be described by two different components: an extraction framework BroAPT-Core and a detection framework BroAPT-App. The other is the command line interface (CLI) and a daemon server BroAPT-Daemon, which is a RESTful API server based on `Flask`_ framework. This part runs on the host machine of the Docker container. .. _Flask: https://flask.palletsprojects.com CLI is the entrypoint for the whole BroAPT system. When running, the CLI configures the daemon server and bring it up, then start the Docker container with core functions. Within BroAPT-Core extraction framework, it will read in a PCAP file and process it with Bro IDS, which will reassemble then extract files transmitted by the traffic and generate log files from its logging system. Afterwards, BroAPT-App detection framework will take the extracted file, parse it's file name to extract MIME type information of this file. Then the framework will fetch specific detection API of such MIME type and process it to detect if the file is malicious. If needed, the framework will generate a request to BroAPT-Daemon server to process a remote (privileged) detection on such file. Of BroAPT-Core extraction framework, it mainly has three steps. First, file check. The system will scan for new PCAP files and send them to the BroAPT-Core extraction framework. Second, Bro analysis. The system will process the PCAP with file extraction scripts, reassemble then extract files transmitted through the traffic. The extraction can be grouped by MIME type of files or application layer protocol which transmitted the file. Also, the user may load external Bro scripts as site functions to process along with the main extraction scripts. Third, post-processing and cross-analysis. After processing the PCAP file with Bro IDS, the system will have several extracted files and a bunch of log files. Besides those standard Bro logs, there will be logs defined by the site functions and generated by the logging system of Bro IDS. Then the system, by default, will generate connection information of the extracted files through Bro logs, which includes timestamp, source and destination, MIME type, as well as hash values. Plus, the user may also register Python hooks to the system, as they will be called every time a PCAP is processed. These hooks can to used to provide further investigation upon the logs generated by Bro IDS. To work along with Bro intrusion detection system (IDS), the system is implemented in a multi-processing manner. Since CPython's multi-threading is not working as expected -- cannot perform parallel processing -- we implemented BroAPT system with full support of multi-processing to accelerate the main processing logic. Synchronised queues are used to communicate and coordinate processes within the system: in BroAPT-Core extraction framework, we used a queue to send basic information about the extracted files to BroAPT-App detection framework, and another queue to procede Python hooks with the generated log files. Currently, we have introduced several site functions and Python hooks to BroAPT system. There are six bunches of Bro scripts. Constant definitions for common application layer protocols, such as HTTP and FTP, these constants are fetched from IANA registry. Extend standard Bro log http.log with new entry of COOKIE information and data in POST request. Calculate hash values of all files transmitted through network traffic. And two Bro modules to perform phishing emails based on cross-analysis of SMTP and HTTP traffic. The Python hook function currently included is to parse http.log then extract information of HTTP connections and generate a new log file. As for BroAPT-App detection framework, we genetically designed the client-server remote detection framework based on the support of BroAPT-Daemon server. Briefly, the BroAPT-App detection framework will take the extracted files as input source. The system will perform file check to extract information from it. These information includes path to the file, MIME type and unique identifier (UID) of such file, etc. Then the system will parse an API configuration file to obtain a mapping of MIME type specific malicious file detection APIs. Based on the MIME type we had from the file, the system will perform APT detection with the selected API. When detection, the system will firstly prepare the working environment according to the API configuration: it will assign environment variables, change working directory accordingly, expand variables defined in scripts then execute installations scripts. Afterwards, the system will execute detection scripts, then report generation script to generate detection results for the target file. If a remote detection is required, the system will prepare the request data, then post it to the BroAPT-Daemon server running on the host machine. The BroAPT-Daemon server will process the detection ibid. Speaking of installation, we introduced several attributes to manage and avoid resource competition. We used a shared memory space to indicate whether such API has been proceded with installation. This indicator will avoid reinstallation of APIs. It is shared with all MIME type specific APIs that sharing the same detection process, not just processes using the same API. Additionally, we have a synchronised process lock to prevent parallel installation for the same APIs. However, considering the APIs might fail due to network connection issue, we will try to rerun the script if it fails. We have by far introduced, six different APIs targeted for dozens of MIME types. We used VirusTotal as the basic general detection method for BroAPT, which will detect any MIME types that have no registered API; VirusTotal aggregates many antivirus products and online scan engines to check for ciruses that the user's own antivirus may have missed, or to verify against any false positives. We used `AndroPyTool`_ to detect APK files (MIME type: ``application/vnd.android.package-archive``); AndroPyTool is a tool for extracting statis and dynamic features from Android APKs, which combines different well-known Android application analysis tools such as DroidBox, FlowDroid, Strace, AndroGuard or VirusTotal analysis. We used `MaliciousMacroBot`_ to detect Office documents (MIME type: ``application/vnd.openxmlformats-officedocument``, or ``application/msword``, ``application/vnd.ms-excel``, ``application/vnd.ms-powerpoint``, etc.); MaliciousMacroBot provides a powerful malicious file triage tool through clever feature engineering and applied machine learning techniques like Random Forest and TF-IDF. We used `ELF Parser`_ to detect Linux ELF binaries (MIME type: ``application/x-executable``); ELF Parser is a static ELF analysis tool to quickly determine the capabilities of an ELF binary through statis analysis. We used `LMD`_ to detect other common Linux exploitable files (MIME type: ``application/octet-stream``, ``text/html``, ``text/x-c``, ``text/x-perl``, ``text/x-php``, etc.); LMD is a malware scanner for Linux systems based on threat data from network edge intrusion detection systems to extract malware that is actively being used in attacks and generates signatures for detection. And we used `JaSt`_ to detect JavaScript files (MIME type: ``application/javascript`` or ``text/javascript``); JaSt is a tool to syntactically detect malicious (obfuscated) JavaScript files based on machine learning and clustering algorithms. .. _AndroPyTool: https://github.com/alexMyG/AndroPyTool .. _MaliciousMacroBot: https://github.com/egaus/MaliciousMacroBot .. _ELF Parser: http://elfparser.com .. _LMD: https://www.rfxn.com/projects/linux-malware-detect .. _JaSt: https://github.com/Aurore54F/JaSt As described above, BroAPT is an APT detection system based on Bro IDS with high extensibility and compatibility with high-speed traffic. We tested BroAPT system with real-time traffic collected from the network edge of a college. The system will extract all targeted files from an approximately 35G PCAP file within one minute. And the Bro site functions introduced within BroAPT-Core extraction framework has no significant impact on performance of the system, whilst the Python hook functions will smoothly work along and generate new log files as it intended to. Also, the detection APIs we used in BroAPT-App detection system has proved that they are working perfectly with reasonable false-positive rates. In a word, the BroAPT system is working as expected in the real network environment. However, besides the implementation above, we have tried several other implementations during the project. We used pure Python scripts based on `PyPCAPKit`_ (a multi-engine PCAP file analysis tool) with supoort of `DPKT`_ to reassemble and extract files transmitted through the traffic, but the process efficiency was not quite good. Not to mension hybrid implementation with Bro scripts logging TCP traffic data and Python or C/C++ programs to reassembly then extract the traffic, and the miserable pure Bro implementation of TCP reassembly. At last, File Analysis framework of Bro IDS proved its worthiness to the BroAPT system. And thus we adopted the current implementation. .. _PyPCAPKit: https://github.com/JarryShaw/PyPCAPKit .. _DPKT: https://dpkt.readthedocs.io Although our research on APT detection is quite preceding, the BroAPT system utilised Bro IDS and works as an APT detection system which is compatible with high-speed network traffic. The system has been proved in practical scenarios, and is the basis of follow-up researches on APT detection. For more information, please refer to the :download:`Graduation Thesis <../../thesis.pdf>` of BroAPT (in Chinese). Liscensing ========== This work is in general licensed under the `Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License `__. Part of this work is derived and copied from `Zeek `__, `Broker `__, and `file-extraction `__ all with **BSD 3-Clause License**, which shall be dual-licensed under the two licenses. Original developed part of this software and associated documentation files (the "*Software*") are hereby licensed under the **Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License**. No permits are foreordained unless granted by the author and maintainer of the *Software*, i.e. `Jarry Shaw `__. Indices and tables ================== * :ref:`genindex` * :ref:`modindex` * :ref:`search`