BroAPT-Core Extration Framework

The BroAPT-Core framework processes PCAP files, extracts files transferred through traffic contained in the PCAP files, and perform analysis to the log files generated by Bro scripts.

BroAPT-Core Extration Framework
  1. When the BroAPT-Core framework first reads in a new PCAP file, it will validate if it’s a valid tcpdump (tcpdump(1)) format file, through libmagic (libmagic(3)).

  2. If validated, the BroAPT-Core framework will utilise the Bro IDS to perform analysis upon the PCAP file, extracting files and generating logs.

    When extracting, you may toggle through environment variables to configure which MIME types and/or what application layer protocol files transferred with should be extracted.

    Also, site functions from user-defined Bro scripts will be loaded and executed at the same time.

    This step will produce extracted files and standard Bro logs, as well as extra artefacts elevated through the site functions.

  3. Later, the BroAPT-Core framework will perform post-processing, a.k.a. cross-analysis, upon the logs generated in previous step.

    By default, the BroAPT-Core framework will gather connection information of the extracted files from the Bro logs (files.log). Some other analysis will also be performed as defined in the Python hooks.

    The result of analysis will be elevated as BroAPT logs.

Custom Bro Scripts

In the BroAPT system, you can customise your own Bro script. The BroAPT-Core framework will load those scripts when running Bro IDS to process PCAP files.

User defined Bro scripts will be mapped into the Docker container at runtime. The directory structure would be as following:

/broapt/scripts/
│   # load FileExtraction module
├── __load__.bro
│   # configurations
├── config.bro
│   # MIME-extension mappings
├── file-extensions.bro
│   # protocol hooks
├── hooks/
│   │   # extract DTLS
│   ├── extract-dtls.bro
│   │   # extract FTP_DATA
│   ├── extract-ftp.bro
│   │   # extract HTTP
│   ├── extract-http.bro
│   │   # extract IRC_DATA
│   ├── extract-irc.bro
│   │   # extract SMTP
│   └── extract-smtp.bro
│   # core logic
├── main.bro
│   # MIME hooks
│── plugins/
│   │   # extract all files
│   ├── extract-all-files.bro
│   │   # extract APK
│   ├── extract-application-vnd-android-package-archive.bro
│   │   # extract PDF
│   ├── extract-application-pdf.bro
│   │   # extract PE
│   ├── extract-application-vnd-microsoft-portable-executable.bro
│   │   # extract by BRO_MIME
│   └── extract-white-list.bro
│   # site functions by user
└── sites/
   │   # load site functions
   ├── __load__.bro
   └── ...

where extract-application-vnd-android-package-archive.bro, extract-application-pdf.bro and extract-application-vnd-microsoft-portable-executable.bro are Bro scripts generated automatically by the BroAPT-Core framework based on the BROAPT_LOAD_MIME environment vairable.

Important

The BROAPT_LOAD_MIME supports UNIX shell-like pattern matching, c.f. fnmatch module from Python.

And /broapt/scripts/sites/ are mapped from the host machine, which includes the Bro scripts defined by user. You may include your scripts into the BroAPT-Core framework by loading (@load) them in the /broapt/scripts/sites/__load__.bro file.

At the moment, we have six sets of Bro scripts included in the distribution.

Common Constants

In the BroAPT system, it predefines many constants of common protocols and systems, such as FTP commands, HTTP methods, etc. We used crawlers to fetch relevant data from the IANA registry, generate and/or update Bro constants, such as HTTP::header_names for HTTP headers fields.

HTTP Cookies

The script utilised http_header event, and extends the builtin http.log record object HTTP::Info with data from the COOKIE header.

Unknown HTTP Headers

As defined in RFC 2616 and RFC 7230, and registered in IANA, there’re a list of known HTTP headers. However, customised headers may be introduced when implementation. Such unknown headers may contain significant information about the HTTP traffic. Therefore, the script utilised http_header event and search for unknown headers, i.e. not included in HTTP::header_names, then record them in the http.log files.

HTTP POST Data

As RFC 2616 suggests, we can utilise the data sent from POST command to analyse information about outbound traffic. The script utilised http_entity_data event, and save the POST data to http.log files.

Calculate Hash Values

Hash value of files can be used to detect malware. The script utilised file_new event, calculated and saved the hash values of files transferred in the files.log file.

SMTP Phishing Detect

Since files transferred through SMTP traffic are not easy to gather and detect phishing information. We introduced two Bro modules to perform such detection on the SMTP traffic.

A. Phishing Module

The Phishing module mainly provides mass scam emails; phishing email detection based on Levenshtein distance of sender address. It will elevate a phishing_link.log log file, containing such malicious connections and URLs.

B. Phish Module

Primary scope of these bro policies is to give more insights into smtp-analysis esp to track phishing events.

This is a subset of phish-analysis repo and doesn’t use any backed postgres database. So relieves the user from postgres dependency while getting basic phishing detection up and running very quickly.

Custom Python Hooks

In the BroAPT system, you can customise your own Python hooks for cross-analysis to the log files. The BroAPT-Core framework will call such registered hooks on each set of log files generacted from a PCAP file after processing of Bro.

See also

Log analysis and generation can be done through the ZLogging project, which provides both loading and dumping interface to the processing of Bro logs in an elegant Pythonic way.

User defined Bro scripts will be mapped into the Docker container at runtime. The directory structure would be as following:

/broapt/python/
│   # setup PYTHONPATH
├── __init__.py
│   # entry point
├── __main__.py
│   # config parser
├── cfgparser.py
│   # Bro script composer
├── compose.py
│   # global constants
├── const.py
│   # Bro log parser
├── logparser.py
│   # BroAPT-Core logic
├── process.py
│   # multiprocessing support
├── remote.py
│   # BroAPT-App logic
├── scan.py
│   # Python hooks
├── sites
│   │   # register hooks
│   ├── __init__.py
│   └── ...
│   # utility functions
└── utils.py

where /broapt/python/sites/ is mapped from the host machine, which includes user-defined site customisation Python hooks.

You can register your own hooks in the /broapt/python/sites/__init__.py, by importing (import) them and add them to the HOOK and/or EXIT registry lists.

In the HOOK registry, each registered hook function will be called after a PCAP file is processed by the Bro IDS, and perform analysis on the logs generated from the PCAP file.

Note

The hook function will be called with ONE argument, log_name, a string (str) representing the folder name to the target logs.

In the EXIT registry, each registered hook function will be called before the main process of the BroAPT-Core framework exits.

Note

The hook function will be called with NO argument.

At the moment, we have bundled two sets of Python hooks in the system.

Extracted File Information

Through conn.log and files.log, the BroAPT system generates a new log file for information of extracted files, which includes the timestamp, source and destination IP addresses of the transport layer connection (TCP/UDP) transferring the file, MIME type of the file, as well as hash values, see below:

Field Name

Bro Type

Description

timestamp

float

Connection timestamp

log_uuid

string

UUID of source logs

log_path

string

Absolute path to source logs (in Docker container)

log_name

string

Relative path to source logs

dump_path

string

Absolute path to extracted file (in Docker container)

local_name

string

Relative path to extracted file

source_name

string

Original filename (if present)

hosts

vector

Transferrer and receiver

conns

vector

Source and destination IP addresses and ports

bro_mime_type

string

MIME type probed by Bro IDS

real_mime_type

string

MIME type detected by libmagic

hash

table

Hash values (MD5, SHA1 and SHA256)

The equivalent ZLogging data model can be declared as following:

class ExtractedFiles(Model):
    timestamp = FloatType()
    log_uuid = StringType()
    log_path = StringType()
    dump_path = StringType()
    local_name = StringType()
    source_name  = StringType()
    hosts = VectorType(element_type=RecordType(
        tx=AddrType(),
        rx=AddrType(),
    ))
    conns = VectorType(element_type=RecordType(
        src_h=AddrType(),
        src_p=PortType(),
        dst_h=AddrType(),
        dst_p=PortType(),
    ))
    bro_mime_type = StringType()
    real_mime_type = StringType()
    hash = RecordType(
        md5=StringType(),
        sha1=StringType(),
        sha256=StringType(),
    )

HTTP Connection Information

Through analysis upon http.log, the BroAPT system elevated a new log file with more concentrated information about HTTP connections. Such log file contains all HTTP connections from every processed PCAP file, and can be used for further analysis based on big data.

Field Name

Bro Type

Description

srcip

addr

Client IP address

ts

float

Request timestamp (microseconds)

url

string

Requests URL path

ref

string

Referer header of the request (base64 encoded)

ua

string

User-Agent header of the request (base64 encoded)

dstip

addr

Server IP address

cookie

string

Cookie header of the request (base64 encoded)

src_port

port

Client port

json

vector

Unregistered HTTP header fields (JSON encoded)

method

string

HTTP method

body

string

POST body data (base64 encoded)

The equivalent ZLogging data model can be declared as following (with type annotations):

class HTTPConnections(Model):
    srcip: bro_addr
    ts: bro_float
    url: bro_string
    ref: bro_string
    ua: bro_string
    dstip: bro_addr
    cookie: bro_string
    src_port: bro_port
    json: bro_vector[bro_string]
    method: bro_string
    body: bro_string