BroAPT-Core Extration Framework¶

The BroAPT-Core framework processes PCAP files, extracts files transferred through traffic contained in the PCAP files, and perform analysis to the log files generated by Bro scripts.

When the BroAPT-Core framework first reads in a new PCAP file, it will validate if it’s a valid tcpdump (tcpdump(1)) format file, through libmagic (libmagic(3)).
If validated, the BroAPT-Core framework will utilise the Bro IDS to perform analysis upon the PCAP file, extracting files and generating logs.

When extracting, you may toggle through environment variables to configure which MIME types and/or what application layer protocol files transferred with should be extracted.

Also, site functions from user-defined Bro scripts will be loaded and executed at the same time.

This step will produce extracted files and standard Bro logs, as well as extra artefacts elevated through the site functions.
Later, the BroAPT-Core framework will perform post-processing, a.k.a. cross-analysis, upon the logs generated in previous step.

By default, the BroAPT-Core framework will gather connection information of the extracted files from the Bro logs (files.log). Some other analysis will also be performed as defined in the Python hooks.

The result of analysis will be elevated as BroAPT logs.

Custom Bro Scripts¶

In the BroAPT system, you can customise your own Bro script. The BroAPT-Core framework will load those scripts when running Bro IDS to process PCAP files.

User defined Bro scripts will be mapped into the Docker container at runtime. The directory structure would be as following:

/broapt/scripts/
│   # load FileExtraction module
├── __load__.bro
│   # configurations
├── config.bro
│   # MIME-extension mappings
├── file-extensions.bro
│   # protocol hooks
├── hooks/
│   │   # extract DTLS
│   ├── extract-dtls.bro
│   │   # extract FTP_DATA
│   ├── extract-ftp.bro
│   │   # extract HTTP
│   ├── extract-http.bro
│   │   # extract IRC_DATA
│   ├── extract-irc.bro
│   │   # extract SMTP
│   └── extract-smtp.bro
│   # core logic
├── main.bro
│   # MIME hooks
│── plugins/
│   │   # extract all files
│   ├── extract-all-files.bro
│   │   # extract APK
│   ├── extract-application-vnd-android-package-archive.bro
│   │   # extract PDF
│   ├── extract-application-pdf.bro
│   │   # extract PE
│   ├── extract-application-vnd-microsoft-portable-executable.bro
│   │   # extract by BRO_MIME
│   └── extract-white-list.bro
│   # site functions by user
└── sites/
   │   # load site functions
   ├── __load__.bro
   └── ...

where extract-application-vnd-android-package-archive.bro, extract-application-pdf.bro and extract-application-vnd-microsoft-portable-executable.bro are Bro scripts generated automatically by the BroAPT-Core framework based on the BROAPT_LOAD_MIME environment vairable.

Important

The BROAPT_LOAD_MIME supports UNIX shell-like pattern matching, c.f. fnmatch module from Python.

And /broapt/scripts/sites/ are mapped from the host machine, which includes the Bro scripts defined by user. You may include your scripts into the BroAPT-Core framework by loading (@load) them in the /broapt/scripts/sites/__load__.bro file.

At the moment, we have six sets of Bro scripts included in the distribution.

Common Constants¶

In the BroAPT system, it predefines many constants of common protocols and systems, such as FTP commands, HTTP methods, etc. We used crawlers to fetch relevant data from the IANA registry, generate and/or update Bro constants, such as HTTP::header_names for HTTP headers fields.

HTTP Cookies¶

The script utilised http_header event, and extends the builtin http.log record object HTTP::Info with data from the COOKIE header.

Unknown HTTP Headers¶

As defined in RFC 2616 and RFC 7230, and registered in IANA, there’re a list of known HTTP headers. However, customised headers may be introduced when implementation. Such unknown headers may contain significant information about the HTTP traffic. Therefore, the script utilised http_header event and search for unknown headers, i.e. not included in HTTP::header_names, then record them in the http.log files.

HTTP `POST` Data¶

As RFC 2616 suggests, we can utilise the data sent from POST command to analyse information about outbound traffic. The script utilised http_entity_data event, and save the POST data to http.log files.

Calculate Hash Values¶

Hash value of files can be used to detect malware. The script utilised file_new event, calculated and saved the hash values of files transferred in the files.log file.

SMTP Phishing Detect¶

Since files transferred through SMTP traffic are not easy to gather and detect phishing information. We introduced two Bro modules to perform such detection on the SMTP traffic.

A. `Phishing` Module¶

The Phishing module mainly provides mass scam emails; phishing email detection based on Levenshtein distance of sender address. It will elevate a phishing_link.log log file, containing such malicious connections and URLs.

B. `Phish` Module¶

Primary scope of these bro policies is to give more insights into smtp-analysis esp to track phishing events.

This is a subset of phish-analysis repo and doesn’t use any backed postgres database. So relieves the user from postgres dependency while getting basic phishing detection up and running very quickly.

Custom Python Hooks¶

In the BroAPT system, you can customise your own Python hooks for cross-analysis to the log files. The BroAPT-Core framework will call such registered hooks on each set of log files generacted from a PCAP file after processing of Bro.

Extracted File Information¶

Through conn.log and files.log, the BroAPT system generates a new log file for information of extracted files, which includes the timestamp, source and destination IP addresses of the transport layer connection (TCP/UDP) transferring the file, MIME type of the file, as well as hash values, see below:

Field Name	Bro Type	Description
`timestamp`	`float`	Connection timestamp
`log_uuid`	`string`	UUID of source logs
`log_path`	`string`	Absolute path to source logs (in Docker container)
`log_name`	`string`	Relative path to source logs
`dump_path`	`string`	Absolute path to extracted file (in Docker container)
`local_name`	`string`	Relative path to extracted file
`source_name`	`string`	Original filename (if present)
`hosts`	`vector`	Transferrer and receiver
`conns`	`vector`	Source and destination IP addresses and ports
`bro_mime_type`	`string`	MIME type probed by Bro IDS
`real_mime_type`	`string`	MIME type detected by `libmagic`
`hash`	`table`	Hash values (MD5, SHA1 and SHA256)

The equivalent ZLogging data model can be declared as following:

class ExtractedFiles(Model):
    timestamp = FloatType()
    log_uuid = StringType()
    log_path = StringType()
    dump_path = StringType()
    local_name = StringType()
    source_name  = StringType()
    hosts = VectorType(element_type=RecordType(
        tx=AddrType(),
        rx=AddrType(),
    ))
    conns = VectorType(element_type=RecordType(
        src_h=AddrType(),
        src_p=PortType(),
        dst_h=AddrType(),
        dst_p=PortType(),
    ))
    bro_mime_type = StringType()
    real_mime_type = StringType()
    hash = RecordType(
        md5=StringType(),
        sha1=StringType(),
        sha256=StringType(),
    )

HTTP Connection Information¶

Through analysis upon http.log, the BroAPT system elevated a new log file with more concentrated information about HTTP connections. Such log file contains all HTTP connections from every processed PCAP file, and can be used for further analysis based on big data.

Field Name	Bro Type	Description
`srcip`	`addr`	Client IP address
`ts`	`float`	Request timestamp (microseconds)
`url`	`string`	Requests URL path
`ref`	`string`	`Referer` header of the request (base64 encoded)
`ua`	`string`	`User-Agent` header of the request (base64 encoded)
`dstip`	`addr`	Server IP address
`cookie`	`string`	`Cookie` header of the request (base64 encoded)
`src_port`	`port`	Client port
`json`	`vector`	Unregistered HTTP header fields (JSON encoded)
`method`	`string`	HTTP method
`body`	`string`	`POST` body data (base64 encoded)

The equivalent ZLogging data model can be declared as following (with type annotations):

class HTTPConnections(Model):
    srcip: bro_addr
    ts: bro_float
    url: bro_string
    ref: bro_string
    ua: bro_string
    dstip: bro_addr
    cookie: bro_string
    src_port: bro_port
    json: bro_vector[bro_string]
    method: bro_string
    body: bro_string

BroAPT-Core Extration Framework¶

Custom Bro Scripts¶

Common Constants¶

HTTP Cookies¶

Unknown HTTP Headers¶

HTTP `POST` Data¶

Calculate Hash Values¶

SMTP Phishing Detect¶

A. `Phishing` Module¶

B. `Phish` Module¶

Custom Python Hooks¶

Extracted File Information¶

HTTP Connection Information¶

BroAPT

Navigation

Related Topics

BroAPT-Core Extration Framework¶

Custom Bro Scripts¶

Common Constants¶

HTTP Cookies¶

Unknown HTTP Headers¶

HTTP POST Data¶

Calculate Hash Values¶

SMTP Phishing Detect¶

A. Phishing Module¶

B. Phish Module¶

Custom Python Hooks¶

Extracted File Information¶

HTTP Connection Information¶

HTTP `POST` Data¶

A. `Phishing` Module¶

B. `Phish` Module¶