Post

Home SIEM Architecture: Wazuh + OpenSearch + Zeek

Home SIEM Architecture: Wazuh + OpenSearch + Zeek

A home SIEM does not need to be massive, but it should be coherent. The most stable pattern I have found is Wazuh for host telemetry, Zeek for network metadata, and OpenSearch for storage and search. This mirrors a real security stack while staying within the resource constraints of a small lab.

The goal is to build a pipeline that collects and enriches data, stores it with consistent timestamps, and makes it easy to pivot from a host alert to a network flow. That is the core of incident response, even in a small lab.

Context

Problem: Disconnected lab telemetry makes investigations slow and unreliable. Approach: Use Wazuh for host data, Zeek for network metadata, and OpenSearch for storage. Outcome: Fast pivots from host alerts to network flows.

Architecture overview

The basic layout uses three roles:

  • Wazuh manager + indexer (OpenSearch) on a single VM for small labs
  • Wazuh agents on Windows and Linux hosts
  • Zeek sensor pushing JSON logs into OpenSearch

If you have limited CPU, merge the Wazuh manager and OpenSearch into one VM and keep Zeek on a separate sensor. The sensor should be close to the network mirror to avoid drops.

Minimal docker-compose

For a lab, docker-compose keeps the stack reproducible. The example below is trimmed to the essentials and will run on a 16 GB RAM box.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
version: "3.7"
services:
  opensearch:
    image: opensearchproject/opensearch:2.11.1
    environment:
      - discovery.type=single-node
      - OPENSEARCH_JAVA_OPTS=-Xms2g -Xmx2g
    ulimits:
      memlock:
        soft: -1
        hard: -1
    ports:
      - "9200:9200"

  wazuh-manager:
    image: wazuh/wazuh-manager:4.7.3
    ports:
      - "1514:1514/udp"
      - "1515:1515"
      - "55000:55000"
    depends_on:
      - opensearch

Keep OpenSearch heap low and increase later if you see performance issues. In a lab, fast queries are less important than keeping the system stable.

Wazuh agent deployment

Install the Wazuh agent on each endpoint and enroll it with the manager. Use the native packages and set a clear agent name that includes the host role.

1
2
3
curl -sO https://packages.wazuh.com/4.x/apt/pool/main/w/wazuh-agent/wazuh-agent_4.7.3-1_amd64.deb
sudo WAZUH_MANAGER="192.168.56.20" WAZUH_AGENT_NAME="win10-lab" dpkg -i ./wazuh-agent_4.7.3-1_amd64.deb
sudo systemctl enable --now wazuh-agent

Use Wazuh for file integrity monitoring, command audit, and basic detection rules. In a lab, you can also enable log collection from Sysmon and forward it as JSON.

Ingesting Zeek logs into OpenSearch

Zeek emits JSON logs that are perfect for indexing. You can ship them with Filebeat or a lightweight Logstash pipeline. The key is to normalize timestamps and set a consistent index naming scheme.

A simple Filebeat input for Zeek looks like this:

1
2
3
4
5
6
7
8
9
10
11
filebeat.inputs:
  - type: filestream
    paths:
      - /opt/zeek/logs/current/*.log
    parsers:
      - ndjson:
          keys_under_root: true

output.elasticsearch:
  hosts: ["http://opensearch:9200"]
  index: "zeek-%{+yyyy.MM.dd}"

If you do not want a full shipper, you can use a small Python script to push batches into OpenSearch using the bulk API. This is often enough for a lab where volume is modest.

Field normalization and enrichment

You get the most value when host and network data line up. Normalize timestamps to UTC and use consistent hostnames. Add a small enrichment step for GeoIP and ASN on network fields.

If you use OpenSearch ingest pipelines, add a pipeline that maps id.orig_h and id.resp_h into src_ip and dest_ip and attaches GeoIP data. This allows you to reuse dashboards and queries across multiple data sources.

Index mappings and ECS alignment

If you want better cross-source queries, align your fields to a common schema. The Elastic Common Schema (ECS) is a de facto standard and works well even if you are not on Elasticsearch. At minimum, map IPs to source.ip and destination.ip, users to user.name, and process fields to process.name.

This small investment pays off when you write detections. You can search for source.ip and have it match both Zeek and Wazuh data. In a lab, you can do the mapping in an ingest pipeline with a few simple rename processors.

Alerting and detection rules

OpenSearch provides alerting that can trigger on queries. Create a few foundational rules such as:

  • Multiple failed logins across hosts in 5 minutes
  • A host contacting a new external domain for the first time
  • A process on a Windows host making an outbound connection to a non-standard port

Start with low volume alerts and tune them. In a lab, you can validate these rules by generating the traffic and watching the alert fire. This builds confidence that your pipeline is usable in real investigations.

Resource sizing and retention

Log pipelines fail when they run out of disk or heap. Monitor OpenSearch heap usage and disk utilization. For a small lab, 2 to 4 GB heap is enough, but you should still set a retention policy to avoid surprise disk exhaustion.

A simple approach is daily index rollover with a 14 or 30 day retention. This keeps indices small and queries fast. If you need longer retention, archive older indices to a slower disk or export them to object storage.

Building useful dashboards

A home SIEM should answer a few core questions quickly:

  • Which hosts are generating alerts right now?
  • Which internal IPs are talking to unknown external domains?
  • What processes are creating outbound connections?

Build a dashboard that shows Wazuh alerts by severity, Zeek connection volume by host, and DNS queries by base domain. Once these are in place, you can build detection rules that are actually testable.

Operational guidance

Retention matters. Keep at least two weeks of data so you can compare behavior over time. For a small VM, you can roll indices daily and delete after 14 days.

Also watch disk I/O. Zeek and Wazuh both write many small events. Put OpenSearch data on SSD if possible, or separate the data path to a faster disk.

Lab validation

Trigger events to confirm the pipeline end to end. Run a port scan and verify that Zeek logs appear in OpenSearch. Run a failed login on a Windows host and make sure Wazuh produces an alert. Then correlate the two in a search query.

1
2
3
4
5
6
7
8
9
10
{
  "query": {
    "bool": {
      "must": [
        {"match": {"agent.name": "win10-lab"}},
        {"range": {"@timestamp": {"gte": "now-15m"}}}
      ]
    }
  }
}

Takeaways

A home SIEM is not about flashy dashboards. It is about consistent telemetry, predictable pipelines, and the ability to answer investigative questions quickly. Wazuh plus OpenSearch plus Zeek provides that foundation. Once it works, you can add more data sources, build detections, and run tabletop exercises that feel real.

This post is licensed under CC BY 4.0 by the author.