Architecture

How a SIEM is actually built.

SIEM stands for Security Information and Event Management. Strip away the marketing and it is a pipeline: get data in, make it searchable, look for patterns, and surface the few events worth acting on. This guide walks that pipeline stage by stage, vendor-neutral, with Splunk used as a concrete example where it helps.

Stage 1

Collection: getting data in

Sources

  • Operating system and authentication logs.
  • Network gear: firewalls, proxies, DNS, VPN.
  • Endpoints, cloud platforms, and applications.

How it arrives

  • Agents or forwarders push data from each host.
  • Syslog and API pulls cover devices and cloud services.
  • In Splunk terms, this is the forwarder and input layer.

Stage 2

Parsing and normalisation

Raw logs are messy. One device writes a timestamp one way, another writes it differently, and the same idea — a username, a source address, an action — hides under different labels everywhere. Normalisation maps all of those into a common set of fields so a single search can ask "show me every failed login" without knowing each vendor's format.

This is where data models and field extractions matter. A SIEM that normalises well lets you write one detection that works across many sources. A SIEM with poor normalisation forces you to write the same logic ten times, once per device, which is how detection coverage quietly falls apart.

Stage 3

Indexing and storage

Why indexing matters

  • Indexing is what makes years of logs searchable in seconds.
  • Hot, warm, and cold tiers balance speed against cost.
  • Retention policy decides how far back an investigation can reach.

The trade-offs

  • More data means more cost and more to search through.
  • Too little retention means incidents go unprovable.
  • Good SIEM design is mostly deciding what to keep and for how long.

Stage 4

Correlation and detection

Single events rarely tell the whole story. Correlation is the act of joining events across time and sources into something meaningful: a failed-login burst followed by a success followed by access to a sensitive share is far more interesting than any one of those alone. Detections encode these patterns as rules that run continuously.

A simple Splunk-style detection reads almost like a sentence:

index=auth action=failure
| stats count by user, src_ip
| where count > 10

The skill is not the syntax; it is knowing which patterns indicate trouble and writing them so they fire on real attacks without drowning analysts in false positives.

Stage 5

Alerting, dashboards, and response

Output that humans use

  • Alerts: ranked events that need a decision.
  • Dashboards: the at-a-glance health of the environment.
  • Reports: trends for managers and audits.

Closing the loop

  • Response actions may be manual or automated (see SOAR).
  • Every outcome should feed back into better detections.
  • A SIEM that never gets tuned slowly becomes background noise.

Keep going

Related guides