Architecture

How a SIEM is actually built.

SIEM stands for Security Information and Event Management. Strip away the marketing and it is a pipeline: get data in, make it searchable, look for patterns, and surface the few events worth acting on. This guide walks that pipeline stage by stage, vendor-neutral, with Splunk used as a concrete example where it helps.

Stage 1

Collection: getting data in

Sources

Operating system and authentication logs.
Network gear: firewalls, proxies, DNS, VPN.
Endpoints, cloud platforms, and applications.

How it arrives

Agents or forwarders push data from each host.
Syslog and API pulls cover devices and cloud services.
In Splunk terms, this is the forwarder and input layer.

Stage 2

Parsing and normalisation

Raw logs are messy. One device writes a timestamp one way, another writes it differently, and the same idea — a username, a source address, an action — hides under different labels everywhere. Normalisation maps all of those into a common set of fields so a single search can ask "show me every failed login" without knowing each vendor's format.

This is where data models and field extractions matter. A SIEM that normalises well lets you write one detection that works across many sources. A SIEM with poor normalisation forces you to write the same logic ten times, once per device, which is how detection coverage quietly falls apart.

Stage 3

Indexing and storage

Why indexing matters

Indexing is what makes years of logs searchable in seconds.
Hot, warm, and cold tiers balance speed against cost.
Retention policy decides how far back an investigation can reach.

The trade-offs

More data means more cost and more to search through.
Too little retention means incidents go unprovable.
Good SIEM design is mostly deciding what to keep and for how long.

Stage 4

Correlation and detection

Single events rarely tell the whole story. Correlation is the act of joining events across time and sources into something meaningful: a failed-login burst followed by a success followed by access to a sensitive share is far more interesting than any one of those alone. Detections encode these patterns as rules that run continuously.

A simple Splunk-style detection reads almost like a sentence:

index=auth action=failure
| stats count by user, src_ip
| where count > 10

The skill is not the syntax; it is knowing which patterns indicate trouble and writing them so they fire on real attacks without drowning analysts in false positives.

Stage 5

Alerting, dashboards, and response

Output that humans use

Alerts: ranked events that need a decision.
Dashboards: the at-a-glance health of the environment.
Reports: trends for managers and audits.

Closing the loop

Response actions may be manual or automated (see SOAR).
Every outcome should feed back into better detections.
A SIEM that never gets tuned slowly becomes background noise.

Keep going

Related guides

To see how one specific platform implements this pipeline, read the Splunk architecture guide. For how SIEM compares to the wider tool market, see the SIEM tools guide and the ArcSight vs Splunk comparison. For the automated response stage, see SOAR explained.