Bringing NetBox to Elasticsearch: Turning your Source-of-truth into Search-at-Scale

Steven vd Braak
17 jan.
4 minuten om te lezen

When teams talk about “operational visibility,” they usually think about logs, SIEM data, metrics, or alerts. But there’s another dataset quietly powering everything beneath the surface: your infrastructure source-of-truth.

For many organizations, that’s NetBox — the authoritative registry for devices, racks, circuits, tenants, VLANs, VMs, and topology. But NetBox is not a search engine, not an analytics platform, and not designed for correlation across massive environments.

Recently, we built a complete NetBox → Elasticsearch integration, turning your static source-of-truth into a fast, cross-domain search and analytics layer. This post walks through the real architecture we deployed, the decisions involved, and the tooling that makes it all work.

Why Bring NetBox into Elasticsearch?

Once NetBox data becomes searchable in Elasticsearch, entirely new workflows appear:

Pivot from an EDR alert → asset → device → rack → site → tenant.
Correlate network events with topology and IPAM metadata.
Analyze utilization: racks, prefixes, circuits, wireless assets.
Detect infrastructure drift and outdated configuration.
Join operational logs with your infrastructure model.

NetBox contains authoritative truth.Elasticsearch gives you speed and correlation.

Together, they’re far more powerful than either system alone.

Architecture Overview

Below is the high-level architecture :

 +-------------------+       +------------------------+
 |     NetBox API     | ---> |  query_netbox.py       |
 |  (dcim/ipam/etc.)  |      |  auto-discovers models |
 +-------------------+       +-----------+------------+
                                           |
                                           v
                               +--------------------------+
                               | Elasticsearch Ingest     |
                               | - ILM                    |
                               | - Rollover indices       |
                               | - Ingest pipelines       |
                               +--------------------------+

Everything flows through a single, schema-aware loader that discovers endpoints, paginates through the entire dataset, normalizes documents, and indexes them into rollover-managed indices inside Elasticsearch.

The Loader: Auto-Discovering and Indexing NetBox Data

The heart of the integration is the loader: query_netbox.py, it:

Discovers real model endpoints under categories eg: /api/dcim/ ⇒ devices, racks, sites, interfaces
Paginates through all list endpoints using ?limit=1000 + next pagination.
Indexes each endpoint into its own index:
netbox-<category>-<endpoint>
Example:netbox-dcim-devices, netbox-ipam-prefixes, netbox-plugins-netbox_topology_views-topology
Wraps each record with a metadata envelope describing its origin.

ASCII Diagram: Loader Workflow

         For each category:
         --------------------
         dcim
         ipam
         virtualization
         extras
         plugins
         ...

                |
                v

      [ Discover endpoints ]
                |
                v
      [ Paginate through all pages ]
                |
                v
      [ Normalize JSON structure ]
                |
                v
      [ Bulk index into Elasticsearch ]

Configuring the Loader

All configuration lives in config-netbox.yaml :

elastic:
  host: https://elastic-prod.cluster.local:9200
  api_key: "xxxxxxxx=="

netbox:
  base_url: https://netbox.cluster.local
  token: "xxxxx"
  verify_certs: false # optional
  page_size: 1000

  endpoints:
    dcim: "https://netbox.cluster.local/api/dcim/"
    ipam: "https://netbox.cluster.local/api/ipam/"
    virtualization: "https://netbox.cluster.local/api/virtualization/"
    <name all endpoints>...

Running a full ingest

python3 query_netbox.py -c config-netbox.yaml --all

Running a category-specific ingest

python3 query_netbox.py -c config-netbox.yaml --categories dcim,ipam

Dry-run (no Elasticsearch)

python3 query_netbox.py -c config-netbox.yaml --endpoints dcim.devices \
  --dry-run --out devices.ndjson --out-format ndjson

Rollover Index Strategy (ILM-Friendly)

Each category gets its own rollover index following:

netbox-<category>-000001
netbox-<category>-write   (alias)

The bootstrap script (bootstrap_netbox_rollover.sh) creates these initial indices for all categories:

core
dcim
ipam
virtualization
tenancy
circuits
wireless
extras
users
plugins
schema
status
vpn

➞ These are all automatically created with write aliases(“netbox-*-write”) using the script below.

Example output:

CREATE netbox-core-000001 with alias netbox-core-write
CREATE netbox-dcim-000001 with alias netbox-dcim-write
CREATE netbox-ipam-000001 with alias netbox-ipam-write
...

ASCII Diagram: Rollover Structure

 netbox-dcim-write ---> netbox-dcim-000001
                        |
                        +--(will roll over to)--> netbox-dcim000002

Bootstrap Script

#!/usr/bin/env bash

CATEGORIES="core dcim ipam virtualization tenancy circuits wireless extras users plugins schema status vpn"

for c in $CATEGORIES; do
  IDX="netbox-$c-000001"
  ALIAS="netbox-$c-write"

  echo "CREATE $IDX with alias $ALIAS"
  curl -X PUT "$HOST/$IDX" -d '{
    "aliases": { "'$ALIAS'": { "is_write_index": true } }
  }'
done

Index Templates from Schema

The template generator reads netbox-schema.json, merges every model per category, and emits:

Component templates
Index templates
Optional ILM policy attachments

Output:13 component templates + 13 index templates were generated.

Example invocation:

python3 generate_netbox_es_templates_by_category.py \
  --schema netbox-schema.json \
  --outdir templates \
  --index-prefix netbox- \
  --emit-ilm --ilm-policy netbox-default

Ingest Pipeline: Normalizing NetBox Documents

NetBox timestamps and nested fields vary per model. The ingest pipeline solves this by:

Setting @timestamp from record.last_updated or record.created.
Preserving originals as netbox.last_updated / netbox.created.
Extracting tags[].name → tags_keyword.
Flattening label/value fields (e.g. status).

ASCII Diagram: Pipeline Logic

       Incoming Document
                |
                v
   [ Extract timestamps ] ---> @timestamp
                |
                v
   [ Flatten nested objects ]
                |
                v
   [ Extract tags[].name ]
                |
                v
   [ Output normalized document ]

Pipeline Code Snippet

PUT _ingest/pipeline/netbox-default
{
  "processors": [
    {
      "script": {
        "ignore_failure": true,
        "source": """
          def ts = ctx.record.last_updated ?: ctx.record.created;
          if (ts != null) { ctx['@timestamp'] = ts; }
        """
      }
    },
    {
      "script": {
        "ignore_failure": true,
        "source": """
          if (ctx.record.tags != null) {
            ctx.tags_keyword = [];
            for (t in ctx.record.tags) { ctx.tags_keyword.add(t.name); }
          }
        """
      }
    }
  ]
}

What This Integration Unlocks

Once NetBox data lives inside Elasticsearch, entirely new capabilities appear:

Security & IR

Enrich EDR alerts with devices, racks, sites, tenants.
Correlate lateral movement with topology.

Networking

Search prefixes, circuits, wireless assets globally.
Build dashboards on utilization, rack density, peering, VRFs.

SRE / Operations

Detect drift between config and NetBox truth.
Join system logs with VM placement, cluster metadata, assignment groups.

Architecture & Planning

Visualize infrastructure across sites or tenants.
Track lifecycle of devices and virtual assets over time.

NetBox becomes a real-time searchable dataset, not just a documentation tool.

Lessons Learned

1. Every NetBox category has inconsistent schemas

Some models offer timestamps, some don’t. Some embed labels. Tags differ.The pipeline solves this consistently.

2. Pagination matters

NetBox uses ?limit=1000 and DRF-style next paging — miss this and you miss data.

3. Per-endpoint indices scale better

netbox-dcim-devices separate from netbox-dcim-interfaces avoids bloated mappings.

4. ILM is essential

Some categories (dcim, ipam) grow extremely fast.

Where This Goes Next

This integration turns NetBox into a first-class analytics source — enabling new workflows across security, network engineering, and operations.

If you want help designing:

A production-grade NetBox → Elasticsearch pipeline
A schema-driven index strategy
An ingest normalization pipeline
A dashboard suite / search layer
Correlation workflows with EDR/SIEM data

We’re happy to assist you.

Elastic Security Consultancy