Post

Generate Fake Logs for Testing Log Aggregation Platform: The Ultimate Guide

Complete guide to generating realistic fake logs for testing log aggregation systems like Loki, Elastic Stack (ELK), and Splunk. Includes tools, examples, and step-by-step implementation with Docker and Kubernetes

Generate Fake Logs for Testing Log Aggregation Platform: The Ultimate Guide

Testing log aggregation platforms like Loki, Elastic Stack, and Splunk requires realistic fake log data that mimics production environments. This comprehensive log generation guide covers the best tools and techniques for generating fake logs, complete with Docker and Kubernetes deployment examples.

Quick Start: Jump to fuzzy-train setup for immediate log generation or explore our Docker examples for containerized testing.

Log Generation Architecture Overview

graph TB
    subgraph "Log Generation Tools"
        A[fuzzy-train<br/>Multi-format]
        B[flog<br/>Apache-focused]
        C[Custom Scripts<br/>Pattern-specific]
    end

    subgraph "Output Methods"
        D[File Output<br/>Persistent storage]
        E[stdout Output<br/>Real-time streaming]
    end

    subgraph "Log Shipping Agents"
        F[Fluent-bit]
        G[Vector.dev]
        H[Grafana Alloy]
    end

    subgraph "Aggregation Platforms"
        I[Grafana Loki]
        J[Elastic Stack]
        K[Splunk]
    end

    subgraph "Monitoring & Analysis"
        L[Performance Metrics]
        M[Alert Testing]
    end

    A --> D
    B --> D
    C --> D
    A --> E
    B --> E
    C --> E

    D --> F
    D --> G
    D --> H
    E --> F
    E --> G
    E --> H

    F --> I
    F --> J
    F --> K
    G --> I
    G --> J
    G --> K
    H --> I
    H --> J
    H --> K

    I --> L
    J --> L
    K --> L
    I --> M
    J --> M
    K --> M

    style A fill:#e1f5fe
    style B fill:#f3e5f5
    style C fill:#e8f5e8
    style D fill:#fff3e0
    style E fill:#fce4ec
    style F fill:#e0f2f1
    style G fill:#f1f8e9
    style H fill:#f9fbe7
    style I fill:#e3f2fd
    style J fill:#f3e5f5
    style K fill:#e8f5e8
    style L fill:#fff8e1
    style M fill:#f3e5f5

Table of Contents

Why Generate Fake Logs?

Fake log generation is essential for modern DevOps and testing workflows. Log generation tools enable you to:

  • Validate system performance under various load conditions without production data
  • Test log parsing rules and filtering logic safely
  • Develop monitoring dashboards with consistent, predictable data
  • Train teams on log analysis tools and techniques
  • Simulate error scenarios for alert testing
  • Load test infrastructure before production deployment

Log Generation Workflow

flowchart TD
    A[Define Testing Requirements] --> B[Choose Log Format]
    B --> C[Select Generation Tool]
    C --> D[Configure Parameters]
    D --> E[Generate Fake Logs]
    E --> F[Verify Log Output]
    F --> G[Configure Log Shipping]
    G --> H[Test Aggregation Platform]
    H --> I[Validate Performance]
    I --> J[Monitor & Optimize]

    subgraph "Tool Selection"
        C1[fuzzy-train<br/>Multi-format, Docker-ready]
        C2[flog<br/>Fast, Apache-focused]
        C3[Custom Scripts<br/>Specific patterns]
    end

    subgraph "Output Options"
        E1[File Output<br/>Persistent storage]
        E2[stdout Output<br/>Real-time streaming]
        E3[Both Outputs<br/>Flexible testing]
    end

    subgraph "Validation Steps"
        H1[Parse Accuracy<br/>Format compliance]
        H2[Performance Metrics<br/>Throughput & latency]
        H3[Resource Usage<br/>CPU, memory, disk]
    end

    C --> C1
    C --> C2
    C --> C3
    E --> E1
    E --> E2
    E --> E3
    H --> H1
    H --> H2
    H --> H3

    style A fill:#e3f2fd
    style B fill:#e8f5e8
    style C fill:#fff3e0
    style D fill:#fce4ec
    style E fill:#f1f8e9
    style F fill:#e0f2f1
    style G fill:#f9fbe7
    style H fill:#fff8e1
    style I fill:#f3e5f5
    style J fill:#e1f5fe

Use Cases for Log Generation

Testing Log Aggregation Systems

Log aggregation testing is crucial for validating your observability stack:

Validating Log Shipping Agents

Log shipping validation ensures your data pipeline works correctly:

Development & Operations Use Cases

DevOps log testing scenarios include:

  • Parser Development: Test regex patterns and log parsing rules
  • Alert System Testing: Generate specific patterns to trigger monitoring alerts
  • Dashboard Development: Create realistic data for visualization testing
  • Load Testing: Simulate disk I/O and system resource usage
  • Training & Demos: Provide realistic data for learning environments

Best Tools for Log Generation

Choosing the right log generation tool depends on your specific testing requirements. Here are the top-rated tools for fake log generation:

1. fuzzy-train - Versatile Log Generator

fuzzy-train is a versatile fake log generator designed for testing and development environments. This Docker-ready log generation tool runs anywhere and supports multiple output formats.

Features:

  • Multiple Formats: JSON, logfmt, Apache (common/combined/error), BSD syslog (RFC3164), Syslog (RFC5424)
  • Smart Tracking: trace_id with PID/Container ID or incremental integers for multi-instance tracking
  • Flexible Output: stdout, file, or both simultaneously
  • Smart File Handling: Auto-creates directories and default filename
  • Container-Aware: Uses container/pod identifiers in containerized environments
  • Field Control: Optional timestamp, log level, length, and trace_id fields

Python Script Usage:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# Clone repository
git clone https://github.com/sagarnikam123/fuzzy-train
cd fuzzy-train

# Default JSON logs (90-100 chars, 1 line/sec)
python3 fuzzy-train.py

# Apache common with custom parameters
python3 fuzzy-train.py \
    --min-log-length 100 \
    --max-log-length 200 \
    --lines-per-second 5 \
    --log-format "apache common" \
    --time-zone UTC \
    --output file

# Logfmt with simple trace IDs
python3 fuzzy-train.py \
    --log-format logfmt \
    --trace-id-type integer

# Clean logs (no metadata)
python3 fuzzy-train.py \
    --no-timestamp \
    --no-log-level \
    --no-length \
    --no-trace-id

# Output to both stdout and file
python3 fuzzy-train.py --output stdout --file fuzzy-train.log

Docker Usage:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# Quick start with defaults
docker pull sagarnikam123/fuzzy-train:latest
docker run --rm sagarnikam123/fuzzy-train:latest

# Run in background
docker run -d --name fuzzy-train-log-generator sagarnikam123/fuzzy-train:latest \
    --lines-per-second 2 --log-format JSON

# Apache combined logs with volume mount
docker run --rm -v "$(pwd)":/logs sagarnikam123/fuzzy-train:latest \
    --min-log-length 180 \
    --max-log-length 200 \
    --lines-per-second 2 \
    --time-zone UTC \
    --log-format logfmt \
    --output file \
    --file /logs/fuzzy-train.log

# High-volume syslog for load testing
docker run --rm sagarnikam123/fuzzy-train:latest \
    --lines-per-second 10 \
    --log-format syslog \
    --time-zone UTC \
    --output file

# Cleanup running containers
docker stop $(docker ps -q --filter ancestor=sagarnikam123/fuzzy-train:latest)
docker rm $(docker ps -aq --filter ancestor=sagarnikam123/fuzzy-train:latest)

Kubernetes Deployment:

1
2
3
4
5
6
7
8
9
10
# Download YAML files
wget https://raw.githubusercontent.com/sagarnikam123/fuzzy-train/refs/heads/main/fuzzy-train-file.yaml
wget https://raw.githubusercontent.com/sagarnikam123/fuzzy-train/refs/heads/main/fuzzy-train-stdout.yaml

# Deploy to Kubernetes cluster
kubectl apply -f fuzzy-train-file.yaml
kubectl apply -f fuzzy-train-stdout.yaml

# Check running pods
kubectl get pods -l app=fuzzy-train

2. flog - Fast Log Generator

flog is a high-performance fake log generator optimized for common log formats including Apache, syslog, and JSON. This Go-based log generation tool is ideal for testing log streams and data pipelines with minimal resource overhead.

Supported Formats: apache_common (default), apache_combined, apache_error, rfc3164 (syslog), rfc5424 (syslog), json Output Types: stdout (default), log (file), gz (gzip compressed)

Tool Comparison Matrix

graph LR
    subgraph "fuzzy-train"
        A1[Multi-format Support<br/>JSON, Apache, Syslog, Logfmt]
        A2[Docker & K8s Ready<br/>Container-native]
        A3[Smart Tracking<br/>trace_id, PID, Container ID]
        A4[Flexible Output<br/>File, stdout, or both]
        A5[Field Control<br/>Optional metadata]
    end

    subgraph "flog"
        B1[Apache Focused<br/>Web server logs]
        B2[High Performance<br/>Go-based, fast]
        B3[Compression Support<br/>gzip output]
        B4[Size-based Generation<br/>MB/GB targets]
        B5[Continuous Mode<br/>Loop generation]
    end

    subgraph "Use Case Decision"
        C1[<b>fuzzy-train</b> :<br/>• Multi-format needed<br/>• Container deployment<br/>• Custom trace tracking<br/>• Flexible output options]

        C2[<b>flog</b> :<br/>• Apache logs only<br/>• High performance needed<br/>• Compression required<br/>• Size-based generation]
    end

    style A1 fill:#e1f5fe
    style A2 fill:#e1f5fe
    style A3 fill:#e1f5fe
    style A4 fill:#e1f5fe
    style A5 fill:#e1f5fe
    style B1 fill:#f3e5f5
    style B2 fill:#f3e5f5
    style B3 fill:#f3e5f5
    style B4 fill:#f3e5f5
    style B5 fill:#f3e5f5
    style C1 fill:#e8f5e8
    style C2 fill:#fff3e0

Installation Options:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# Using go install (recommended)
go install github.com/mingrammer/flog

# Using Homebrew
brew tap mingrammer/flog
brew install flog

# Using pre-built binary
# macOS
curl -O -L "https://github.com/mingrammer/flog/releases/download/v0.4.4/flog_0.4.4_darwin_amd64.tar.gz"
tar -xvzf flog_0.4.4_darwin_amd64.tar.gz
cd flog_0.4.4_darwin_amd64

# Linux
curl -O -L "https://github.com/mingrammer/flog/releases/download/v0.4.4/flog_0.4.4_linux_amd64.tar.gz"
tar -xvzf flog_0.4.4_linux_amd64.tar.gz
cd  flog_0.4.4_linux_amd64

chmod +x ./flog
sudo mv ./flog /usr/local/bin/

Command Line Usage:

1
2
3
4
5
6
7
8
9
10
11
# Generate (-n) 1000 logs to stdout (default)
flog

# Generate logs with (-s) time interval and (-d) delay
flog -s 10s -n 200 -d 3s

# Apache combined (-f) format with (-w) overwrite to (-o) output file
flog -t log -f apache_combined -w -o apache.log

# Continuous generation with (--loop) mode
flog -f rfc3164 -l

Advanced Options:

1
2
3
4
5
6
7
8
# Generate logs by size instead of line count
flog -b 10485760 -f json -o large.log

# Split logs every 1MB with gzip compression
flog -t gz -o log.gz -b 10485760 -p 1048576

# Generate logs with path structure
flog -t log -f apache_combined -o web/log/apache.log -n 5000

Docker Usage:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# Basic Docker run (interactive)
docker run -it --rm mingrammer/flog

# Generate logs to stdout with custom parameters
docker run --rm mingrammer/flog -f apache_combined -n 500

# Generate logs to file with volume mount
docker run --rm -v "$(pwd)":/logs mingrammer/flog -t log -o /logs/apache.log -n 1000

# Continuous log generation in background
docker run -d --name flog-generator mingrammer/flog -f json -l

# High-volume generation with gzip compression
docker run --rm -v "$(pwd)":/logs mingrammer/flog -t gz -o /logs/large.log.gz -b 50MB

# Cleanup running containers
docker stop $(docker ps -q --filter ancestor=mingrammer/flog)
docker rm $(docker ps -aq --filter ancestor=mingrammer/flog)

Step-by-Step Implementation

Follow this log generation implementation guide to set up fake log generation for your testing environment:

Step 1: Choose Your Log Format

Selecting the right log format for testing is crucial for realistic log aggregation testing:

  • Apache Common Log Format: Web server testing
  • JSON: Modern microservices
  • Syslog: System-level testing
  • Logfmt: Structured key-value logs

Step 2: Set Up Log Generation with fuzzy-train

Implement Docker-based log generation using fuzzy-train for scalable testing:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# Generate JSON logs to file
docker run -d --name fuzzy-train-generator \
  -v /tmp/logs:/logs \
  sagarnikam123/fuzzy-train:latest \
  --log-format JSON \
  --lines-per-second 5 \
  --output file \
  --file /logs/fuzzy-train.log

# Generate Apache combined logs
docker run -d --name apache-log-generator \
  -v /tmp/logs:/logs \
  sagarnikam123/fuzzy-train:latest \
  --log-format "apache combined" \
  --lines-per-second 10 \
  --output file \
  --file /logs/apache.log

Using Python Script

1
2
3
4
5
6
7
8
9
10
# Clone and setup fuzzy-train
git clone https://github.com/sagarnikam123/fuzzy-train
cd fuzzy-train

# Generate logs to file
python3 fuzzy-train.py \
  --log-format JSON \
  --lines-per-second 5 \
  --output file \
  --file $HOME/data/log/logger/fuzzy-train.log

Step 3: Verify Log Generation

1
2
3
4
5
# Check generated logs
tail -f /tmp/logs/fuzzy-train.log

# Monitor log generation rate
watch "wc -l /tmp/logs/fuzzy-train.log" # brew install watch

Log Shipping Configuration

Configure log shipping agents to collect and forward your generated fake logs to aggregation platforms:

Fluent-bit Configuration

1
fluent-bit --config=fluent-bit-local-fs-json-loki.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# fluent-bit-local-fs-json-loki.yaml

service:
  log_level: info    # warn
  http_server: on
  http_listen: 0.0.0.0
  http_port: 2020

parsers:    # adjust according to your log format
  - name: json
    format: json
    time_key: timestamp
    time_format: "%Y-%m-%dT%H:%M:%S.%LZ"
    time_keep: on

pipeline:
  inputs:
    - name: tail
      path: /tmp/logs/*.log    # adjust as per your .log file location
      read_from_head: false
      refresh_interval: 10
      ignore_older: 1h
      tag: local.*
      parser: json

  outputs:
    - name: loki
      match: '*'
      host: 127.0.0.1    # change to your Loki instance host or IP
      port: 3100    # change to your Loki instance port
      labels: service_name=fluent-bit, source=fuzzy-train-log

Vector.dev Configuration

1
2
vector validate config/vector-local-fs-json-loki.yaml
vector --config=config/vector-local-fs-json-loki.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
# vector-local-fs-json-loki.yaml

data_dir: $HOME/data/vector

sources:
  fuzzy_logs:
    type: file
    include:
      - /tmp/logs/*.log    # adjust as per your .log file location
    read_from: beginning
    encoding:
      charset: utf-8

transforms:
  parse_logs:
    type: remap
    inputs:
      - fuzzy_logs
    source: |
      . = parse_json!(.message)

sinks:
  loki_sink:
    type: loki
    inputs:
      - parse_logs
    endpoint: http://127.0.0.1:3100    # change to your Loki instance endpoint
    encoding:
      codec: json
    healthcheck:
      enabled: true
    labels:
      service_name: fuzzy-train
      source: fuzzy-train-log

api:    # optional
  enabled: true
  address: 127.0.0.1:8686
  playground: true

Grafana Alloy Configuration

1
2
alloy run config/alloy-local-fs-json-loki.alloy
# Visit UI - http://127.0.0.1:12345/
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# alloy-local-fs-json-loki.alloy

livedebugging {
  enabled = true
}

local.file_match "local_files" {
    path_targets = [{"__path__" = "/tmp/logs/*.log", "job" = "alloy", "hostname" = constants.hostname}]
    sync_period  = "5s"
}

loki.source.file "log_scrape" {
    targets    = local.file_match.local_files.targets
    forward_to = [loki.write.local_loki.receiver]
    tail_from_end = true
}

loki.write "local_loki" {
  endpoint {
    url = "http://127.0.0.1:3100/loki/api/v1/push"
  }
}

Advanced Log Generation Techniques

Implement sophisticated log generation patterns for comprehensive testing scenarios:

Volume Calculation Guide

Understanding log generation volume calculations helps optimize your testing infrastructure:

Log Encoding Information:

  • Character Encoding: UTF-8
  • Basic ASCII characters: 1 byte each (letters, numbers, punctuation)
  • Newlines: 1 byte each (\n)
  • JSON overhead: ~50-80 bytes per log (timestamp, level, trace_id, etc.)

Formula:

1
Data Rate (MB/sec) = (Average Log Size in bytes) × (Lines per second) × (Number of instances) ÷ 1,048,576

Note: 1,048,576 is the number of bytes in 1 MB (1024 × 1024 bytes)

Example: Generate 10 MB/sec

To generate 10 MB/sec of log data, you need to calculate the required lines per second based on your average log size:

  • Option 1: Single high-volume instance - Average log size of ~200 bytes (150 characters + JSON overhead + newline). Required rate: 10 MB/sec ÷ 200 bytes = ~52,428 lines/sec. Use parameters: –lines-per-second 52428 –min-log-length 130 –max-log-length 170

  • Option 2: Multiple moderate instances (recommended) - 10 instances × 5,243 lines/sec each = 52,430 total lines/sec. Per instance parameters: –lines-per-second 5243 –min-log-length 130 –max-log-length 170

  • Option 3: Many small instances - 50 instances × 1,049 lines/sec each = 52,450 total lines/sec. Per instance parameters: –lines-per-second 1049 –min-log-length 130 –max-log-length 170

Volume Calculation Visualization

graph TD
    subgraph "Volume Formula"
        F[Data Rate MB/sec = <br/>Log Size × Lines/sec × Instances ÷ 1,048,576]
    end

    subgraph "10 MB/sec Examples"
        E1[Single Instance<br/>52,428 lines/sec]
        E2[10 Instances<br/>5,243 lines/sec each]
        E3[50 Instances<br/>1,049 lines/sec each]
    end

    subgraph "Common Targets"
        V1[1 MB/sec<br/>~7,000 lines/sec]
        V2[5 MB/sec<br/>~21,000 lines/sec]
        V3[100 MB/sec<br/>~300,000 lines/sec]
    end

    subgraph "Resource Impact"
        R1[CPU: Higher rate = More usage]
        R2[Memory: Larger logs = More memory]
        R3[Disk: File output = I/O pressure]
    end

    F --> E1
    F --> E2
    F --> E3
    F --> V1
    F --> V2
    F --> V3
    E1 --> R1
    E2 --> R2
    E3 --> R3

    style F fill:#e3f2fd
    style E1 fill:#e8f5e8
    style E2 fill:#fff3e0
    style E3 fill:#fce4ec
    style V1 fill:#f1f8e9
    style V2 fill:#e0f2f1
    style V3 fill:#f9fbe7
    style R1 fill:#ffebee
    style R2 fill:#fff3e0
    style R3 fill:#e8f5e8

High-Volume Log Generation

Implement high-volume log generation for performance testing and load testing scenarios:

Docker High-Volume Generation:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# Clean up any existing log files
rm -f /tmp/logs/*
mkdir -p /tmp/logs

# Generate 100 logs per second for load testing
# Volume generated: 100 lines/sec × ~150 bytes = ~14.3 KB/sec (0.014 MB/sec)
docker run -d --name high-volume-generator \
  -v /tmp/logs:/logs \
  sagarnikam123/fuzzy-train:latest \
  --lines-per-second 100 \
  --log-format JSON \
  --output file \
  --file /logs/high-volume.log

# Multiple containers for extreme load
for i in {1..5}; do
  docker run -d --name volume-gen-$i \
    -v /tmp/logs:/logs \
    sagarnikam123/fuzzy-train:latest \
    --lines-per-second 50 \
    --log-format JSON \
    --output file \
    --file /logs/volume-$i.log
done

# Cleanup when done testing - takes time - wait & have patience!
for i in {1..5}; do
  docker stop volume-gen-$i
  docker rm volume-gen-$i
done

📊 Volume Calculation Breakdown

ContainerLines/secBytes/lineVolume/secMB/sec
volume-gen-150~1507.1 KB0.007
volume-gen-250~1507.1 KB0.007
volume-gen-350~1507.1 KB0.007
volume-gen-450~1507.1 KB0.007
volume-gen-550~1507.1 KB0.007
TOTAL250~150🔥 35.7 KB0.035

💡 Total Volume Generated: 5 containers × 7.1 KB/sec = 35.7 KB/sec (0.035 MB/sec)

Error Pattern Simulation

Simulate realistic error patterns and log bursts for comprehensive log aggregation testing:

Simulating Error Bursts:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# Clean up any existing log files
rm -f /tmp/logs/*
mkdir -p /tmp/logs

# Normal operation logs
# 📈 Normal Operations Volume
# Lines: 5/sec | Size: ~150 bytes/line | Volume: 0.7 KB/sec (0.0007 MB/sec)
docker run -d --name normal-ops \
  -v /tmp/logs:/logs \
  sagarnikam123/fuzzy-train:latest \
  --lines-per-second 5 \
  --log-format JSON \
  --output file \
  --file /logs/normal.log

# Simulate error burst (high frequency for 2 minutes)
# 🚨 ERROR BURST VOLUME CALCULATION
# Lines: 5,000/sec | Size: ~225 bytes/line | Volume: 1.07 MB/sec
# Duration: 2 minutes | Total data: ~128 MB
sleep 30
docker run --rm \
  -v /tmp/logs:/logs \
  sagarnikam123/fuzzy-train:latest \
  --lines-per-second 5000 \
  --min-log-length 150 \
  --max-log-length 200 \
  --log-format JSON \
  --output file \
  --file /logs/error-burst.log &

# Stop error burst after 2 minutes
sleep 120
docker stop $(docker ps -q --filter ancestor=sagarnikam123/fuzzy-train:latest)

# Remove stopped containers
docker rm $(docker ps -aq --filter ancestor=sagarnikam123/fuzzy-train:latest)

Multi-Service Log Simulation

Create multi-service log generation scenarios using Docker Compose for microservices testing:

Docker Compose Multi-Service Setup:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
# docker-compose.yml
# 🎯 MULTI-SERVICE TOTAL VOLUME: 918 KB/sec (0.92 MB/sec)

services:
  auth-service:
    # 🔐 AUTH SERVICE VOLUME
    # Lines: 2,000/sec | Size: ~200 bytes | Volume: 381 KB/sec (0.38 MB/sec)
    image: sagarnikam123/fuzzy-train:latest
    command: >
      --lines-per-second 2000
      --min-log-length 120
      --max-log-length 180
      --log-format JSON
      --output file
      --file /logs/auth-service.log
      --trace-id-type integer
    volumes:
      - ./logs:/logs
    container_name: auth-logs

  payment-service:
    # 💳 PAYMENT SERVICE VOLUME
    # Lines: 1,500/sec | Size: ~175 bytes | Volume: 251 KB/sec (0.25 MB/sec)
    image: sagarnikam123/fuzzy-train:latest
    command: >
      --lines-per-second 1500
      --min-log-length 100
      --max-log-length 150
      --log-format logfmt
      --output file
      --file /logs/payment-service.log
      --trace-id-type integer
    volumes:
      - ./logs:/logs
    container_name: payment-logs

  user-service:
    # 👤 USER SERVICE VOLUME
    # Lines: 1,000/sec | Size: ~300 bytes | Volume: 286 KB/sec (0.29 MB/sec)
    image: sagarnikam123/fuzzy-train:latest
    command: >
      --lines-per-second 1000
      --min-log-length 200
      --max-log-length 300
      --log-format "apache combined"
      --output file
      --file /logs/user-service.log
    volumes:
      - ./logs:/logs
    container_name: user-logs

Running the Docker Compose Setup:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# Clean up any existing log files
rm -f logs/* 2>/dev/null || true

# Create logs directory
mkdir -p logs

# Start all services in background
docker-compose up -d

# View running services
docker-compose ps

# Check generated log files
ls -la logs/

# Monitor log files (since containers write to files, not stdout)
tail -f logs/auth-service.log
tail -f logs/payment-service.log
tail -f logs/user-service.log

# Monitor all log files simultaneously
tail -f logs/*.log

# Stop all services
docker-compose down

📊 Volume Calculation Reference

Docker Compose Multi-Service Volume Breakdown

ServiceFormatLines/secBytes/lineVolume/secMB/hour
auth-serviceJSON2,000~200381 KB1,372 MB
payment-servicelogfmt1,500~175251 KB904 MB
user-serviceApache1,000~300286 KB1,030 MB
🔥 TOTALMixed4,500~225 avg🚀 918 KB3,306 MB

Volume Scaling Examples

ScenarioTotal Lines/secVolume/secVolume/hourVolume/day
Light Testing10020 KB72 MB1.7 GB
Medium Load1,000200 KB720 MB17.3 GB
Heavy Load4,500918 KB3.3 GB79.4 GB
Stress Test10,0002.0 MB7.2 GB172.8 GB

Storage Planning Guide

💾 Disk Space Requirements:

  • 1 hour testing: ~3.3 GB (heavy load)
  • 8 hour workday: ~26.4 GB (heavy load)
  • 24 hour continuous: ~79.4 GB (heavy load)
  • 1 week continuous: ~556 GB (heavy load)

⚠️ Important: Add 20-30% buffer for log rotation and compression overhead.

Multi-Service Log Generation Architecture

graph TB
    subgraph "Docker Compose Services"
        AUTH[auth-service<br/>2,000 logs/sec<br/>~381 KB/sec]
        PAY[payment-service<br/>1,500 logs/sec<br/>~251 KB/sec]
        USER[user-service<br/>1,000 logs/sec<br/>~286 KB/sec]
    end

    subgraph "Log Output"
        AUTH --> AUTH_LOG[auth-service.log<br/>JSON format]
        PAY --> PAY_LOG[payment-service.log<br/>Logfmt format]
        USER --> USER_LOG[user-service.log<br/>Apache combined]
    end

    subgraph "Volume Mounts"
        AUTH_LOG --> VOL[./logs volume<br/>Host directory]
        PAY_LOG --> VOL
        USER_LOG --> VOL
    end

    subgraph "Total Generation"
        TOTAL[Total Volume<br/>~918 KB/sec<br/>0.92 MB/sec]
        AUTH --> TOTAL
        PAY --> TOTAL
        USER --> TOTAL
    end

    subgraph "Log Shipping Options"
        VOL --> FLUENT[Fluent-bit<br/>Tail input]
        VOL --> VECTOR[Vector.dev<br/>File source]
        VOL --> ALLOY[Grafana Alloy<br/>Local file match]
    end

    subgraph "Aggregation Platforms"
        FLUENT --> LOKI[Grafana Loki]
        VECTOR --> LOKI
        ALLOY --> LOKI

        FLUENT --> ELK[Elastic Stack]
        VECTOR --> ELK
        ALLOY --> ELK
    end

    style AUTH fill:#e1f5fe
    style PAY fill:#f3e5f5
    style USER fill:#e8f5e8
    style AUTH_LOG fill:#fff3e0
    style PAY_LOG fill:#fce4ec
    style USER_LOG fill:#f1f8e9
    style VOL fill:#e0f2f1
    style TOTAL fill:#f9fbe7
    style FLUENT fill:#fff8e1
    style VECTOR fill:#e3f2fd
    style ALLOY fill:#f3e5f5
    style LOKI fill:#e8f5e8
    style ELK fill:#fff3e0

Kubernetes DaemonSet for Cluster-Wide Generation

Deploy cluster-wide log generation using Kubernetes DaemonSet for distributed testing:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
# fuzzy-train-stdout-daemonset.yaml
# 🌐 KUBERNETES CLUSTER VOLUME CALCULATION
# Per Node: 10 lines/sec × ~170 bytes = 1.6 KB/sec
# Total Cluster: 1.6 KB/sec × (Number of nodes)
# Example 4-node cluster: 4 × 1.6 KB/sec = 6.4 KB/sec

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fuzzy-train-stdout
spec:
  selector:
    matchLabels:
      app: fuzzy-train-stdout
  template:
    metadata:
      labels:
        app: fuzzy-train-stdout
    spec:
      containers:
      - name: fuzzy-train
        image: sagarnikam123/fuzzy-train:latest
        args:
          - "--lines-per-second"
          - "10"                    # Adjust for desired rate per node
          - "--min-log-length"
          - "90"                    # Adjust for desired log size
          - "--max-log-length"
          - "150"                   # Adjust for desired log size
          - "--time-zone"
          - "UTC"
          - "--log-format"
          - "JSON"
          - "--output"
          - "stdout"

Deploy and manage DaemonSet:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# Deploy DaemonSet
kubectl apply -f fuzzy-train-stdout-daemonset.yaml

# Check DaemonSet status
kubectl get daemonset fuzzy-train-stdout

# View pods on all nodes
kubectl get pods -l app=fuzzy-train-stdout -o wide

# View logs from all pods
kubectl logs -l app=fuzzy-train-stdout --tail=50

# View logs from specific pod
kubectl logs $(kubectl get pods -l app=fuzzy-train-stdout -o jsonpath='{.items[0].metadata.name}') -f

# Cleanup DaemonSet
kubectl delete -f fuzzy-train-stdout-daemonset.yaml

Kubernetes Cluster-Wide Log Generation

graph TB
    subgraph "Kubernetes Cluster"
        NODE1[Node 1<br/>fuzzy-train pod<br/>10 logs/sec]
        NODE2[Node 2<br/>fuzzy-train pod<br/>10 logs/sec]
        NODE3[Node 3<br/>fuzzy-train pod<br/>10 logs/sec]
        NODE4[Node 4<br/>fuzzy-train pod<br/>10 logs/sec]
    end

    subgraph "Log Generation per Node"
        NODE1 --> LOG1[stdout logs<br/>JSON format<br/>~1.6 KB/sec]
        NODE2 --> LOG2[stdout logs<br/>JSON format<br/>~1.6 KB/sec]
        NODE3 --> LOG3[stdout logs<br/>JSON format<br/>~1.6 KB/sec]
        NODE4 --> LOG4[stdout logs<br/>JSON format<br/>~1.6 KB/sec]
    end

    subgraph "Total Cluster Volume"
        TOTAL[Total Volume<br/>40 logs/sec<br/>~6.4 KB/sec]
        LOG1 --> TOTAL
        LOG2 --> TOTAL
        LOG3 --> TOTAL
        LOG4 --> TOTAL
    end

    subgraph "Log Collection Options"
        TOTAL --> PROMTAIL[Promtail<br/>DaemonSet]
        TOTAL --> FLUENT[Fluent-bit<br/>DaemonSet]
        TOTAL --> VECTOR[Vector.dev<br/>DaemonSet]
    end

    subgraph "Aggregation Platforms"
        PROMTAIL --> LOKI[Grafana Loki]
        FLUENT --> LOKI
        VECTOR --> LOKI

        PROMTAIL --> ELK[Elastic Stack]
        FLUENT --> ELK
        VECTOR --> ELK
    end

    subgraph "Scaling Options"
        SCALE1[High Volume: 50 logs/sec per node<br/>Total: 200 logs/sec ~32 KB/sec<br/>Daily: ~2.8 GB]
        SCALE2[Low Volume: 1 log/sec per node<br/>Total: 4 logs/sec ~0.64 KB/sec<br/>Daily: ~55 MB]
    end

    TOTAL --> SCALE1
    TOTAL --> SCALE2

    style NODE1 fill:#e1f5fe
    style NODE2 fill:#e1f5fe
    style NODE3 fill:#e1f5fe
    style NODE4 fill:#e1f5fe
    style LOG1 fill:#f3e5f5
    style LOG2 fill:#f3e5f5
    style LOG3 fill:#f3e5f5
    style LOG4 fill:#f3e5f5
    style TOTAL fill:#e8f5e8
    style PROMTAIL fill:#fff3e0
    style FLUENT fill:#fce4ec
    style VECTOR fill:#f1f8e9
    style LOKI fill:#e0f2f1
    style ELK fill:#f9fbe7
    style SCALE1 fill:#ffebee
    style SCALE2 fill:#fff8e1

Tweaking Volume and Rate:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# For high-volume testing - increase rate and log size
# Volume generated per node: 50 lines/sec × ~350 bytes (350 chars + overhead) = ~16.8 KB/sec
# Edit fuzzy-train-stdout-daemonset.yaml:
# --lines-per-second: "50"     # 50 logs/sec per node
# --min-log-length: "200"      # Larger logs
# --max-log-length: "500"      # Up to 500 chars

# For low-volume testing - decrease rate and log size
# Volume generated per node: 1 line/sec × ~125 bytes (75 chars + overhead) = ~0.12 KB/sec
# Edit fuzzy-train-stdout-daemonset.yaml:
# --lines-per-second: "1"       # 1 log/sec per node
# --min-log-length: "50"       # Smaller logs
# --max-log-length: "100"      # Up to 100 chars

# Apply changes
kubectl apply -f fuzzy-train-stdout-daemonset.yaml

Troubleshooting Log Generation

Resolve common log generation issues and optimize performance:

Common Log Generation Issues

Container Permission Issues:

1
2
3
# Fix volume permissions
sudo chown -R $USER:$USER /tmp/logs
chmod 755 /tmp/logs

High CPU Usage:

1
2
3
# Reduce log generation rate
docker run --cpus="0.5" --memory="256m" sagarnikam123/fuzzy-train:latest \
  --lines-per-second 10

Log Shipping Agent Not Reading Files:

1
2
3
4
# Check file permissions and paths
ls -la /tmp/logs/
# Verify agent configuration
tail -f /var/log/fluent-bit.log

Log Generation Performance Optimization

Optimize for High Volume:

  • Use SSD storage for log files
  • Increase file system buffer sizes
  • Monitor disk I/O and memory usage
  • Use log rotation to prevent disk space issues

Custom Error Pattern Script

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# simulate_error_patterns.py

#!/usr/bin/env python3
import subprocess
import time

def simulate_error_patterns():
    patterns = [
        {"rate": 2, "duration": 60, "format": "JSON"},    # Normal operation: 2 lines/sec × ~150 bytes = ~0.3 KB/sec
        {"rate": 20, "duration": 30, "format": "JSON"},   # Error spike: 20 lines/sec × ~150 bytes = ~2.9 KB/sec
        {"rate": 5, "duration": 45, "format": "JSON"},    # Recovery period: 5 lines/sec × ~150 bytes = ~0.7 KB/sec
        {"rate": 100, "duration": 15, "format": "syslog"} # Critical failure: 100 lines/sec × ~200 bytes = ~19.1 KB/sec
    ]

    for i, pattern in enumerate(patterns):
        print(f"Starting pattern {i+1}: {pattern['rate']} logs/sec for {pattern['duration']}s")

        process = subprocess.Popen([
            "python3", "fuzzy-train.py",
            "--lines-per-second", str(pattern["rate"]),
            "--log-format", pattern["format"],
            "--output", "file",
            "--file", f"/tmp/logs/pattern-{i+1}.log"
        ])

        time.sleep(pattern["duration"])
        process.terminate()
        time.sleep(2)

if __name__ == "__main__":
    simulate_error_patterns()

Run the error pattern simulation:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# Clean up any existing log files
rm -f /tmp/logs/*

# Create logs directory
mkdir -p /tmp/logs

# Make script executable
chmod +x simulate_error_patterns.py

# Run error pattern simulation
python3 simulate_error_patterns.py

# Monitor generated patterns
ls -la /tmp/logs/pattern-*.log
tail -f /tmp/logs/pattern-*.log

# Cleanup after simulation
pkill -f "simulate_error_patterns.py"

Troubleshooting Decision Tree

flowchart TD
    START[Log Generation Issue?] --> A{What's the problem?}

    A -->|Container won't start| B[Check Docker & Image]
    A -->|Permission denied| C[Fix File Permissions]
    A -->|High CPU usage| D[Reduce Generation Rate]
    A -->|Logs not appearing| E[Check Output Config]
    A -->|Agent not reading| F[Verify Log Shipping]

    B --> B1[Pull image & check logs]
    C --> C1[chown & chmod logs dir]
    D --> D1[Limit CPU & reduce rate]
    E --> E1[Verify file paths]
    F --> F1[Check agent config]

    B1 --> SUCCESS[Issue Resolved]
    C1 --> SUCCESS
    D1 --> SUCCESS
    E1 --> SUCCESS
    F1 --> SUCCESS

    style START fill:#e3f2fd
    style A fill:#e8f5e8
    style B fill:#fff3e0
    style C fill:#fce4ec
    style D fill:#f1f8e9
    style E fill:#e0f2f1
    style F fill:#f9fbe7
    style SUCCESS fill:#c8e6c9

Best Practices for Log Generation

Follow these log generation best practices for effective log aggregation testing:

  1. Start Small: Begin with low-volume fake log generation and gradually increase
  2. Monitor Resources: Watch disk space, CPU usage, and memory consumption during log generation
  3. Clean Up: Implement log rotation and cleanup procedures for generated logs
  4. Realistic Data: Use realistic timestamps and patterns for accurate log testing
  5. Version Control: Keep your log generation scripts and configurations in Git
  6. Test Incrementally: Validate each component before scaling up log generation volume
  7. Document Configurations: Maintain clear documentation of log generation parameters
  8. Security Considerations: Ensure fake logs don’t contain sensitive information
  9. Resource Planning: Calculate log generation volume requirements in advance
  10. Integration Testing: Test log shipping agents with various log formats

Conclusion

Generating fake logs is essential for testing log aggregation systems safely and effectively. This comprehensive log generation guide provides solutions for:

  • Development environments - Test parsing and filtering logic with fake log data
  • Performance testing - Validate system capacity using high-volume log generation
  • Training scenarios - Provide realistic generated logs for learning
  • Production preparation - Ensure systems handle expected log generation load

Key Takeaways for Log Generation

  • fuzzy-train and flog are the leading log generation tools
  • Docker-based log generation provides scalable testing solutions
  • Kubernetes DaemonSet enables cluster-wide log generation
  • Volume calculations help optimize log generation performance
  • Log shipping configuration is crucial for end-to-end testing

Next Steps for Implementing Log Generation

  1. Choose the appropriate log generation tool based on your requirements
  2. Start with basic fake log generation and gradually increase complexity
  3. Integrate generated logs with your log shipping agents
  4. Monitor system performance and adjust log generation rates
  5. Implement log rotation and cleanup procedures for fake logs

This log generation guide helps DevOps teams implement effective fake log generation for comprehensive log aggregation testing. For more DevOps and observability content, explore our logging category and DevOps guides.

This post is licensed under CC BY 4.0 by the author.