Step-by-Step Guide to Generating Fake Logs for Log Aggregation Systems | Sagar Nikam Notes
Post

Step-by-Step Guide to Generating Fake Logs for Log Aggregation Systems

Learn how to easily create realistic fake logs to test and improve your log aggregation system

Step-by-Step Guide to Generating Fake Logs for Log Aggregation Systems

Use Cases

Testing Log Aggregation Systems

Validating Log Shipping Agents

Development & Operations

  • Parser Development: Test regex patterns and log parsing rules
  • Alert System Testing: Generate specific patterns to trigger monitoring alerts
  • Dashboard Development: Create realistic data for visualization testing
  • Load Testing: Simulate disk I/O and system resource usage
  • Training & Demos: Provide realistic data for learning environments

Tools for Generating Fake Logs

1. fuzzy-train - Versatile Log Generator

A versatile fake log generator for testing and development - runs anywhere.

Features:

  • Multiple Formats: JSON, logfmt, Apache (common/combined/error), BSD syslog (RFC3164), Syslog (RFC5424)
  • Smart Tracking: trace_id with PID/Container ID or incremental integers for multi-instance tracking
  • Flexible Output: stdout, file, or both simultaneously
  • Smart File Handling: Auto-creates directories and default filename
  • Container-Aware: Uses container/pod identifiers in containerized environments
  • Field Control: Optional timestamp, log level, length, and trace_id fields

Python Script Usage:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# Clone repository
git clone https://github.com/sagarnikam123/fuzzy-train
cd fuzzy-train

# Default JSON logs (90-100 chars, 1 line/sec)
python3 fuzzy-train.py

# Apache common with custom parameters
python3 fuzzy-train.py \
    --min-log-length 100 \
    --max-log-length 200 \
    --lines-per-second 5 \
    --log-format "apache common" \
    --time-zone UTC \
    --output file

# Logfmt with simple trace IDs
python3 fuzzy-train.py \
    --log-format logfmt \
    --trace-id-type integer

# Clean logs (no metadata)
python3 fuzzy-train.py \
    --no-timestamp \
    --no-log-level \
    --no-length \
    --no-trace-id

# Output to both stdout and file
python3 fuzzy-train.py --output stdout --file fuzzy-train.log

Docker Usage:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# Quick start with defaults
docker pull sagarnikam123/fuzzy-train:latest
docker run --rm sagarnikam123/fuzzy-train:latest

# Run in background
docker run -d --name fuzzy-train-log-generator sagarnikam123/fuzzy-train:latest \
    --lines-per-second 2 --log-format JSON

# Apache combined logs with volume mount
docker run --rm -v "$(pwd)":/logs sagarnikam123/fuzzy-train:latest \
    --min-log-length 180 \
    --max-log-length 200 \
    --lines-per-second 2 \
    --time-zone UTC \
    --log-format logfmt \
    --output file \
    --file /logs/fuzzy-train.log

# High-volume syslog for load testing
docker run --rm sagarnikam123/fuzzy-train:latest \
    --lines-per-second 10 \
    --log-format syslog \
    --time-zone UTC \
    --output file

Kubernetes Deployment:

1
2
3
4
5
6
7
8
9
10
# Download YAML files
wget https://raw.githubusercontent.com/sagarnikam123/fuzzy-train/refs/heads/main/fuzzy-train-file.yaml
wget https://raw.githubusercontent.com/sagarnikam123/fuzzy-train/refs/heads/main/fuzzy-train-stdout.yaml

# Deploy to Kubernetes cluster
kubectl apply -f fuzzy-train-file.yaml
kubectl apply -f fuzzy-train-stdout.yaml

# Check running pods
kubectl get pods -l app=fuzzy-train

2. flog - Fast Log Generator

A fake log generator for common log formats including Apache, syslog, and JSON. Useful for testing log streams and data pipelines.

Supported Formats: apache_common (default), apache_combined, apache_error, rfc3164 (syslog), rfc5424 (syslog), json
Output Types: stdout (default), log (file), gz (gzip compressed)

Installation Options:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# Using go install (recommended)
go install github.com/mingrammer/flog

# Using Homebrew
brew tap mingrammer/flog
brew install flog

# Using pre-built binary
# macOS
curl -O -L "https://github.com/mingrammer/flog/releases/download/v0.4.4/flog_0.4.4_darwin_amd64.tar.gz"
tar -xvzf flog_0.4.4_darwin_amd64.tar.gz
cd flog_0.4.4_darwin_amd64

# Linux
curl -O -L "https://github.com/mingrammer/flog/releases/download/v0.4.4/flog_0.4.4_linux_amd64.tar.gz"
tar -xvzf flog_0.4.4_linux_amd64.tar.gz
cd  flog_0.4.4_linux_amd64

chmod +x ./flog
sudo mv ./flog /usr/local/bin/

Command Line Usage:

1
2
3
4
5
6
7
8
9
10
11
# Generate (-n) 1000 logs to stdout (default)
flog

# Generate logs with (-s) time interval and (-d) delay
flog -s 10s -n 200 -d 3s

# Apache combined (-f) format with (-w) overwrite to (-o) output file
flog -t log -f apache_combined -w -o apache.log

# Continuous generation with (--loop) mode
flog -f rfc3164 -l

Advanced Options:

1
2
3
4
5
6
7
8
# Generate logs by size instead of line count
flog -b 10485760 -f json -o large.log

# Split logs every 1MB with gzip compression
flog -t gz -o log.gz -b 10485760 -p 1048576

# Generate logs with path structure
flog -t log -f apache_combined -o web/log/apache.log -n 5000

Docker Usage:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# Basic Docker run (interactive)
docker run -it --rm mingrammer/flog

# Generate logs to stdout with custom parameters
docker run --rm mingrammer/flog -f apache_combined -n 500

# Generate logs to file with volume mount
docker run --rm -v "$(pwd)":/logs mingrammer/flog -t log -o /logs/apache.log -n 1000

# Continuous log generation in background
docker run -d --name flog-generator mingrammer/flog -f json -l

# High-volume generation with gzip compression
docker run --rm -v "$(pwd)":/logs mingrammer/flog -t gz -o /logs/large.log.gz -b 50MB

Step-by-Step Implementation

Step 1: Choose Your Log Format

Decide on the log format based on your testing needs:

  • Apache Common Log Format: Web server testing
  • JSON: Modern microservices
  • Syslog: System-level testing
  • Logfmt: Structured key-value logs

Step 2: Set Up Log Generation with fuzzy-train

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# Generate JSON logs to file
docker run -d --name fuzzy-train-generator \
  -v /tmp/logs:/logs \
  sagarnikam123/fuzzy-train:latest \
  --log-format JSON \
  --lines-per-second 5 \
  --output file \
  --file /logs/fuzzy-train.log

# Generate Apache combined logs
docker run -d --name apache-log-generator \
  -v /tmp/logs:/logs \
  sagarnikam123/fuzzy-train:latest \
  --log-format "apache combined" \
  --lines-per-second 10 \
  --output file \
  --file /logs/apache.log

Using Python Script

1
2
3
4
5
6
7
8
9
10
# Clone and setup fuzzy-train
git clone https://github.com/sagarnikam123/fuzzy-train
cd fuzzy-train

# Generate logs to file
python3 fuzzy-train.py \
  --log-format JSON \
  --lines-per-second 5 \
  --output file \
  --file $HOME/data/log/logger/fuzzy-train.log    # change as per your directory structure

Step 3: Configure Log Shipping

Fluent-bit Configuration

1
fluent-bit --config=fluent-bit-local-fs-json-loki.yaml    # run Fluent-bit
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# fluent-bit-local-fs-json-loki.yaml
service:
  log_level: info
  http_server: on
  http_listen: 0.0.0.0
  http_port: 2020

parsers:
  - name: json
    format: json
    time_key: timestamp # check your log lines, this may be "time"
    time_format: "%Y-%m-%dT%H:%M:%S.%LZ" # or "%Y-%m-%dT%H:%M:%S.%L%z"
    time_keep: on

pipeline:
  inputs:
    - name: tail
      path: /Users/snikam/data/log/logger/*.log # change according to where your .log file present
      read_from_head: false
      refresh_interval: 10
      ignore_older: 1h
      tag: local.*
      parser: json

  outputs:
    - name: loki
      match: '*'
      host: 127.0.0.1
      port: 3100
      labels: service_name=fluent-bit, source=fuzzy-train-log

Vector.dev Configuration

1
2
vector validate config/vector-local-fs-json-loki.yaml    # validate the configuration
vector --config=config/vector-local-fs-json-loki.yaml    # run the vector
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
data_dir: $HOME/data/vector # set if no global data_dir set

sources:
  fuzzy_logs:
    type: file
    include:
      - $HOME/data/log/logger/*.log # change according to your file path
    read_from: beginning
    encoding:
      charset: utf-8

transforms:
  parse_logs:
    type: remap
    inputs:
      - fuzzy_logs
    source: |
      . = parse_json!(.message) # make sure your log line has "message" keyword

sinks:
  loki_sink:
    type: loki
    inputs:
      - parse_logs
    endpoint: http://127.0.0.1:3100 # change as per your Loki endpoint
    encoding:
      codec: json
    healthcheck:
      enabled: true
    labels:
      service_name: fuzzy-train
      source: fuzzy-train-log

api:  # optional
  enabled: true
  address: 127.0.0.1:8686 # visit - http://127.0.0.1:8686/playground
  playground: true

Grafana Alloy Configuration

1
2
3
alloy-darwin-amd64 run config/alloy-local-fs-json-loki.alloy    # run alloy
alloy-1.10.1 run config/alloy-local-fs-json-loki.alloy    # run alloy
# visit UI - http://127.0.0.1:12345/
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
livedebugging {
  enabled = true
}

local.file_match "local_files" {
    path_targets = [{"__path__" = "/Users/snikam/data/log/logger/*.log", "job" = "alloy", "hostname" = constants.hostname}]
    sync_period  = "5s"
}

loki.source.file "log_scrape" {
    targets    = local.file_match.local_files.targets
    forward_to = [loki.write.local_loki.receiver]
    tail_from_end = true
}

loki.write "local_loki" {
  endpoint {
    url = "http://127.0.0.1:3100/loki/api/v1/push"
  }
}

Advanced Techniques

High-Volume Log Generation with fuzzy-train

Docker High-Volume Generation:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
# Generate 100 logs per second for load testing
docker run -d --name high-volume-generator \
  -v /tmp/logs:/logs \
  sagarnikam123/fuzzy-train:latest \
  --lines-per-second 100 \
  --log-format JSON \
  --output file \
  --file /logs/high-volume.log

# Multiple containers for extreme load
for i in {1..5}; do
  docker run -d --name volume-gen-$i \
    -v /tmp/logs:/logs \
    sagarnikam123/fuzzy-train:latest \
    --lines-per-second 50 \
    --log-format JSON \
    --output file \
    --file /logs/volume-$i.log
done

# Kubernetes DaemonSet for cluster-wide generation
kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fuzzy-train-volume
spec:
  selector:
    matchLabels:
      app: fuzzy-train-volume
  template:
    metadata:
      labels:
        app: fuzzy-train-volume
    spec:
      containers:
      - name: fuzzy-train
        image: sagarnikam123/fuzzy-train:latest
        args:
          - "--lines-per-second"
          - "200"
          - "--log-format"
          - "JSON"
          - "--output"
          - "file"
          - "--file"
          - "/logs/node-volume.log"
        volumeMounts:
        - name: log-volume
          mountPath: /logs
      volumes:
      - name: log-volume
        hostPath:
          path: /var/log/fuzzy-train
EOF

Python Script High-Volume:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# Generate massive logs with different formats
python3 fuzzy-train.py \
  --lines-per-second 500 \
  --log-format JSON \
  --min-log-length 200 \
  --max-log-length 500 \
  --output file \
  --file /var/log/massive-load.log

# Parallel generation with different trace IDs
for i in {1..10}; do
  python3 fuzzy-train.py \
    --lines-per-second 50 \
    --log-format logfmt \
    --trace-id-type integer \
    --output file \
    --file /var/log/parallel-$i.log &
done

Error Pattern Simulation with fuzzy-train

Simulating Error Bursts:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# Normal operation logs
docker run -d --name normal-ops \
  -v /tmp/logs:/logs \
  sagarnikam123/fuzzy-train:latest \
  --lines-per-second 5 \
  --log-format JSON \
  --output file \
  --file /logs/normal.log

# Simulate error burst (high frequency for 2 minutes)
sleep 30  # Normal operation for 30 seconds
docker run --rm \
  -v /tmp/logs:/logs \
  sagarnikam123/fuzzy-train:latest \
  --lines-per-second 50 \
  --log-format JSON \
  --output file \
  --file /logs/error-burst.log &

# Stop error burst after 2 minutes
sleep 120
docker stop $(docker ps -q --filter ancestor=sagarnikam123/fuzzy-train:latest)

Custom Error Pattern Script:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
#!/usr/bin/env python3
import subprocess
import time
import random

def simulate_error_patterns():
    patterns = [
        # Normal operation
        {"rate": 2, "duration": 60, "format": "JSON"},
        # Error spike
        {"rate": 20, "duration": 30, "format": "JSON"},
        # Recovery period
        {"rate": 5, "duration": 45, "format": "JSON"},
        # Critical failure
        {"rate": 100, "duration": 15, "format": "syslog"}
    ]
    
    for i, pattern in enumerate(patterns):
        print(f"Starting pattern {i+1}: {pattern['rate']} logs/sec for {pattern['duration']}s")

        process = subprocess.Popen([
            "python3", "fuzzy-train.py",
            "--lines-per-second", str(pattern["rate"]),
            "--log-format", pattern["format"],
            "--output", "file",
            "--file", f"/tmp/logs/pattern-{i+1}.log"
        ])

        time.sleep(pattern["duration"])
        process.terminate()
        time.sleep(2)  # Brief pause between patterns

if __name__ == "__main__":
    simulate_error_patterns()

Multi-Service Log Simulation with fuzzy-train

Docker Compose Multi-Service Setup:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
# docker-compose.yml
version: '3.8'
services:
  auth-service:
    image: sagarnikam123/fuzzy-train:latest
    command: >
      --lines-per-second 10
      --log-format JSON
      --output file
      --file /logs/auth-service.log
      --trace-id-type integer
    volumes:
      - ./logs:/logs
    container_name: auth-logs

  payment-service:
    image: sagarnikam123/fuzzy-train:latest
    command: >
      --lines-per-second 15
      --log-format logfmt
      --output file
      --file /logs/payment-service.log
      --trace-id-type integer
    volumes:
      - ./logs:/logs
    container_name: payment-logs

  user-service:
    image: sagarnikam123/fuzzy-train:latest
    command: >
      --lines-per-second 8
      --log-format "apache combined"
      --output file
      --file /logs/user-service.log
    volumes:
      - ./logs:/logs
    container_name: user-logs

  notification-service:
    image: sagarnikam123/fuzzy-train:latest
    command: >
      --lines-per-second 5
      --log-format syslog
      --output file
      --file /logs/notification-service.log
      --no-trace-id
    volumes:
      - ./logs:/logs
    container_name: notification-logs

Start Multi-Service Simulation:

1
2
3
4
5
6
7
8
9
10
11
# Create logs directory
mkdir -p logs

# Start all services
docker-compose up -d

# Monitor log generation
tail -f logs/*.log

# Stop all services
docker-compose down

Kubernetes Multi-Service Deployment:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# Deploy multiple services with different log patterns
kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  name: microservices-logs
spec:
  replicas: 1
  selector:
    matchLabels:
      app: microservices-logs
  template:
    metadata:
      labels:
        app: microservices-logs
    spec:
      containers:
      - name: auth-service
        image: sagarnikam123/fuzzy-train:latest
        args: ["--lines-per-second", "10", "--log-format", "JSON", "--trace-id-type", "integer"]
      - name: payment-service
        image: sagarnikam123/fuzzy-train:latest
        args: ["--lines-per-second", "15", "--log-format", "logfmt", "--trace-id-type", "integer"]
      - name: user-service
        image: sagarnikam123/fuzzy-train:latest
        args: ["--lines-per-second", "8", "--log-format", "apache combined"]
      - name: notification-service
        image: sagarnikam123/fuzzy-train:latest
        args: ["--lines-per-second", "5", "--log-format", "syslog", "--no-trace-id"]
EOF

Service-Specific Log Generation Script:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
#!/bin/bash
# multi-service-logs.sh

SERVICES=("auth:JSON:10" "payment:logfmt:15" "user:syslog:8" "notification:apache_combined:5")

for service_config in "${SERVICES[@]}"; do
    IFS=':' read -r service format rate <<< "$service_config"

    echo "Starting $service service log generation..."

    python3 fuzzy-train.py \
        --lines-per-second "$rate" \
        --log-format "$format" \
        --output file \
        --file "/var/log/${service}-service.log" \
        --trace-id-type integer &

    echo "$service service started with PID $!"
done

echo "All services started. Press Ctrl+C to stop."
wait

Testing Scenarios

Load Testing

  • Generate 1000+ logs per second
  • Test log rotation and archival
  • Verify system performance under load

Alert Testing

  • Generate specific error patterns
  • Test threshold-based alerts
  • Verify notification systems

Parser Testing

  • Create logs with various formats
  • Test regex patterns
  • Validate field extraction

Best Practices

  1. Start Small: Begin with low-volume generation
  2. Monitor Resources: Watch disk space and CPU usage
  3. Clean Up: Implement log rotation and cleanup
  4. Realistic Data: Use realistic timestamps and patterns
  5. Version Control: Keep your log generation scripts in Git

Cleanup

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
# Stop all fuzzy-train Docker containers
docker stop fuzzy-train-generator apache-log-generator fuzzy-train-log-generator
docker stop high-volume-generator normal-ops auth-logs payment-logs user-logs notification-logs
docker rm fuzzy-train-generator apache-log-generator fuzzy-train-log-generator
docker rm high-volume-generator normal-ops auth-logs payment-logs user-logs notification-logs

# Stop flog containers
docker stop flog-generator
docker rm flog-generator

# Stop volume generation containers
for i in {1..5}; do
  docker stop volume-gen-$i 2>/dev/null && docker rm volume-gen-$i 2>/dev/null
done

# Stop multi-service Docker Compose setup
docker-compose down

# Stop Python script processes
pkill -f "fuzzy-train.py"
pkill -f "flog"
pkill -f "simulate_error_patterns"
pkill -f "multi-service-logs.sh"

# Stop log shipping agents
pkill -f "fluent-bit"
pkill -f "vector"
pkill -f "alloy"

# Clean up log files and directories
rm -rf /tmp/logs/
rm -rf $HOME/data/log/logger/
rm -f /var/log/massive-load.log
rm -f /var/log/parallel-*.log
rm -f /var/log/*-service.log
rm -f /var/log/pattern-*.log
rm -rf logs/  # Docker Compose logs directory

# Clean up fuzzy-train repository
rm -rf fuzzy-train/

# Clean up configuration files
rm -f fluent-bit-local-fs-json-loki.yaml
rm -f vector-local-fs-json-loki.yaml
rm -f alloy-local-fs-json-loki.alloy
rm -f docker-compose.yml
rm -f multi-service-logs.sh
rm -f simulate_error_patterns.py

# Clean up Kubernetes deployments
kubectl delete -f fuzzy-train-file.yaml 2>/dev/null
kubectl delete -f fuzzy-train-stdout.yaml 2>/dev/null
kubectl delete daemonset fuzzy-train-volume 2>/dev/null
kubectl delete deployment microservices-logs 2>/dev/null

# Clean up downloaded YAML files
rm -f fuzzy-train-file.yaml fuzzy-train-stdout.yaml

# Clean up flog binary (if installed locally)
rm -f /usr/local/bin/flog
rm -rf flog_*

# Verify cleanup
echo "Cleanup completed. Checking for remaining processes..."
ps aux | grep -E "(fuzzy-train|flog|fluent-bit|vector|alloy)" | grep -v grep || echo "No remaining processes found."

Summary

Generating fake logs is essential for testing log aggregation systems. This guide covered:

  • Tools: fuzzy-train and flog for different use cases
  • Deployment: Docker, Kubernetes, and Python script options
  • Log Shipping: Fluent-bit, Vector.dev, and Grafana Alloy configurations
  • Advanced Scenarios: High-volume generation, error simulation, and multi-service setups
  • Best Practices: Resource monitoring, cleanup, and realistic testing

Start with simple tools like fuzzy-train for basic testing, then move to advanced scenarios for comprehensive log aggregation system validation. Always monitor system resources and clean up after testing to maintain a healthy development environment.

This post is licensed under CC BY 4.0 by the author.