Use Cases
Testing Log Aggregation Systems
- Cloud-Native Platforms: Grafana Loki, Elastic Stack (ELK), SigNoz, OpenSearch, Quickwit
- Enterprise Solutions: Splunk, Datadog, New Relic, Sumo Logic, Logz.io, Honeycomb
- Cloud Services: AWS CloudWatch, Azure Monitor, Google Cloud Logging, Papertrail
- Open Source: Rsyslog, Syslog-ng, Apache Flume, VictoriaLogs
- Performance Testing: Verify ingestion rates, query performance, and storage efficiency
- Scalability Testing: Test system behavior under high log volumes
Validating Log Shipping Agents
- Log Collectors: Fluent-bit, Grafana Alloy, Vector.dev, Promtail, Fluentd, Filebeat, Logstash, OpenTelemetry Collector, Telegraf
- Configuration Testing: Verify parsing rules, filtering, and routing logic
- Reliability Testing: Test agent behavior during network failures or high load
Development & Operations
- Parser Development: Test regex patterns and log parsing rules
- Alert System Testing: Generate specific patterns to trigger monitoring alerts
- Dashboard Development: Create realistic data for visualization testing
- Load Testing: Simulate disk I/O and system resource usage
- Training & Demos: Provide realistic data for learning environments
1. fuzzy-train - Versatile Log Generator
A versatile fake log generator for testing and development - runs anywhere.
Features:
- Multiple Formats: JSON, logfmt, Apache (common/combined/error), BSD syslog (RFC3164), Syslog (RFC5424)
- Smart Tracking: trace_id with PID/Container ID or incremental integers for multi-instance tracking
- Flexible Output: stdout, file, or both simultaneously
- Smart File Handling: Auto-creates directories and default filename
- Container-Aware: Uses container/pod identifiers in containerized environments
- Field Control: Optional timestamp, log level, length, and trace_id fields
Python Script Usage:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
| # Clone repository
git clone https://github.com/sagarnikam123/fuzzy-train
cd fuzzy-train
# Default JSON logs (90-100 chars, 1 line/sec)
python3 fuzzy-train.py
# Apache common with custom parameters
python3 fuzzy-train.py \
--min-log-length 100 \
--max-log-length 200 \
--lines-per-second 5 \
--log-format "apache common" \
--time-zone UTC \
--output file
# Logfmt with simple trace IDs
python3 fuzzy-train.py \
--log-format logfmt \
--trace-id-type integer
# Clean logs (no metadata)
python3 fuzzy-train.py \
--no-timestamp \
--no-log-level \
--no-length \
--no-trace-id
# Output to both stdout and file
python3 fuzzy-train.py --output stdout --file fuzzy-train.log
|
Docker Usage:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
| # Quick start with defaults
docker pull sagarnikam123/fuzzy-train:latest
docker run --rm sagarnikam123/fuzzy-train:latest
# Run in background
docker run -d --name fuzzy-train-log-generator sagarnikam123/fuzzy-train:latest \
--lines-per-second 2 --log-format JSON
# Apache combined logs with volume mount
docker run --rm -v "$(pwd)":/logs sagarnikam123/fuzzy-train:latest \
--min-log-length 180 \
--max-log-length 200 \
--lines-per-second 2 \
--time-zone UTC \
--log-format logfmt \
--output file \
--file /logs/fuzzy-train.log
# High-volume syslog for load testing
docker run --rm sagarnikam123/fuzzy-train:latest \
--lines-per-second 10 \
--log-format syslog \
--time-zone UTC \
--output file
|
Kubernetes Deployment:
1
2
3
4
5
6
7
8
9
10
| # Download YAML files
wget https://raw.githubusercontent.com/sagarnikam123/fuzzy-train/refs/heads/main/fuzzy-train-file.yaml
wget https://raw.githubusercontent.com/sagarnikam123/fuzzy-train/refs/heads/main/fuzzy-train-stdout.yaml
# Deploy to Kubernetes cluster
kubectl apply -f fuzzy-train-file.yaml
kubectl apply -f fuzzy-train-stdout.yaml
# Check running pods
kubectl get pods -l app=fuzzy-train
|
2. flog - Fast Log Generator
A fake log generator for common log formats including Apache, syslog, and JSON. Useful for testing log streams and data pipelines.
Supported Formats: apache_common
(default), apache_combined
, apache_error
, rfc3164
(syslog), rfc5424
(syslog), json
Output Types: stdout
(default), log
(file), gz
(gzip compressed)
Installation Options:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
| # Using go install (recommended)
go install github.com/mingrammer/flog
# Using Homebrew
brew tap mingrammer/flog
brew install flog
# Using pre-built binary
# macOS
curl -O -L "https://github.com/mingrammer/flog/releases/download/v0.4.4/flog_0.4.4_darwin_amd64.tar.gz"
tar -xvzf flog_0.4.4_darwin_amd64.tar.gz
cd flog_0.4.4_darwin_amd64
# Linux
curl -O -L "https://github.com/mingrammer/flog/releases/download/v0.4.4/flog_0.4.4_linux_amd64.tar.gz"
tar -xvzf flog_0.4.4_linux_amd64.tar.gz
cd flog_0.4.4_linux_amd64
chmod +x ./flog
sudo mv ./flog /usr/local/bin/
|
Command Line Usage:
1
2
3
4
5
6
7
8
9
10
11
| # Generate (-n) 1000 logs to stdout (default)
flog
# Generate logs with (-s) time interval and (-d) delay
flog -s 10s -n 200 -d 3s
# Apache combined (-f) format with (-w) overwrite to (-o) output file
flog -t log -f apache_combined -w -o apache.log
# Continuous generation with (--loop) mode
flog -f rfc3164 -l
|
Advanced Options:
1
2
3
4
5
6
7
8
| # Generate logs by size instead of line count
flog -b 10485760 -f json -o large.log
# Split logs every 1MB with gzip compression
flog -t gz -o log.gz -b 10485760 -p 1048576
# Generate logs with path structure
flog -t log -f apache_combined -o web/log/apache.log -n 5000
|
Docker Usage:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
| # Basic Docker run (interactive)
docker run -it --rm mingrammer/flog
# Generate logs to stdout with custom parameters
docker run --rm mingrammer/flog -f apache_combined -n 500
# Generate logs to file with volume mount
docker run --rm -v "$(pwd)":/logs mingrammer/flog -t log -o /logs/apache.log -n 1000
# Continuous log generation in background
docker run -d --name flog-generator mingrammer/flog -f json -l
# High-volume generation with gzip compression
docker run --rm -v "$(pwd)":/logs mingrammer/flog -t gz -o /logs/large.log.gz -b 50MB
|
Step-by-Step Implementation
Decide on the log format based on your testing needs:
- Apache Common Log Format: Web server testing
- JSON: Modern microservices
- Syslog: System-level testing
- Logfmt: Structured key-value logs
Step 2: Set Up Log Generation with fuzzy-train
Using Docker (Recommended)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
| # Generate JSON logs to file
docker run -d --name fuzzy-train-generator \
-v /tmp/logs:/logs \
sagarnikam123/fuzzy-train:latest \
--log-format JSON \
--lines-per-second 5 \
--output file \
--file /logs/fuzzy-train.log
# Generate Apache combined logs
docker run -d --name apache-log-generator \
-v /tmp/logs:/logs \
sagarnikam123/fuzzy-train:latest \
--log-format "apache combined" \
--lines-per-second 10 \
--output file \
--file /logs/apache.log
|
Using Python Script
1
2
3
4
5
6
7
8
9
10
| # Clone and setup fuzzy-train
git clone https://github.com/sagarnikam123/fuzzy-train
cd fuzzy-train
# Generate logs to file
python3 fuzzy-train.py \
--log-format JSON \
--lines-per-second 5 \
--output file \
--file $HOME/data/log/logger/fuzzy-train.log # change as per your directory structure
|
1
| fluent-bit --config=fluent-bit-local-fs-json-loki.yaml # run Fluent-bit
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
| # fluent-bit-local-fs-json-loki.yaml
service:
log_level: info
http_server: on
http_listen: 0.0.0.0
http_port: 2020
parsers:
- name: json
format: json
time_key: timestamp # check your log lines, this may be "time"
time_format: "%Y-%m-%dT%H:%M:%S.%LZ" # or "%Y-%m-%dT%H:%M:%S.%L%z"
time_keep: on
pipeline:
inputs:
- name: tail
path: /Users/snikam/data/log/logger/*.log # change according to where your .log file present
read_from_head: false
refresh_interval: 10
ignore_older: 1h
tag: local.*
parser: json
outputs:
- name: loki
match: '*'
host: 127.0.0.1
port: 3100
labels: service_name=fluent-bit, source=fuzzy-train-log
|
Vector.dev Configuration
1
2
| vector validate config/vector-local-fs-json-loki.yaml # validate the configuration
vector --config=config/vector-local-fs-json-loki.yaml # run the vector
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
| data_dir: $HOME/data/vector # set if no global data_dir set
sources:
fuzzy_logs:
type: file
include:
- $HOME/data/log/logger/*.log # change according to your file path
read_from: beginning
encoding:
charset: utf-8
transforms:
parse_logs:
type: remap
inputs:
- fuzzy_logs
source: |
. = parse_json!(.message) # make sure your log line has "message" keyword
sinks:
loki_sink:
type: loki
inputs:
- parse_logs
endpoint: http://127.0.0.1:3100 # change as per your Loki endpoint
encoding:
codec: json
healthcheck:
enabled: true
labels:
service_name: fuzzy-train
source: fuzzy-train-log
api: # optional
enabled: true
address: 127.0.0.1:8686 # visit - http://127.0.0.1:8686/playground
playground: true
|
Grafana Alloy Configuration
1
2
3
| alloy-darwin-amd64 run config/alloy-local-fs-json-loki.alloy # run alloy
alloy-1.10.1 run config/alloy-local-fs-json-loki.alloy # run alloy
# visit UI - http://127.0.0.1:12345/
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
| livedebugging {
enabled = true
}
local.file_match "local_files" {
path_targets = [{"__path__" = "/Users/snikam/data/log/logger/*.log", "job" = "alloy", "hostname" = constants.hostname}]
sync_period = "5s"
}
loki.source.file "log_scrape" {
targets = local.file_match.local_files.targets
forward_to = [loki.write.local_loki.receiver]
tail_from_end = true
}
loki.write "local_loki" {
endpoint {
url = "http://127.0.0.1:3100/loki/api/v1/push"
}
}
|
Advanced Techniques
High-Volume Log Generation with fuzzy-train
Docker High-Volume Generation:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
| # Generate 100 logs per second for load testing
docker run -d --name high-volume-generator \
-v /tmp/logs:/logs \
sagarnikam123/fuzzy-train:latest \
--lines-per-second 100 \
--log-format JSON \
--output file \
--file /logs/high-volume.log
# Multiple containers for extreme load
for i in {1..5}; do
docker run -d --name volume-gen-$i \
-v /tmp/logs:/logs \
sagarnikam123/fuzzy-train:latest \
--lines-per-second 50 \
--log-format JSON \
--output file \
--file /logs/volume-$i.log
done
# Kubernetes DaemonSet for cluster-wide generation
kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fuzzy-train-volume
spec:
selector:
matchLabels:
app: fuzzy-train-volume
template:
metadata:
labels:
app: fuzzy-train-volume
spec:
containers:
- name: fuzzy-train
image: sagarnikam123/fuzzy-train:latest
args:
- "--lines-per-second"
- "200"
- "--log-format"
- "JSON"
- "--output"
- "file"
- "--file"
- "/logs/node-volume.log"
volumeMounts:
- name: log-volume
mountPath: /logs
volumes:
- name: log-volume
hostPath:
path: /var/log/fuzzy-train
EOF
|
Python Script High-Volume:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
| # Generate massive logs with different formats
python3 fuzzy-train.py \
--lines-per-second 500 \
--log-format JSON \
--min-log-length 200 \
--max-log-length 500 \
--output file \
--file /var/log/massive-load.log
# Parallel generation with different trace IDs
for i in {1..10}; do
python3 fuzzy-train.py \
--lines-per-second 50 \
--log-format logfmt \
--trace-id-type integer \
--output file \
--file /var/log/parallel-$i.log &
done
|
Error Pattern Simulation with fuzzy-train
Simulating Error Bursts:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
| # Normal operation logs
docker run -d --name normal-ops \
-v /tmp/logs:/logs \
sagarnikam123/fuzzy-train:latest \
--lines-per-second 5 \
--log-format JSON \
--output file \
--file /logs/normal.log
# Simulate error burst (high frequency for 2 minutes)
sleep 30 # Normal operation for 30 seconds
docker run --rm \
-v /tmp/logs:/logs \
sagarnikam123/fuzzy-train:latest \
--lines-per-second 50 \
--log-format JSON \
--output file \
--file /logs/error-burst.log &
# Stop error burst after 2 minutes
sleep 120
docker stop $(docker ps -q --filter ancestor=sagarnikam123/fuzzy-train:latest)
|
Custom Error Pattern Script:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
| #!/usr/bin/env python3
import subprocess
import time
import random
def simulate_error_patterns():
patterns = [
# Normal operation
{"rate": 2, "duration": 60, "format": "JSON"},
# Error spike
{"rate": 20, "duration": 30, "format": "JSON"},
# Recovery period
{"rate": 5, "duration": 45, "format": "JSON"},
# Critical failure
{"rate": 100, "duration": 15, "format": "syslog"}
]
for i, pattern in enumerate(patterns):
print(f"Starting pattern {i+1}: {pattern['rate']} logs/sec for {pattern['duration']}s")
process = subprocess.Popen([
"python3", "fuzzy-train.py",
"--lines-per-second", str(pattern["rate"]),
"--log-format", pattern["format"],
"--output", "file",
"--file", f"/tmp/logs/pattern-{i+1}.log"
])
time.sleep(pattern["duration"])
process.terminate()
time.sleep(2) # Brief pause between patterns
if __name__ == "__main__":
simulate_error_patterns()
|
Multi-Service Log Simulation with fuzzy-train
Docker Compose Multi-Service Setup:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
| # docker-compose.yml
version: '3.8'
services:
auth-service:
image: sagarnikam123/fuzzy-train:latest
command: >
--lines-per-second 10
--log-format JSON
--output file
--file /logs/auth-service.log
--trace-id-type integer
volumes:
- ./logs:/logs
container_name: auth-logs
payment-service:
image: sagarnikam123/fuzzy-train:latest
command: >
--lines-per-second 15
--log-format logfmt
--output file
--file /logs/payment-service.log
--trace-id-type integer
volumes:
- ./logs:/logs
container_name: payment-logs
user-service:
image: sagarnikam123/fuzzy-train:latest
command: >
--lines-per-second 8
--log-format "apache combined"
--output file
--file /logs/user-service.log
volumes:
- ./logs:/logs
container_name: user-logs
notification-service:
image: sagarnikam123/fuzzy-train:latest
command: >
--lines-per-second 5
--log-format syslog
--output file
--file /logs/notification-service.log
--no-trace-id
volumes:
- ./logs:/logs
container_name: notification-logs
|
Start Multi-Service Simulation:
1
2
3
4
5
6
7
8
9
10
11
| # Create logs directory
mkdir -p logs
# Start all services
docker-compose up -d
# Monitor log generation
tail -f logs/*.log
# Stop all services
docker-compose down
|
Kubernetes Multi-Service Deployment:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
| # Deploy multiple services with different log patterns
kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: microservices-logs
spec:
replicas: 1
selector:
matchLabels:
app: microservices-logs
template:
metadata:
labels:
app: microservices-logs
spec:
containers:
- name: auth-service
image: sagarnikam123/fuzzy-train:latest
args: ["--lines-per-second", "10", "--log-format", "JSON", "--trace-id-type", "integer"]
- name: payment-service
image: sagarnikam123/fuzzy-train:latest
args: ["--lines-per-second", "15", "--log-format", "logfmt", "--trace-id-type", "integer"]
- name: user-service
image: sagarnikam123/fuzzy-train:latest
args: ["--lines-per-second", "8", "--log-format", "apache combined"]
- name: notification-service
image: sagarnikam123/fuzzy-train:latest
args: ["--lines-per-second", "5", "--log-format", "syslog", "--no-trace-id"]
EOF
|
Service-Specific Log Generation Script:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
| #!/bin/bash
# multi-service-logs.sh
SERVICES=("auth:JSON:10" "payment:logfmt:15" "user:syslog:8" "notification:apache_combined:5")
for service_config in "${SERVICES[@]}"; do
IFS=':' read -r service format rate <<< "$service_config"
echo "Starting $service service log generation..."
python3 fuzzy-train.py \
--lines-per-second "$rate" \
--log-format "$format" \
--output file \
--file "/var/log/${service}-service.log" \
--trace-id-type integer &
echo "$service service started with PID $!"
done
echo "All services started. Press Ctrl+C to stop."
wait
|
Testing Scenarios
Load Testing
- Generate 1000+ logs per second
- Test log rotation and archival
- Verify system performance under load
Alert Testing
- Generate specific error patterns
- Test threshold-based alerts
- Verify notification systems
Parser Testing
- Create logs with various formats
- Test regex patterns
- Validate field extraction
Best Practices
- Start Small: Begin with low-volume generation
- Monitor Resources: Watch disk space and CPU usage
- Clean Up: Implement log rotation and cleanup
- Realistic Data: Use realistic timestamps and patterns
- Version Control: Keep your log generation scripts in Git
Cleanup
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
| # Stop all fuzzy-train Docker containers
docker stop fuzzy-train-generator apache-log-generator fuzzy-train-log-generator
docker stop high-volume-generator normal-ops auth-logs payment-logs user-logs notification-logs
docker rm fuzzy-train-generator apache-log-generator fuzzy-train-log-generator
docker rm high-volume-generator normal-ops auth-logs payment-logs user-logs notification-logs
# Stop flog containers
docker stop flog-generator
docker rm flog-generator
# Stop volume generation containers
for i in {1..5}; do
docker stop volume-gen-$i 2>/dev/null && docker rm volume-gen-$i 2>/dev/null
done
# Stop multi-service Docker Compose setup
docker-compose down
# Stop Python script processes
pkill -f "fuzzy-train.py"
pkill -f "flog"
pkill -f "simulate_error_patterns"
pkill -f "multi-service-logs.sh"
# Stop log shipping agents
pkill -f "fluent-bit"
pkill -f "vector"
pkill -f "alloy"
# Clean up log files and directories
rm -rf /tmp/logs/
rm -rf $HOME/data/log/logger/
rm -f /var/log/massive-load.log
rm -f /var/log/parallel-*.log
rm -f /var/log/*-service.log
rm -f /var/log/pattern-*.log
rm -rf logs/ # Docker Compose logs directory
# Clean up fuzzy-train repository
rm -rf fuzzy-train/
# Clean up configuration files
rm -f fluent-bit-local-fs-json-loki.yaml
rm -f vector-local-fs-json-loki.yaml
rm -f alloy-local-fs-json-loki.alloy
rm -f docker-compose.yml
rm -f multi-service-logs.sh
rm -f simulate_error_patterns.py
# Clean up Kubernetes deployments
kubectl delete -f fuzzy-train-file.yaml 2>/dev/null
kubectl delete -f fuzzy-train-stdout.yaml 2>/dev/null
kubectl delete daemonset fuzzy-train-volume 2>/dev/null
kubectl delete deployment microservices-logs 2>/dev/null
# Clean up downloaded YAML files
rm -f fuzzy-train-file.yaml fuzzy-train-stdout.yaml
# Clean up flog binary (if installed locally)
rm -f /usr/local/bin/flog
rm -rf flog_*
# Verify cleanup
echo "Cleanup completed. Checking for remaining processes..."
ps aux | grep -E "(fuzzy-train|flog|fluent-bit|vector|alloy)" | grep -v grep || echo "No remaining processes found."
|
Summary
Generating fake logs is essential for testing log aggregation systems. This guide covered:
- Tools: fuzzy-train and flog for different use cases
- Deployment: Docker, Kubernetes, and Python script options
- Log Shipping: Fluent-bit, Vector.dev, and Grafana Alloy configurations
- Advanced Scenarios: High-volume generation, error simulation, and multi-service setups
- Best Practices: Resource monitoring, cleanup, and realistic testing
Start with simple tools like fuzzy-train for basic testing, then move to advanced scenarios for comprehensive log aggregation system validation. Always monitor system resources and clean up after testing to maintain a healthy development environment.