title: Log Compression CLI Reference description: Complete CLI reference for LogStrip – flags, exit codes, I/O contract, stats, JSON output. Extend with .logstrip.yml: custom sources, diagnostic patterns, ignore rules, sanitization, and internal stack patterns.

CLI Reference¶

The CLI is the primary distribution channel for LogStrip. It is published to npm as logstrip and exposes two binaries:

Binary	Purpose
`logstrip`	Short alias - preferred when the name is free.
`logstrip`	Verbose alias - useful when `logstrip` is already taken.

Both binaries point at the same compiled entry: dist/cli/index.js.

Synopsis¶

logstrip [INPUT] [options]

Arguments¶

Argument	Description
`INPUT`	Path to a raw log file. When omitted, the CLI reads from `stdin`.

Options¶

Flag	Description	Default
`-o`, `--output <path>`	Write the compressed log to `<path>`. When omitted, the compressed log is written to `stdout`.	(stdout)
`-a`, `--aggressiveness <level>`	Compression preset: `low`, `medium`, `high`, `aggressive`, `auto`.	`auto`
`-s`, `--stats`	Print compression statistics to `stderr` after the log has been processed.	off
`-j`, `--json`	Print the `LogStripResult` as JSON to `stdout`. Requires `--output` so the compressed log does not collide with the report.	off
`-h`, `--help`	Print the help text and exit.	-
`-v`, `--version`	Print the CLI version and exit.	-
`--config <path>`	Path to a `.logstrip.yml` custom config file. When omitted, the CLI auto-detects `.logstrip.yml` in the current working directory.	(auto)

I/O contract¶

The compressed log goes to --output when set, otherwise to stdout.
Stats (--stats) always go to stderr so they never collide with the compressed log on stdout.
The JSON report (--json) always goes to stdout. To prevent contamination, --json requires --output.
When INPUT is omitted and stdin is a terminal (TTY), the CLI exits with code 2 rather than waiting forever for input.

Exit codes¶

Code	Meaning
`0`	Success.
`1`	Runtime failure (file not found, stream error, internal exception).
`2`	Usage error (unknown flag, unsupported aggressiveness, `--json` without `--output`, stdin is a TTY).

Recipes¶

File in, file out¶

logstrip raw.log -o clean.log

Stdin in, stdout out¶

cat raw.log | logstrip > clean.log

PowerShell:

Get-Content raw.log | logstrip > clean.log

Stats alongside content¶

logstrip raw.log --stats > clean.log
# compressed log -> clean.log
# stats          -> stderr (visible in the terminal or CI summary)

Machine-readable report¶

logstrip raw.log -o clean.log --json

stdout will contain a LogStripResult object:

{
  "stats": {
    "inputLines": 4128,
    "outputLines": 312,
    "inputWords": 21450,
    "outputWords": 4138,
    "inputBytes": 412800,
    "outputBytes": 31200,
    "droppedLines": 3640,
    "duplicateLines": 87,
    "hiddenInternalStackLines": 89
  },
  "inputTokens": 27885,
  "outputTokens": 5379,
  "savedTokens": 22506,
  "savingsPercent": 80.71,
  "detectedSources": ["webpack", "npm", "kubernetes"],
  "outputPath": "clean.log"
}

detectedSources is ranked by lightweight source fingerprints gathered during streaming. It is informational and does not change the compressed log output.

Aggressiveness and context retention¶

--aggressiveness controls how much context survives around high-signal lines. The parser uses a hybrid scoring model instead of a single binary filter:

hard signals ([ERROR], JSON "level":"error", scanner findings, container failures, npm/yarn errors, stack frames) are emitted immediately;
nearby soft lines are kept through a small before/after context window;
repeated sanitized lines are dampened so spam eventually falls below the keep threshold;
adjacent diagnostic variants with the same stable shape are folded as delta summaries, so repeated lines like amount=99.99, amount=49.50, and amount=12.00 render as [x3] ... amount=[99.99 | 49.50 | 12.00];
aggressive still drops pure warning noise, but preserves warning lines with diagnostic keywords such as failed, timeout, refused, crashed, killed, terminated, unauthorized, and unavailable.

Static levels¶

Level	Behavior
`low`	Keeps most lines including `[INFO]` and `[DEBUG]`. Minimal compression.
`medium`	Drops noise tags (`[INFO]`, `[DEBUG]`, `[TRACE]`) but keeps `[WARN]`.
`high`	Drops noise and pure warnings; keeps only diagnostic signals + context window.
`aggressive`	Drops everything except errors, fatals, stack frames, and explicit diagnostic keywords. Maximum compression.

`auto` mode (default)¶

auto starts at the high static level and then adjusts dynamically based on what the parser sees in the stream:

The parser tracks a sliding window of the last 8 line decisions (kept vs dropped).
When the window contains mostly hard-keep signals (3+ errors/diagnostics), the effective level decreases toward medium — more context is preserved because the log is signal-rich.
When the window shows many drops and repeated lines (6+ drops + repeats), the effective level increases toward aggressive — the log is mostly noise, so stricter filtering recovers more tokens.

This means auto is safe to use as the default: it preserves context in error-heavy logs and maximizes compression in noisy build output, without requiring the user to guess the right level up front.

To pin a specific static level and disable dynamic adjustment, pass it explicitly:

logstrip raw.log -a high -o clean.log

Use inside a shell script¶

#!/usr/bin/env bash
set -euo pipefail

logstrip raw.log -o clean.log --json | jq '.savingsPercent'

Stats block format¶

When --stats is enabled the CLI writes a fixed-shape block to stderr:

LogStrip compression report
  input lines     : <int>
  output lines    : <int>
  dropped lines   : <int>
  duplicate lines : <int>
  hidden internal : <int>
  input tokens    : <int>
  output tokens   : <int>
  saved tokens    : <int>
  savings         : <float>%
  output path     : <path>   # only when --output was set

This format is stable across patch releases. If you need to parse it from shell, prefer --json instead.

Embedding in Node¶

If you'd rather call the CLI from JavaScript without spawning a subprocess, import the helper directly:

import { runCli } from 'logstrip/cli';

const exitCode = await runCli(['raw.log', '-o', 'clean.log', '--json'], {
  stdin: process.stdin,
  stdout: process.stdout,
  stderr: process.stderr,
  stdinIsTTY: Boolean(process.stdin.isTTY),
});

For library-style integration that returns a LogStripResult directly, use processLogFile / processLogStream instead.

Custom configuration (.logstrip.yml)¶

Corporations and teams running internal tools can extend LogStrip without modifying the source code. Create a .logstrip.yml file in the repository root (or pass --config path/to/config.yml) to define custom log sources, diagnostic patterns, ignore rules, sanitization rules, and internal stack patterns that merge with the built-in set at runtime.

File format¶

# Custom log sources – markers are case-insensitive substrings
# matched against every line. If a source name matches a built-in
# source, the markers are merged (deduplicated).
sources:
  - name: acme-gateway
    markers:
      - acme-gateway
      - "[ACME-GW]"
  - name: acme-auth
    markers: [acme-auth-service, "[ACME-AUTH]"]

# Lines matching any of these regexes receive a +50 relevance boost,
# same as built-in DIAGNOSTIC_PATTERN.
diagnosticPatterns:
  - "ACME_ERROR_\\d+"
  - "\\bACME-FATAL\\b"

# Lines matching any of these regexes are dropped early (before
# sanitization and scoring), similar to built-in IGNORED_LOG_TAG_PATTERN.
ignorePatterns:
  - "\\bACME-HEARTBEAT\\b"
  - "\\bacme-metrics\\b"

# Each rule applies a regex replacement to every line after built-in
# sanitization. Use "flags" to control regex flags (default: "gu").
sanitizePatterns:
  - pattern: "\\bACME-USER-\\d+\\b"
    replacement: "[ACME-USER]"
  - pattern: "acme-tenant/[a-z0-9-]+"
    replacement: "acme-tenant/[ID]"
    flags: "gi"

# Lines matching any of these regexes are collapsed behind the
# [internal-stack] marker, same as built-in INTERNAL_STACK patterns.
internalStackPatterns:
  - "/opt/acme/lib/"

How it works¶

Auto-detection – When --config is not provided, the CLI looks for .logstrip.yml in the current working directory. If the file does not exist, processing continues with built-in patterns only.
Merging – Custom sources with a name that already exists in the built-in set (e.g. docker) have their markers merged with the built-in markers. New source names are appended.
Order of application – Custom ignore patterns are checked before built-in noise-tag filtering. Custom sanitize rules run after built-in sanitization. Custom diagnostic patterns add +50 to the relevance score (same weight as built-in diagnostics). Custom internal-stack patterns are checked alongside built-in ones.
Zero runtime dependencies – The YAML subset parser is built into logstrip-config.ts and handles the constructs shown above (mappings, sequences, inline arrays, quoted and unquoted strings, comments). It does not require js-yaml or any external package.

Example: internal CI platform¶

# .logstrip.yml – Acme Corp CI extension
sources:
  - name: acme-ci
    markers: [acme-ci-runner, "[ACME-CI]"]

diagnosticPatterns:
  - "ACME_BUILD_FAILED"
  - "ACME_TEST_TIMEOUT"

ignorePatterns:
  - "\\bacme-ci heartbeat\\b"
  - "\\bacme-ci version check\\b"

sanitizePatterns:
  - pattern: "ACME-EMP-\\d{6}"
    replacement: "[ACME-EMP]"

internalStackPatterns:
  - "/opt/acme/ci-runner/"

Then simply run:

logstrip ci-output.log -o clean.log
# .logstrip.yml is auto-detected from the current directory

Or explicitly:

logstrip ci-output.log -o clean.log --config /etc/logstrip/acme.yml