Compression Flow for AI Readiness¶

LogStrip processes raw logs through a multi-step engine to optimize them for AI consumption. This flow ensures high-signal context is preserved while noise is minimized.

Step	What happens	Why it matters for AI
1. Drop obvious noise	Remove low-value telemetry (`[INFO]`, `[DEBUG]`, `[TRACE]`, `[VERBOSE]`) before it reaches the scorer.	Cuts prompt bloat immediately.
2. Score relevance	Add signal from errors, JSON severity, scanner findings, container failures, npm/yarn errors, diagnostic keywords, and stack frames.	Works across server, CI, scanner, and container ecosystems.
3. Preserve context windows	Keep a few soft lines before and after high-score diagnostics.	Captures setup and follow-up context without retaining the whole log.
4. Sanitize entropy	Normalize UUIDs, timestamps, hashes, and IPs before deduplication. Repeated structured fields such as `amount=99.99`, `amount=49.50`, and `amount=12.00` collapse to `amount=[99.99 \\| 49.50 \\| 12.00]` only inside folded `[xN]` groups.	Stabilizes repeated events and masks sensitive-looking identifiers without hiding one-off values.
5. Damp repeated spam	Penalize high-frequency sanitized lines and fold adjacent same-shape diagnostics into `[xN]` delta summaries.	Reduces repetition without losing frequency info.
6. Collapse internals	Replace low-value framework/library stack frames with a single marker.	Keeps user-code frames visible while shrinking noisy stacks.
7. Detect source	Identify dominant log ecosystems in the stream.	Helps route incidents to the right owner/toolchain.