Strip the filler words out of your documents before you embed them and embedding gets ~25% cheaper for one to two points of retrieval accuracy — flat, across every model I tried. The real lesson isn’t the caveman trick: it’s that twelve test questions will lie to you with a perfectly straight face, and a clean model-by-model story can be complete garbage until you run a few hundred.
The practical follow-up to the goldfish-memory post. Bring a Postgres database with pgvector and an agent that talks to users; an hour later you’ve got two-tier memory bolted on. Staging, realtime and consolidate cells, three scheduling options, three reader patterns, and an LLM fact extractor — Python and Rust both.
Agent memory has two completely different jobs — fast context for the next reply, and curated truth three weeks later — and most people try to do both with one tool. Here’s the two-tier pattern I built chunkshop’s memory layer around, the late-event bug that silently eats conversations, and why ‘just use pgvector’ isn’t the whole answer.
The features I’d argue are genuinely novel — framers, hierarchical summaries, BYO embedders via four lines of YAML, schema-flex append mode, cross-language vector compatibility, and the modular-backends roadmap toward MariaDB and ClickHouse. Plus the four bets chunkshop is making about where RAG infrastructure goes next.
Three corpora, three different winners, none of them the chunker the README recommended. Why nobody can tell you in advance which chunker to use, and the 30-minute primitive that does the work for you.
Real OLTP corpus, twelve-combo bakeoff with three baked-in models plus Snowflake Arctic via BYO YAML, hybrid search via promoted metadata, then wired into a LangGraph agent through inline mode. Every command actually run.
Seven walkthroughs with opinions — what each chunker is good at, where it falls over, and the corpus shape that flips the leaderboard between them. A field guide, not a recommendation. Bakeoff first.
An illegal chop shop for your data — the YAML-driven RAG ingest tool that ships a bakeoff primitive so you measure chunker × embedder × your corpus instead of vibe-picking from somebody else’s blog post.