The features I’d argue are genuinely novel — framers, hierarchical summaries, BYO embedders via four lines of YAML, schema-flex append mode, cross-language vector compatibility, and the modular-backends roadmap toward MariaDB and ClickHouse. Plus the four bets chunkshop is making about where RAG infrastructure goes next.
Three corpora, three different winners, none of them the chunker the README recommended. Why nobody can tell you in advance which chunker to use, and the 30-minute primitive that does the work for you.
Seven walkthroughs with opinions — what each chunker is good at, where it falls over, and the corpus shape that flips the leaderboard between them. A field guide, not a recommendation. Bakeoff first.
An illegal chop shop for your data — the YAML-driven RAG ingest tool that ships a bakeoff primitive so you measure chunker × embedder × your corpus instead of vibe-picking from somebody else’s blog post.
Standalone ingest-to-pgvector with a built-in chunker × embedder bakeoff. One YAML config = one end-to-end ingest cell. Python + Rust at parity.