Swap AI Prompts Instantly with MLflow Prompt Registry
Plus automate LLM evaluation with custom judges
Quick note before we start: CodeCut is now being sent from Substack. You donât need to do anything, and the cadence stays the same.
đ
Todayâs Picks
Swap AI Prompts Instantly with MLflow Prompt Registry
Problem
Finding the right prompt often takes experimentation: tweaking wording, adjusting tone, testing different instructions.
But with prompts hardcoded in your codebase, each test requires a code change and redeployment.
Solution
MLflow Prompt Registry solves this with aliases. Your code references an alias like âproductionâ instead of a version number, so you can swap versions without changing it.
Hereâs how it works:
Every prompt edit creates a new immutable version with a commit message
Register prompts once, then assign aliases to specific versions
Deploy to different environments by creating aliases like âstagingâ and âproductionâ
Track full version history with metadata and tags for each prompt
â View GitHub
â Worth Revisiting
Automate LLM Evaluation at Scale with MLflow make_judge()
Problem
When you ship LLM features without evaluating them, models might hallucinate, violate safety guidelines, or return incorrectly formatted responses.
Manual review doesnât scale. Reviewers might miss subtle issues when evaluating thousands of outputs, and scoring standards often vary between people.
Solution
MLflow make_judge() applies the same evaluation standards to every output, whether youâre checking 10 or 10,000 responses.
Key capabilities:
Define evaluation criteria once, reuse everywhere
Automatic rationale explaining each judgment
Built-in judges for safety, toxicity, and hallucination detection
Typed outputs that never return unexpected formats
â View GitHub
âď¸ Weekly Finds
gspread [Data Processing] - Google Sheets Python API for reading, writing, and formatting spreadsheets
zeppelin [Data Analysis] - Web-based notebook for interactive data analytics with SQL, Scala, and more
vectorbt [Data Science] - Fast engine for backtesting, algorithmic trading, and research in Python
Before You Go
Explore More on CodeCut
Tool Selector - Discover 70+ Python tools for AI and data science
Production Ready Data Science - A practical book for taking projects from prototype to production
Rate Your Experience
How would you rate your newsletter experience? Share your feedback â




Solid breakdown on the alias system. The decoupling between code and prompt versions is elegant, tho the immutability part raises an intresting consideration. On one side you get version history, but if a team has dozens of prompts cycling weekly, managing all those versions could become its own overhead. Found navigating old prompt trails helpful when debugging regressions tho.