MarkItDown: From Images to Searchable Text in Seconds
Plus query multiple databases at once with DuckDB
Grab your coffee. Here are this week’s highlights.
🤝 COLLABORATION
What Data Engineers Really Think About Airflow (5.8K Surveyed)
Astronomer analyzed 5.8k+ responses from data engineers on how they are navigating Airflow today and the findings might surprise you.
You’ll learn:
How early adopters are using Airflow 3 features in production
Which teams are bringing AI into production and what’s holding others back
94% believe that Airflow is beneficial to their career
📅 Today’s Picks
Query Multiple Databases at Once with DuckDB
Problem
Working with data across PostgreSQL, MySQL, and SQLite often means managing multiple database connections and additional integration overhead.
That overhead adds up quickly when your goal is simply to analyze data across sources.
Solution
DuckDB removes the friction by allowing you to join tables across databases with a single query.
Key benefits:
Join SQLite, PostgreSQL, MySQL, and Parquet files in a single SQL statement
Automatic connection handling across all sources
Filters run at the source database, so only matching rows are transferred
MarkItDown: From Images to Searchable Text in Seconds
Problem
Charts, diagrams, and screenshots in your documents need text descriptions to be searchable and processable.
But writing descriptions manually is slow and produces inconsistent results across large document sets.
Solution
MarkItDown, an open-source library from Microsoft, integrates with OpenAI to automatically generate detailed descriptions of images.
Key capabilities:
Generate consistent descriptions across hundreds of images
Process images from documents like PowerPoint and PDF files
Customize the description prompt for your specific needs
📖 View Full Article | 🧪 Run code | ⭐ View GitHub
☕️ Weekly Finds
Skill_Seekers [LLM] - Convert documentation websites, GitHub repositories, and PDFs into Claude AI skills with automatic conflict detection
sqlit [Data] - A user-friendly TUI for SQL databases supporting SQL Server, MySQL, PostgreSQL, SQLite, Turso and more
giskard [ML] - Open-source CI/CD platform for ML teams to eliminate AI bias and deliver quality ML products faster
📚 Latest Deep Dives
From CSS Selectors to Natural Language: Web Scraping with ScrapeGraphAI - Web scraping without selector maintenance. ScrapeGraphAI uses LLMs to extract data from any site using plain English prompts and Pydantic schemas.
Before You Go
🔍 Explore More on CodeCut
Tool Selector - Discover 70+ Python tools for AI and data science
Production Ready Data Science - A practical book for taking projects from prototype to production
💬 Rate Your Experience
How would you rate your newsletter experience? Share your feedback →



