As someone who works a lot with SQL and big datasets, the rise of DuckDB has been a game-changer. I usually handle data using Python’s pandas library, and then run SQL queries on it with the sqldf library. Let’s learn about DuckDB from scratch.

What is DuckDB?

DuckDB is a database system designed to process large datasets. DuckDB is also often called the “SQLite for analytics” because it uses the SQL dialect. (source)

How to use DuckDB in Python?

import duckdb
duckdb.sql("select 10").show()

Screenshot 2025-09-10 at 16.48.37.png

How do I read a CSV file using DuckDB?

duckdb.read_csv("example.csv") 
duckdb.sql("select * from 'example.csv'")

Screenshot 2025-09-10 at 17.06.47.png

Screenshot 2025-09-10 at 17.08.00.png

As you can see, we can directly run SQL queries on a CSV file in Python using DuckDB.

Can we read a Pandas DataFrame using DuckDB?

import pandas as pd

pandas_df = pd.DataFrame({"a": [10]})
duckdb.sql("select * from pandas_df")

Screenshot 2025-09-10 at 17.21.47.png

How can I convert a DuckDB SQL result into a Pandas DataFrame?

duckdb.sql("select 10").df()

Screenshot 2025-09-10 at 17.24.06.png

How can I export a DuckDB SQL result to a CSV file?

duckdb.sql("select 10").write_csv("out.csv")

Screenshot 2025-09-10 at 17.26.31.png