As someone who works a lot with SQL and big datasets, the rise of DuckDB has been a game-changer. I usually handle data using Python’s pandas library, and then run SQL queries on it with the sqldf library. Let’s learn about DuckDB from scratch.
What is DuckDB?
DuckDB is a database system designed to process large datasets. DuckDB is also often called the “SQLite for analytics” because it uses the SQL dialect. (source)
How to use DuckDB in Python?
import duckdb
duckdb.sql("select 10").show()

How do I read a CSV file using DuckDB?
duckdb.read_csv("example.csv")
duckdb.sql("select * from 'example.csv'")


As you can see, we can directly run SQL queries on a CSV file in Python using DuckDB.
Can we read a Pandas DataFrame using DuckDB?
import pandas as pd
pandas_df = pd.DataFrame({"a": [10]})
duckdb.sql("select * from pandas_df")

How can I convert a DuckDB SQL result into a Pandas DataFrame?
duckdb.sql("select 10").df()

How can I export a DuckDB SQL result to a CSV file?
duckdb.sql("select 10").write_csv("out.csv")
