Skip to content

Python Polars 常用操作

1 - lazy 模式

lazy 模式读取数据:

python
lazy = pl.scan_parquet(parquet_file_path)

DataFrame 转为 LazyFrame:

python
lazy = df.lazy()

把 LazyFrame 导出:

python
lazy.sink_parquet(parquet_file_path)

2 - LazyFrame 的常用操作

2.1 数据过滤

python
lazy.filter(pl.col("column_name") == "value")

2.2 数据列选择

python
lazy.select(["column_name1", "column_name2"])

2.3 获取符合条件的数据总行数

python
lazy.filter(pl.col("column_name") == "value").select(pl.count()).collect().item()

2.4 新增、替换列

根据已有列生成新列:

python
lazy.with_columns(
  pl.col("column_name")
  .alias("new_column_name")
)

用默认值生成新列:

python
lazy.with_columns(
  pl.lit("default_value", dtype=pl.Utf8)
  .alias("new_column_name")
)

2.5 条件赋值

python
lazy.with_columns(
  pl.when(pl.col("column_name") == "value")
  .then(pl.lit("new_value"))
  .otherwise(pl.col("column_name"))
  .alias("new_column_name")
)

2.6 查询 Schema

python
schema = lazy.collect_schema()
for col in schema.names():
  col_type = schema[col]
  print(col, col_type)
# 输出:
# column_name Utf8  # <== pl.Utf8
# another_column Int64  # <== pl.Int64

Released under the MIT License.