Fix: Polars Not Working — AttributeError, InvalidOperationError, and ShapeError

Q: How do I fix "Polars Not Working — AttributeError, InvalidOperationError, and ShapeError"?

How to fix Polars errors — AttributeError groupby not found, InvalidOperationError from Python lambdas, ShapeError broadcasting mismatch, lazy vs eager collect confusion, type casting failures, and ColumnNotFoundError in with_columns.

The Error

You switch from Pandas to Polars and the familiar API doesn’t exist:

AttributeError: 'DataFrame' object has no attribute 'groupby'

Or you try to filter with a lambda and get a cryptic failure:

InvalidOperationError: expression not allowed in this context

Or a column operation crashes with a shape mismatch:

ShapeError: unable to add a column of length 3 to a DataFrame of height 5

Or you run a scan_csv pipeline and nothing happens — no error, no data, just an empty LazyFrame printed to the console.

Polars is not a drop-in Pandas replacement. It has a different execution model, stricter type system, and an expression-based API that requires rethinking how you write transformations. These errors are all fixable once you understand the patterns.

Why This Happens

Polars separates eager and lazy execution explicitly. Operations on DataFrame run immediately; operations on LazyFrame build a query plan that only executes on .collect(). The expression system (pl.col("x") > 5) compiles to optimized Rust code — Python lambdas bypass this and are only allowed in specific slower-path methods.

The Pandas migration friction comes from subtle renames (.groupby() → .group_by()), removed conveniences (no .loc, .iloc, .values), and a stricter type system where nulls and NaN are distinct and shapes must always match.

Fix 1: Pandas API Errors — Method Names Changed

Polars deliberately renamed or removed several Pandas methods. These all surface as AttributeError.

groupby → group_by (with underscore):

import polars as pl

df = pl.DataFrame({"category": ["A", "A", "B"], "value": [10, 20, 15]})

# WRONG
result = df.groupby("category").agg(...)  # AttributeError

# CORRECT
result = df.group_by("category").agg(pl.col("value").sum())

No .loc or .iloc — use expressions instead:

# WRONG — Polars has no index-based selection
df.iloc[0:5]         # AttributeError
df.loc["label"]      # AttributeError

# CORRECT — slice by position
first_five = df.slice(0, 5)         # First 5 rows
first_five = df.head(5)             # Equivalent

# Filter by condition (replaces .loc[mask])
filtered = df.filter(pl.col("value") > 10)

# Select rows by index (integer position)
row = df[2]       # Single row as DataFrame
rows = df[1:4]    # Slice

.values → .to_numpy():

# WRONG
arr = df["value"].values   # AttributeError

# CORRECT
arr = df["value"].to_numpy()

# Or convert whole DataFrame
arr = df.to_numpy()

.iterrows() → .iter_rows(named=True):

# WRONG
for idx, row in df.iterrows():   # AttributeError
    print(row["value"])

# CORRECT
for row in df.iter_rows(named=True):
    print(row["value"])           # row is a dict

# Or iterate as tuples (faster)
for row in df.iter_rows():
    print(row[1])                 # Tuple access by position

.apply() → .map_elements() (renamed in Polars 0.19, removed in 1.0):

# WRONG (Polars 1.0+)
df.with_columns(pl.col("value").apply(lambda x: x * 2))   # AttributeError

# CORRECT
df.with_columns(
    doubled=pl.col("value").map_elements(lambda x: x * 2, return_dtype=pl.Int64)
)

Pro Tip: Before spending time on a workaround, check if Polars has a native expression for what you’re doing. df.apply(func) for squaring values becomes pl.col("x") ** 2 — zero Python overhead, and much faster.

Fix 2: Lazy vs Eager — Don’t Forget `.collect()`

pl.scan_csv(), pl.scan_parquet(), and other scan_* functions return a LazyFrame — a query plan, not data. Nothing executes until you call .collect().

import polars as pl

# scan_csv returns a LazyFrame — no data loaded yet
lf = pl.scan_csv("large_file.csv")
print(type(lf))  # <class 'polars.LazyFrame'>

# Filters and selections added to the query plan — still no execution
lf = lf.filter(pl.col("country") == "US").select(["name", "country", "revenue"])

# STILL nothing executed — lf just prints the query plan
print(lf)   # Prints "PLAN" not data

# Execute the plan — this is when disk I/O and filtering actually happen
df = lf.collect()
print(type(df))  # <class 'polars.DataFrame'>
print(df.shape)  # (n_rows, 3)

Use lazy evaluation by default for files. Polars optimizes the query plan before executing — it pushes filters down to the file reader (reading only matching rows) and projects only the columns you need:

# Reads the ENTIRE CSV then filters — inefficient
df = pl.read_csv("100gb_file.csv").filter(pl.col("year") == 2025)

# Pushes the filter to disk read — only scans matching rows
df = pl.scan_csv("100gb_file.csv").filter(pl.col("year") == 2025).collect()

Inspect the query plan before collecting to understand what Polars will do:

lf = pl.scan_csv("data.csv").filter(pl.col("x") > 5).select(["x", "y"])
print(lf.explain())            # Unoptimized plan
print(lf.explain(optimized=True))  # After predicate/projection pushdown

For very large files that don’t fit in memory, streaming processes the data in chunks:

df = (
    pl.scan_csv("huge_file.csv")
    .filter(pl.col("status") == "active")
    .group_by("region")
    .agg(pl.col("revenue").sum())
    .collect(streaming=True)   # Processes in batches, bounded memory
)

LazyFrame.schema was removed in Polars 1.0. To inspect columns and types without collecting:

# OLD (0.x, broken in 1.0)
schema = lf.schema   # AttributeError in 1.0

# CORRECT (1.0+)
schema = lf.collect_schema()
print(schema)   # Schema({'col1': Int64, 'col2': Utf8, ...})

Fix 3: `InvalidOperationError` — Use Polars Expressions, Not Python Lambdas

InvalidOperationError: expression not allowed in this context

Polars expressions (pl.col("x") > 5, pl.col("name").str.starts_with("A")) compile to optimized Rust. Python lambdas in .filter() or similar contexts break the expression system entirely.

import polars as pl

df = pl.DataFrame({"x": [1, 5, 10, 3, 8], "name": ["alice", "bob", "carol", "dave", "eve"]})

# WRONG — lambdas not allowed in filter
df.filter(lambda row: row["x"] > 5)    # InvalidOperationError

# CORRECT — use Polars expressions
df.filter(pl.col("x") > 5)

# String operations use the .str namespace
df.filter(pl.col("name").str.starts_with("a"))

# Combine conditions with & (and) and | (or)
df.filter((pl.col("x") > 3) & (pl.col("name").str.len_chars() > 3))

When you genuinely need a Python function, use map_elements() — but understand the performance cost:

import polars as pl

df = pl.DataFrame({"text": ["hello world", "foo bar", "baz"]})

# map_elements: Python called once per element (slow for large datasets)
df.with_columns(
    word_count=pl.col("text").map_elements(
        lambda s: len(s.split()),
        return_dtype=pl.Int32,
    )
)

# Always specify return_dtype — without it, Polars infers from the first element,
# which can produce unexpected types on later rows

map_batches() is faster — it passes an entire Series to your function at once rather than element by element. Use it when your function can operate on a whole Series:

import polars as pl

df = pl.DataFrame({"x": [1.0, 2.0, 3.0, 4.0, 5.0]})

# map_batches: Python called once with the full Series
df.with_columns(
    normalized=pl.col("x").map_batches(
        lambda s: (s - s.mean()) / s.std(),
        return_dtype=pl.Float64,
    )
)

Performance hierarchy (fastest to slowest):

Native Polars expressions — pl.col("x") * 2, pl.col("x").log()
map_batches() — Python called once per Series
map_elements() — Python called once per element

Before reaching for map_elements, check the Polars expressions API — string methods, date operations, list operations, and statistics are all built in.

Fix 4: Type Casting Errors — Strict vs Lenient

InvalidOperationError: cannot cast Utf8 to Int64 in strict mode

Polars defaults to strict=True in .cast() — if any value can’t be converted, the entire operation fails. This is the right behavior for clean data but breaks on real-world data with missing markers.

import polars as pl

df = pl.DataFrame({"amount": ["100", "250", "N/A", "400", "null"]})

# WRONG — fails because "N/A" and "null" can't become Int64
df.with_columns(pl.col("amount").cast(pl.Int64))   # InvalidOperationError

# CORRECT — non-convertible values become null
df.with_columns(pl.col("amount").cast(pl.Int64, strict=False))
# [100, 250, null, 400, null]

# Fill nulls after casting
df.with_columns(
    pl.col("amount").cast(pl.Int64, strict=False).fill_null(0)
)
# [100, 250, 0, 400, 0]

Specify types at read time — more efficient than reading and casting:

df = pl.read_csv(
    "transactions.csv",
    schema_overrides={
        "amount": pl.Float64,
        "quantity": pl.Int32,
        "user_id": pl.Utf8,   # Keep as string even if it looks numeric
    },
    null_values=["N/A", "null", "", "NA"],
)

Polars separates null and NaN — two distinct concepts that Pandas conflates. null is a missing value (all types). NaN is a floating-point representation of “not a number” (only Float32/Float64). They need different handling:

import polars as pl
import math

df = pl.DataFrame({"x": [1.0, float("nan"), None, 4.0]})

print(df.select(pl.col("x").is_null()))   # [false, false, true, false]
print(df.select(pl.col("x").is_nan()))    # [false, true, false, false]

# fill_null handles missing values (None/null)
# fill_nan handles NaN (floating point only)
df.with_columns(pl.col("x").fill_nan(0.0).fill_null(0.0))
# [1.0, 0.0, 0.0, 4.0]

If you’re reading data that has CSV "NaN" strings, map them to Polars nulls at read time:

df = pl.read_csv("data.csv", null_values=["NaN", "nan", "N/A", ""])

Fix 5: `ColumnNotFoundError` and `with_columns` Chaining

ColumnNotFoundError: column 'total' not found

The most common cause: you create a column in one with_columns() call and try to reference it in the same call. New columns aren’t visible within the same with_columns() invocation.

import polars as pl

df = pl.DataFrame({"price": [10.0, 20.0, 30.0], "qty": [2, 3, 1]})

# WRONG — 'total' doesn't exist yet when 'discount' is computed
df.with_columns(
    total=pl.col("price") * pl.col("qty"),
    discount=pl.col("total") * 0.1,   # ColumnNotFoundError
)

# CORRECT — chain two with_columns calls
df.with_columns(
    total=pl.col("price") * pl.col("qty"),
).with_columns(
    discount=pl.col("total") * 0.1,
)

select() vs with_columns() — these are different operations that Pandas users often confuse:

# select() — returns only the listed columns (like SQL SELECT)
df.select("price", "qty")            # DataFrame with 2 columns
df.select(pl.col("price") * 1.1)     # Computed column, original dropped

# with_columns() — keeps all original columns and adds/replaces
df.with_columns(adjusted_price=pl.col("price") * 1.1)  # 3 columns: price, qty, adjusted_price

Rename columns to fix mismatches between datasets:

df.rename({"old_name": "new_name", "another_old": "another_new"})

Check column names before referencing them:

print(df.columns)   # List of column names
print(df.schema)    # Dict of {name: dtype}

Fix 6: `ShapeError` — Broadcasting Rules

ShapeError: unable to add a column of length 3 to a DataFrame of height 5

Polars is strict about shapes. A Series added to a DataFrame must either match the DataFrame’s height exactly or have length 1 (which broadcasts). Unlike NumPy or Pandas, there is no silent truncation or repetition.

import polars as pl

df = pl.DataFrame({"x": [1, 2, 3, 4, 5]})   # height = 5

# WRONG — Series has wrong length
s = pl.Series([10, 20, 30])   # length 3
df.with_columns(y=s)          # ShapeError

# CORRECT — Series matches height
s = pl.Series([10, 20, 30, 40, 50])
df.with_columns(y=s)          # Works

# CORRECT — Scalar broadcasts to all rows
df.with_columns(constant=pl.lit(42))   # 42 in every row

# CORRECT — Expressions operate row-by-row (automatic length match)
df.with_columns(doubled=pl.col("x") * 2)

For group-level aggregations that you want to join back to the original DataFrame, use .over() (window function) instead of .group_by().agg():

df = pl.DataFrame({
    "category": ["A", "A", "B", "B", "B"],
    "value": [10, 20, 15, 25, 5],
})

# group_by returns one row per group (height changes)
totals = df.group_by("category").agg(total=pl.col("value").sum())
# totals has 2 rows, df has 5 — can't add this back with with_columns

# CORRECT — over() keeps original height, broadcasts group result
df.with_columns(
    group_total=pl.col("value").sum().over("category")
)
# Every row gets the sum for its category group

Fix 7: `group_by` and Aggregation Syntax

Polars aggregation is explicit — you must list every column you want in the output. There is no as_index=False or automatic column retention.

import polars as pl

df = pl.DataFrame({
    "category": ["A", "A", "B", "B", "B"],
    "sub": ["x", "y", "x", "y", "x"],
    "value": [10, 20, 15, 25, 5],
})

# Basic aggregation
result = df.group_by("category").agg(
    pl.col("value").sum(),
    pl.col("value").mean().alias("avg_value"),
    pl.col("value").count().alias("n"),
)

# Multiple grouping columns
result = df.group_by("category", "sub").agg(
    total=pl.col("value").sum(),
)

# Group order is non-deterministic by default — use maintain_order for consistent output
result = df.group_by("category", maintain_order=True).agg(
    pl.col("value").sum()
)

# Multiple aggregations on the same column
result = df.group_by("category").agg([
    pl.col("value").sum().alias("total"),
    pl.col("value").mean().alias("avg"),
    pl.col("value").min().alias("min"),
    pl.col("value").max().alias("max"),
    pl.col("value").std().alias("std_dev"),
])

Window functions with .over() — like SQL’s PARTITION BY, they compute an aggregate per group but keep all original rows:

df.with_columns(
    # Sum per category, broadcast back to each row
    category_total=pl.col("value").sum().over("category"),
    # Rank within category
    category_rank=pl.col("value").rank(descending=True).over("category"),
    # Cumulative sum within category (in original row order)
    cumsum=pl.col("value").cum_sum().over("category"),
)

Fix 8: Reading Files and Schema Inference Problems

Polars infers column types by reading the first 1,024 rows. If your data has type-breaking values in row 1,025, the CSV read fails mid-stream.

Increase inference scan depth or disable it entirely:

import polars as pl

# Scan more rows before inferring (slower but safer)
df = pl.read_csv("data.csv", infer_schema_length=10_000)

# Read everything as strings, then cast manually (safest)
df = pl.read_csv("data.csv", infer_schema_length=0)
# df.dtypes are all Utf8; cast what you need
df = df.with_columns(
    amount=pl.col("amount").cast(pl.Float64, strict=False),
    count=pl.col("count").cast(pl.Int32, strict=False),
)

Map missing value strings to null at read time:

df = pl.read_csv(
    "data.csv",
    null_values=["N/A", "NA", "null", "NULL", "-", ""],
)

You can also specify per-column null values as a dict when different columns use different conventions.

Use Parquet for large production pipelines — it stores schema alongside data and reads dramatically faster than CSV:

# Write once
df.write_parquet("data.parquet")

# Read (schema always correct, no inference needed)
df = pl.read_parquet("data.parquet")

# Lazy scan with predicate pushdown (reads only matching rows from disk)
df = (
    pl.scan_parquet("large_data.parquet")
    .filter(pl.col("year") == 2025)
    .select(["date", "revenue", "region"])
    .collect()
)

Common Mistake: Using read_csv for files that are gigabytes in size. Use scan_csv(...).collect() instead so Polars can optimize the read with projection and predicate pushdown. The difference can be 10x in both time and peak memory.

Still Not Working?

Polars 0.x Code Breaks on 1.0

The most disruptive 0.x → 1.0 changes:

Old (0.x)	New (1.0)	Notes
`.apply()`	`.map_elements()`	With `return_dtype` arg
`.groupby()`	`.group_by()`	Underscore added
`lf.schema`	`lf.collect_schema()`	LazyFrame only
`.replace()`	`.replace()` + `.replace_strict()`	Behavior split
`pl.map()`	`pl.map_elements()`	Global function

Run python -c "import polars; print(polars.__version__)" to confirm which version you’re on, then check the official upgrade guide.

Performance: Polars Is Slower Than Expected

If Polars feels slower than Pandas on small DataFrames — it often is. Polars’ Rust execution engine has startup overhead that only pays off on larger datasets (typically 100k+ rows). For small local tables, this is normal. The gains become significant at millions of rows.

If large operations are slow, check whether you’re accidentally using eager evaluation when lazy would benefit from predicate pushdown. And always profile before using map_elements() — it surrenders Polars’ performance advantage.

Using Polars with PyTorch or NumPy

Polars integrates cleanly with NumPy and PyTorch through Arrow zero-copy:

import polars as pl
import numpy as np
import torch

df = pl.DataFrame({"x": [1.0, 2.0, 3.0], "y": [4.0, 5.0, 6.0]})

# To NumPy (zero-copy if no nulls)
arr = df.to_numpy()

# To PyTorch tensor
tensor = torch.from_numpy(df.to_numpy())

For training loops and model pipelines that consume Polars DataFrames, see PyTorch not working for tensor device and dtype patterns.

Migrating Large Pandas Codebases

The official Polars migration guide maps common Pandas patterns to Polars equivalents. For Pandas-specific errors you encounter while migrating, see pandas SettingWithCopyWarning and pandas merge key error.

Installing Optional Extras

Some Polars features require additional dependencies:

# Excel support (read_excel / write_excel)
pip install "polars[fastexcel]"

# Cloud storage (S3, GCS, Azure Blob)
pip install "polars[cloud]"

# All extras
pip install "polars[all]"

For installation failures — particularly when building from source on unusual platforms — see Python packaging not working.

Polars Hangs on Large `collect()`

A LazyFrame.collect() that never returns is usually a join or group_by spilling to disk. Watch htop — if memory is full and swap is climbing, the optimizer chose a plan that materializes more than fits. Add collect(streaming=True) and re-order joins so the smallest table is on the right side. If you have many small Parquet files, pl.scan_parquet("dir/*.parquet") is faster than pl.concat([pl.scan_parquet(f) for f in files]) because the optimizer can push predicates into the multi-file reader.

`pyarrow` Conflict Crashes the Interpreter

Polars vendors its own Arrow C++ runtime. Importing both Polars and PyArrow built against different Arrow versions can cause SIGSEGV or ImportError: PyCapsule_Import on call paths that cross between them (e.g., df.to_arrow() followed by a pyarrow.compute call). Pin pyarrow to the version listed in your polars wheel’s metadata, or stay inside Polars expressions until the very end of the pipeline.

`with_columns` Order Suddenly Matters

In Polars 1.x, with_columns evaluates all expressions in parallel by default. Two expressions that both read and write the same column race and the later one may not see the earlier value — a regression from 0.20 behavior. Wrap dependent expressions in separate with_columns calls or use with_columns(pl.col("a").alias("a_v2")) and reference a_v2 downstream. The error surfaces as silently wrong numbers, not an exception, so unit-test transformations that chain column updates.

Platform-Specific Differences

Polars has the same expression API across language bindings, but the runtime characteristics shift depending on where you run it. The difference between “Polars is amazing” and “Polars is slow” is almost always the runtime, not the code.

Python vs Rust vs Node Bindings

The Python binding (pip install polars) is the most feature-complete and the one all the docs target. The Rust crate (polars = "0.40") is a thinner layer over the same engine — it has the full expression API and is faster for short-lived processes because there is no Python interpreter cost, but streaming and some I/O codecs are gated behind cargo features (features = ["lazy", "streaming", "parquet"]). The NodeJS binding (nodejs-polars) lags the Python release by 2–4 versions and does not yet expose collect(streaming=True) or the GPU engine. Stick to Python for data work, Rust for embedded or server-side ETL, and only use the Node binding if you are forced to stay in a Node-only stack.

LazyFrame on a Cluster

Polars is single-node by design — there is no built-in distributed mode. For multi-machine work, the common patterns are:

scan_parquet("s3://bucket/year=2025/*") plus an external scheduler (Airflow, Prefect) that runs one Polars worker per partition. Each worker processes a slice and writes back to S3.
polars-cloud (currently in beta) targets distributed LazyFrame execution on Polars-managed infrastructure, but is not generally available.
Ray + Polars works if you call Polars inside @ray.remote functions, but cross-task DataFrame shuffling has to be done manually via object storage — there is no equivalent to Spark’s shuffle.

The most common mistake is reaching for multiprocessing.Pool(processes=N) to parallelize Polars. Polars already uses all cores via Rayon; spawning processes that each spawn N threads thrashes the CPU. Set POLARS_MAX_THREADS=1 in worker processes if you genuinely need both.

Apple Silicon (M1/M2/M3) Native

Polars publishes native arm64 wheels for macOS 11+ on PyPI. If pip install polars is installing the slow x86_64 build under Rosetta, your Python itself is x86 — check with python -c "import platform; print(platform.machine())". The fix is to reinstall Python via the official arm64 installer (or pyenv install 3.12.3 after arch -arm64 brew install pyenv). Performance on M1 native is typically 1.5–2x faster than x86 emulation, and collect(streaming=True) benefits the most because of the unified memory architecture.

Conda vs pip Wheel

The conda-forge polars package historically lagged the pip release by 1–2 minor versions and was sometimes built without lazy features enabled. As of 1.0, conda install -c conda-forge polars ships the same feature set as PyPI, but the build still trails by a release. If you need a recent fix, use pip install polars inside the conda env — they coexist fine as long as pyarrow versions match. Avoid mixing polars-lts-cpu (built without AVX2) and polars in the same environment; the wheel resolver picks one and the other becomes dead weight.

Fix: Polars Not Working — AttributeError, InvalidOperationError, and ShapeError

The Error

Why This Happens

Fix 1: Pandas API Errors — Method Names Changed

Fix 2: Lazy vs Eager — Don’t Forget `.collect()`

Fix 3: `InvalidOperationError` — Use Polars Expressions, Not Python Lambdas

Fix 4: Type Casting Errors — Strict vs Lenient

Fix 5: `ColumnNotFoundError` and `with_columns` Chaining

Fix 6: `ShapeError` — Broadcasting Rules

Fix 7: `group_by` and Aggregation Syntax

Fix 8: Reading Files and Schema Inference Problems

Still Not Working?

Polars 0.x Code Breaks on 1.0

Performance: Polars Is Slower Than Expected

Using Polars with PyTorch or NumPy

Migrating Large Pandas Codebases

Installing Optional Extras

Polars Hangs on Large `collect()`

`pyarrow` Conflict Crashes the Interpreter

`with_columns` Order Suddenly Matters

Platform-Specific Differences

Python vs Rust vs Node Bindings

LazyFrame on a Cluster

Apple Silicon (M1/M2/M3) Native

Conda vs pip Wheel

Related Articles

Fix: Dask Not Working — Scheduler Errors, Out of Memory, and Delayed Not Computing

Fix: Jupyter Notebook Not Working — Kernel Dead, Module Not Found, and Widget Errors

Fix: LightGBM Not Working — Installation Errors, Categorical Features, and Training Issues

Fix: Matplotlib Not Working — Plots Not Showing, Blank Output, and Figure Layout Problems

The Error

Why This Happens

Fix 1: Pandas API Errors — Method Names Changed

Fix 2: Lazy vs Eager — Don’t Forget .collect()

Fix 3: InvalidOperationError — Use Polars Expressions, Not Python Lambdas

Fix 4: Type Casting Errors — Strict vs Lenient

Fix 5: ColumnNotFoundError and with_columns Chaining

Fix 6: ShapeError — Broadcasting Rules

Fix 7: group_by and Aggregation Syntax

Fix 8: Reading Files and Schema Inference Problems

Still Not Working?

Polars 0.x Code Breaks on 1.0

Performance: Polars Is Slower Than Expected

Using Polars with PyTorch or NumPy

Migrating Large Pandas Codebases

Installing Optional Extras

Polars Hangs on Large collect()

pyarrow Conflict Crashes the Interpreter

with_columns Order Suddenly Matters

Platform-Specific Differences

Python vs Rust vs Node Bindings

LazyFrame on a Cluster

Apple Silicon (M1/M2/M3) Native

Conda vs pip Wheel

Related Articles

Fix: Dask Not Working — Scheduler Errors, Out of Memory, and Delayed Not Computing

Fix: Jupyter Notebook Not Working — Kernel Dead, Module Not Found, and Widget Errors

Fix: LightGBM Not Working — Installation Errors, Categorical Features, and Training Issues

Fix: Matplotlib Not Working — Plots Not Showing, Blank Output, and Figure Layout Problems

Fix 2: Lazy vs Eager — Don’t Forget `.collect()`

Fix 3: `InvalidOperationError` — Use Polars Expressions, Not Python Lambdas

Fix 5: `ColumnNotFoundError` and `with_columns` Chaining

Fix 6: `ShapeError` — Broadcasting Rules

Fix 7: `group_by` and Aggregation Syntax

Polars Hangs on Large `collect()`

`pyarrow` Conflict Crashes the Interpreter

`with_columns` Order Suddenly Matters