Skip to content

Fix: TensorFlow Not Working — OOM, Shape Mismatch, GPU Not Found, and Keras Errors

FixDevs ·

Quick Answer

How to fix TensorFlow errors — GPU not detected CUDA library missing, ResourceExhaustedError OOM, InvalidArgumentError shape mismatch, NaN loss, @tf.function AutoGraph failures, and Keras 3 breaking changes in TF 2.16+.

The Error

You install TensorFlow and immediately hit a wall:

Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file

Or training starts but crashes with an out-of-memory error:

ResourceExhaustedError: OOM when allocating tensor with shape [64, 512, 512, 3]

Or the model compiles but shapes don’t line up:

InvalidArgumentError: Incompatible shapes: [32, 10] vs [32]

Or the model trains but the loss stays at NaN from epoch one.

Or you upgrade TensorFlow and your working Keras code breaks with import errors you didn’t expect.

Each of these failures has a different root cause. This guide covers all of them.

Why This Happens

TensorFlow combines a Python API, a C++ runtime, and an optional CUDA layer for GPU execution — three independent systems that must be version-matched exactly. The Python API itself is split between eager execution (immediate, like NumPy) and graph execution (via @tf.function), which have meaningfully different behavior.

Keras adds another layer: TF 2.16 shipped Keras 3 as the default backend, breaking several TF 1.x and Keras 2.x patterns that codebases had relied on for years.

Most TensorFlow errors fall into one of these categories: environment mismatch, shape mismatch, training instability, graph execution semantics, or Keras API changes.

Fix 1: GPU Not Detected — CUDA Library Errors

Could not load dynamic library 'libcudart.so.11.0'
W tensorflow/core/common_runtime/gpu/gpu_device.cc: No GPU devices available

These warnings appear when TensorFlow can’t find the CUDA libraries it was built against. Your TF version and CUDA version must match:

TensorFlowCUDAcuDNN
2.16.x12.38.9
2.15.x12.28.9
2.14.x11.88.7
2.13.x11.88.6

Check what TF sees:

import tensorflow as tf

print(tf.__version__)
print(tf.config.list_physical_devices('GPU'))  # Empty list = no GPU found
print(tf.sysconfig.get_build_info()['cuda_version'])  # CUDA TF was built with

The easiest fix on Linux is to install TF with its bundled CUDA via pip — no system CUDA needed:

pip install tensorflow[and-cuda]   # TF 2.14+ Linux only

This installs the matching nvidia-* packages automatically. For Windows or older versions, install CUDA and cuDNN from NVIDIA separately, matching the table above.

Verify CUDA is visible to TF after install:

python -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
# Expected: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

If the list is empty despite CUDA being installed, check:

# Confirm CUDA libraries are on LD_LIBRARY_PATH (Linux)
ldconfig -p | grep libcudart

# Or set it explicitly
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

Note: The CUDA library warning (“Could not load dynamic library”) is printed even when TF falls back gracefully to CPU. Your code still runs — just on CPU. This is a warning, not a crash. Silence it with export TF_CPP_MIN_LOG_LEVEL=2 if you’re intentionally running CPU-only.

Fix 2: ResourceExhaustedError OOM — Managing GPU Memory

ResourceExhaustedError: OOM when allocating tensor with shape [64, 512, 512, 3]
failed to allocate memory

TensorFlow pre-allocates the entire GPU memory at startup by default. Every other process gets nothing. Enable memory growth so TF only takes what it needs:

import tensorflow as tf

gpus = tf.config.list_physical_devices('GPU')
if gpus:
    try:
        for gpu in gpus:
            tf.config.experimental.set_memory_growth(gpu, True)
    except RuntimeError as e:
        # Must be set before GPU is initialized — call this before any model code
        print(e)

Call this before any model or tensor operations, or it raises RuntimeError. Put it at the top of your script, before any imports that touch TF internals.

Cap TF to a specific memory fraction if you’re sharing the GPU with other processes:

gpus = tf.config.list_physical_devices('GPU')
if gpus:
    tf.config.set_logical_device_configuration(
        gpus[0],
        [tf.config.LogicalDeviceConfiguration(memory_limit=4096)]  # 4 GB cap
    )

Reduce batch size — the most direct fix. OOM during training almost always means your batch doesn’t fit in VRAM. Halve the batch size and double your gradient accumulation steps to maintain the effective batch size:

# Instead of batch_size=64 on 8GB GPU:
batch_size = 16
accumulation_steps = 4  # Accumulate 4 steps before updating weights

optimizer = tf.keras.optimizers.Adam(learning_rate=1e-4)

for step, (x_batch, y_batch) in enumerate(dataset):
    with tf.GradientTape() as tape:
        predictions = model(x_batch, training=True)
        loss = loss_fn(y_batch, predictions) / accumulation_steps

    gradients = tape.gradient(loss, model.trainable_variables)

    if (step + 1) % accumulation_steps == 0:
        optimizer.apply_gradients(zip(gradients, model.trainable_variables))

Use mixed precision to nearly halve memory usage on GPU:

from tensorflow.keras import mixed_precision

# Set at the top of your script, before building the model
mixed_precision.set_global_policy('mixed_float16')

# Build model normally — activations use float16, weights stay float32
model = tf.keras.Sequential([...])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# model.fit() handles loss scaling automatically
model.fit(train_dataset, epochs=10)

Pro Tip: Mixed precision alone can give 2–3x speedup on Ampere GPUs (RTX 30xx, A100) due to Tensor Core utilization, in addition to the memory savings. Use 'mixed_bfloat16' instead on TPUs.

Fix 3: InvalidArgumentError — Shape Mismatch

InvalidArgumentError: Incompatible shapes: [32, 10] vs [32]
logits and labels must have the same first dimension

Shape errors in TensorFlow usually come from one of three places: the label format doesn’t match the loss function, the first layer doesn’t know the input shape, or operations assume a dimension that doesn’t exist.

Label format vs. loss function mismatch:

import tensorflow as tf
import numpy as np

# Scenario: 10-class classification

# WRONG — sparse_categorical_crossentropy expects integer labels [0, 9]
# but you're passing one-hot encoded labels
labels = np.eye(10)[class_indices]   # shape (batch, 10) — one-hot
loss = tf.keras.losses.SparseCategoricalCrossentropy()
# Crashes: expects shape (batch,), got (batch, 10)

# FIX option 1 — use CategoricalCrossentropy for one-hot labels
loss = tf.keras.losses.CategoricalCrossentropy()

# FIX option 2 — use integer labels with SparseCategoricalCrossentropy
labels = class_indices   # shape (batch,) — integers

# Matrix: which loss to use
# SparseCategoricalCrossentropy → integer labels (0, 1, 2, ...)
# CategoricalCrossentropy       → one-hot labels ([0, 0, 1, 0, ...])
# BinaryCrossentropy            → binary labels (0 or 1), output shape (batch, 1)

First layer input shape — Keras needs to know the input shape to build the model:

# WRONG — shape unknown until first batch, error on predict
model = tf.keras.Sequential([
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax'),
])

# CORRECT — specify input_shape in first layer
model = tf.keras.Sequential([
    tf.keras.layers.Dense(128, activation='relu', input_shape=(784,)),
    tf.keras.layers.Dense(10, activation='softmax'),
])

# Or use an explicit Input layer
inputs = tf.keras.Input(shape=(784,))
x = tf.keras.layers.Dense(128, activation='relu')(inputs)
outputs = tf.keras.layers.Dense(10, activation='softmax')(x)
model = tf.keras.Model(inputs, outputs)

Flatten before Dense layers — Conv layers output 3D tensors (batch, height, width, channels). Dense layers expect 2D (batch, features):

model = tf.keras.Sequential([
    tf.keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Flatten(),   # Required before Dense
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax'),
])

Check actual shapes at any point with eager execution:

x = tf.ones((4, 28, 28, 1))
for layer in model.layers:
    x = layer(x)
    print(f"{layer.name}: {x.shape}")

Fix 4: Loss Is NaN or Not Decreasing

A loss that starts at NaN means the model is numerically unstable before training begins. A loss that’s valid but flat means optimization isn’t working.

NaN loss — most common causes:

# 1. Learning rate too high — reduce by 10x first
optimizer = tf.keras.optimizers.Adam(learning_rate=1e-4)   # Not 1e-2 or 0.1

# 2. Exploding gradients — clip gradient norm
optimizer = tf.keras.optimizers.Adam(
    learning_rate=1e-4,
    clipnorm=1.0,    # Rescale gradient vector to unit norm if it exceeds 1.0
)

# 3. Log of zero in custom loss — add epsilon
def custom_loss(y_true, y_pred):
    epsilon = 1e-7
    y_pred = tf.clip_by_value(y_pred, epsilon, 1.0 - epsilon)
    return -tf.reduce_mean(y_true * tf.math.log(y_pred))

# 4. Check for NaN in your data before training
import numpy as np
assert not np.isnan(X_train).any(), "NaN in training features"
assert not np.isnan(y_train).any(), "NaN in training labels"

Use the NaN debug callback to find which batch causes the blow-up:

model.fit(
    X_train, y_train,
    callbacks=[tf.keras.callbacks.TerminateOnNaN()],
)
# Stops training and prints which batch/epoch triggered NaN

Loss not decreasing — check these in order:

# 1. Verify the model is actually training (not just evaluating)
model.compile(optimizer='adam', loss='...', metrics=['accuracy'])
model.fit(X_train, y_train, epochs=10)   # Not model.predict() or model.evaluate()

# 2. Shuffle your dataset — ordered data stalls gradient descent
model.fit(X_train, y_train, shuffle=True, epochs=10)

# 3. Use learning rate scheduling to escape plateaus
lr_schedule = tf.keras.callbacks.ReduceLROnPlateau(
    monitor='val_loss', factor=0.5, patience=5, min_lr=1e-7
)
model.fit(X_train, y_train, validation_split=0.2, callbacks=[lr_schedule], epochs=50)

# 4. Check class imbalance — a model predicting all zeros for 95% majority class
#    achieves 95% accuracy but learns nothing useful
from sklearn.utils.class_weight import compute_class_weight
class_weights = compute_class_weight('balanced', classes=np.unique(y_train), y=y_train)
model.fit(X_train, y_train, class_weight=dict(enumerate(class_weights)))

Fix 5: @tf.function and AutoGraph Errors

OperatorNotAllowedInGraphError: Using a `tf.Tensor` as a Python `bool` is not allowed in Graph execution

@tf.function converts Python functions to TensorFlow computation graphs for speed. This conversion (tracing) runs your Python code once and records operations — which means Python-level control flow over tf.Tensor values doesn’t work the way you expect.

import tensorflow as tf

# WRONG — Python bool check on a Tensor doesn't work in graph mode
@tf.function
def bad_function(x):
    if x > 0:    # OperatorNotAllowedInGraphError
        return x * 2
    return x

# CORRECT — use tf.cond for tensor-dependent branching
@tf.function
def good_function(x):
    return tf.cond(x > 0, lambda: x * 2, lambda: x)

# ALSO CORRECT — Python conditions on Python values work fine
@tf.function
def with_python_flag(x, scale=True):
    if scale:   # Python bool, not a Tensor — this traces two separate graphs
        return x * 2
    return x

print() only runs during tracing — use tf.print() for runtime debugging:

@tf.function
def debug_function(x):
    print("Tracing!")          # Runs ONCE during tracing, not on each call
    tf.print("x value:", x)   # Runs on EVERY call
    return x + 1

debug_function(tf.constant(5))
debug_function(tf.constant(10))
# Output:
# Tracing!
# x value: 5
# x value: 10

Prevent excessive retracing with input_signature:

@tf.function(input_signature=[
    tf.TensorSpec(shape=[None, 128], dtype=tf.float32),
    tf.TensorSpec(shape=[None],      dtype=tf.int32),
])
def train_step(features, labels):
    with tf.GradientTape() as tape:
        predictions = model(features, training=True)
        loss = loss_fn(labels, predictions)
    gradients = tape.gradient(loss, model.trainable_variables)
    optimizer.apply_gradients(zip(gradients, model.trainable_variables))
    return loss

Without input_signature, TF retraces for each new input shape — which is expensive. With None dimensions in TensorSpec, any size along that axis is accepted without retracing.

Debug graph execution by temporarily disabling it:

# Disable @tf.function globally — everything runs eagerly
tf.config.run_functions_eagerly(True)

# Now errors show actual Python tracebacks, not graph execution errors
model.fit(X_train, y_train, epochs=1)

# Re-enable when done debugging
tf.config.run_functions_eagerly(False)

Fix 6: Custom Layers Not Working

Custom Keras layers break most often because build() isn’t implemented, training=False isn’t handled, or the layer doesn’t call super().__init__().

import tensorflow as tf

# WRONG — no build(), no training kwarg
class BadLayer(tf.keras.layers.Layer):
    def call(self, x):
        return x * self.scale   # AttributeError: no self.scale

# CORRECT — full custom layer pattern
class ScaledDropout(tf.keras.layers.Layer):
    def __init__(self, rate=0.5, scale=2.0, **kwargs):
        super().__init__(**kwargs)   # Always pass **kwargs to super
        self.rate = rate
        self.scale = scale

    def build(self, input_shape):
        # Create weights here — input_shape is known at this point
        self.learned_scale = self.add_weight(
            name='learned_scale',
            shape=(input_shape[-1],),
            initializer='ones',
            trainable=True,
        )
        super().build(input_shape)

    def call(self, inputs, training=False):
        # training=False is the correct default — same as built-in layers
        if training:
            inputs = tf.nn.dropout(inputs, rate=self.rate)
        return inputs * self.learned_scale * self.scale

    def get_config(self):
        # Required for model saving/loading with this custom layer
        config = super().get_config()
        config.update({'rate': self.rate, 'scale': self.scale})
        return config

Common Mistake: Omitting get_config(). If you save a model that contains your custom layer and then load it without the original class definition in scope, you get a deserialization error. get_config() is what allows tf.keras.models.load_model() to reconstruct the layer.

Fix 7: Model Save/Load Errors

ValueError: Unable to load model saved in TF SavedModel format
OSError: No such file or directory: 'model.h5'

TensorFlow supports three save formats with different tradeoffs:

# 1. Native Keras format (.keras) — recommended for TF 2.12+
model.save('model.keras')
loaded = tf.keras.models.load_model('model.keras')

# 2. SavedModel format — portable, works outside Python (TF Serving, TFLite)
model.save('saved_model_dir/')            # Saves as a directory
loaded = tf.saved_model.load('saved_model_dir/')  # Low-level object, not Keras model
# OR
loaded = tf.keras.models.load_model('saved_model_dir/')  # Keras model (if saved via .save())

# 3. Legacy HDF5 format (.h5) — deprecated, for backwards compat only
model.save('model.h5')
loaded = tf.keras.models.load_model('model.h5')

Custom objects require registration when loading:

# Save
model.save('model.keras')

# Load with custom layer class provided
loaded = tf.keras.models.load_model(
    'model.keras',
    custom_objects={'ScaledDropout': ScaledDropout}
)

# Or register once globally
@tf.keras.utils.register_keras_serializable()
class ScaledDropout(tf.keras.layers.Layer):
    ...
# Now load_model() finds it automatically

Save only weights when you have the architecture separately:

# Save
model.save_weights('weights.weights.h5')

# Rebuild architecture, then load
model = build_model()  # Your function that creates the model
model.load_weights('weights.weights.h5')

tf.saved_model.load() returns a callable object, not a Keras model. You can’t call .fit() or .predict() on it — only the serving_default signature:

import tensorflow as tf

infer = tf.saved_model.load('saved_model_dir/')
serving_fn = infer.signatures['serving_default']

result = serving_fn(tf.constant(X_test, dtype=tf.float32))
predictions = result['output_0'].numpy()

Fix 8: Keras 3 Migration — TF 2.16+ Breaking Changes

TensorFlow 2.16 made Keras 3 the default backend. If you installed tensorflow>=2.16 and your code uses from tensorflow import keras or tf.keras, you may hit unexpected behavior.

Check which Keras version you have:

import tensorflow as tf
import keras

print(tf.__version__)     # e.g. 2.16.0
print(keras.__version__)  # e.g. 3.0.0 — Keras 3 installed

The most common breakage is code that relied on Keras 2.x internals:

# Keras 2.x pattern — works in TF 2.15 and earlier
model.weights[0].numpy()   # weights returned tf.Variable

# Keras 3.x behavior — weights are now keras.Variable wrappers
# Use .value property or call tf.constant() to convert
weights = [w.numpy() for w in model.weights]

Revert to Keras 2 behavior without downgrading TF:

# Install the legacy Keras 2 package alongside TF 2.16+
pip install tf-keras~=2.16

# Then set the environment variable before importing TF
export TF_USE_LEGACY_KERAS=1
import os
os.environ['TF_USE_LEGACY_KERAS'] = '1'   # Must be before import tensorflow

import tensorflow as tf
from tensorflow import keras   # Now uses Keras 2 behavior

tf.estimator was removed in TF 2.16. If your code uses tf.estimator.Estimator, you must stay on TF 2.15 or rewrite using tf.keras.Model.fit().

Migrating to Keras 3 imports (the forward-compatible path):

# Old pattern (works but may break in future TF versions)
from tensorflow import keras
from tensorflow.keras import layers

# New pattern (Keras 3 standalone — works independently of TF version)
import keras
from keras import layers

The standalone keras package (installed via pip install keras) works with TF, JAX, or PyTorch as backends. It’s the long-term direction; tf.keras is a compatibility shim.

Still Not Working?

Check Your TF and Python Versions

TF drops Python version support over time:

python --version        # Check Python
pip show tensorflow     # Check installed TF version

# For GPU, also verify NVIDIA driver version
nvidia-smi

tf.data Pipeline Errors

OOM and slowness during training often originate in the data pipeline, not the model. Profile it:

import tensorflow as tf

dataset = (
    tf.data.Dataset.from_tensor_slices((X_train, y_train))
    .shuffle(buffer_size=10_000)
    .batch(32)
    .map(preprocess_fn, num_parallel_calls=tf.data.AUTOTUNE)
    .prefetch(tf.data.AUTOTUNE)   # Overlap data loading with GPU compute
    .cache()                       # Cache to RAM after first epoch (if dataset fits)
)

.prefetch(tf.data.AUTOTUNE) is the single most impactful change for GPU utilization — without it, the GPU waits idle while the CPU loads the next batch. For DataLoader patterns in PyTorch, see PyTorch not working.

Mixed Precision in Custom Training Loops

model.fit() handles loss scaling automatically. In custom loops, you must do it manually:

from tensorflow.keras import mixed_precision

mixed_precision.set_global_policy('mixed_float16')
optimizer = tf.keras.optimizers.Adam(1e-4)
optimizer = mixed_precision.LossScaleOptimizer(optimizer)   # Wrap it

@tf.function
def train_step(x, y):
    with tf.GradientTape() as tape:
        predictions = model(x, training=True)
        loss = loss_fn(y, predictions)
        scaled_loss = optimizer.get_scaled_loss(loss)   # Scale before backward

    scaled_gradients = tape.gradient(scaled_loss, model.trainable_variables)
    gradients = optimizer.get_unscaled_gradients(scaled_gradients)   # Unscale after
    optimizer.apply_gradients(zip(gradients, model.trainable_variables))
    return loss

Using Pretrained Models

For large pretrained models from HuggingFace that you want to run through TF, see Hugging Face Transformers not working for token authentication, device mapping, and quantization. For local model serving without writing code, Ollama not working covers the daemon and GPU setup.

Installing TF When pip Fails

If pip install tensorflow fails with build errors — especially on ARM Macs, older Linux distros, or unusual Python versions — see pip could not build wheels for platform-specific install strategies.

F

FixDevs

Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.

Was this article helpful?

Related Articles