<?xml version="1.0" encoding="UTF-8"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>inference — FixDevs</title><description>Latest fixes and solutions for inference errors on FixDevs.</description><link>https://fixdevs.com/</link><language>en</language><lastBuildDate>Wed, 20 May 2026 00:00:00 GMT</lastBuildDate><atom:link href="https://fixdevs.com/tags/inference/rss.xml" rel="self" type="application/rss+xml"/><item><title>Fix: Replicate Not Working — Model Versions, Prediction Polling, Webhooks, and Cog Build</title><link>https://fixdevs.com/blog/replicate-not-working/</link><guid isPermaLink="true">https://fixdevs.com/blog/replicate-not-working/</guid><description>How to fix Replicate API errors — model version ID required, prediction polling vs streaming, webhook signature verification, file inputs and HTTPS URLs, cold start latency, Cog deployment, and deployments vs predictions.</description><pubDate>Wed, 20 May 2026 00:00:00 GMT</pubDate><category>replicate</category><category>ml</category><category>inference</category><category>cog</category><category>ai</category><category>api</category><author>FixDevs</author></item><item><title>Fix: vLLM Not Working — CUDA OOM, Model Loading, and API Server Errors</title><link>https://fixdevs.com/blog/vllm-not-working/</link><guid isPermaLink="true">https://fixdevs.com/blog/vllm-not-working/</guid><description>How to fix vLLM errors — CUDA out of memory during model load, tokenizer mismatch with HuggingFace, tensor parallel size does not match GPU count, KV cache exceeds memory, OpenAI API compatibility issues, and max_model_len too large.</description><pubDate>Thu, 09 Apr 2026 00:00:00 GMT</pubDate><category>python</category><category>vllm</category><category>llm</category><category>inference</category><category>machine-learning</category><category>gpu</category><category>debugging</category><author>FixDevs</author></item></channel></rss>