OpenAI announced an upgrade to ChatGPT’s Deep Research feature, now powered by GPT-5.2. Users can connect apps, search websites, track search progress, and adjust queries. A new full-screen viewer facilitates navigation of AI reports with a table of contents and sources. It includes real-time report tracking, editing scope, adding sources, and downloading in formats like Markdown, Word, and PDF.
OpenAI has released updates to its Responses API, which now includes features like Server-side Compaction, Hosted Shell Containers, and a new Skills standard for agents. These updates allow for improved context retention during long-running tasks and provide a managed cloud environment for executing code with native execution capabilities for multiple programming languages. The Server-side Compaction feature lets agents operate without losing context over extended sessions, while the Hosted Shell Containers provide a full terminal environment with persistent storage.
Why do we care?
OpenAI isn’t just selling models now—it’s selling the runtime. That’s convenient, and it changes the risk profile.
Server-side Compaction is marketed as “agents that don’t lose context,” but it’s effectively black-box summarization. You don’t get to inspect what was kept, what was dropped, or how meaning shifts over repeated compressions. If you design a multi-hour diagnostic workflow around that, you’re trusting a hidden process to preserve the exact details your client will later argue mattered. That’s faith, not engineering.
Hosted Shell Containers reduce operational burden, but they also introduce dependency risk. When the hosted environment degrades, your agent doesn’t merely slow down—it can fail mid-process. Without clear service guarantees and a tested fallback path, you’ve moved an operational failure mode into the middle of client delivery.
And the new Skills standard is promising for interoperability, but it expands the procedural supply chain. You’re executing natural-language instructions via non-deterministic systems—exactly the kind of surface area prompt injection thrives in.
MSP takeaway: use these features where the downside is tolerable—internal tooling, first-pass research, draft generation, high-volume low-stakes workflows. Avoid them for regulated data, incident response, and “agent makes changes” automation unless you externalize state into auditable systems, add deterministic checkpoints, and prove performance with retention tests. If you can’t measure what the agent retained, you can’t promise what it will do.

