Recent research from Meta, Google DeepMind, Cornell University, and NVIDIA reveals that large language models have a fixed memorization capacity of approximately 3.6 bits per parameter. This finding is significant as it clarifies how much of the training data is memorized versus generalized, which is critical for understanding the operational mechanisms of these models and addressing copyright concerns. The study indicates that training on larger datasets does not increase the likelihood of memorizing specific data points; instead, it distributes the memorization capacity across the dataset. For instance, a model trained on a dataset of 1.5 billion parameters can hold about 5.4 billion bits of information. The researchers employed a unique approach by training models on random bitstrings to measure memorization without the influence of patterns found in natural language. This method provides a clearer understanding of how language models retain information, which is particularly relevant in ongoing discussions surrounding data privacy and copyright in the AI industry.
Google CEO Sundar Pichai has introduced a new term for the current phase of artificial intelligence, referring to it as “artificial jagged intelligence.” This concept highlights the non-linear progress in AI development, as researchers and developers face significant challenges despite achieving impressive milestones. Pichai explained that while advancements continue, many AI models still struggle with basic tasks, describing the current state of AI as marked by “jagged edges”—a mix of extraordinary capabilities and notable errors. This perspective echoes sentiments shared by Andrej Karpathy, a prominent figure in deep learning, who emphasized the unpredictability of AI performance.
A new breed of artificial intelligence bot is rapidly transforming how users access information online, with traffic from these retrieval bots increasing by 49 percent in the first quarter of 2025 compared to the previous quarter. Companies like OpenAI and Anthropic are deploying these bots to summarize content in real time, moving away from traditional search methods that provided links to multiple sources. According to data from TollBit, a New York-based start-up, the growth of retrieval bots is exponential, reflecting a rising demand for content, even as human traffic to news sites declines. TollBit’s CEO, Toshit Panigrahi, emphasized the need for publishers to adapt to this shift, stating that “this is coming for everyone.” The report notes that more than 26 million AI scrapes bypassed content blockers in March 2025, highlighting the challenges publishers face in protecting their material. As the landscape changes, producers of content may need to rethink how they engage with AI visitors to sustain their business models.
Why do we care?
Sundar Pichai’s framing of the current phase as jagged is more than a catchy term. It’s an admission of AI’s inconsistent maturity: astonishing performance in some areas, baffling failures in others.
The rise of retrieval bots—from OpenAI and Anthropic especially—signals a massive redefinition of the web’s economic structure. Users increasingly receive summarized outputs, not links. This has enormous implications for content-driven businesses. We’re moving into a world where content is consumed by AI first, human second. Businesses need to rethink attribution, tracking, and even monetization models accordingly.
The AI landscape is becoming less about building capability and more about governing, packaging, and protecting it. You’re no longer just implementing tools—you’re shaping AI outcomes, managing its risks, and helping clients find relevance in a transformed digital economy.
Treat AI like electricity in a jagged power grid: valuable, unpredictable, and politically charged. How you wire it up for your clients—safely, sustainably, and strategically—will determine your future value.

