D^3 Content & Learning, Author at Digital Data Design Institute at 性视界 The Digital Data Design Institute at 性视界 catalyzes new knowledge to invent a better future by solving ambitious challenges. Tue, 14 Apr 2026 13:38:27 +0000 en-US hourly 1 https://wordpress.org/?v=6.9.4 /wp-content/uploads/2023/09/cropped-A5@1x-150x150.png D^3 Content & Learning, Author at Digital Data Design Institute at 性视界 32 32 The Hidden Economics of Workplace AI /the-hidden-economics-of-workplace-ai/ Tue, 14 Apr 2026 13:00:23 +0000 /?p=30022 As AI learns directly from how people work, a new tension is emerging about expertise, power, and governance. Listen to this article: In many workplaces, the newest addition to virtual meetings isn鈥檛 a colleague, but an AI assistant like Granola or Otter. Suddenly no one has to scramble for action items or wonder who said […]

The post The Hidden Economics of Workplace AI appeared first on Digital Data Design Institute at 性视界.

]]>
As AI learns directly from how people work, a new tension is emerging about expertise, power, and governance.

In many workplaces, the newest addition to virtual meetings isn鈥檛 a colleague, but an AI assistant like Granola or Otter. Suddenly no one has to scramble for action items or wonder who said what. The tool fades into the background while work gets a little smoother. And somewhere downstream, the precise record of how capable people think through a problem, handle a difficult client, or navigate a complex negotiation becomes raw material for an AI model. The convenience is real, and the implications are enormous. The new working paper 鈥,鈥 co-written by D^3 Associate , confronts this dynamic head-on. What happens when workers realize that their work habits, insights, and creativity are training the systems that could replace them? Combining survey evidence, a randomized experiment, and formal economic theory, the authors show that when workers understand that the information they give out to AI may strengthen the organization鈥檚 hand later, they may change how much they share.

Key Insight: The Surveillance Economy

鈥淸W]ork increasingly generates data about work: records of how exactly people do their jobs.鈥 [1]

The working paper argues that workplace surveillance has created a new pipeline from labor to AI: the data produced while employees do their jobs can now be used to train systems that replicate or automate their expertise. This matters because workers report holding large amounts of valuable 鈥渦ncodified鈥 knowledge like tacit know-how, judgment, communication skills and context-specific understanding of customers, projects, and processes that are not fully captured in manuals or company-wide wikis (think Atlassian Confluence). The authors find that workers believe they have substantial control over how much of their knowledge becomes visible to employers, whether by documenting more carefully, communicating on or off monitored channels (think Slack), or altering their participation in surveilled workflows. As workers become aware that this information may be used to build AI that performs similar work, they may become less willing to share it. 

Key Insight: When Workers Find Out, They Pull Back

鈥淸W]orkers who are more aware of how their data may be used choose to forgo meaningful payments for both their past and future data.鈥 [2]

In a study with 971 participants drawn from the online survey platform Prolific, the authors randomly showed half the group a video explaining that AI systems can be trained on records of worker behavior, including their own survey responses. The other half watched an equivalent video that said nothing about data being used for AI training. The effect was stark. Among workers who saw the AI-training explanation, 41% refused to share their past survey data in exchange for a $10 bonus, compared to just 25% in the control group. The same workers were also significantly less willing to participate in future surveys at their existing wage. These results drive the paper鈥檚 theoretical argument. In the model, workers recognize that the knowledge they reveal today can improve the firm by helping create AI that substitutes for their expertise. Anticipating weaker future bargaining power, workers may withhold knowledge in the present. That withholding is individually rational, but collectively costly: it reduces productivity and limits the quality of the AI systems firms can build. Under the current default, worker awareness does not simply slow adoption because people dislike AI, it slows adoption because workers have reason to protect themselves. 

Key Insight: A Fight Over Ownership and Governance

鈥淸C]ollective bargaining over work data eliminates this externality and can achieve both efficient knowledge sharing and a more equitable division of the gains from AI.鈥 [3]

The paper highlights a gap between what workers prefer and what may best protect them. Workers in the survey favored individual ownership of work data, meaning the right to control and sell their own data for AI development. But because each worker鈥檚 knowledge supply (鈥渢he recorded aspects of labor鈥 [4] that could train an AI) could be a substitute for one another, each individual sale strengthens the firm鈥檚 bargaining position against every other worker. Collective ownership resolves this. When workers bargain jointly and their knowledge supplies are bundled together, one worker鈥檚 contribution no longer undermines another’s position. The competition externality disappears. The broader implication is that workplace AI governance should be understood not just as a privacy issue, but as a labor-market and institutional design issue shaped by bargaining power, ownership rights, and collective labor arrangements. 

Why This Matters

For business leaders, this research surfaces a friction that most AI adoption strategies don鈥檛 account for yet. The employees whose expertise you most need to encode could be precisely the ones most aware of what鈥檚 at stake when they share it. As AI tools become more capable and more visible in the workplace, worker awareness will only rise, and so could strategic withholding. This creates a clear managerial implication: organizations can improve AI adoption not just by deploying better tools, but by discussing employee career concerns directly and giving people more meaningful control over how their work data is used. Firms that treat data governance as part of talent strategy and innovation design, rather than a legal checkbox, may be better positioned to unlock mutual benefit: stronger AI performance, higher productivity, and gains that are shared more broadly by the people helping to build the organization鈥檚 future.

Bonus

This paper shows that resistance to workplace AI is not just a matter of fear or inertia, it can emerge whenever new systems redistribute knowledge, bargaining power, or control over how work gets done. For another example, where the friction appears closer to management, check out The Manager鈥檚 AI Dilemma for a perspective on how AI can threaten the authority, discretion, and legitimacy of the very roles expected to approve and implement AI in the workplace.

References

[1] Cullen, Zo毛, Danielle Li, and Shengwu Li, 鈥,鈥 Working Paper (March 30, 2026): 1.

[2] Cullen et al., 鈥淟abor as Capital,鈥 16.

[3] Cullen et al., 鈥淟abor as Capital,鈥 2.

[4] Cullen et al., 鈥淟abor as Capital,鈥 1.

Meet the Authors

Headshot of Zoe Cullen

is Associate Professor of Business Administration at 性视界 Business School and Associate at D^3.

Headshot of Danielle Li

is the David Sarnoff Professor of Management of Technology and a Professor at the MIT Sloan School of Management.

Headshot of Shengwu Li

is Professor of Economics at 性视界 University.

Watch a video version of the Insight Article .

The post The Hidden Economics of Workplace AI appeared first on Digital Data Design Institute at 性视界.

]]>
Back to the Beginnings of AI at Work /back-to-the-beginnings-of-ai-at-work/ Thu, 09 Apr 2026 11:14:14 +0000 /?p=29980 What a Landmark AI Study Tells Us About When to Trust, and When Not to Trust, AI Listen to this article: In September 2023, a working paper out of 性视界 Business School landed at an unusually consequential moment. Generative AI had been publicly available for less than a year, organizations were scrambling to understand its […]

The post Back to the Beginnings of AI at Work appeared first on Digital Data Design Institute at 性视界.

]]>
What a Landmark AI Study Tells Us About When to Trust, and When Not to Trust, AI

In September 2023, a working paper out of 性视界 Business School landed at an unusually consequential moment. Generative AI had been publicly available for less than a year, organizations were scrambling to understand its implications, and almost no rigorous field evidence existed on how it actually affected professional performance. 鈥溾 offered exactly that. Now, in March 2026, that research has been formally published in the peer-reviewed journal . To mark this milestone, we鈥檙e revisiting the study and its findings. The questions it set out to answer, what AI actually does to knowledge worker performance, where it helps, where it hurts, and why, were foundational then. They remain foundational now.

Key Insight: An Experiment Built for the Real World

鈥淸T]hese tasks were 鈥榲ery much in line with part of the daily activities of the consultants鈥 involved.鈥 [1]

To test the impact of generative AI on high-end knowledge work, the researchers collaborated with Boston Consulting Group (BCG) on a randomized controlled trial involving 758 consultants. After establishing an individual performance baseline, participants were randomly assigned to one of three conditions: no AI access, GPT-4 access, or GPT-4 access paired with a brief prompt engineering overview. The core of the design involved testing how these professionals navigated realistic tasks simulating real-world workflows. The researchers created two kinds of consulting-style assignments. One centered on product innovation and go-to-market work, including ideation, analysis, writing, and persuasion. The other was a difficult brand strategy case that required participants to reconcile spreadsheet data with subtle clues embedded in interview notes. This design let the researchers ask not just whether AI boosts productivity in general, but whether the answer depends on the nature of the task itself.

Key Insight: AI鈥檚 Capabilities Don鈥檛 Follow a Smooth Line

鈥淸W]ithin the same knowledge workflow, some tasks are beyond the frontier, whereas others remain within it, making effective AI use challenging.鈥 [2]

The paper introduces its signature jagged technological frontier concept to describe the uneven capabilities of generative AI. Tasks that appear of similar difficulty to humans might fall on opposite sides of this boundary. When a task falls inside the frontier, AI is capable of generating accurate, high-quality outputs that support human work. Conversely, being outside the frontier means that AI fails or produces believable but incorrect hallucinations. In such tasks performance still depends on human judgment, guidance, or synthesis that the AI cannot reliably provide on its own. The danger is that professionals have no obvious signal telling them which side of the line a task is on.

Key Insight: AI as a Booster and Disruptor

鈥淸E]xperienced and incentivized knowledge professionals, engaged in tasks akin to some of their daily responsibilities, performed worse when given access to AI.鈥 [3]

For tasks inside the frontier (the innovation and market exercise), AI access produced striking improvements. Consultants using GPT-4 completed 12.2% more subtasks, worked roughly 25% faster, and delivered work that human graders rated about 32% higher in quality. But for the task outside the frontier (the brand strategy case), the results flipped sharply. The control group (no AI) answered correctly 84.5% of the time. Among consultants using AI, accuracy fell to 70.6% for those with GPT access and 60% for those with access and the prompt-engineering overview. The AI had correctly processed surface-level numerical data, but missed a critical insight buried in interview materials. Consultants who used AI tended to trust its analysis and follow it to the wrong conclusion. Over-reliance on AI output, not ignorance of the task, was the mechanism of failure.

Key Insight: The Biggest Gains Go to the Lower Half

鈥淸T]he most significant beneficiaries of using AI are the bottom-half-skill subjects.鈥 [4]

The distribution of gains was not uniform. When the research team segmented consultants by their baseline assessment performance, they found that the largest beneficiaries of AI assistance were those in the lower half of the skill distribution. Bottom-half performers improved by 31% on the experimental task; top-half performers improved by 11%. This pattern suggests that AI can function as a meaningful equalizer within professional environments, lifting those furthest from peak performance (while still delivering meaningful gains to those at the top).

Why This Matters

For executives and leaders, this paper remains foundational because it frames AI adoption as a problem of decisions, strategy, and execution. The lesson is not that AI is universally good or dangerously flawed, it鈥檚 that leaders have to understand where, in a workflow, AI strengthens performance and where it creates deceptive failures. This means training people to exercise judgment rather than outsource it, and recognizing that polished output is not the same as sound reasoning. How will you guide your team along the jagged frontier?

Bonus

The need for human oversight, the risk of overtrusting polished outputs, and the challenge of separating assessment from interpretation are tensions that run through many of the most important conversations in AI today. Seen through these lenses, the paper is not only about consulting work, but a broader shift in how decisions get made when AI becomes part of the process. For another look at those themes in the context of screening and evaluating ideas, check out The Future of Decision-Making: How Generative AI Transforms Innovation Evaluation

References

[1] Dell鈥橝cqua, Frabrizio, et al., 鈥淣avigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of Artificial Intelligence on Knowledge Worker Productivity and Quality,鈥 Organizational Science 37(2): 403-423, 405.  

[2] Dell鈥橝cqua et al., 鈥淣avigating the Jagged Technological Frontier,鈥 404.

[3] Dell鈥橝cqua et al., 鈥淣avigating the Jagged Technological Frontier,鈥 419.

[4] Dell鈥橝cqua et al., 鈥淣avigating the Jagged Technological Frontier,鈥 410.

Meet the Authors

Headshot of Fabrizio Dell'Acqua

is a postdoctoral researcher at 性视界 Business School. His research explores how human/AI collaboration reshapes knowledge work: the impact of AI on knowledge workers, its effects on team dynamics and performance, and its broader organizational implications.

Headshot of Edward McFowland III

is an Assistant Professor in the Technology and Operations Management Unit at 性视界 Business School and Principal Investigator at the Digital Data Design Institute (D^3) Data Science and AI Operations Lab hosted within the Laboratory for Innovation Science.

Headshot of Ethan Mollick

is an Associate Professor at the Wharton School of the University of Pennsylvania, where he studies and teaches innovation and entrepreneurship, and examines the effects of artificial intelligence on work and education. Ethan is the Co-Director of the Generative AI Lab at Wharton, which builds prototypes and conducts research to discover how AI can help humans thrive while mitigating risks.

Headshot of Hila Lifshitz

is a Professor of Management at Warwick Business School (WBS) and a visiting faculty at 性视界 University, at the Laboratory for Innovation Science at 性视界 (LISH). She heads the Artificial Intelligence Innovation Network at WBS.

Headshot of Kate Kellogg

is the David J. McGrath Jr Professor of Management and Innovation, a Professor of Business Administration at the MIT Sloan School of Management. Her research focuses on helping knowledge workers and organizations develop and implement Predictive and Generative AI products, on-the-ground in everyday work, to improve decision making, collaboration, and learning.

Headshot of Saran Rajendran

is Director of Strategy and Execution at Palo Alto Networks.

Headshot of Lisa Krayer

is Principal at Boston Consulting Group (BGC).

Headshot of Francois Candelon

is Partner Value Creation & Portfolio Monitoring at Seven2.

Headshot of Karim Lakhani

is the Dorothy & Michael Hintze Professor of Business Administration at 性视界 Business School. He specializes in technology management, innovation, digital transformation, and artificial intelligence. He is also the Co-Founder and Faculty Chair of D^3 and the Founder and Co-Director of the Laboratory for Innovation Science at 性视界 (LISH).

Watch a video version of the Insight Article here.

The post Back to the Beginnings of AI at Work appeared first on Digital Data Design Institute at 性视界.

]]>
Everyone Has AI. Which Firms are Going to Win? /everyone-has-ai-which-firms-are-going-to-win/ Tue, 07 Apr 2026 12:50:00 +0000 /?p=29965 New research shows that access to AI is not the same as knowing where to use it. Listen to this article: A firm is only as fast as the slowest step in its chain of work. In manufacturing, it might be one particular machine on the line. In software, one overloaded intake service. Many business […]

The post Everyone Has AI. Which Firms are Going to Win? appeared first on Digital Data Design Institute at 性视界.

]]>
New research shows that access to AI is not the same as knowing where to use it.

A firm is only as fast as the slowest step in its chain of work. In manufacturing, it might be one particular machine on the line. In software, one overloaded intake service. Many business leaders are accidentally recreating this scenario with artificial intelligence. They provision AI tools to employees and hear about localized productivity spikes, but the company鈥檚 overall performance barely moves. This tension lies at the heart of the new working paper 鈥,鈥 from co-authors at INSEAD and the Digital Data Design Institute at 性视界 (D^3). By tracking hundreds of organizations, the researchers have uncovered friction points that hold firms back from realizing the true economic promise of generative AI.

Key Insight: A Global 性视界 for AI鈥檚 Real Value

鈥淒iscovering where and how AI creates value is fundamentally a search problem.鈥 [1]

To test how companies can overcome the barrier of firm-level AI performance, the authors conducted a massive field experiment involving 515 high-growth startups spanning the globe. All participating firms received API credits, access to frontier AI models, and technical training. A randomly selected treatment group of firms also received specialized case studies highlighting how AI-native companies reorganize their production workflows, teams, and business models around the technology. Control firms attended workshops on general entrepreneurship practices. The design let the researchers hold access and technical skill constant while varying which firms gained perspective across a much wider set of organizational functions, thereby expanding their search space for AI opportunities. 

Key Insight: A Small Nudge, Outsized Results

鈥淭reated ventures achieve faster growth without proportional increases in labor or capital, consistent with a reduction in the costs of experimentation and scaling seen in earlier technological waves.鈥 [2]

The performance effects were substantial. The treatment startups discovered 44% more AI use cases, particularly in high-leverage areas like strategy and product development. They completed 12% more tasks, became 18% more likely to land paying customers, and generated an astounding 1.9 times higher revenue compared to the control group. What makes these numbers even more fascinating is that these companies did not spend their way to growth. In fact, their demand for external capital investments actually fell by 39.5%, proving that AI enables firms to scale outputs without scaling inputs proportionally. The researchers found that these gains were heavily concentrated in the upper tail, suggesting that AI lifts the ceiling of what top ventures can achieve rather than just making struggling businesses slightly better. One startup built an end-to-end AI pipeline covering classification, compliance checking, and bid pricing without hiring any technical staff, growing from zero to $40,000 in revenue with four paying customers during the ten-week program. 

Key Insight: A Cognitive Bottleneck

鈥淭wo firms with identical tools, training, and budgets can realize very different returns if one searches more broadly across its production process for where the technology creates value.鈥 [3]

The researchers conclude that the ultimate blocker for AI gains is not the cost of technology or a lack of skills, but what they call the mapping problem: 鈥渄iscovering where and how AI creates value within a firm鈥檚 production process.鈥 [4] Most leaders default to localized, obvious AI solutions like launching a customer service chatbot or drafting email responses. The untapped potential comes from discovering how to rethink interconnected, complementary tasks across the entire enterprise. For example, a field services startup in the study rebuilt its entire operations chain of dispatcher, bookkeeper, scheduler, and collections staff into a sequence of AI modules that self-improve, fundamentally changing the firm鈥檚 cost structure. Solving the mapping problem is about overcoming cognitive constraints to see AI as a way to redraw your company鈥檚 production landscape, rather than simply slapping digital band-aids on legacy processes. 

Why This Matters

For business leaders and executives, this research shows that the organizations most likely to realize substantial AI-driven results are those that invest not just in technology, but in the wide-ranging process of exploring where it fits. That is a strategy and execution problem, and leaders will need to ask which parts of their organizations need redesign rather than optimization. If you don鈥檛 actively push the boundaries of how AI rewrites your firm, you risk using a map that never leads you to your destination. 

Bonus

What happens when the bottleneck lies in the surrounding market, rather than within your business? For example, committing too early to a single AI provider, before the technology has stabilized, risks being locked into a platform that may not be the right fit 6 months or a year from now. For a look at whether the competitive landscape will reward flexibility, check out Is GenAI Heading for a Tech Monopoly?

References

[1] Kim, Hyunjin, Dahyeon Kim, and Rembrand Koning, 鈥淢apping AI into Production: A Field Experiment on Firm Performance,鈥 INSEAD Working Paper No. 2026/20/STR (March 2026), 2. .

[2] Kim et al., 鈥淢apping AI into Production,鈥 4.

[3] Kim et al., 鈥淢apping AI into Production,鈥 6.

[4] Kim et al., 鈥淢apping AI into Production,鈥 2.

Meet the Authors

is Assistant Professor of Strategy at INSEAD.

Dahyeon Kim

is a PhD student in strategy at INSEAD.

Headshot of Rembrand M. Koning

is Mary V. and Mark A. Stevens Associate Professor of Business Administration at 性视界 Business School, and the co-director and co-founder of the Tech for All lab at D^3.

Watch a video version of the Insight Article here.

The post Everyone Has AI. Which Firms are Going to Win? appeared first on Digital Data Design Institute at 性视界.

]]>
80 Apps in One Afternoon: What the Frontier Firm Initiative Is Already Building /80-apps-in-one-afternoon-what-the-frontier-firm-initiative-is-already-building/ Mon, 30 Mar 2026 12:14:21 +0000 /?p=29856 The Frontier Firm AI Initiative was designed around a simple conviction: the most important questions about AI in business can’t be answered in theory. Listen to this article: On Wednesday, March 11, 2026, senior leaders from across the Frontier Firm AI Initiative came together at 性视界 Business School for Journey to the Frontier, an event […]

The post 80 Apps in One Afternoon: What the Frontier Firm Initiative Is Already Building appeared first on Digital Data Design Institute at 性视界.

]]>
The Frontier Firm AI Initiative was designed around a simple conviction: the most important questions about AI in business can’t be answered in theory.

On Wednesday, March 11, 2026, senior leaders from across the Frontier Firm AI Initiative came together at 性视界 Business School for Journey to the Frontier, an event that brought roughly 100 executives into direct conversation with research shaping the future of their organizations. The Frontier Firm AI Initiative, a collaboration between D^3 and Microsoft, brings together companies like Barclays, DuPont, EY, Mastercard, and Nestle around a shared commitment: don鈥檛 just adopt AI, but study the transformation rigorously and share what they learn.

The final session of the day, led by Dr. , Mary V. and Mark A. Associate Professor of Business Administration at HBS and co-director of the Tech for All Lab at D^3, invited participants to do something that might seem surprising at an executive-level event: actually build something. By the end of the session, more than 80 working, no-code software applications had taken shape, each addressing a real challenge faced in their daily lives.

Key Insight: The Barrier to Building Has Disappeared

“The gap between the firms leading AI transformation and everyone else is not closing. It is accelerating.” 鈥 Rem Koning

For most of business history, turning an idea into working software meant assembling a team, securing a budget, and waiting. That friction meant most ideas never got off the ground. This session challenged that entirely. Using Lovable, a natural-language app-building platform, participants took the problems they knew best, the ones sitting on their desk every morning, and built tools to address them. One executive created an application that pulls together emails, calendar, and documents each morning and surfaces the handful of decisions that need attention that day. Another built a platform to automate the compliance tracking of fraud claims, the kind of operational infrastructure that would normally require months of engineering work and a dedicated team. In both cases, the gap between having an idea and having a working tool collapsed to an afternoon.

Key Insight: Expertise Is Now the Differentiator

“AI does not replace judgment. It multiplies it. The executives who get the most out of AI are the ones who bring the deepest knowledge of their business to the table.” 鈥 Rem Koning

What became clear across the room was that technical confidence wasn’t what separated the most powerful applications from the rest. It was institutional knowledge. The executives who built the most compelling tools were the ones who understood their problem best, because they had been living with it for years.

One participant built a tool that reframes how their sales teams approach client conversations, shifting the focus from narrow questions about AI deployment toward the more valuable question of how work itself should be redesigned. A leader started building an operations hub to bring their contractors, finances, and the entire pipeline into one place, a single tool to replace all the scattered spreadsheets eating up hours of their week. Another leader from financial services began prototyping an internal tool to help their team evaluate AI projects against a governance framework, something that had previously existed only in manual, time-consuming processes. In each case, what made the tool work was not the technology. It was what the person building it already knew.

Why This Matters

For the organizations in the Frontier Firm cohort, this was just one day in a longer journey. For the broader business world, it was a reminder that the window to lead on AI will not stay open forever. AI is no longer something companies adopt from the outside. It is becoming the foundation on which strategy, operations, and decision-making are built. The leaders who understand this will not just have better tools; they will have built something that is genuinely hard to replicate. The competitive advantage of the next decade will not belong to organizations with the best AI strategy on paper. It will belong to the ones with leaders who know how to build.

Meet the Speaker

Headshot of Rembrand M. Koning

is the Mary V. and Mark A. Stevens Associate Professor of Business Administration in the Entrepreneurial Management Unit at 性视界 Business School. He researches and teaches entrepreneurship, exploring how AI is transforming organizations across the globe, from microenterprises in emerging markets to global enterprises. He is co-director and co-founder of the Tech for All Lab at the Digital Data and Design (D^3) Institute at 性视界, and a pioneer in the use of field experiments to study entrepreneurial strategy and innovation.

Watch a video version of the Insight Article here.

The post 80 Apps in One Afternoon: What the Frontier Firm Initiative Is Already Building appeared first on Digital Data Design Institute at 性视界.

]]>
The Surprising Link Between AI Reasoning and Honesty /the-surprising-link-between-ai-reasoning-and-honesty/ Mon, 23 Mar 2026 12:09:09 +0000 /?p=29780 Exploring how the complexity of large language models acts as a moral safeguard Listen to this article: The standard fear about advanced AI goes something like this: the more sophisticated a system becomes, the better it gets at sounding convincing, reading the room, and manipulating people. A model that can reason step-by-step might not just […]

The post The Surprising Link Between AI Reasoning and Honesty appeared first on Digital Data Design Institute at 性视界.

]]>
Exploring how the complexity of large language models acts as a moral safeguard

The standard fear about advanced AI goes something like this: the more sophisticated a system becomes, the better it gets at sounding convincing, reading the room, and manipulating people. A model that can reason step-by-step might not just answer better, it might lie better. That concern feels intuitive, especially as businesses hand more customer interactions, internal workflows, and decision support to increasingly capable systems. However, in the new study 鈥,鈥 co-written by D^3 Associate Martin Wattenberg, a team of researchers found that our intuition might be backward. Through an exhaustive series of tests involving moral trade-offs and complex reasoning traces, they found that when an AI is forced to slow down and show its work, it becomes significantly more honest. 

Key Insight: Testing the Moral Compass

鈥淓ach scenario is paired with two options: one favoring honesty and the other deception.鈥 [1]

To study deceptive behavior rigorously, the researchers built a new benchmark dataset called DoubleBind, a collection of social dilemmas engineered so that choosing honesty comes at a tangible, variable cost. In one scenario, a manager praises you for an analysis your colleague actually produced, so correcting the record means losing a promotion. The financial stakes shift across versions of each dilemma, allowing the researchers to observe how models respond as the price of honesty rises. They also augmented an existing dataset, DailyDilemmas, with the same cost-scaling structure. Together, the two datasets gave the team a controlled way to probe moral trade-offs across six open-weight model families. Each model was tested in two modes: token-forcing, where the model answers immediately without deliberation, and reasoning mode, where the model deliberates for a specified number of sentences before committing to a final recommendation. Models are honest roughly 80% of the time under token-forcing, though that rate erodes as the cost of telling the truth climbs.

Key Insight: Why Deliberation Favors the Truth

鈥淸M]odels are significantly more likely to choose the honest option when required to reason before providing a final answer.鈥 [2]

In human psychology, the 鈥渄ual-process鈥 theory suggests that our first, intuitive impulse is often prosocial, while slow, calculated reasoning allows us to justify selfish or deceptive behavior. We might 鈥渃alculate鈥 our way into a lie. The researchers found that LLMs flip this script entirely: across all model families tested, reasoning increases the probability of an honest recommendation, and longer deliberation amplifies the effect. Additionally, it seems that the effect doesn鈥檛 principally come from the reasoning text itself. If chain-of-thought were simply constructing a persuasive moral argument, then reading the reasoning should make the model鈥檚 final decision easy to predict, but that is not what the researchers found. Reasoning traces frequently read like balanced surveys of the pros and cons of both options rather than arguments building toward a verdict. The decision to deceive, when it happens, tends to arrive without a legible trail. This is what the researchers call the 鈥渇acsimile problem鈥 – reasoning changes behavior, but not because of what it says.

Key Insight: Deceptive Answers are Easier to Shake Loose

鈥淲e hypothesize that compared to honesty, deception is a metastable state鈥攖hat is, deceptive outputs are easily destabilized.鈥 [3]

If the content of reasoning doesn鈥檛 explain the honesty boost, what does? The researchers propose a theory based on the 鈥済eometry鈥 of the model鈥檚 internal states. They suggest that honesty is a stable, broad region in the AI鈥檚 conceptual map, while deception is a 鈥渕etastable鈥 state, essentially a narrow, fragile peak that is easily knocked over. When a model is 鈥渢hinking,鈥 it is navigating through its internal landscape. Because the honest regions of this space are larger and more 鈥渞obust,鈥 the process of reasoning draws the model toward them. The researchers tested this claim several ways. By changing the wording slightly through paraphrasing, they found that deceptive answers are much more likely to flip than honest ones. By resampling the model鈥檚 output, they found that initially deceptive recommendations often become honest, while honest ones usually stay put. Across these tests the asymmetry was consistent: honesty is robust, deception is fragile.

Why This Matters

For business leaders, the value of this paper is not that AI can now be assumed trustworthy. Rather, it offers a more useful way to think about risk. If deceptive outputs are less stable, then system design can exploit that fact. Building deliberation into AI workflows may become an important step before interfacing with customers or making high-stakes decisions. Organizations need systems that hold up when incentives get messy, and this paper suggests that at least in some cases, more reasoning may keep AI honest when it counts.

Bonus

In another study from D^3 associates, researchers found that fine-tuning LLMs on specialized datasets generally degrades chain-of-though reasoning performance. Faithfulness and Accuracy: How Fine-Tuning Shapes LLM Reasoning is a critical reminder that the choices made before deployment could erode the reasoning capacity you鈥檙e counting on.

References

[1] Ann Yuan et al., 鈥淭hink Before You Lie: How Reasoning Leads to Honesty,鈥 arXiv preprint arXiv:2603.09957 (2026): 3. . 

[2] Yuan et al., 鈥淭hink Before You Lie,鈥 4.

[3] Yuan et al., 鈥淭hink Before You Lie,鈥 2.

Meet the Authors

Martin Wattenberg

is Gordon McKay Professor of Computer Science at the 性视界 John A. Paulson School of Engineering and Applied Sciences, and an Associate Collaborator at the Digital Data Design Institute at 性视界 (D^3).

Additional authors: Ann Yuan, Asma Ghandeharioun, Carter Blum, Alicia Machado, Jessica Hoffmann, Daphne Ippolito, Lucas Dixon, Katja Filippova

Watch a video version of the Insight Article here.

The post The Surprising Link Between AI Reasoning and Honesty appeared first on Digital Data Design Institute at 性视界.

]]>
Why Your AI Strategy May Be Failing /why-your-ai-strategy-may-be-failing/ Mon, 16 Mar 2026 11:42:51 +0000 /?p=29696 How companies can overcome the structural frictions that block AI at scale Listen to this article: AI has entered the enterprise faster than most previous waves of technology, reshaping expectations about speed, productivity, and decision-making. Yet adoption alone does not produce transformation. The Frontier Firm Initiative (FFI), a joint effort between the Digital Data Design […]

The post Why Your AI Strategy May Be Failing appeared first on Digital Data Design Institute at 性视界.

]]>
How companies can overcome the structural frictions that block AI at scale

AI has entered the enterprise faster than most previous waves of technology, reshaping expectations about speed, productivity, and decision-making. Yet adoption alone does not produce transformation. The Frontier Firm Initiative (FFI), a joint effort between the Digital Data Design Institute at 性视界 (D^3) and Microsoft, recently convened senior leaders from a dozen global organizations to address the 鈥渓ast mile鈥 challenge, when a company tries to scale localized, successful AI pilot programs into a standard, enterprise-wide operating model. In the new HBR article 鈥,鈥 Karim R. Lakhani and Jen Stave of D^3 and Microsoft鈥檚 Jared Spataro identify a framework of the specific 鈥渇rictions鈥 stalling progress and outline a strategic blueprint to overcome them. In the insight below, we will zoom in on one friction and one corresponding recommendation from the blueprint to resolve it.

Key Insight: The Weight of What Already Exists

鈥淸T]he primary obstacle to progress is rarely model quality or data availability, but rather the 鈥榣ast mile鈥 of transformation where technical capability must meet organizational design.鈥 [1]

When organizations try to implement AI, they often assume that any issues will arise from the AI technology itself. However, the authors find that AI actually functions as a “diagnostic tool” that exposes problematic processes already present within a firm. For example, the authors label 鈥減rocess debt鈥 as the accumulation of fragmented and inconsistent workflows built up over years of embeddedness and geographic specificity. The article cites, for example, one professional-services firm operating in more than 170 countries where the 鈥渟ame鈥 process had dozens of different, regional variations. 

Key Insight: Designing the Organization AI Deserves

鈥淸F]or every bottleneck discovered in the 鈥榣ast mile,鈥 a corresponding shift in this blueprint provides the path forward.鈥 [2]

One of the shifts by organizations making real headway is replacing legacy processes. For example, what the authors call 鈥渃lean-sheet redesign鈥 starts by asking whether a given workflow would exist at all if the company were built today around AI agents. This demands equally fresh thinking about people and governance: capturing expert judgment as a codified asset rather than a protected credential, redesigning roles toward oversight and interpretation rather than execution, and managing AI agents with the same accountability structures applied to human teams.

Why This Matters

For today鈥檚 business professionals and executives, the 鈥渓ast mile鈥 is less a technical challenge and more a test of leadership imagination. Process debt and clean-sheet redesign are only two parts of a broader diagnosis of seven frictions and corresponding transformation strategies. Read the to see them all. The potential of the technology you have already purchased is immense, but realizing it requires the courage to redesign the organization to match the speed of an agentic world.

References

[1] Lakhani, Karim R., Jared Spataro, and Jen Stave, 鈥淭he 鈥楲ast Mile鈥 Problem Slowing AI Transformation,鈥 性视界 Business Review, March 9, 2026, . 

[2] Lakhani et al., 鈥淭he 鈥楲ast Mile鈥 Problem Slowing AI Transformation.鈥 

Meet the Authors

Headshot of Karim Lakhani

is the Dorothy & Michael Hintze Professor of Business Administration at 性视界 Business School. He specializes in technology management, innovation, digital transformation, and artificial intelligence. He is also the Co-Founder and Faculty Chair of the Digital Data Design (D^3) Institute at 性视界 and the Founder and Co-Director of the Laboratory for Innovation Science at 性视界.

is Chief Marketing Officer, AI at Work, Microsoft.

Jen Stave Jen Stave is Executive Director of the Digital Data Design (D^3) Institute at 性视界. She was previously Senior Vice President at Wells Fargo, and has a PhD from American University.

The post Why Your AI Strategy May Be Failing appeared first on Digital Data Design Institute at 性视界.

]]>
Competing in the Dark /competing-in-the-dark/ Thu, 12 Mar 2026 12:29:33 +0000 /?p=29670 New research reveals that firms are playing a game of catch-up they didn鈥檛 even know they were losing Listen to this article: Leaders know they must innovate to survive, and they don鈥檛 make decisions in a vacuum: they watch rivals, draw inferences, and position themselves accordingly. But what happens when their picture of the competitive […]

The post Competing in the Dark appeared first on Digital Data Design Institute at 性视界.

]]>
New research reveals that firms are playing a game of catch-up they didn鈥檛 even know they were losing

Leaders know they must innovate to survive, and they don鈥檛 make decisions in a vacuum: they watch rivals, draw inferences, and position themselves accordingly. But what happens when their picture of the competitive landscape is fundamentally wrong? The new NBER working paper, 鈥,鈥 co-written by a team of researchers including D^3 associate Zo毛 B. Cullen, takes this question seriously. Embedded in the Bank of Italy鈥檚 long-running INVIND survey, the study ran a randomized field experiment with roughly 3,000 Italian firms to test whether correcting firms鈥 misperceptions about competitors鈥 technology adoption changes their own investment plans. What they found should matter to any leader thinking about AI, automation, and the pace of organizational change.

Key Insight: Racing Without a Scorecard

鈥淭hese data provide a unique opportunity to measure the beliefs firms hold about their competitors鈥 adoption decisions鈥攁nd to identify their causal effects on firms鈥 own adoption behavior.鈥 [1]

The central question driving the research is deceptively simple: are firms more likely to adopt advanced technologies like AI when they expect competitors to adopt them? This is a problem economists call 鈥榮trategic complementarity鈥欌攖he idea that one firm鈥檚 incentive to act depends partly on what others around it are doing. It鈥檚 easy to theorize about, but hard to test with real firms making real decisions.

The researchers solved this through a massive field experiment. They first asked firms what share of their competitors were currently using AI or robotics, then randomly provided half of them with the actual adoption rates of peers in their specific sector and size class. By measuring how these firms updated their 2027 adoption plans after seeing the data, the researchers could cleanly identify whether knowing a rival鈥檚 move actually changes your own. In the real world, if two firms adopt AI at the same time, it鈥檚 hard to tell if one is copying the other or if they are both just reacting to something like a new tax break or a labor shortage. By introducing a controlled 鈥渋nformation shock,鈥 the researchers could prove that it was the knowledge of competitor behavior itself that drove the change in strategy.

Key Insight: We Are More Alone Than We Think

鈥淥n average, prior beliefs underestimated actual adoption by 24.6 pp.鈥 [2]

The research surfaced a striking baseline finding: firms are deeply mistaken about how technologically advanced their peers already are. On average, firms underestimated the share of competitors using AI or robotics by about 25 percentage points, a gap so large it suggests companies are making strategic decisions based on a competitive landscape that no longer exists. But when treated firms received accurate peer-adoption figures, they meaningfully revised their expectations upward鈥攁 response consistent with rational updating, and proportional to how far off their original estimates had been.

However, one of the most fascinating findings of the study was that not all technologies are equal when it comes to peer pressure. While information about competitors significantly increased intentions to adopt robotics, it had almost no measurable effect on plans for AI. The researchers offer several explanations: (1) AI adoption was already higher at baseline (roughly half of firms planned to use it by 2027), leaving less headroom for growth. (2) Robotics is a mature, deeply embedded technology in Italian manufacturing, so firms that see rivals using it extensively are receiving a clear legible signal. (3) AI, by contrast, is newer and often adopted experimentally, so competitive signals carry more ambiguity. Perhaps the return on investment for AI is still shrouded in uncertainty.

Key Insight: Information Campaigns

鈥淸G]overnments could deploy information campaigns that raise firms鈥 awareness of the productivity benefits of new technologies and the extent to which their peers are adopting them.鈥 [3]

Traditionally, governments try to spur innovation through expensive financial incentives and subsidies. However, this research points to a much cheaper and potentially more effective tool: the information campaign. If the primary reason firms aren鈥檛 adopting new technology is a misperception of the competitive landscape, then simply publishing accurate, sector-specific adoption data could do more to modernize an industry than a mountain of tax breaks. The researchers note two mechanisms that may be at work: a competition channel, where firms fear falling behind rivals, and a learning channel, where they use peer behavior to infer a technology鈥檚 productivity potential. Evidence from firms in concentrated markets suggests both channels are active, though neither can be fully isolated with current data. 

Why This Matters

For executives and business leaders, this research surfaces a concrete and often underappreciated source of strategic risk: competitive misperception. If your organization is making AI and automation investment decisions based on an outdated view of where your industry actually stands, you may be systematically underinvesting, not because you lack capital or ambition, but because you lack accurate signals. The practical implication is that competitive intelligence on technology adoption is a direct input into investment strategy. For those thinking about the longer arc of AI diffusion, the contrast between robotics and AI results is instructive: behavioral responses to peer signals are strongest when a technology has a proven track record. As generative AI matures from experiment to infrastructure, the competitive spillovers documented here for robotics will likely be coming for AI next.

Bonus

This research shows that learning what peers are actually doing with advanced technology can shift decision-making. So here鈥檚 a question worth asking: how well do you really know where GenAI adoption stands in the broader workforce? The , built by a team including D^3 Associate David Deming, offers a data-grounded answer. Drawing on five nationally representative U.S. surveys and 25,000 respondents, it tracks GenAI use at work and at home, adoption rates among working-age adults, and the productivity time savings already being realized. Consider it your lamp in the darkness.

References

[1] Cullen, Zo毛 B., Ester Faia, Elisa Guglielminetti, Ricardo Perez-Truglia, and Concetta Rondinelli, “The Innovation Race: Experimental Evidence on Advanced Technologies,” NBER Working Paper 34532 (2025), 2. . 

[2] Cullen et al., 鈥淭he Innovation Race,鈥 3.

[3] Cullen et al., 鈥淭he Innovation Race,鈥 26.

Meet the Authors

Headshot of Zoe Cullen

is Associate Professor of Business Administration at 性视界 Business School and Associate at the Digital Data Design Institute at 性视界 (D^3).

is Professor at Goethe University Frankfurt.

Headshot of Elisa Guglielminetti

is an Economist at the Bank of Italy.

Ricardo Perez-Truglia

is a Professor at UCLA鈥檚 Anderson School of Management.

Concetta Rondinelli

is a Senior Economist at the Bank of Italy.

The post Competing in the Dark appeared first on Digital Data Design Institute at 性视界.

]]>
Can You Spot the Bot? /can-you-spot-the-bot/ Thu, 05 Mar 2026 13:09:22 +0000 /?p=29558 New research reveals just how convincingly AI mimics humans Listen to this article: Alan Turing鈥檚 original 鈥渋mitation game,鈥 proposed in 1950, had an elegant simplicity: a human judge conducts a text-based conversation with two hidden parties鈥攐ne human, one machine鈥攁nd tries to guess which is which. Today, the question Turing posed has quietly expanded into territory […]

The post Can You Spot the Bot? appeared first on Digital Data Design Institute at 性视界.

]]>
New research reveals just how convincingly AI mimics humans

Alan Turing鈥檚 original 鈥渋mitation game,鈥 proposed in 1950, had an elegant simplicity: a human judge conducts a text-based conversation with two hidden parties鈥攐ne human, one machine鈥攁nd tries to guess which is which. Today, the question Turing posed has quietly expanded into territory he never mapped. Our digital existence is a kaleidoscope of multi-modal interactions. We don鈥檛 just “talk” to the internet, we upload snapshots of our morning coffee, interpret complex visual data in professional dashboards, estimate the mood of a room through a video call, and follow subtle cues of visual attention. co-written by Hanspeter Pfister, D^3 Associate and An Wang Professor of Computer Science at 性视界 SEAS, explains how a new large-scale study from researchers at 15 organizations around the globe drags the imitation game into the full complexity of how humans communicate, perceive, and describe the world. Are we already past the point where we can reliably tell machines from humans, and does it matter who鈥檚 doing the judging?

Key Insight: A Gauntlet of Language and Vision

鈥淸W]e present an integrative benchmark encompassing a wide range of standard and well-established AI tasks across both language and vision.鈥 [1]

Rather than testing imitation in a single domain, the researchers designed a six-task benchmark spanning language and vision. Language tasks included image captioning, word association, and open-ended conversation. Vision tasks covered color estimation (identifying the dominant color in a scene), object detection (naming three visible items), and attention prediction (comparing human eye-tracking data with AI-generated gaze sequences). The data collection was correspondingly ambitious: 36,499 responses from 636 human participants and 37 AI models, evaluated through 72,191 Turing-like tests administered to 1,916 human judges and 10 AI judges. A subtle but important design choice: the tests were not trying to determine accuracy, they were trying to quantify indistinguishability: a system can be wrong and still match human patterns, or be correct and still fail to pass as a human.

Key Insight: Measurement, Not Myth

鈥淸W]e consider Turing-like tests as a quantitative evaluation of how well current AIs can imitate humans.鈥 [2]

The Turing test has always been contentious. Critics argue that a machine could pass it without genuine understanding, that it measures performance rather than intelligence, or that it鈥檚 too narrow to be meaningful. The researchers behind this study sidestep that debate entirely. Their goal isn鈥檛 to adjudicate whether AI is intelligent , it鈥檚 to measure something more practical: the ability of machines to convincingly replicate human behavior, including our flaws and mistakes. By using imitation detectability, which is the ability of a judge to distinguish between a person and an algorithm, the researchers create a framework for measuring how 鈥渉uman-like鈥 a machine has become. That shift matters because conventional benchmarks often assume a stable ground truth and score models by how accurately they match a single ‘correct’ answer predefined by a human. But many real tasks, especially open-ended ones, don鈥檛 have a single ‘correct’ output. And even when they do, being right isn’t the same as acting human; a truly human-like AI should make the exact same mistakes we do.

Key Insight: The Vanishing Threshold of Detection

鈥淸C]urrent algorithms are not far from being able to imitate humans in these tasks.鈥 [3]

The results of the study are both a testament to engineering prowess and a warning for the future of digital trust. Across the six tasks, the researchers found that the detectability of AI is plummeting. In a 鈥榩erfect鈥 world where humans and machines are indistinguishable, a judge would have a 50% chance of being right鈥攅ssentially a coin flip. The study found that for many tasks, the detectability scores are hovering remarkably close to that 50/50 chance level. In areas like image captioning and word associate, the gap has become so thin that even motivated human judges struggle to tell the difference between a person鈥檚 description and an AI鈥檚. Yet the convergence is not complete: AI still occasionally trips over the uniqueness of human behavior. For instance, humans are surprisingly consistent in where they look in a picture, while machines sometimes focus on statistically likely but humanly irrelevant details.

At the same time, one of the paper鈥檚 most striking results showed that AI proves highly effective in a different role: judge rather than subject. Simple machine learning classifiers trained to distinguish human- from AI-responses outperformed human judges on most tasks. For word associations, the AI judge correctly identified machine-generated responses 91% of the time, compared to just 47% for humans. Machines, it seems, are far better than we are at spotting each other.

Why This Matters

For executives and business leaders, this research redraws the risk landscape in two directions. First, the near invisibility of AI responses in everyday tasks means fraud, disinformation, and impersonation are no longer theoretical risks, they are statistically plausible at scale, today. Second, because automated classifiers outperform human judges, detection cannot rely on human vigilance alone anymore. It requires infrastructure, and regulators in the EU and elsewhere are already moving toward mandatory AI disclosure requirements. This paper highlights the importance of building transparency tools now to be prepared for when they are required and to ensure you can maintain your customers鈥 trust. 

Bonus

As AI systems get more capable, they鈥檙e also getting harder to understand. Another response to this challenge is to build clearer explanations for why models behave the way they do with a single, coherent framework. To go deeper on this initiative, check out 鈥Unifying AI Attribution: A New Frontier in Understanding Complex Systems.鈥

References

[1] Mengmi Zhang et al., 鈥淐an Machines Imitate Humans? Integrative Turing-like tests for Language and Vision Demonstrate a Narrowing Gap,鈥 arXiv preprint arXiv:2211.13087v3 (2025): 3.  

[2] Zhang et al., 鈥淐an Machines Imitate Humans?鈥: 2.

[3] Zhang et al., 鈥淐an Machines Imitate Humans?鈥: 16.

Meet the Authors

Hanspeter Pfister

is An Wang Professor of Computer Science at 性视界 John A. Paulson School of Engineering and Applied Sciences and D^3 Associate.

Additional Authors: Mengmi Zhang, Elisa Pavarino, Xiao Liu, Giorgia Dellaferrera, Ankur Sikarwar, Caishun Chen, Marcelo Armendariz, Noga Mudrik, Prachi Agrawal, Spandan Madan, Mranmay Shetty, Andrei Barbu, Haochen Yang, Tanishq Kumar, Shui鈥橢r Han, Aman Raj Singh, Meghna Sadwani, Stella Dellaferrera, Michele Pizzochero, Brandon Tang, Yew Soon Ong, Gabriel Kreiman

The post Can You Spot the Bot? appeared first on Digital Data Design Institute at 性视界.

]]>
The AI Deep Research Race Has a New Leaderboard /the-ai-deep-research-race-has-a-new-leaderboard/ Thu, 26 Feb 2026 14:07:08 +0000 /?p=29548 A new cross-domain benchmark reveals how the leading AI research tools perform on real-world production tasks Listen to this article: Two AI-generated research reports land on your desk before a major decision. Both are polished, confidently written, and well-structured, but they reach different conclusions. Which one do you trust, and how would you even begin […]

The post The AI Deep Research Race Has a New Leaderboard appeared first on Digital Data Design Institute at 性视界.

]]>
A new cross-domain benchmark reveals how the leading AI research tools perform on real-world production tasks

Two AI-generated research reports land on your desk before a major decision. Both are polished, confidently written, and well-structured, but they reach different conclusions. Which one do you trust, and how would you even begin to find out? In 鈥,鈥 a team at Perplexity and , Assistant Professor of Business Administration at 性视界 Business School and affiliate with the Digital Data Design Institute at 性视界 (D^3), present a rigorous new benchmark for measuring how well AI deep research systems actually perform on real-world production tasks.

Key Insight: A New Standard for Deep Research Evaluation

鈥淲e introduce a cross-domain benchmark derived from real-world production deep research tasks designed to bridge the gap between AI evaluations and authentic research needs.鈥 [1]

AI 鈥渄eep research鈥 systems, tools that can autonomously decompose a complex question, search hundreds of sources, reconcile conflicting evidence, and synthesize findings into a cited report, are increasingly being used for high-stakes analytical work in areas such as finance, legal, and medicine. Unlike a simple chatbot response, these systems operate more like an analyst running an independent research process. While this technology has been advancing quickly, the frameworks for evaluating it have not kept pace. The authors argue that evaluating deep research must reflect realistic use cases, span domains, account for region-specific sources, and probe multiple system capabilities such as planning, search, and reasoning all at once.

Key Insight: Tasks Deeply Rooted in Practice

鈥淥ur main contribution is a curated set of benchmark tasks that closely mirror real deep research needs and how people use deep research agents in practice.鈥 [2]

Many AI benchmarks are built by researchers and experts imagining what hard questions look like. DRACO takes a different approach: its 100 tasks were sourced directly from actual user queries submitted to Perplexity鈥檚 deep research system in fall 2025. Specifically, researchers sampled from high-difficulty requests where users had expressed dissatisfaction, making these exactly the kinds of tasks where AI systems tend to struggle. Those raw queries were then anonymized, augmented to add specificity and scope, and filtered to ensure each task was objectively evaluable, appropriately bounded, and genuinely challenging. The results span 10 domains drawing on sources from 40 countries across five regions. 

Key Insight: Rating Real-World Complexity

鈥淭wenty-six domain experts, including medical professionals, attorneys, financial analysts, software engineers, and designers, were recruited to develop rubrics for selected tasks.鈥 [3]

DRACO鈥檚 grading rubrics were developed through a rigorous human-expert pipeline: an initial rubric is drafted by one expert, reviewed and refined by a second, subjected to a 鈥渟aturation test鈥 to ensure the current system cannot easily exceed 90% (which would indicate an overly easy task or lenient rubric), and finally validated by a third and fourth expert for quality assurance. Each task was ultimately assessed across an average of 39 criteria spanning four dimensions: factual accuracy, breadth and depth of analysis, presentation quality, and citation quality. 

Key Insight: Progress, But Gaps Remain

鈥淥ur evaluation of frontier deep research systems reveals that while significant progress has been made (especially in presentation quality), substantial headroom remains (especially in factual accuracy).鈥 [4]

The evaluation results indicate that while agents have improved across all rubric dimensions鈥攁nd now excel in presentation quality鈥攖hey continue to struggle with factual accuracy. This may partly stem from design choices: roughly half of all criteria focused on verifiable factual claims, and the rubrics also included negative criteria penalizing specific failure modes. In domains like medicine and law, these penalties are particularly severe, as incorrect or unsafe recommendations carry heavy negative weights. This reflects a core design principle: in high-stakes domains, what AI gets wrong matters as much as what it gets right.

Why This Matters

As we increasingly rely on AI for high-stakes tasks, from brainstorming and research to actual execution, the bottleneck is no longer speed, it鈥檚 accuracy. The area where AI performs best, producing polished, well-structured output, is precisely where it鈥檚 hardest for a non-specialist to detect errors. For business leaders, DRACO鈥檚 task-and-rubric design offers a concrete blueprint for evaluating and choosing research agents: define success criteria, test on representative workloads, and be sure to clarify how you鈥檒l know when it鈥檚 wrong.

Bonus

While it seems self-evident that we want the best and most accurate information from AI, that鈥檚 actually not always the case. Check out 鈥Explanations on Mute: Why We Turn Away From Explainable AI鈥 to see why.

References

[1] Joey Zhong et al., 鈥淒RACO: a Cross-Domain Benchmark for Deep Research Accuracy, Completeness, and Objectivity,鈥 arXiv preprint arXiv:2602.11685 (2026): 2.  

[2] Zhong et al., 鈥淒RACO鈥: 2.

[3] Zhong et al., 鈥淒RACO鈥: 5.

[4] Zhong et al., 鈥淒RACO鈥: 12.

Meet the Authors

Headshot of Jeremy Yang

is an Assistant Professor of Business Administration at 性视界 Business School and affiliated with the Digital Data Design Institute at 性视界 (D^3).

Additional Authors (Perplexity): Joey Zhong, Hao Zhang, Clare Southern, Thomas Wang, Kate Jung, Shu Zhang, Denis Yarats, Johnny Ho, Jerry Ma

The post The AI Deep Research Race Has a New Leaderboard appeared first on Digital Data Design Institute at 性视界.

]]>
The Manager鈥檚 AI Dilemma /the-managers-ai-dilemma/ Tue, 17 Feb 2026 13:23:54 +0000 /?p=29450 How to design AI adoption so decision makers can say yes without self-sabotage Lots of organizations can green-light AI. Far fewer can absorb it. That gap, between excitement and real, embedded use, keeps showing up even when ROI is compelling and leadership is visibly supportive. New research from D^3 Frontier Firm affiliate Shunyuan Zhang and […]

The post The Manager鈥檚 AI Dilemma appeared first on Digital Data Design Institute at 性视界.

]]>
How to design AI adoption so decision makers can say yes without self-sabotage

Lots of organizations can green-light AI. Far fewer can absorb it. That gap, between excitement and real, embedded use, keeps showing up even when ROI is compelling and leadership is visibly supportive. New research from D^3 Frontier Firm affiliate Shunyuan Zhang and Das Narayandas reveals an uncomfortable idea contributing to this gap. In 鈥,鈥 they highlight that the very people who must approve and champion these technologies are the same ones whose jobs could be fundamentally threatened by them.

Key Insight: The Three Threats of Self-Disruptive Technologies (SDTs)

鈥淲e define SDTs as innovations that simultaneously (1) improve organizational performance and (2) erode the authority, discretion, or legitimacy of the role responsible for approving them.鈥 [1]

Traditional adoption theories typically focus on whether organizations are ready, whether the technology is useful, and whether there鈥檚 institutional pressure to adopt. But these frameworks miss something critical: they assume decision-makers are neutral agents acting on behalf of the firm. Now, add to the mix AI systems with the potential to automate managerial judgment, analytics platforms that centralize decision rights, or algorithmic tools that replace experiential expertise with codified models. When the manager in charge of approving these technologies anticipates that they will shrink their own role or reduce their influence, the approval decision becomes identity-laden. These Self-Disruptive Technologies, as Narayandas and Zhang call them, trigger three forms of role-level identity threat. Role compression occurs when automation shifts core work from 鈥渄eciding鈥 to 鈥渕onitoring,鈥 compressing the judgment and expertise that defines a role鈥檚 distinctive contribution. Control shift happens when discretion moves away from the approving role (e.g. centralized to analytics teams or delegated to algorithms), removing the decision authority that makes roles defensible within organizations. Span erosion reflects the contraction of influence over people, budgets, or processes, undermining status and future opportunity even when the formal position remains intact.

What makes these threats particularly powerful is that they can dominate the approval calculus even when firm-level incentives favor adoption and economic cases are strong. A manufacturing supervisor might support efficiency improvements in principle but resist when the technology eliminates the judgment calls that justify their expertise. A procurement manager might delay adopting an AI tool that demonstrably reduces costs because it centralizes decisions that previously sustained their organizational influence.

Key Insight: Engineering the Solution – Identity-Compatible Advantage (ICA)

鈥淚dentity-Compatible Advantage therefore does not operate by increasing perceived value or shifting bargaining power, but by enabling approvers to say yes without identity loss.鈥 [2]

Here鈥檚 where the research gets actionable. Narayandas and Zhang argue for an approach of Identity-Compatible Advantage to require bundling new technology with governance and role-design mechanisms that make adoption personally and politically defensible for managers. ICA includes five complementary elements: role rechartering that redefines the role around higher-order judgment rather than routine decisions; decision guardrails that preserve authority through override rights and and governance structures; analytical overlays that frame technology as augmentative rather than substitutive; redeployment pathways that provide credible commitments to role evolution rather than elimination; and executive sponsorship that legitimizes identity transition and reallocates accountability. 

The research emphasizes that these mechanisms work as a bundle, not in isolation. For example, implementing guardrails without rechartering leaves meaning unaddressed, as you give the manager the power to override the AI (restoring some control), but because the AI still does the core work, the manager feels their daily expertise is useless (leaving their loss of purpose and contribution unaddressed). The framework shows that successful SDT adoption requires designing offerings where endorsement becomes personally and politically defensible. 

Why This Matters

Most AI automation discourse has fixated on individual contributors like programmers, graphic designers, and copywriters because their work products are visible and the substitution story is easy to tell. This research adds a missing piece: the managers and decision-makers, who control whether AI technologies get adopted in the first place, are themselves facing automation of their core judgment and authority. For executives and business leaders, the implications are profound. If you treat AI adoption as a purely rational calculation, you are likely to be met with 鈥渟ymbolic adoption,鈥 where your team pays lip service to innovation while quietly ensuring that the status quo remains undisturbed. By utilizing Identity-Compatible Advantage, leaders can implement the complex undertaking of AI adoption as an evolution of their teams, not a replacement of them. The future of work belongs to the firms that can successfully re-anchor identities around high-level strategy, risk ownership, and the human-centric decisions that no machine can replicate.

Bonus

The path to real AI adoption runs through design choices: how you frame AI, where you keep humans in the loop, and how you protect legitimacy. For another look at the dynamics of AI in the workplace, check out Drawing the Line on AI Usage in the Workplace.

References

[1] Narayandas, Das and Shunyuan Zhang, 鈥淪elling Self-Disruptive Technologies: Identity-Compatible Advantage and the Role-Level Microfoundations of Automation Adoption.鈥 性视界 Business School Working Paper, No. 26-050 (February 9, 2026): 5.  

[2] Narayandas and Zhang, 鈥淪elling Self-Disruptive Technologies,鈥 9.

Meet the Authors

Headshot of Das Narayandas

is Edsel Bryant Ford Professor of Business Administration at 性视界 Business School.

Headshot of Shunyuan Zhang

is Associate Professor of Business Administration at 性视界 Business School. She and other HBS faculty contribute to the D^3 Frontier Firm Initiative.

The post The Manager鈥檚 AI Dilemma appeared first on Digital Data Design Institute at 性视界.

]]>