The Hypothesis Trap

When Scientists Fall in Bad Love With Their Own Ideas

Approximately four decades ago, I became a witness in a scientific misconduct case. The charges had been brought by an international postdoc in the lab where I had also worked before moving on, and I cannot remember many of the details, except that my written testimony stated that I knew nothing. But I do remember, in the context of more recent high-profile cases, that the essence of the accusation then was the same as it is now: altering experimental data to support the ‘party line’.

The recent disruption to American science has been extensively documented. Given how deeply intertwined government research dollars are with the budget models for R1 universities and the large academic medical centers, it’s not surprising that those funds were chosen for their leverage, and that the consequence of their being in jeopardy will profoundly alter the course of pursuing Vannevar Bush’s version of the endless frontier.

But I want to explore a different question raised by that long-ago case. When I recall that the essence involved “altering data to support the party line,” I need to ask: whose party line was it? In that case, and in many since, the party line wasn’t imposed by some external authority. It was the PI’s own hypothesis, their pet theory, the idea they’d invested years in developing and defending. The fraud wasn’t about serving power—it was about rescuing a cherished belief from contradictory evidence.

This raises uncomfortable questions about how we organize biomedical research. The current system—hypothesis-driven projects led by individual PIs who develop deep attachments to specific ideas—contains structural flaws that push even honest scientists toward motivated reasoning and occasionally push the dishonest ones past the line into fraud.

The Romantic Model of Science

Our funding system enshrines a particular vision of how science works. A brilliant investigator conceives a hypothesis. They design clever experiments to test it. They write a compelling grant proposal. If funded, they spend 3-5 years testing their idea. Success means publishing papers that confirm the hypothesis, which leads to more grants to extend the work.

This model has romantic appeal. It positions the PI as the creative genius whose insight drives discovery. It makes science a battle of ideas where the best hypotheses prevail. It creates clear narratives: an investigator proposes a theory, designs experiments to test it, and demonstrates it is correct. This is how we teach science, how we write about it in popular accounts, how we celebrate it in awards and prizes.

The problem is that this romantic model creates precisely the conditions under which fraud becomes tempting and honest self-deception becomes nearly inevitable.

When Hypothesis Becomes One’s Identity

Here’s what happened in numerous misconduct cases from the 1980s onward: A researcher develops a hypothesis. It’s not just any hypothesis—it’s their hypothesis, the idea that defines their research program, the theory that distinguishes them from competitors. They build a laboratory around it, recruit students and postdocs to test it, and write grants that promise to extend it.

The hypothesis becomes their professional identity. Colleagues know them as “the person who works on that theory.” Graduate students join their lab specifically to work on that problem. Papers in high-impact journals describe their unique contribution. Tenure committees evaluate whether the hypothesis has generated sufficient publications. Grant review panels judge whether the approach is likely to continue producing results.

Then experiments start yielding contradictory data. Not every experiment—if every experiment failed, the researcher might abandon the hypothesis. However, when enough experiments yield ambiguous or contradictory results, the careful scientist should begin to question the core idea.

This is where the system’s design creates problems. Walking away from the hypothesis means walking away from professional identity, from grants that depend on that research program, from students and postdocs whose projects are built on that framework. It means admitting that years of work may have been directed toward the wrong question. It means watching competitors promote alternative theories.

The pressure isn’t external—nobody is ordering the researcher to maintain their hypothesis. The pressure is structural, built into how we organize careers and evaluate success. When your identity, your lab’s funding, and your scientific reputation all depend on a particular idea being correct, it takes extraordinary intellectual honesty to acknowledge that idea might be wrong.

On the Spectrum: From Delusion to Fraud

Most scientists don’t fabricate data. But many engage in practices that fall short of fraud while still distorting the scientific record. These practices stem from the same structural problem: excessive investment in a specific hypothesis.

Selective reporting occurs when experiments yielding inconvenient results are dismissed as “technical problems,” whereas experiments supporting the hypothesis are published. The researcher isn’t fabricating data—they’re making judgments about which data are “good.” But those judgments are biased by investment in the hypothesis.

Data massaging occurs when researchers make analytical decisions that favor their theory. Which outliers to exclude? How to set cutoffs? Which statistical tests to use? Each decision seems defensible individually, but collectively, they bias results toward the preferred outcome. Again, this isn’t fabrication—it’s motivated reasoning dressed up as methodological choice.

Hypothesis rescue manifests as increasingly elaborate explanations for why experiments that should have supported the theory failed. Maybe the conditions weren’t quite right. Maybe there’s an additional factor we didn’t control for. Maybe the effect is context-dependent. Some auxiliary hypotheses are legitimate scientific refinements. Others are epicycles added to save a failing theory.

Selective collaboration and citation appear when researchers preferentially cite papers supporting their view while ignoring contradictory work. They collaborate with scientists who share their hypothesis, while avoiding those who promote alternatives. This creates echo chambers where a contested theory looks like a consensus because the believers only talk to each other.

These practices aren’t fraud in the legal sense. They’re what happens when intelligent, well-meaning scientists become too invested in particular ideas. The investment doesn’t require conscious dishonesty—it just requires the normal human tendency to see what we expect to see, to value evidence confirming our beliefs more highly than evidence challenging them.

The Cases We Remember

The 1980s wave of misconduct cases illuminates this pattern. Take John Darsee at Harvard Medical School. His fraudulent cardiology research wasn’t random fabrication—it was data manufactured to support his ongoing research program. He was so invested in demonstrating that his approach worked that he fabricated results when experiments didn’t cooperate. His extraordinary productivity should have raised red flags, but it fit the romantic model: the brilliant investigator producing breakthrough after breakthrough.

The Baltimore affair involved Thereza Imanishi-Kari’s immunology data that Margot O’Toole couldn’t replicate. The decade-long controversy ended in 1996 when an appeals board cleared Imanishi-Kari of all misconduct charges. But the case revealed how competing interpretations of the same data can arise when different investigators bring different assumptions to their analysis, and how difficult it becomes to distinguish between legitimate scientific disagreement and potential misconduct when researchers are deeply invested in their theories.

Eric Poehlman’s obesity research fraud—falsifying data in 17 grant applications and 10 publications—followed the same pattern. He had a research program, a reputation, and a stream of funding dependent on showing that his hypotheses about aging and obesity were correct. When data didn’t cooperate, he made them cooperate.

The common thread isn’t that these individuals were uniquely evil. It’s that they were operating in a system where too much depended on specific hypotheses being correct. The same pressures that led them to commit fraud push others into questionable practices and drive everyone toward motivated reasoning.

The Structural Alternative: Team Science

Consider how differently science works in fields that have moved away from the PI-centered hypothesis-driven model.

Large-scale genomics operates with diverse teams interrogating datasets rather than testing specific hypotheses. The question isn’t “Is my theory correct?” but “What patterns exist in these data?” Multiple investigators with different backgrounds and biases analyze the same datasets. Results require replication across labs. The data-sharing infrastructure enables other groups to independently verify findings.

Nobody’s career depends on a specific gene being associated with a particular disease. If your analysis suggests gene X matters but another team’s analysis contradicts that, there’s no professional catastrophe. You’re contributing to collective understanding rather than defending personal theories.

The BRAIN Initiative that I helped launch during my tenure at NSF was designed in part to avoid the hypothesis trap. Rather than funding individual PIs to test specific theories about brain function, it funded tool development, data collection, and infrastructure that multiple investigators could use. The bet was that understanding the brain required comprehensive data and analytical capabilities, not just clever hypotheses.

This doesn’t eliminate all bias—researchers still have preferences about which tools to develop or which brain regions to map. But it reduces the intense personal investment in any particular theory about how the brain works. The focus shifts from testing hypotheses to building shared resources.

Particle physics has worked this way for decades. Nobody at CERN builds a career on predicting a specific particle will or won’t be found. The infrastructure supports collective inquiry. Results require consensus across large collaborations. Data are shared immediately. Multiple teams analyze the same detector output.

Can you imagine a particle physicist fabricating Higgs boson data? The system makes it nearly impossible—not because particle physicists are more ethical, but because the organizational structure distributes both credit and accountability across large teams working with shared data.

The Biomedical Research Counterfactual

What would biomedical research look like if we designed it to minimize the hypothesis trap?

Separation of hypothesis generation from testing. One team develops theories and predictions. A different team, with no stake in the theory’s success, conducts the experiments. The testing team is rewarded for rigorous methods and clear results, not for confirming or refuting specific hypotheses. This isn’t unprecedented—clinical trials often use this model, with statisticians who haven’t seen interim results conducting final analyses.

Registered reports and pre-registration. Require researchers to specify hypotheses, methods, and analyses before collecting data. Journals commit to publishing based on methodological quality, not results. This removes the temptation to massage data because publication is already guaranteed. The researcher benefits from doing careful work, not from obtaining specific results.

Adversarial collaboration. When competing theories exist, fund collaborations between proponents to design jointly agreed-upon decisive tests. Each side specifies in advance what results would falsify their theory. The collaboration is rewarded for clarity and rigor, not for one side winning.

Collective attribution and team leadership. Move away from the PI model toward team leadership with distributed authority. Make it normal for multiple investigators to share senior authorship without hierarchical ordering. Reward contributions to collective projects, not just defending personal theories. This reduces the intensity of individual investment in specific hypotheses.

Diverse parallel approaches. Rather than funding one investigator to test one hypothesis over five years, fund multiple teams to simultaneously test competing hypotheses. Make this explicit: “We think question X is important but don’t know which of three theories is correct, so we’re funding all three approaches.” The field benefits from comparative testing; individual investigators aren’t catastrophically invested in one answer.

The Objections

These suggestions will provoke immediate resistance, much of it justified. The romantic model of science—brilliant individual investigator pursuing visionary ideas—isn’t entirely fiction. Great insights do come from individuals. Breakthrough theories do require conviction to pursue against skepticism. Hypothesis-driven research has produced genuine discoveries.

Moreover, team science and collective approaches have their own challenges. Large collaborations can become bureaucratic. Consensus-building can delay needed action. Distributing credit across many people may reduce individual incentive for excellence. Pre-registration can be gamed by enrolling multiple studies and selectively reporting which ones to complete.

The adversarial collaboration model assumes good faith from competing investigators, which isn’t always present. Separating hypothesis generation from testing may slow progress if the best experiments require an intimate understanding of the theory. Distributed leadership creates coordination problems.

These are real concerns. I’m not arguing for the complete abandonment of hypothesis-driven research or the PI model. But I am arguing that we’ve over-indexed on one way of organizing science—a way that creates predictable problems around motivated reasoning and hypothesis attachment—without seriously considering alternatives that might mitigate those problems.

The Incentive Redesign

The deeper issue is incentive structure. We reward:

Publications in high-impact journals (which prefer dramatic confirmations of interesting hypotheses)
Grant funding (which requires convincing reviewers you’re pursuing important ideas likely to yield results)
Citations (which accumulate for papers making strong claims, not for careful null results)
Awards and prizes (which celebrate breakthroughs, not rigorous refutations)
Tenure and promotion (based on establishing an independent research program—meaning a distinctive hypothesis)

Each incentive encourages researchers to develop strong attachments to specific theories. The scientist who carefully tests a hypothesis, finds ambiguous results, and concludes, “This is more complicated than we thought,” doesn’t thrive under these incentives. The scientist who generates a provocative theory, designs experiments to support it, and publishes dramatic results thrives—even if the theory is ultimately wrong.

We could design different incentives:

Reward rigorous replication attempts
Fund adversarial collaborations that test competing theories
Celebrate careful negative results that prevent the field from pursuing dead ends
Promote scientists who change their minds when evidence demands it
Value contributions to infrastructure and methods that enable collective progress

None of this is unprecedented. Clinical trial statisticians build careers on methodological rigor, not therapeutic breakthroughs. Methods developers in genomics gain recognition for creating tools others use. Psychology researchers are valued for independently testing whether published findings hold up.

The question is whether biomedical research, more broadly, is willing to diversify its incentive structures and organizational models. The field is enormously successful—NIH funding, breakthrough therapeutics, extended lifespans. Why change a winning formula?

BACK TO THAT 1980S CASE

The postdoc who brought misconduct charges understood something important: when data are being altered to support “the party line,” someone needs to object. That takes courage—postdocs are vulnerable, whistleblowers face retaliation, and questioning senior scientists is risky.

But here’s what I’ve come to understand that I didn’t fully appreciate forty years ago: the party line wasn’t imposed from outside. It emerged from structural features of how we organize research. The PI who allegedly manipulated data wasn’t serving some external master. They were serving their own hypothesis, the idea they’d built a career around, the theory their lab existed to develop.

That makes the problem both worse and better than simple corruption. Worse, because it means well-meaning scientists with good intentions can slide into questionable practices without recognizing it. The same motivated reasoning that drives fraud also drives less dramatic but equally problematic biases in how we collect, analyze, and report data.

Better because it means organizational redesign might help. We can’t eliminate human fallibility or the emotional attachment scientists develop to their ideas. However, we can design systems that reduce the extent to which outcomes depend on any particular hypothesis being correct. We can create structures where admitting you were wrong is professionally survivable. We can reward rigor over drama, collective progress over individual breakthroughs.

The Path Forward

I’m not optimistic about radical transformation. The biomedical research enterprise is vast, successful, and institutionally entrenched. The romantic model of the lone investigator testing brilliant hypotheses is deeply embedded in how we tell science stories, train graduate students, and allocate prestige.

But incremental change is possible:

Funding agencies can require pre-registration for hypothesis-driven research while also funding more exploratory, team-based approaches. NIH’s BRAIN Initiative and precision medicine programs already point in this direction. Expanding these models would diversify how research gets organized.

Journals can mandate data sharing and the use of registered reports. Some journals already do this; others resist for fear of losing exciting submissions to competitors. But collective action could shift norms. If high-impact journals required rigorous transparency, researchers would adapt.

Universities can broaden tenure criteria to value methodological rigor, replication, infrastructure development, and collaborative contributions, alongside traditional metrics of independent research. This requires courage because it means promoting faculty who don’t fit the standard template, but it’s feasible.

Training programs can teach critical evaluation of one’s own hypotheses. Rather than just training students to design clever experiments and write compelling grants, we can teach them to actively look for ways they might be wrong, to value evidence against their theories, and to see changing one’s mind as a strength rather than a weakness. This is partly cultural, partly structural.

Funders can experiment with alternative models. Fund some research explicitly as adversarial collaboration. Fund some as team science with distributed leadership. Fund some as infrastructure development. Create parallel tracks so researchers can build careers through multiple pathways, reducing the pressure to develop intense attachment to specific hypotheses.

None of this will eliminate fraud—there will always be individuals who cheat. However, it might reduce the structural pressures that push honest scientists toward motivated reasoning and, in some cases, scientists toward outright fabrication.

Integrity is More Than Honesty

That 1980s case I barely remember continues to inform my thinking, not because I have clear memories of it but because it captures something essential: scientific integrity requires more than individual honesty. It requires organizational structures that don’t push even honest people toward biased reasoning.

The postdoc filing charges was practicing integrity. But they were fighting against a system where a PI’s attachment to their hypothesis created pressure—probably unconscious, probably rationalized, but pressure nonetheless—to make the data fit the theory. One brave postdoc can’t fix structural problems alone.

We’ve built an enormously productive research enterprise. Biomedical science has achieved genuine miracles. The hypothesis-driven, PI-centered model has generated breakthrough after breakthrough. I’m not arguing it’s failed—clearly it hasn’t.

However, I argue it’s flawed in predictable ways. The same features that make it successful—individual investigators developing strong convictions about important ideas and pursuing them relentlessly—also create conditions for motivated reasoning, questionable research practices, and occasional fraud.

Acknowledging those flaws doesn’t diminish the achievements. It opens space for experimentation with alternative models that might reduce the problematic incentives while preserving the creative energy that drives discovery. The question is whether we’re willing to diversify how we organize research or whether we’ll continue over-relying on a single model because it’s familiar and has worked in the past.

The endless frontier that Vannevar Bush envisioned shouldn’t be endless in just one direction. It should include exploring different ways of pursuing knowledge, different structures for organizing inquiry, and different incentives for rewarding contributions to collective understanding.

That’s the real challenge: not just preventing fraud but creating systems where the pressures toward fraud—and toward less dramatic but equally problematic biases—are reduced. Where changing your mind based on evidence is professionally rewarded rather than punished. Where attachment to ideas is balanced by commitment to collective truth-seeking.

The party line that worries me most isn’t imposed by political power. It’s the party line we impose on ourselves when we become too attached to our own hypotheses, when our professional identities become too entangled with specific theories, when the systems we’ve built make admitting error too costly. We need to extend that understanding to object not only to individual fraud but also to organizational structures that make such fraud more likely. Building not just oversight systems but alternative models of how to pursue science.

That’s the integrity challenge for the next forty years.

How Will You Know You’ve Succeeded? A BRAIN story

Photo by Tima Miroshnichenko on Pexels.com

August 2008: A summer day in Mountain View California. The previous year, In 2007, The Krasnow Institute for Advanced Study, which I was leading at George Mason University, had developed a proposal to invest tons of money in figuring out how mind emerges from brains and now I had to make the case that it deserved to be a centerpiece of a new administration’s science agenda. Three billion dollars is not a small ask, especially in the context of the 2008 financial crisis that was accelerating.

Before this moment, the project had evolved organically: a kickoff meeting at the Krasnow Institute near D.C., a joint manifesto published in Science Magazine, and then follow-on events in Des Moines, Berlin and Singapore to emphasize the broader aspects of such a large neuroscience collaboration. There even had been a radio interview with Oprah.

When I flew out to Google’s Mountain View headquarters in August 2008 for the SciFoo conference, I didn’t expect to be defending the future of neuroscience over lunch. But the individual who was running the science transition for the Obama Presidential Campaign, had summoned me for what he described as a “simple” conversation: defend our idea for investing $3 billion over the next decade in neuroscience with the audacious goal of explaining how “mind” emerges from “brains.” It was not the kind of meeting I was ready for.

I was nervous. As an institute director, I’d pitched for million-dollar checks. This was a whole new scale of fundraising for me. And though, California was my native state, I’d never gone beyond being a student body president out there. Google’s headquarters in summer of 2008 was an altar to Silicon Valley power.

SciFoo itself was still in its infancy then – the whole “unconference” concept felt radical and exciting, a fitting backdrop for pitching transformational science. But the Obama campaign wasn’t there for the unconventional meeting format. Google was a convenient meeting spot. And they wanted conventional answers.

I thought I made a compelling case: this investment could improve the lives of millions of patients with brain diseases. Neuroscience was on the verge of delivering cures. (I was wrong about that, but I believed it at the time.) The tools were ready. The knowledge was accumulating. We just needed the resources to put it all together.

Then I was asked the question that killed my pitch: “How will we know we have succeeded? What’s the equivalent of Kennedy’s moon landing – a clear milestone that tells us we’ve achieved what we set out to do?” You could see those astronauts come down the ladder of the lunar module. You could see that American flag on the moon. No such prospects with a large neuroscience initiative.

I had no answer.

I fumbled through some vague statements about understanding neural circuits and developing new therapies, but even as the words left my mouth, I knew they were inadequate. The moon landing worked as a political and scientific goal because it was binary: either we put a man on the moon or we didn’t. Either the flag was planted or it wasn’t.

But “explaining how mind emerges from brains”? When would we know we’d done that? What would success even look like?

The lunch ended politely. I flew back to DC convinced it had been an utter failure.

But that wasn’t the end of it. Five years later, at the beginning of Obama’s second presidential term, we began to hear news of a large initiative driven by the White House called the Brain Activity Map or BAM for short. The idea was to comprehensively map the functional activity of brains at high spatial and temporal resolution beyond that available at the time. It was like my original pitch both in scale (dollars) and in the notion that it was important to understand how mind emerges from brain function. The goal for the new BAM project was to be able to map between the activity and the brain’s emergent “mind”-like behavior, both in the healthy and pathological cases. But the BAM project trial balloon, even coming from the White House, was not an immediate slam dunk.

There was immediate push-back from large segments of the neuroscience community that felt excluded from BAM, but with a quick top-down recalibration from the White House Office of Science and Technology Policy and a whole of government approach that included multiple science agencies, BRAIN (Brain Research through Advancing Innovative Neurotechnologies) was born in April of 2013.

A year later, in April of 2014, I was approached to head Biological Sciences at the US National Science Foundation. When I took the job that October, I was leading a directorate with a budget of $750 million annually that supported research across the full spectrum of the life sciences – from molecular biology to ecosystems. I would also serve as NSF’s co-lead for the Obama Administration’s BRAIN Initiative—an acknowledgement of the failed pitch in Mountain View, I guess.

October 2014: sworn in and meeting with my senior management team–now here I was, a little more than a year into BRAIN. I had gotten what I’d asked for in Mountain View. Sort of. We had the funding, we had the talent, we had review panels evaluating hundreds of proposals. But I kept thinking about the question—the one I couldn’t answer then and still struggled with now. We had built this entire apparatus for funding transformational research, yet we were asking reviewers to apply the same criteria that would have rejected Einstein’s miracle year. How do you evaluate research when you can’t articulate clear success metrics? How do you fund work that challenges fundamental assumptions when your review criteria reward preliminary data and well-defined hypotheses?

Several months later, testifying before Congress about the BRAIN project, I remember fumbling again at the direct question of when we would deliver cures for dreaded brain diseases like ALS and Schizophrenia. I punted: that was an NIH problem (even though the original pitch had been about delivering revolutionary treatments. At NSF, we were about understanding the healthy brain. In fact, how could you ever understand brain disease without a deep comprehension of the non-pathological condition?

It was a reasonable bureaucratic answer. NIH does disease; NSF does basic science. Clean jurisdictional boundaries. But sitting there in that hearing room, I realized I was falling into the same trap that had seemingly doomed our pitch in 2008: on being asked for the delivery date of a clear criterion for success, I was waffling. Only this time, I was the agent for the funder: the American taxpayer.

The truth was uncomfortable. We had launched an initiative explicitly designed to support transformational research – research that would “show us how individual brain cells and complex neural circuits interact” in ways we couldn’t yet imagine. But when it came time to evaluate proposals, we fell back on the same criteria that favored incrementalism: preliminary data, clear hypotheses, established track records, well-defined deliverables. We were asking Einstein for preliminary data on special relativity.

And we weren’t unique. This was the system. This was how peer review worked across federal science funding. We had built an elaborate apparatus designed to be fair, objective, and accountable to Congress and taxpayers. What we had built was a machine that systematically filtered out the kind of work that might transform neuroscience.

All of this was years before the “neuroscience winter”—where massive scientific misconduct was unearthed in neurodegenerative disease research—which included Alzheimer’s. But the modus operandi of BRAIN foreshadowed it.

Starting in 2022, a series of investigations revealed that some of the most influential research on Alzheimer’s disease—work that had shaped the field for nearly two decades and guided billions in research funding—was built on fabricated data. Images had been manipulated. Results had been doctored. And this work had sailed through peer review at top journals, had been cited thousands of times, and had successfully competed for grant funding year after year. The amyloid hypothesis, which this fraudulent research had bolstered, had become scientific orthodoxy not because the evidence was overwhelming, but because it fit neatly into the kind of clear, well-defined research program that review panels knew how to evaluate.

Here was the other side of the Einstein problem that I’ve mentioned in previous posts. The same system that would have rejected Einstein’s 1905 papers for lack of preliminary data and institutional support had enthusiastically funded research that looked rigorous but was fabricated. Because the fraudulent work had all the elements that peer review rewards: clear hypotheses, preliminary data, incremental progress building on established findings, well-defined success metrics. It looked like good science. It checked all the boxes.

Meanwhile, genuinely transformational work—the kind that challenges fundamental assumptions, that crosses disciplinary boundaries, that can’t provide preliminary data because the questions are too new—struggles to get funded. Not because reviewers are incompetent or malicious, but because we’ve built a system that is literally optimized to make these mistakes. We’ve created an apparatus that rewards the appearance of rigor over actual discovery, that favors consensus over challenge, that funds incrementalism and filters out transformation.

So, what’s the real function of peer review? It’s supposed to be about identifying transformative research, but I don’t think that the real purpose. To my mind, the real purpose of peer review panels at NSF, the study sections at NIH, is to make inherently flawed funding decisions defensible—both to Congress and the American taxpayer. The criteria, intellectual merit, broader impacts at NSF, make awarding grant dollars auditable and fair seeming, not because they identify breakthrough work.

But honestly, there’s a real dilemma here: if you gave out NSF’s annual budget based on a program officer’s feeling that “this seems promising”, you’d face legitimate questions about cronyism, waste and arbitrary decision-making. The current system’s flaws aren’t bad policy accidents; they are the price we pay for other values we also care about.

So, did the BRAIN Initiative deliver on that pitch I made in Mountain View in 2008? Did we figure out how ‘mind’ emerges from ‘brains’? In retrospect, I remain super impressed by NSF’s NeuroNex program: we got impressive technology – better ways to record from more neurons, new imaging techniques, sophisticated tools. We trained a generation of neuroscientists. But that foundational question – the one that made the political case, the one that justified the investment – we’re not meaningfully closer to answering it. We made incremental progress on questions we already knew how to ask. Which is exactly what peer review is designed to deliver. Oh, and one other thing that was produced: NIH’s parent agency, the Department of Health and Human Services, got a trademark issued on the name of the initiative itself, BRAIN.

I spent four years as NSF’s co-lead on BRAIN trying to make transformational neuroscience happen within this system. I believed in it. I still believe in federal science funding. But I’ve stopped pretending the tension doesn’t exist. The very structure that makes BRAIN funding defensible to Congress made the transformational science we promised nearly impossible to deliver.

That failed pitch at Google’s headquarters in 2008. Turns out that the question was spot on we just never answered it.

Why Transformational Science Can’t Get Funded: The Einstein Problem

Proposal declined. Insufficient institutional support. No preliminary data. Applicant lacks relevant expertise—they work in a patent office, not a research laboratory. The proposed research is too speculative and challenges well-established physical laws without adequate justification. The principal investigator is 26 years old and has no prior experience in physics.

This would have been the fate of Albert Einstein in 1905, had the NSF existed as it does today. Even with grant calls requesting ‘transformative ideas,’ an Einstein proposal would have been rejected outright. And yet, that year 1905 has been called Einstein’s miracle year. Yes, he was a patent clerk working in Bern, Switzerland, without a university affiliation. He had neither access to a laboratory nor equipment. He worked in isolation on evenings and weekends and was unknown in the physics community. Yet, despite those disadvantages, he produced four revolutionary papers on the Photoelectric Effect, Brownian motion, Special Relativity, and the famous E=mc²energy-mass equivalence.

Taken as a whole, the work was purely theoretical. There were no preliminary data. The papers challenged fundamental assumptions of the field and, as such, were highly speculative and definitively high-risk. There were no broader impacts because there were no immediate practical applications. And the work was inherently multidisciplinary, bridging mechanics, optics, and thermodynamics. Yet, the work was transformative. By modern grant standards, Einstein’s work failed every criterion.

The Modern Grant Application – A Thought Experiment

Let’s imagine Einstein’s 1905 work packaged as a current NSF proposal. What would it look like, and how would it fare in peer review?

Einstein’s Hypothetical NSF Proposal

Project Title: Reconceptualizing the Fundamental Nature of Space, Time, and the Propagation of Light

Principal Investigator: Albert Einstein, Technical Expert Third Class, Swiss Federal Patent Office

Institution: None (individual applicant)

Requested Duration: 3 years

Budget: $150,000 (minimal – just salary support and travel to one conference)

Project Summary

This proposal challenges the fundamental assumptions underlying Newtonian mechanics and Maxwell’s electromagnetic theory. I propose that space and time are not absolute but relative, dependent on the observer’s state of motion. This requires abandoning the concept of the luminiferous ether and reconceptualizing the relationship between matter and energy. The work will be entirely theoretical, relying on thought experiments and mathematical derivation to establish a new framework for understanding physical reality.

How NSF Review Panels Would Evaluate This

Intellectual Merit: Poor

Criterion: Does the proposed activity advance knowledge and understanding?

Panel Assessment: The proposal makes extraordinary claims without adequate preliminary data. The applicant asserts that Newtonian mechanics—the foundation of physics for over 200 years—requires fundamental revision yet provides no experimental evidence supporting this radical departure.

Specific Concerns:

Lack of Preliminary Results: The proposal contains no preliminary data demonstrating the feasibility of the approach. There are no prior publications by the applicant in peer-reviewed physics journals. The applicant references his own unpublished manuscripts, which cannot be evaluated.

Methodology Insufficient: The proposed “thought experiments” do not constitute rigorous scientific methodology. How will hypotheses be tested? What experimental validation is planned? The proposal describes mathematical derivations but provides no pathway to empirical verification. Without experimental confirmation, these remain untestable speculations.

Contradicts Established Science: The proposal challenges Newton’s laws of motion and the existence of the luminiferous ether—concepts supported by centuries of successful physics. While scientific progress requires questioning assumptions, such fundamental challenges require extraordinary evidence. The applicant provides none.

Lack of Expertise: The PI works at a patent office and has no formal research position. He has no advisor supporting this work, no collaborators at research institutions, and no track record in theoretical physics. His biosketch lists a doctorate from the University of Zurich but no subsequent research appointments or publications in relevant areas.

Representative Reviewer Comments:

Reviewer 1: “While the mathematical treatment shows some sophistication, the fundamental premise—that simultaneity is relative—contradicts basic physical intuition and has no experimental support. The proposal reads more like philosophy than physics.”

Reviewer 2: “The applicant’s treatment of the photoelectric effect proposes that light behaves as discrete particles, directly contradicting Maxwell’s well-established wave theory. This is not innovation; it’s contradiction without justification.”

Reviewer 3: “I appreciate the applicant’s ambition, but this proposal is not ready for funding. I recommend the PI establish himself at a research institution, publish preliminary findings, and gather experimental evidence before requesting support for such speculative work. Perhaps a collaboration with experimentalists at a major university would strengthen future submissions.”

Broader Impacts: Very Poor

Criterion: Does the proposed activity benefit society and achieve specific societal outcomes?

Panel Assessment: The proposal fails to articulate any concrete broader impacts. The work is purely theoretical with no clear pathway to societal benefit.

Specific Concerns:

No Clear Applications: The proposal does not explain how reconceptualizing space and time would benefit society. What problems would this solve? What technologies would it enable? The PI suggests the work is “fundamental” but provides no examples of potential applications.

No Educational Component: There is no plan for training students or postdocs. The PI works alone at a patent office, with no access to students and no institutional infrastructure for education and training.

No Outreach Plan: The proposal includes no activities to communicate findings to the public or policymakers. There is no plan for broader dissemination beyond potential publication in physics journals.

Questionable Impact Timeline: Even if the proposed theories are correct, the proposal provides no timeline for practical applications. How long until these ideas translate into societal benefit? The proposal is silent on this critical question.

Representative Reviewer Comments:

Reviewer 1: “The broader impacts section is essentially non-existent. The PI states that ‘fundamental understanding of nature has intrinsic value,’ but this does not meet NSF’s requirement for concrete societal outcomes.”

Reviewer 2: “I cannot envision how this work, even if successful, would lead to practical applications within a reasonable timeframe. The proposal needs to articulate a clear pathway from theory to impact.”

Reviewer 3: “NSF has limited resources and must prioritize research with demonstrable benefits to society. This proposal does not make that case.”

Panel Summary and Recommendation

Intellectual Merit Rating: Poor
Broader Impacts Rating: Very Poor

Overall Assessment: While the panel appreciates the PI’s creativity and mathematical ability, the proposal is highly speculative, lacks preliminary data, contradicts established physical laws without sufficient justification, and fails to articulate broader impacts. The PI’s lack of institutional affiliation and research track record raises concerns about feasibility.

The panel notes that the PI appears talented and encourages resubmission after:

Establishing an independent position at a research institution
Publishing preliminary findings in peer-reviewed journals
Developing collaborations with experimental physicists
Articulating a clearer pathway to practical applications
Demonstrating broader impacts through education and outreach

Recommendation: Decline

Panel Consensus: Not competitive for funding in the current cycle. The proposal would need substantial revision and preliminary results before it could be considered favorably.

The Summary Statement Einstein Would Receive

Dear Dr. Einstein,

Thank you for your submission to the National Science Foundation. Unfortunately, your proposal, “Reconceptualizing the Fundamental Nature of Space, Time, and the Propagation of Light,” was not recommended for funding.

The panel recognized your ambition and mathematical capabilities but identified several concerns that prevented a favorable recommendation:

– Lack of preliminary data supporting the feasibility of your approach – Insufficient experimental validation of your theoretical claims
– Absence of institutional support and research infrastructure – Inadequate articulation of broader impacts and societal benefits

We encourage you to address these concerns and consider resubmission in a future cycle. You may wish to establish collaborations with experimentalists and develop a clearer pathway from theory to application.

We appreciate your interest in NSF funding and wish you success in your future endeavors.

Sincerely,
NSF Program Officer

And that would be it. Einstein’s miracle year—four papers that transformed physics and laid the groundwork for quantum mechanics, nuclear energy, GPS satellites, and our modern understanding of the cosmos—would have died in peer review, never funded, never attempted.

The system would have protected us from wasting taxpayer dollars on such speculation. It would have worked exactly as designed.

The Preliminary Data Paradox

The contemporary scientific grant review process implicitly expects foundational work in transformative science to present preliminary data, despite knowing that truly groundbreaking ideas often do not originate from such tangible evidence but instead evolves through thought experiments and mathematical derivations, as Einstein did. This unrealistic expectation stifles innovation at its core – the process essentially forces researchers like Einstein to abandon pure theoretical exploration and confine them to a narrow experimental framework, where they cannot freely challenge existing paradigms, even when their work holds no immediate empirical validation yet promises to revolutionize our understanding fundamentally.

The Risk-Aversion Problem

Often, in grant reviews, I see a very junior reviewer criticize work as being too risky—dooming the proposal to failure—while simultaneously sensing their admiration for the promise and transformative nature of the work. The conservative nature and risk-averse mentality of modern grant review panels are deeply rooted in the scientific community’s culture that values incremental advances over speculative leaps – a bias born from career motivations wherein funding decisions can make or break one’s professional trajectory. Reviewers often exhibit reluctance to invest support into proposals like Einstein’s, as they pose potential controversy and may not align with personal research interests due to the associated risks of failure – a reflection of how science has traditionally evolved through evolutionary rather than revolutionary processes within academic institutions.

The Credentials Catch-22

To secure funding in today’s scientific landscape, one often needs institutional affiliation and an impressive publication record that reflects strong research credentials – a catch-22 scenario wherein groundbreaking innovators with no formal backing or prior experience find it challenging to gain the trust of reviewers. This requirement discriminates against fresh perspectives from individuals such as Einstein, who was working outside established institutions and did not have access to mentorship, which is typically deemed necessary for academic recognition – a stark contrast in how transformative outsider thinkers with unconventional backgrounds historically nurtured science.

The Short-Term Timeline Problem

Einstein developed special relativity over years with no milestones, no quarterly reports, no renewals. How would he answer, ‘What will you accomplish in Year 2?” The funding cycle durations set forth by major grant agencies, such as NSF’s typical three to five years for regular grants and the NIH’s maximum of five years, do not accommodate the long periods necessary for fully developing foundational theories that require time-intensive evolution. Such timelines impose an unfair constraint on researchers like Einstein, whose transformative ideas did not evolve within strict milestones but unfolded in an unconstrained fashion – showcasing the incompatibility of this model with truly revolutionary scientific discoveries where a linear progression is unrealistic and even counterproductive.

The Impact Statement Trap

Requirements for demonstrating immediate “broader impacts” or societal benefits pose significant obstacles to transformative research proposals that often envision far-reaching implications beyond their direct applications – an aspect Einstein’s work exemplifies best with its foundational role in advancing physics. The trap lies when reviewers, fearing potential misuse of speculative science or unable to perceive future benefits due to cognitive biases, force research proposals into a mold where immediate practical impact takes precedence over visionary scientific contributions, further marginalizing transformative studies that could potentially unlock new dimensions in various fields.

The Interdisciplinary Gap

The inherent disciplinarity of current grant funding schemes disconnects them from the interdisciplinary essence required for revolutionary research proposals like Einstein’s – a reality where transformative work frequently transcends conventional academic boundaries by merging concepts across multiple fields. This approach often results in an exclusion not only based on institutional affiliation but also because of its challenge to compartmentalized funding models that struggle with the non-linear, cross-disciplinary nature integral to truly transformative science – a significant obstacle for proposals inherently interdisciplinary yet unable to fit neatly within program structures or expertise.

The hypothetical funding scenarios for transformational science, as presented through the lens of Albert Einstein’s groundbreaking work, illustrate the inherent challenges faced by revolutionary ideas. To further highlight this problem, let’s take a look at other seminal discoveries that may have been overlooked or deemed unworthy of support under current grant review criteria:

Copernicus’ Heliocentric Model: In a contemporary setting, Copernicus’ heliocentric model might face skepticism due to its challenge to the widely accepted geocentric view of the universe. Lacking preliminary data and facing resistance from established religious beliefs, his proposal would likely be rejected under modern grant review criteria, despite its ultimate validation through observation and mathematical proof.

Gregor Mendel’s Pea Plant Experiments: The foundation of modern genetics was laid by Mendel’s pea plant experiments, yet his work remained largely unnoticed for decades after its initial publication. A grant reviewer in 1863 would likely have dismissed Mendel’s findings as too speculative and without immediate practical applications, thereby overlooking the fundamental insights he provided about heredity and genetic inheritance.

mRNA Vaccines: Katalin Karikó spent decades struggling to fund mRNA therapeutic research. Too risky. Too speculative. No clear applications. Penn demoted her. NIH rejected her grants. Reviewers wanted proof that mRNA could work as a therapeutic platform, but without funding, she couldn’t generate that proof. Then COVID-19 hit, and mRNA vaccines saved millions of lives. The technology that couldn’t get funded became one of the most important medical breakthroughs of the century.

Why does all of this matter now? First, the evidence is mounting that American science is at an inflection point. The rate of truly disruptive discoveries—those that reshape fields rather than incrementally advance them—has been declining for decades, even as scientific output has grown. Both NSF and NIH leadership recognize this troubling trend.

This innovation crisis manifests in the problems we cannot solve. Cancer and Alzheimer’s have resisted decades of intensive research. AI alignment and safety remain fundamentally unsolved as we deploy increasingly powerful systems. We haven’t returned to the moon in over 50 years. In my own field of neuroscience, incremental progress has failed to produce treatments for the diseases that devastate millions of families.

These failures point to a deeper problem: we’ve optimized our funding system for incremental advances, not transformational breakthroughs. Making matters worse, we’re losing ground internationally. China’s funding models allow longer timelines and embrace higher risk. European ERC grants support more adventurous research. Many of our best researchers now weigh opportunities overseas or in industry, where they can pursue riskier ideas with greater freedom.

What Needs to Change

Fixing this requires fundamental changes at multiple levels—from how we structure programs to how we evaluate proposals to how we support unconventional researchers.

Create separate funding streams for high-risk research. NSF and NIH need more programs that emulate DARPA’s high-risk, high-reward model. These programs should be insulated from traditional grant review: no preliminary data required, longer timelines (10+ years), and peer review conducted by scientists who have themselves taken major risks and succeeded. I propose that 10 percent of each agency’s budget be set aside for “Einstein Grants”—awards that take the view opposite the status quo. Judge proposals on originality and potential impact, not feasibility and preliminary data. Accept that most will fail, but the few that succeed will be transformational.

Protect exploratory research within traditional programs. Even standard grant programs should allow pivots when researchers discover unexpected directions. We should fund people with track records of insight, not just projects with detailed timelines. Judge proposals on the quality of thinking, not the completeness of deliverables.

Reform peer review processes. The current system needs three critical changes. First, separate review tracks for incremental versus transformational proposals—they require fundamentally different evaluation criteria. Second, don’t let a single negative review kill bold ideas; if three reviewers are enthusiastic and one is skeptical, fund it. Third, value originality over feasibility. The most transformational ideas often sound impossible until someone proves otherwise.

Support alternative career paths. We should fund more researchers outside traditional academic institutions and recognize that the best science doesn’t always emerge from R1 universities. Explicitly value interdisciplinary training and create flexible career paths that don’t punish researchers who take time to develop unconventional ideas. Track where our most creative researchers go when they leave academia—if we’re consistently losing them to industry or foreign institutions, that’s a failure signal we must heed.

Acknowledge the challenge ahead. These reforms require sustained political will across multiple administrations and consistent support from Congress. They demand patience—accepting that transformational breakthroughs can’t be scheduled or guaranteed. But the alternative is clear: we continue optimizing for incremental progress while the fundamental problems remain unsolved and our international competitors embrace the risk we’ve abandoned.

The choice before us is stark. We can optimize the current system for productivity—incremental papers, measurable progress—or we can create space for transformative discovery. We cannot have both with the same funding mechanisms.

The cost of inaction is clear: we will miss the next Einstein, fall further behind in fundamental discovery, watch science become a bureaucratic exercise, and lose what made American science into a powerhouse of discovery.

This requires action at every level. Scientists must advocate for reform and be willing to champion risky proposals. Program officers must have the courage to fund work that reviewers call too speculative. Policymakers must create new funding models and resist the temptation to demand near-term results. The public must understand that breakthrough science looks different from incremental progress—it’s messy, unpredictable, and often wrong before it’s right.

In 1905, Einstein changed our understanding of the universe while working in a patent office with no grant funding. Today, our funding system would never have let him try. We need to fix that.

How to reform NIH…

Recently, I’ve mostly written in this respect about the NSF, but I also spent six years at the NIH, as a staff fellow in the intramural program (the biomedical medical center in Bethesda Maryland). When most folks think about the NIH, they are not really focussing on the intramural program. Rather, it’s the extramural program that gives out grant awards to biomedical researchers at US Colleges and Medical Centers that gets the attention. And I guess that’s fine because the extramural program represents about 90% of the NIH budget.

But, if I were going to magically reform the agency, I would focus on the intramural program. That’s because it has so much potential. With an annual budget north of $4B/year, America’s largest research medical center and thousands of young researchers from all over the world, it has so much potential. If Woods Hole is a summer nexus for life sciences during the summer, the NIH Bethesda campus is that thing on steroids year round.

The special sauce for the intramural program is that ideas can become experiments and then discoveries without the usual intermediate step of writing a proposal and waiting to see if it was funded. When I was at NIH, I could literally conceive of a new experiment, order the equipment and reagents and publish the results several months later. Hence, the intramural program has the structure in place to be a major science accelerator.

But, for some reason, when we think of such science accelerators, we generally consider private institutions like HHMI, the Allen Institutes and perhaps the Institute for Advanced Study in Princeton. What about NIH? On the criteria of critical mass, it dwarfs those places.

To my mind the problem lies in NIH’s ‘articles of confederation’ nature: it’s really 27 (or so) different Institutes and other units that are largely quite independent (especially the NCI), with a relatively weak central leadership. And this weak confederation organization plays out, not only on the Hill or in the awarding of extramural awards, but crucially also on the Bethesda campus, where intramural institute program directors rule fiefdoms that are more insular than academic units on a college campus. And this weak organizational architecture acts in the opposite direction of the science accelerator advantage that I wrote about above.

So here’s a big idea: let’s make the intramural program it’s own effective NIH institute. And have Congress authorize it and fund it separately, as a high risk, high payoff biomedical research program for the country. Does that sound like ARPA-H? Ooops. Well, then maybe we should just give the Bethesda campus to ARPA-H.

Stuart Buck’s interview with me…

Is here. I enjoyed our discussion on science funding.

The 54…

My colleague and friend T, sent me this link to a Jeff Mervis piece in SCIENCE. Apparently 54 scientists have lost their jobs as a result of essentially hiding their connections to China while taking funding from the NIH. As with other funding compliance issues (for example protection of human subjects), violations can be career-enders. I am quite sure that other US funding agencies are taking a close look at their PI’s also.

The key issue here for me is not declaring a conflict of interest. If they had, then if I were on the enforcement side of the equation, I’d be looking at ways to manage that conflict. So if I were to hand out advice, it’d be to disclose as much as you possibly can to a funder all the time about anything that might have questionable optics. I suspect these 54 individuals would still be gainfully employed if that had pursued that approach.

That said, I’m disturbed by the implied national distrust of Asian scientists. The use of ethnic background as a trigger for suspicion has a long and sordid history, both here in the US and around the globe.

I’m also saddened by the de-coupling that’s occurring in science collaboration between nations–particularly between the US and China. That’ll be a loss for everyone because the really big science questions can’t be solved in isolation–Manhattan Project not withstanding.

And just when you think things couldn’t get worse…

This news from today’s Washington Post on new procedures for entering the Bethesda Campus. The NIH where I did my postdoc was like the United Nations. We came from all over the globe to improve help humans stay well. In my lab alone, there were individuals from Chile, Spain, Nigeria, Italy, Israel and Australia. Biomedical research is qualitatively different from defense R&D–Zika and Malaria do not respect political boundaries. Nor does Alzheimer’s. I hope my former colleagues in positions of authority there are listening.

Putting a chill on international science…

I saw this piece by Jeff Mervis in SCIENCE today. Basically, if you are supported by NIH and you appear to them to be more “connected” to other nation states than you have explicitly disclosed, your institution may have some explaining to do. As Jeff points out, this can be somewhat confusing, since most productive scientists (particularly in biomedical research) do their work in a manner that crosses-borders–just like Ebola or SARS. This new NIH action affects the many, not the few. As I’ve said from my time at the bully pulpit: science is inherently international. When you publish a journal article, it is read by your colleagues all over the globe (at least if it’s good science). And that dissemination is key to producing more excellent science.

I have no problem with disclosing contacts (although there is a paperwork burden). But creating a culture of intimidation that puts a chill on international collaboration–that is a problem.

The curse of soft money….

UCSF’s Henry Bourne has an interesting piece out in PNAS about the boom/bust cycle in biomedical research and specifically how the most recent version played out with vast over-building of infrastructure combined with a shift to soft-money support for PI’s. The documentation of the problems is very impressive, however the notion that this can be fixed piecemeal at a few “pioneer” research institutions I think is dead wrong. To my mind, such elitism is exactly how we arrived at our current situation. And in fact, I’m pleased to report that it’s actually at non-elite institutions where the hard money regime still exists, supported by tuition and, in the case of publics, some state support.

Do I have a solution? Here’s a possibility: I urge my biomedical colleagues to take a hard look at the decadal surveys of other fields (e.g. astronomy or oceanography) where hard prioritization choices are made nationally on the basis of evidence.

NIH and diminishing returns

Citation-Impact-per-M-Human-Study-768x614

The above is the key data. A take on it from Tyler Cowen’s Marginal Revolution here. I’ve been familiar with these reports for some time. When I was serving on the National Library of Medicine’s Board of Regents, Michael Lauer presented this more nuanced version of the finding, here.