The Hypothesis Trap

When Scientists Fall in Bad Love With Their Own Ideas

Approximately four decades ago, I became a witness in a scientific misconduct case. The charges had been brought by an international postdoc in the lab where I had also worked before moving on, and I cannot remember many of the details, except that my written testimony stated that I knew nothing. But I do remember, in the context of more recent high-profile cases, that the essence of the accusation then was the same as it is now: altering experimental data to support the ‘party line’.

The recent disruption to American science has been extensively documented. Given how deeply intertwined government research dollars are with the budget models for R1 universities and the large academic medical centers, it’s not surprising that those funds were chosen for their leverage, and that the consequence of their being in jeopardy will profoundly alter the course of pursuing Vannevar Bush’s version of the endless frontier.

But I want to explore a different question raised by that long-ago case. When I recall that the essence involved “altering data to support the party line,” I need to ask: whose party line was it? In that case, and in many since, the party line wasn’t imposed by some external authority. It was the PI’s own hypothesis, their pet theory, the idea they’d invested years in developing and defending. The fraud wasn’t about serving power—it was about rescuing a cherished belief from contradictory evidence.

This raises uncomfortable questions about how we organize biomedical research. The current system—hypothesis-driven projects led by individual PIs who develop deep attachments to specific ideas—contains structural flaws that push even honest scientists toward motivated reasoning and occasionally push the dishonest ones past the line into fraud.

The Romantic Model of Science

Our funding system enshrines a particular vision of how science works. A brilliant investigator conceives a hypothesis. They design clever experiments to test it. They write a compelling grant proposal. If funded, they spend 3-5 years testing their idea. Success means publishing papers that confirm the hypothesis, which leads to more grants to extend the work.

This model has romantic appeal. It positions the PI as the creative genius whose insight drives discovery. It makes science a battle of ideas where the best hypotheses prevail. It creates clear narratives: an investigator proposes a theory, designs experiments to test it, and demonstrates it is correct. This is how we teach science, how we write about it in popular accounts, how we celebrate it in awards and prizes.

The problem is that this romantic model creates precisely the conditions under which fraud becomes tempting and honest self-deception becomes nearly inevitable.

When Hypothesis Becomes One’s Identity

Here’s what happened in numerous misconduct cases from the 1980s onward: A researcher develops a hypothesis. It’s not just any hypothesis—it’s their hypothesis, the idea that defines their research program, the theory that distinguishes them from competitors. They build a laboratory around it, recruit students and postdocs to test it, and write grants that promise to extend it.

The hypothesis becomes their professional identity. Colleagues know them as “the person who works on that theory.” Graduate students join their lab specifically to work on that problem. Papers in high-impact journals describe their unique contribution. Tenure committees evaluate whether the hypothesis has generated sufficient publications. Grant review panels judge whether the approach is likely to continue producing results.

Then experiments start yielding contradictory data. Not every experiment—if every experiment failed, the researcher might abandon the hypothesis. However, when enough experiments yield ambiguous or contradictory results, the careful scientist should begin to question the core idea.

This is where the system’s design creates problems. Walking away from the hypothesis means walking away from professional identity, from grants that depend on that research program, from students and postdocs whose projects are built on that framework. It means admitting that years of work may have been directed toward the wrong question. It means watching competitors promote alternative theories.

The pressure isn’t external—nobody is ordering the researcher to maintain their hypothesis. The pressure is structural, built into how we organize careers and evaluate success. When your identity, your lab’s funding, and your scientific reputation all depend on a particular idea being correct, it takes extraordinary intellectual honesty to acknowledge that idea might be wrong.

On the Spectrum: From Delusion to Fraud

Most scientists don’t fabricate data. But many engage in practices that fall short of fraud while still distorting the scientific record. These practices stem from the same structural problem: excessive investment in a specific hypothesis.

Selective reporting occurs when experiments yielding inconvenient results are dismissed as “technical problems,” whereas experiments supporting the hypothesis are published. The researcher isn’t fabricating data—they’re making judgments about which data are “good.” But those judgments are biased by investment in the hypothesis.

Data massaging occurs when researchers make analytical decisions that favor their theory. Which outliers to exclude? How to set cutoffs? Which statistical tests to use? Each decision seems defensible individually, but collectively, they bias results toward the preferred outcome. Again, this isn’t fabrication—it’s motivated reasoning dressed up as methodological choice.

Hypothesis rescue manifests as increasingly elaborate explanations for why experiments that should have supported the theory failed. Maybe the conditions weren’t quite right. Maybe there’s an additional factor we didn’t control for. Maybe the effect is context-dependent. Some auxiliary hypotheses are legitimate scientific refinements. Others are epicycles added to save a failing theory.

Selective collaboration and citation appear when researchers preferentially cite papers supporting their view while ignoring contradictory work. They collaborate with scientists who share their hypothesis, while avoiding those who promote alternatives. This creates echo chambers where a contested theory looks like a consensus because the believers only talk to each other.

These practices aren’t fraud in the legal sense. They’re what happens when intelligent, well-meaning scientists become too invested in particular ideas. The investment doesn’t require conscious dishonesty—it just requires the normal human tendency to see what we expect to see, to value evidence confirming our beliefs more highly than evidence challenging them.

The Cases We Remember

The 1980s wave of misconduct cases illuminates this pattern. Take John Darsee at Harvard Medical School. His fraudulent cardiology research wasn’t random fabrication—it was data manufactured to support his ongoing research program. He was so invested in demonstrating that his approach worked that he fabricated results when experiments didn’t cooperate. His extraordinary productivity should have raised red flags, but it fit the romantic model: the brilliant investigator producing breakthrough after breakthrough.

The Baltimore affair involved Thereza Imanishi-Kari’s immunology data that Margot O’Toole couldn’t replicate. The decade-long controversy ended in 1996 when an appeals board cleared Imanishi-Kari of all misconduct charges. But the case revealed how competing interpretations of the same data can arise when different investigators bring different assumptions to their analysis, and how difficult it becomes to distinguish between legitimate scientific disagreement and potential misconduct when researchers are deeply invested in their theories.

Eric Poehlman’s obesity research fraud—falsifying data in 17 grant applications and 10 publications—followed the same pattern. He had a research program, a reputation, and a stream of funding dependent on showing that his hypotheses about aging and obesity were correct. When data didn’t cooperate, he made them cooperate.

The common thread isn’t that these individuals were uniquely evil. It’s that they were operating in a system where too much depended on specific hypotheses being correct. The same pressures that led them to commit fraud push others into questionable practices and drive everyone toward motivated reasoning.

The Structural Alternative: Team Science

Consider how differently science works in fields that have moved away from the PI-centered hypothesis-driven model.

Large-scale genomics operates with diverse teams interrogating datasets rather than testing specific hypotheses. The question isn’t “Is my theory correct?” but “What patterns exist in these data?” Multiple investigators with different backgrounds and biases analyze the same datasets. Results require replication across labs. The data-sharing infrastructure enables other groups to independently verify findings.

Nobody’s career depends on a specific gene being associated with a particular disease. If your analysis suggests gene X matters but another team’s analysis contradicts that, there’s no professional catastrophe. You’re contributing to collective understanding rather than defending personal theories.

The BRAIN Initiative that I helped launch during my tenure at NSF was designed in part to avoid the hypothesis trap. Rather than funding individual PIs to test specific theories about brain function, it funded tool development, data collection, and infrastructure that multiple investigators could use. The bet was that understanding the brain required comprehensive data and analytical capabilities, not just clever hypotheses.

This doesn’t eliminate all bias—researchers still have preferences about which tools to develop or which brain regions to map. But it reduces the intense personal investment in any particular theory about how the brain works. The focus shifts from testing hypotheses to building shared resources.

Particle physics has worked this way for decades. Nobody at CERN builds a career on predicting a specific particle will or won’t be found. The infrastructure supports collective inquiry. Results require consensus across large collaborations. Data are shared immediately. Multiple teams analyze the same detector output.

Can you imagine a particle physicist fabricating Higgs boson data? The system makes it nearly impossible—not because particle physicists are more ethical, but because the organizational structure distributes both credit and accountability across large teams working with shared data.

The Biomedical Research Counterfactual

What would biomedical research look like if we designed it to minimize the hypothesis trap?

Separation of hypothesis generation from testing. One team develops theories and predictions. A different team, with no stake in the theory’s success, conducts the experiments. The testing team is rewarded for rigorous methods and clear results, not for confirming or refuting specific hypotheses. This isn’t unprecedented—clinical trials often use this model, with statisticians who haven’t seen interim results conducting final analyses.

Registered reports and pre-registration. Require researchers to specify hypotheses, methods, and analyses before collecting data. Journals commit to publishing based on methodological quality, not results. This removes the temptation to massage data because publication is already guaranteed. The researcher benefits from doing careful work, not from obtaining specific results.

Adversarial collaboration. When competing theories exist, fund collaborations between proponents to design jointly agreed-upon decisive tests. Each side specifies in advance what results would falsify their theory. The collaboration is rewarded for clarity and rigor, not for one side winning.

Collective attribution and team leadership. Move away from the PI model toward team leadership with distributed authority. Make it normal for multiple investigators to share senior authorship without hierarchical ordering. Reward contributions to collective projects, not just defending personal theories. This reduces the intensity of individual investment in specific hypotheses.

Diverse parallel approaches. Rather than funding one investigator to test one hypothesis over five years, fund multiple teams to simultaneously test competing hypotheses. Make this explicit: “We think question X is important but don’t know which of three theories is correct, so we’re funding all three approaches.” The field benefits from comparative testing; individual investigators aren’t catastrophically invested in one answer.

The Objections

These suggestions will provoke immediate resistance, much of it justified. The romantic model of science—brilliant individual investigator pursuing visionary ideas—isn’t entirely fiction. Great insights do come from individuals. Breakthrough theories do require conviction to pursue against skepticism. Hypothesis-driven research has produced genuine discoveries.

Moreover, team science and collective approaches have their own challenges. Large collaborations can become bureaucratic. Consensus-building can delay needed action. Distributing credit across many people may reduce individual incentive for excellence. Pre-registration can be gamed by enrolling multiple studies and selectively reporting which ones to complete.

The adversarial collaboration model assumes good faith from competing investigators, which isn’t always present. Separating hypothesis generation from testing may slow progress if the best experiments require an intimate understanding of the theory. Distributed leadership creates coordination problems.

These are real concerns. I’m not arguing for the complete abandonment of hypothesis-driven research or the PI model. But I am arguing that we’ve over-indexed on one way of organizing science—a way that creates predictable problems around motivated reasoning and hypothesis attachment—without seriously considering alternatives that might mitigate those problems.

The Incentive Redesign

The deeper issue is incentive structure. We reward:

Publications in high-impact journals (which prefer dramatic confirmations of interesting hypotheses)
Grant funding (which requires convincing reviewers you’re pursuing important ideas likely to yield results)
Citations (which accumulate for papers making strong claims, not for careful null results)
Awards and prizes (which celebrate breakthroughs, not rigorous refutations)
Tenure and promotion (based on establishing an independent research program—meaning a distinctive hypothesis)

Each incentive encourages researchers to develop strong attachments to specific theories. The scientist who carefully tests a hypothesis, finds ambiguous results, and concludes, “This is more complicated than we thought,” doesn’t thrive under these incentives. The scientist who generates a provocative theory, designs experiments to support it, and publishes dramatic results thrives—even if the theory is ultimately wrong.

We could design different incentives:

Reward rigorous replication attempts
Fund adversarial collaborations that test competing theories
Celebrate careful negative results that prevent the field from pursuing dead ends
Promote scientists who change their minds when evidence demands it
Value contributions to infrastructure and methods that enable collective progress

None of this is unprecedented. Clinical trial statisticians build careers on methodological rigor, not therapeutic breakthroughs. Methods developers in genomics gain recognition for creating tools others use. Psychology researchers are valued for independently testing whether published findings hold up.

The question is whether biomedical research, more broadly, is willing to diversify its incentive structures and organizational models. The field is enormously successful—NIH funding, breakthrough therapeutics, extended lifespans. Why change a winning formula?

BACK TO THAT 1980S CASE

The postdoc who brought misconduct charges understood something important: when data are being altered to support “the party line,” someone needs to object. That takes courage—postdocs are vulnerable, whistleblowers face retaliation, and questioning senior scientists is risky.

But here’s what I’ve come to understand that I didn’t fully appreciate forty years ago: the party line wasn’t imposed from outside. It emerged from structural features of how we organize research. The PI who allegedly manipulated data wasn’t serving some external master. They were serving their own hypothesis, the idea they’d built a career around, the theory their lab existed to develop.

That makes the problem both worse and better than simple corruption. Worse, because it means well-meaning scientists with good intentions can slide into questionable practices without recognizing it. The same motivated reasoning that drives fraud also drives less dramatic but equally problematic biases in how we collect, analyze, and report data.

Better because it means organizational redesign might help. We can’t eliminate human fallibility or the emotional attachment scientists develop to their ideas. However, we can design systems that reduce the extent to which outcomes depend on any particular hypothesis being correct. We can create structures where admitting you were wrong is professionally survivable. We can reward rigor over drama, collective progress over individual breakthroughs.

The Path Forward

I’m not optimistic about radical transformation. The biomedical research enterprise is vast, successful, and institutionally entrenched. The romantic model of the lone investigator testing brilliant hypotheses is deeply embedded in how we tell science stories, train graduate students, and allocate prestige.

But incremental change is possible:

Funding agencies can require pre-registration for hypothesis-driven research while also funding more exploratory, team-based approaches. NIH’s BRAIN Initiative and precision medicine programs already point in this direction. Expanding these models would diversify how research gets organized.

Journals can mandate data sharing and the use of registered reports. Some journals already do this; others resist for fear of losing exciting submissions to competitors. But collective action could shift norms. If high-impact journals required rigorous transparency, researchers would adapt.

Universities can broaden tenure criteria to value methodological rigor, replication, infrastructure development, and collaborative contributions, alongside traditional metrics of independent research. This requires courage because it means promoting faculty who don’t fit the standard template, but it’s feasible.

Training programs can teach critical evaluation of one’s own hypotheses. Rather than just training students to design clever experiments and write compelling grants, we can teach them to actively look for ways they might be wrong, to value evidence against their theories, and to see changing one’s mind as a strength rather than a weakness. This is partly cultural, partly structural.

Funders can experiment with alternative models. Fund some research explicitly as adversarial collaboration. Fund some as team science with distributed leadership. Fund some as infrastructure development. Create parallel tracks so researchers can build careers through multiple pathways, reducing the pressure to develop intense attachment to specific hypotheses.

None of this will eliminate fraud—there will always be individuals who cheat. However, it might reduce the structural pressures that push honest scientists toward motivated reasoning and, in some cases, scientists toward outright fabrication.

Integrity is More Than Honesty

That 1980s case I barely remember continues to inform my thinking, not because I have clear memories of it but because it captures something essential: scientific integrity requires more than individual honesty. It requires organizational structures that don’t push even honest people toward biased reasoning.

The postdoc filing charges was practicing integrity. But they were fighting against a system where a PI’s attachment to their hypothesis created pressure—probably unconscious, probably rationalized, but pressure nonetheless—to make the data fit the theory. One brave postdoc can’t fix structural problems alone.

We’ve built an enormously productive research enterprise. Biomedical science has achieved genuine miracles. The hypothesis-driven, PI-centered model has generated breakthrough after breakthrough. I’m not arguing it’s failed—clearly it hasn’t.

However, I argue it’s flawed in predictable ways. The same features that make it successful—individual investigators developing strong convictions about important ideas and pursuing them relentlessly—also create conditions for motivated reasoning, questionable research practices, and occasional fraud.

Acknowledging those flaws doesn’t diminish the achievements. It opens space for experimentation with alternative models that might reduce the problematic incentives while preserving the creative energy that drives discovery. The question is whether we’re willing to diversify how we organize research or whether we’ll continue over-relying on a single model because it’s familiar and has worked in the past.

The endless frontier that Vannevar Bush envisioned shouldn’t be endless in just one direction. It should include exploring different ways of pursuing knowledge, different structures for organizing inquiry, and different incentives for rewarding contributions to collective understanding.

That’s the real challenge: not just preventing fraud but creating systems where the pressures toward fraud—and toward less dramatic but equally problematic biases—are reduced. Where changing your mind based on evidence is professionally rewarded rather than punished. Where attachment to ideas is balanced by commitment to collective truth-seeking.

The party line that worries me most isn’t imposed by political power. It’s the party line we impose on ourselves when we become too attached to our own hypotheses, when our professional identities become too entangled with specific theories, when the systems we’ve built make admitting error too costly. We need to extend that understanding to object not only to individual fraud but also to organizational structures that make such fraud more likely. Building not just oversight systems but alternative models of how to pursue science.

That’s the integrity challenge for the next forty years.

A Grant Reviewer’s New Year Advice to Proposers: What I’d Tell My Younger Self

Happy New Year! Below is Science Policy Insider’s first posting of 2026:

We were reviewing a proposal that included gorgeous preliminary data and confocal microscopy images from what was, at the time, cutting-edge: two-channel laser-scanning technology. Because the images were both crisply in focus and colored in green and red to reflect the locations of different sub-cellular fluorescent molecular probes, it felt as if this was an extraordinary grant proposal, based on the images themselves, never mind the fact that there was no working hypothesis nor technical consideration of a phenomenon called autofluorescence, where the biomolecules of the cell itself produce their own signal that can be confused with the signals coming from the two probes.

Thanks for reading sciencepolicyinsider! Subscribe for free to receive new posts and support my work.

The panel discussion revealed the problem. Some reviewers were ready to fund based solely on the images. Others raised the autofluorescence issue, the missing hypothesis. But even the skeptics prefaced their concerns with “The data are beautiful, but…” Those pictures had done their job—they made weak science look compelling.

That’s when I learned: awesome preliminary data can cloud objectivity. After reviewing thousands of grants at NIH and NSF over three decades, I’ve seen it happen repeatedly.

So, as you plan your 2026 submissions, here’s what I wish I’d known from the start—lessons that might save you the same learning curve.

Lesson 1: Clarity Beats Cleverness

In my early days, I thought impressive vocabulary and complex sentences demonstrated sophistication. Surely reviewers would appreciate nuanced, academic writing that showcased the full complexity of my thinking. I was wrong.

Clarity wins every time. Reviewers are overwhelmed, often reading up to 15 proposals a week while managing their own labs, teaching loads, and grant deadlines. Simple, direct writing isn’t dumbing down your science—it’s respecting your reviewers’ cognitive bandwidth and making your research accessible to the non-specialists who might be reading it.

Several decades ago, I learned the “grandmother” test. If you can’t explain your research clearly and simply to someone outside your immediate field (like maybe your grandmom), it’s probably not clear enough for a review panel where only one or two people are genuine specialists in your exact area.

Here’s my practical advice: Read your overall proposal goals (or aims) out loud. If you stumble over your own sentences, reviewers will too. Remember that if you can’t explain it, you probably don’t understand it well enough yourself. Make the first paragraph of each section a roadmap for what follows. And use jargon only when necessary—when there’s genuinely no more straightforward way to say it.

I once reviewed a proposal with brilliant research that was nearly incomprehensible to anyone outside the PI’s subspecialty. The same panel reviewed another proposal that explained equally complex ideas with straightforward language. Guess which one got funded?

Lesson 2: Preliminary Data Is About Trust, Not Volume

Early in my career, I believed more data equaled a stronger proposal. Fill those pages with figures! Show them everything you’ve got! Every additional graph strengthens your case, right?

Wrong. It’s the quality of the data that counts.

Here’s what preliminary data does: it answers the question “Can this PI execute what they’re proposing?” It’s not about impressing reviewers with how much you’ve already accomplished. It’s about building trust that you can deliver on your promises. And here’s the thing that surprised me most: including the wrong preliminary data raises more questions than having no data at all.

Show that you can execute the specific methods you’re proposing. Demonstrate the feasibility of your key innovation—the part that’s novel and risky. If you don’t have the correct preliminary data yet, address that gap head-on rather than papering over it with tangentially related work.

The deeper insight here is that reviewers are assessing risk. They’re not asking, “Do you have data?” They’re asking, “Do I trust you can deliver what you’re promising with taxpayer money?” Those are fundamentally different questions.

Lesson 3: Broader Impacts Require Situational Awareness

I initially treated broader impacts as a required checkbox. Standard language about societal benefits and outreach seemed perfectly adequate—everyone writes similar things, right? Just describe some plausible activities and move on.

Reviewers can spot boilerplate instantly. We’ve read hundreds of proposals with identical broader impacts sections, and they all blur together into meaningless noise.

The best broader impacts sections connect to who you are and what you’re genuinely already doing in ways that align with the nation’s best interests. Integration with your research and your actual life matters far more than ambitious plans that sound good on paper.

Scaling is essential: build on what you’re already doing rather than inventing entirely new programs you’ll never have time to implement. Be specific rather than grandiose. If you already mentor undergrads in your lab, explain how this project will train them in new techniques. If you have existing connections to a local K-12 program, describe how you’ll use them—don’t manufacture new partnerships from whole cloth.

Here’s the tell: “We will develop outreach materials” raises immediate skepticism. But “I teach a summer workshop at Lincoln High School’s science program—this research will provide three new hands-on modules on climate modeling” builds trust. One is a vague promise. The other is a concrete plan rooted in existing relationships.

Lesson 4: Budget Justification Actually Matters

I used to think budgets were purely administrative. Surely reviewers barely glanced at them—they cared about the science, not the accounting, right? Standard rates and percentages seemed perfectly sufficient.

Reviewers absolutely read budget justifications. We look for alignment between what you’re proposing to do and what you’re proposing to spend. Misalignment raises immediate red flags. And here’s something that surprised many junior faculty I’ve mentored: over-budgeting is just as problematic as under-budgeting.

Every major budget line should connect clearly to a specific aim in your proposal. Justify why you need that piece of equipment—what will it do that your existing infrastructure can’t? Personnel effort should match the work described. If you’re requesting 50% effort for a postdoc, reviewers should see that postdoc playing a central role in half your aims.

Red flags I’ve seen repeatedly: proposing ambitious international fieldwork with minimal travel budget or requesting full postdoc salary when the proposal’s narrative gives that postdoc almost nothing to do. These inconsistencies make reviewers wonder whether you’ve really thought through how the work will get done.

Lesson 5: How You Handle Weaknesses Reveals Everything

I once believed you should never acknowledge limitations. Defend every choice—project confidence at all costs. Any admission of weakness would be seized upon by reviewers looking for reasons to reject your proposal.

This might be the lesson I wish I’d learned earliest. Reviewers already see the weaknesses in your proposal. Pretending problems don’t exist destroys your credibility far more than the limitations themselves.

How you address limitations reveals your scientific maturity. Acknowledge real problems early and directly. Then explain your mitigation strategy: “If plan A fails, we will try plan B because…” Show you’ve thought through alternatives and have realistic contingency plans.

This lesson became even clearer when I started seeing resubmissions. The response letter matters as much as the revised proposal itself. A defensive tone—arguing with reviewers, insisting they misunderstood you—equals instant rejection. But a response that says “We appreciate the panel’s insights. We have substantially revised Section 2.3 to address concerns about statistical power. New preliminary data (Figure 3) demonstrates feasibility of the alternative approach” shows growth and responsiveness.

Panels respect PIs who demonstrate scientific judgment far more than those who claim perfection. We know perfect proposals don’t exist. We want to see that you can identify problems and solve them.

Lesson 6: The Human Element of Review

I believed grant review was a purely objective, data-driven process where careful reviewers gave equal attention to every proposal, systematically evaluating each against clear criteria.

Reviewers are human. They’re tired. They’re distracted. They have bad days. Panel dynamics matter—who speaks up first, who’s respected, who’s combative. Your proposal isn’t evaluated in isolation; it competes with the others in that review cycle, and comparison effects are real even if the program officers say it shouldn’t be.

Here’s the practical reality: reviewers read proposals at night, on weekends, while traveling. They’re squeezing this work into already overwhelming schedules. If they’re confused by page two, they may never fully engage with your brilliant idea on page eight. Your first page matters disproportionately.

Make your innovation immediately clear. Give reviewers ammunition to advocate for you in panel discussions—clear summary statements they can quote, compelling preliminary data they can point to. The discussant’s job is to convince other panelists to fund your work. Make their job easy.

This isn’t unfair. It’s simply reality. Design your proposal for the actual conditions of review, not the idealized version where everyone reads every word with perfect attention on a quiet Sunday morning with fresh coffee.

Lesson 7: Resubmissions Are About Demonstrating You Listened

I initially thought resubmissions were second chances to explain myself better. The reviewers had clearly misunderstood my brilliant idea. Now I’d show them what I really meant, with more precise explanations and stronger arguments.

Resubmissions are about showing scientific growth. They demonstrate whether you can receive criticism, integrate feedback, and improve your work. The reviewers weren’t wrong—or at least, whether they were wrong doesn’t matter. What matters is whether you can respond constructively to their concerns.

Start your response letter with genuine gratitude, not perfunctory politeness. Group your responses to criticisms thematically rather than addressing them line by line, which makes you look defensive. Show clearly what you changed and where reviewers can find those changes in the revised proposal. If you genuinely disagree with a criticism, do so respectfully and support your position with data, not rhetoric.

The successful resubmissions I’ve seen follow a pattern: acknowledge the feedback, explain the changes, demonstrate improvement with new evidence. The unsuccessful ones argue, defend, and explain why the reviewers didn’t understand the first time.

What These Lessons Reveal About Science Funding

These aren’t just tips for better grant writing. They reveal something more profound about how American science funding works. As I’ve written before, the current system prioritizes risk mitigation over bold ideas. It values clear communication and demonstrates competence over theoretical brilliance. It rewards incremental progress from established investigators more readily than moonshots from newcomers.

I’m not criticizing (this time)—it’s how the system is designed. When you’re allocating hundreds of millions in taxpayer dollars, trust and deliverability matter. Understanding this cultural logic helps you work within the system more effectively.

And it raises an interesting question I’m exploring in my new work on international science policy: Do other countries fund science differently because they assess risk differently? Do European or Asian funding systems embed different assumptions about what science should accomplish? That’s a topic for future posts.

These lessons came from mistakes, from failed proposals, from thousands of hours in review rooms watching good science get rejected for preventable reasons. I wish I’d understood them earlier in my career. I’m offering them now to help you avoid the same learning curve.

What hard-won lessons have you learned about grant writing? What advice would you give your younger self? Share in the comments.

As you prepare your 2026 submissions, remember there’s a human being on the other side of your proposal. Make their job easier. Help them advocate for your science. Give them reasons to say yes.

How Will You Know You’ve Succeeded? A BRAIN story

Photo by Tima Miroshnichenko on Pexels.com

August 2008: A summer day in Mountain View California. The previous year, In 2007, The Krasnow Institute for Advanced Study, which I was leading at George Mason University, had developed a proposal to invest tons of money in figuring out how mind emerges from brains and now I had to make the case that it deserved to be a centerpiece of a new administration’s science agenda. Three billion dollars is not a small ask, especially in the context of the 2008 financial crisis that was accelerating.

Before this moment, the project had evolved organically: a kickoff meeting at the Krasnow Institute near D.C., a joint manifesto published in Science Magazine, and then follow-on events in Des Moines, Berlin and Singapore to emphasize the broader aspects of such a large neuroscience collaboration. There even had been a radio interview with Oprah.

When I flew out to Google’s Mountain View headquarters in August 2008 for the SciFoo conference, I didn’t expect to be defending the future of neuroscience over lunch. But the individual who was running the science transition for the Obama Presidential Campaign, had summoned me for what he described as a “simple” conversation: defend our idea for investing $3 billion over the next decade in neuroscience with the audacious goal of explaining how “mind” emerges from “brains.” It was not the kind of meeting I was ready for.

I was nervous. As an institute director, I’d pitched for million-dollar checks. This was a whole new scale of fundraising for me. And though, California was my native state, I’d never gone beyond being a student body president out there. Google’s headquarters in summer of 2008 was an altar to Silicon Valley power.

SciFoo itself was still in its infancy then – the whole “unconference” concept felt radical and exciting, a fitting backdrop for pitching transformational science. But the Obama campaign wasn’t there for the unconventional meeting format. Google was a convenient meeting spot. And they wanted conventional answers.

I thought I made a compelling case: this investment could improve the lives of millions of patients with brain diseases. Neuroscience was on the verge of delivering cures. (I was wrong about that, but I believed it at the time.) The tools were ready. The knowledge was accumulating. We just needed the resources to put it all together.

Then I was asked the question that killed my pitch: “How will we know we have succeeded? What’s the equivalent of Kennedy’s moon landing – a clear milestone that tells us we’ve achieved what we set out to do?” You could see those astronauts come down the ladder of the lunar module. You could see that American flag on the moon. No such prospects with a large neuroscience initiative.

I had no answer.

I fumbled through some vague statements about understanding neural circuits and developing new therapies, but even as the words left my mouth, I knew they were inadequate. The moon landing worked as a political and scientific goal because it was binary: either we put a man on the moon or we didn’t. Either the flag was planted or it wasn’t.

But “explaining how mind emerges from brains”? When would we know we’d done that? What would success even look like?

The lunch ended politely. I flew back to DC convinced it had been an utter failure.

But that wasn’t the end of it. Five years later, at the beginning of Obama’s second presidential term, we began to hear news of a large initiative driven by the White House called the Brain Activity Map or BAM for short. The idea was to comprehensively map the functional activity of brains at high spatial and temporal resolution beyond that available at the time. It was like my original pitch both in scale (dollars) and in the notion that it was important to understand how mind emerges from brain function. The goal for the new BAM project was to be able to map between the activity and the brain’s emergent “mind”-like behavior, both in the healthy and pathological cases. But the BAM project trial balloon, even coming from the White House, was not an immediate slam dunk.

There was immediate push-back from large segments of the neuroscience community that felt excluded from BAM, but with a quick top-down recalibration from the White House Office of Science and Technology Policy and a whole of government approach that included multiple science agencies, BRAIN (Brain Research through Advancing Innovative Neurotechnologies) was born in April of 2013.

A year later, in April of 2014, I was approached to head Biological Sciences at the US National Science Foundation. When I took the job that October, I was leading a directorate with a budget of $750 million annually that supported research across the full spectrum of the life sciences – from molecular biology to ecosystems. I would also serve as NSF’s co-lead for the Obama Administration’s BRAIN Initiative—an acknowledgement of the failed pitch in Mountain View, I guess.

October 2014: sworn in and meeting with my senior management team–now here I was, a little more than a year into BRAIN. I had gotten what I’d asked for in Mountain View. Sort of. We had the funding, we had the talent, we had review panels evaluating hundreds of proposals. But I kept thinking about the question—the one I couldn’t answer then and still struggled with now. We had built this entire apparatus for funding transformational research, yet we were asking reviewers to apply the same criteria that would have rejected Einstein’s miracle year. How do you evaluate research when you can’t articulate clear success metrics? How do you fund work that challenges fundamental assumptions when your review criteria reward preliminary data and well-defined hypotheses?

Several months later, testifying before Congress about the BRAIN project, I remember fumbling again at the direct question of when we would deliver cures for dreaded brain diseases like ALS and Schizophrenia. I punted: that was an NIH problem (even though the original pitch had been about delivering revolutionary treatments. At NSF, we were about understanding the healthy brain. In fact, how could you ever understand brain disease without a deep comprehension of the non-pathological condition?

It was a reasonable bureaucratic answer. NIH does disease; NSF does basic science. Clean jurisdictional boundaries. But sitting there in that hearing room, I realized I was falling into the same trap that had seemingly doomed our pitch in 2008: on being asked for the delivery date of a clear criterion for success, I was waffling. Only this time, I was the agent for the funder: the American taxpayer.

The truth was uncomfortable. We had launched an initiative explicitly designed to support transformational research – research that would “show us how individual brain cells and complex neural circuits interact” in ways we couldn’t yet imagine. But when it came time to evaluate proposals, we fell back on the same criteria that favored incrementalism: preliminary data, clear hypotheses, established track records, well-defined deliverables. We were asking Einstein for preliminary data on special relativity.

And we weren’t unique. This was the system. This was how peer review worked across federal science funding. We had built an elaborate apparatus designed to be fair, objective, and accountable to Congress and taxpayers. What we had built was a machine that systematically filtered out the kind of work that might transform neuroscience.

All of this was years before the “neuroscience winter”—where massive scientific misconduct was unearthed in neurodegenerative disease research—which included Alzheimer’s. But the modus operandi of BRAIN foreshadowed it.

Starting in 2022, a series of investigations revealed that some of the most influential research on Alzheimer’s disease—work that had shaped the field for nearly two decades and guided billions in research funding—was built on fabricated data. Images had been manipulated. Results had been doctored. And this work had sailed through peer review at top journals, had been cited thousands of times, and had successfully competed for grant funding year after year. The amyloid hypothesis, which this fraudulent research had bolstered, had become scientific orthodoxy not because the evidence was overwhelming, but because it fit neatly into the kind of clear, well-defined research program that review panels knew how to evaluate.

Here was the other side of the Einstein problem that I’ve mentioned in previous posts. The same system that would have rejected Einstein’s 1905 papers for lack of preliminary data and institutional support had enthusiastically funded research that looked rigorous but was fabricated. Because the fraudulent work had all the elements that peer review rewards: clear hypotheses, preliminary data, incremental progress building on established findings, well-defined success metrics. It looked like good science. It checked all the boxes.

Meanwhile, genuinely transformational work—the kind that challenges fundamental assumptions, that crosses disciplinary boundaries, that can’t provide preliminary data because the questions are too new—struggles to get funded. Not because reviewers are incompetent or malicious, but because we’ve built a system that is literally optimized to make these mistakes. We’ve created an apparatus that rewards the appearance of rigor over actual discovery, that favors consensus over challenge, that funds incrementalism and filters out transformation.

So, what’s the real function of peer review? It’s supposed to be about identifying transformative research, but I don’t think that the real purpose. To my mind, the real purpose of peer review panels at NSF, the study sections at NIH, is to make inherently flawed funding decisions defensible—both to Congress and the American taxpayer. The criteria, intellectual merit, broader impacts at NSF, make awarding grant dollars auditable and fair seeming, not because they identify breakthrough work.

But honestly, there’s a real dilemma here: if you gave out NSF’s annual budget based on a program officer’s feeling that “this seems promising”, you’d face legitimate questions about cronyism, waste and arbitrary decision-making. The current system’s flaws aren’t bad policy accidents; they are the price we pay for other values we also care about.

So, did the BRAIN Initiative deliver on that pitch I made in Mountain View in 2008? Did we figure out how ‘mind’ emerges from ‘brains’? In retrospect, I remain super impressed by NSF’s NeuroNex program: we got impressive technology – better ways to record from more neurons, new imaging techniques, sophisticated tools. We trained a generation of neuroscientists. But that foundational question – the one that made the political case, the one that justified the investment – we’re not meaningfully closer to answering it. We made incremental progress on questions we already knew how to ask. Which is exactly what peer review is designed to deliver. Oh, and one other thing that was produced: NIH’s parent agency, the Department of Health and Human Services, got a trademark issued on the name of the initiative itself, BRAIN.

I spent four years as NSF’s co-lead on BRAIN trying to make transformational neuroscience happen within this system. I believed in it. I still believe in federal science funding. But I’ve stopped pretending the tension doesn’t exist. The very structure that makes BRAIN funding defensible to Congress made the transformational science we promised nearly impossible to deliver.

That failed pitch at Google’s headquarters in 2008. Turns out that the question was spot on we just never answered it.

“What Grant Reviewers Actually Look For (and What They Ignore)”

A close colleague of mine at a major US research university begins the process of preparing a grant proposal by creating something he calls a “storyboard”. When I was growing up in LA, the concept of a storyboard was very familiar to me. Many of my high school friends, at the time, aspired to careers in the locally dominant entertainment industry. The storyboard, invented by Walt Disney, used pictures to visualize a movie’s plot flow before production—often even before a screenplay was complete. In the LA movie business, you could look at a storyboard and pretty much get right away what a movie is about.

Back to the colleague of mine who uses storyboard to create grant proposals—his key idea is that you’re done making the storyboard, when someone outside the group can come in, look at it, and come away with a good understanding of what the grant is all about. If the storyboard is coherent, then it’s easy to make the proposal coherent as well. Further, the storyboard often gets reused in a modified fashion as the grant’s central graphic. Yes, a picture is worth several thousand words.

My colleague is onto something profound about how grant review works, across all funders, including those in the private sector. But for this issue of Science Policy Insider, we’re going to consider the agency where I headed up Biological Sciences, the NSF. What about NIH, you may ask? A lot of the principles here go for both agencies. But here, we’re going to focus, laser-like, on the National Science Foundation, even as it undergoes drastic changes.

The Brutal Reality of NSF Panel Review

After sitting through too many grant panels at NSF, I can tell you this: most proposals get 15-20 minutes of discussion time in a panel that’s reviewing 30-50 proposals over three days. Your carefully crafted 15-page research plan? The primary reviewer read it thoroughly. The other two panelists skimmed it. Everyone else glanced at the summary.

This isn’t because reviewers are lazy. They’re exhausted, brilliant researchers who read proposals outside their immediate expertise, often late at night, while also worrying about their own grants, their trainees, and the paper referee statements they owe.

The storyboard approach works because it acknowledges this reality: reviewers are looking for a straightforward narrative they can grasp quickly and defend to the panel.

What Actually Happens in Review Panels

Here’s how it typically unfolds:

9:00 AM, Day Two of panel: The primary reviewer presents your proposal. They have 5 minutes to summarize your aims, approach, and why it matters. If they struggle to articulate your story coherently, you’re in trouble—not because your proposed science is bad, but because they can’t effectively advocate for you.

The secondary and tertiary reviewers add their perspectives. Then the panel discusses. The program officers watch for enthusiasm, coherence of the argument, and whether anyone is deeply opposed.

The proposals that succeed have champions—reviewers who “get it” immediately and can explain why it matters to others. The storyboard method facilitates championing reviewability.

What Reviewers Actually Look For

After watching this process play out thousands of times, here’s what I learned reviewers truly care about:

1. Can I explain this to the panel in 3 minutes?

If your research plan requires a flowchart to understand, the primary reviewer will simplify it—possibly incorrectly. Better to give them the simplified version yourself.

2. Is the question worth answering?

Not “is this interesting?” but “will anyone care about the answer?” Reviewers need to justify spending taxpayer money. Give them that justification explicitly.

3. Can this person actually do this?

No matter what is written down in the solicitation, preliminary data matters enormously, but not for the reason applicants think. It’s not about proving the hypothesis—it’s about proving you have the technical capability and haven’t missed an obvious problem.

4. Is this the right approach?

Reviewers are surprisingly forgiving about whether your specific hypothesis is correct. They’re much less forgiving about whether you’re using appropriate methods or have thought through alternatives.

5. Will this move the field forward?

Notice: not “revolutionize” or “transform”—just move forward. Incremental progress from a well-designed study beats a transformative idea with unclear methods. But doesn’t the call state that the proposed work should change the world? Sure, but from a practical standpoint, what counts for the reviewers is steady progress. And here’s the tricky part: while steady is key for the reviewers, transformative really is important for the program officers who make the penultimate decision. So, a balance is necessary.

What Reviewers Ignore (Even Though You Spent Weeks on It)

The extensive literature review: They skim it to see if you know the field. The 47 citations demonstrating your comprehensive knowledge? They checked that you cited the key papers and moved on.

Your detailed budget justification: Unless something looks wildly off, reviewers assume you know what your research costs. The line-by-line explanation of why you need that particular microscope? Skimmed.

Your publication list: They look at: Do you publish in good journals? Are you productive? Have you published on this topic before? That’s it. The distinction between your 47th and 52nd paper doesn’t matter.

The broader impacts section that you agonized over: I feel guilty about this because, I’ve often harped about broader impacts as a central criterion. Truth: most reviewers read this quickly to verify you addressed it competently. Unless it’s either exceptional or terrible, it rarely drives funding decisions. And these days, broader impacts means how the work will benefit all American citizens (think public health) or US National security.

The Elements That Actually Drive Decisions

Clarity of the research goals: Can the reviewer recite your three main questions without looking at the proposal? If not, rewrite.

Logical flow: Does each aim build on the previous one? Or are they three unrelated projects stapled together? Reviewers can tell.

Feasibility signals: Preliminary data, established collaborations, access to necessary resources, realistic timeline. These say, “this person will actually complete this work.”

Positioning: Is this filling a real gap, or are you slightly tweaking someone else’s approach? Reviewers want to fund work that moves us somewhere new, even if incrementally.

The writing quality: Clear, direct prose suggests clear thinking. Dense, jargon-heavy writing suggests unclear thinking (even if that’s unfair).

The Most Common Mistake

Applicants try to impress reviewers with complexity and comprehensiveness. They want to show they’ve thought of everything, considered every alternative, read every paper.

But reviewers are looking for clarity and confidence. They want to understand quickly what you’re proposing and why it matters. They want to feel confident you’ll succeed.

The storyboard method works because it forces simplicity. If you can’t draw a simple picture of your proposal that an outsider immediately understands, you don’t have a fundable story yet.

But Wait, There’s More

As hinted at above, at NSF, that panel review…. it’s strictly advisory. I’ve personally seen proposals with excellent reviews get declined and the reverse. The key decisional person? That’s the cognizant program officer for the solicitation. These days, there’s an additional vetting to look for alignment with the Administration’s political goals, but that’s a topic for a future newsletter.

What This Means for Your Proposal

Before you write a single word:

Can you explain your project in three sentences?
Can someone outside your subfield understand why it matters?
Do you have a clear narrative arc from question to approach to impact?

If not, you’re not ready to write. You’re ready to storyboard.

Build the simple, clear story first. Then elaborate carefully, making sure every detail serves that core narrative.

Reviewers are smart, busy people trying to identify good science under time pressure. Don’t make them work to understand your brilliance. Give them a story they can grasp, defend, and champion.

That’s what my colleague understood. And based on his funding success rate, the reviewers appreciate it.

The Replication Crisis Is a Market Failure (And We Designed It That Way)

Also published on my newsletter

The replication crisis isn’t a mystery. After presiding over the review for thousands of grants at NSF’s Biological Sciences Directorate, I can tell you exactly why science struggles to reproduce its own findings: we built incentives that reward novelty and punish verification.

A 2016 Nature survey found that over 70% of scientists have failed to reproduce another researcher’s experiments. But this isn’t about sloppy science or bad actors. It’s straightforward economics.

Thanks for reading sciencepolicyinsider! Subscribe for free to receive new posts and support my work.

The Researcher’s Optimization Problem

You have limited time and resources. You can either:

Pursue novel findings → potential Nature paper, grant funding, tenure
Replicate someone’s work → maybe a minor publication, minimal funding, colleagues questioning your creativity

The expected value calculation is obvious. Replication is a public good with privatized costs.

How NSF Review Panels Work

At NSF, I watched this play out in every review panel. Proposals to replicate existing work faced an uphill battle. Reviewers—themselves successful researchers who got there by publishing novel findings—naturally favor creative, untested ideas over verification work.

We tried various fixes. Some programs explicitly funded replication studies. Some review criteria emphasized robustness over novelty. But the core incentive remained: breakthrough science gets you the next grant; careful verification doesn’t.

The problem runs deeper than any single agency. Universities want prestigious publications. Journals want citations. Researchers want tenure. Nobody’s optimization function includes “produces reliable knowledge that someone else can build on.”

The Information Market Is Broken

Even when researchers try to replicate, they’re working with incomplete information. Methods sections in papers are sanitized versions of what actually happened in the lab. “Cells were cultured under standard conditions” means something different in every lab. One researcher’s gentle mixing is another’s vigorous shaking.

This information asymmetry makes replication attempts inherently inefficient. You’re trying to reproduce a result while missing critical details that the original researcher might not even realize mattered.

The Time Horizon Problem

NSF grants run 3-5 years. Tenure clocks run 6-7 years. But scientific truth emerges over decades. We’re optimizing for the wrong timescale.

During my time at NSF, I saw brilliant researchers make pragmatic choices: publish something surprising now (even if it might not hold up) rather than spend two years carefully verifying it. That’s not a moral failing—it’s responding rationally to the incentives we created.

What Would Actually Fix This

Make replication profitable:

Count verification studies equally with novel findings in grant review and tenure decisions
Fund researchers whose job is rigorous replication—make it a legitimate career path
Require data and detailed methods sharing as a funding condition, not an afterthought
Make failed replications as publishable as successful ones

The challenge isn’t technical. It’s institutional. We designed a market that overproduces flashy results and underproduces reliable knowledge. Until we fix the incentives, we’ll keep getting exactly what we’re paying for.

On Reproducibility: Physics versus Life Sciences

Scientific reproducibility—the ability of researchers to obtain consistent results when repeating an experiment—sits at the heart of the scientific method. During my years at the bench and later as the leader of an Institute, it became clear that not all sciences struggle equally with this fundamental principle. Physics experiments tend to be more reproducible than those in life sciences, where researchers grapple with what many call a “reproducibility crisis.” Understanding why reveals something profound about the nature of these disciplines.

The State of Reproducibility Across Sciences

A 2016 Nature survey of over 1,500 researchers revealed the scope of the challenge: more than 70% of scientists have failed to reproduce another researcher’s experiments. The rates varied by field—87% of chemists, 77% of biologists, and 69% of physicists and engineers reported such failures. Notably, 52% of respondents agreed that a significant reproducibility crisis exists.

These numbers tell us something important: reproducibility challenges exist across all scientific disciplines, but they manifest with different severity. Physics hasn’t been immune to these issues, but it has been affected less severely than fields like psychology, clinical medicine, and biology. This isn’t a story of success versus failure—it’s a story of different sciences confronting different kinds of complexity.

The Physics Advantage

When a physicist measures the speed of light or the charge of an electron, they’re studying fundamental constants of nature. These values don’t change based on the lab, the researcher, or the day of the week. A particle accelerator in Geneva produces the same collision energies as one in Illinois. The laws governing pendulum motion work identically whether you’re in Cambridge or Kyoto.

This consistency extends beyond fundamental constants. Physics experiments typically involve controlled, isolated systems where researchers can eliminate or account for confounding variables. A physics experiment might study a single particle in a vacuum, far removed from the messy complexity of the real world. Precise measurement tools, refined over centuries, allow astonishing accuracy. NSF’s LIGO, for instance, can detect gravitational waves by measuring changes smaller than one ten-thousandth the width of a proton—equivalent to noticing a hair’s width change in the distance to the nearest star. The centuries of theoretical understanding that physics has developed makes the field less susceptible to reproducibility failures.

The Life Sciences Challenge

Life sciences researchers face a fundamentally different landscape. They’re not studying isolated particles obeying immutable laws; they’re investigating complex, adaptive systems shaped by evolution, environment, and chance.

Consider a seemingly simple experiment: testing how a drug affects cancer cells. Those cells aren’t uniform entities like electrons. Research has revealed extensive genetic variation across supposedly identical cancer cell lines. The same cell line obtained from different sources can show staggering differences—studies have found that at least 75% of compounds that strongly inhibit some strains of a cell line are completely inactive in others. Each cell line has accumulated unique mutations through genetic drift as they’re independently passaged in different laboratories.

The cells’ behavior changes based on how many times they’ve been cultured, what nutrients they receive, even the material of the culture dish. Research has documented profound variability even in highly standardized experiments, with factors like cell density, passage number, temperature, and medium composition all significantly affecting results. The researcher’s technique in handling the cells matters. Countless variables play roles that are difficult or impossible to fully control.

This complexity manifests in several ways:

Biological variability is the norm, not the exception. No two mice are identical, even if they’re genetically similar. Human patients are wildly variable. A treatment that works brilliantly for one person may fail completely for another with the “same” disease.

Emergent properties mean that biological systems exhibit behaviors that can’t be predicted simply by understanding their components. You can’t predict consciousness by studying individual neurons, just as you can’t predict ecosystem dynamics by studying single organisms.

Context dependence is paramount. A gene doesn’t have a single function—its effects depend on the organism, developmental stage, tissue type, and environmental conditions. The same protein can play entirely different roles in different contexts.

Reframing the “Crisis”

It’s worth questioning whether “crisis” is the right word for what’s happening in life sciences. Some researchers argue that the apparent reproducibility problem may be partly a statistical phenomenon. When fields explore bold, uncertain hypotheses—as life sciences often do—a certain rate of non-replication is expected and even healthy. A hypothesis that’s unlikely to be true a priori may still test positive, and subsequent studies revealing the truth represent science’s self-correcting mechanisms at work rather than a failure.

The complexity of biological systems means that two experiments may differ in ways researchers don’t fully understand, leading to different results not because of poor methodology but because of hidden variables or context sensitivity. This doesn’t excuse sloppy work, but it does suggest we should expect life sciences to have inherently lower replication rates than physics due to the nature of what’s being studied.

The Methodological Gap

These fundamental differences create practical challenges. Physics papers often provide enough detail for precise replication: “We used a 532nm laser with 10mW power at normal incidence…” Life sciences papers might say “cells were cultured under standard conditions”—but what’s “standard” varies between labs. One lab’s “gentle mixing” is another’s vigorous shaking.

The statistical approaches differ too. Physics can often work with small sample sizes because measurement precision is high and variability is low. Life sciences need larger samples to overcome biological variability, yet often work with small sample sizes due to cost, time, or ethical constraints. This makes studies underpowered and results less reliable.

Moving Forward

Recognition of reproducibility challenges has sparked essential reforms. Pre-registration of studies, open data sharing, more rigorous statistical practices, and standardized protocols all help. Some fields are developing reference cell lines and model organisms to reduce variability between labs. Journals are implementing checklists to ensure critical details are reported. These efforts are making a real difference.

Yet we must also accept that perfect reproducibility may be neither achievable nor always desirable in life sciences. Biological variability is a feature, not a bug—it’s the raw material of evolution and the reason life adapts to changing environments. The goal shouldn’t be to make biology as reproducible as physics, but to develop methods appropriate for studying complex, variable systems and to be transparent about the limitations and uncertainties inherent in this work.

Understanding the Divide

The reproducibility divide between physics and life sciences doesn’t reflect a failure in the life sciences. It reflects the reality that living systems are profoundly different from the physical systems that physicists study. Both approaches to science are valid and necessary; they’re simply tackling different kinds of problems with appropriately different tools.

Even physics, with all its advantages, sees nearly 70% of researchers unable to reproduce some experiments. The difference is one of degree, not kind. All science involves uncertainty, iteration, and gradual convergence on truth through many studies rather than single definitive experiments.

Understanding these differences helps us appreciate both the elegant precision of physics and the challenging complexity of life. And perhaps most importantly, it reminds us that the scientific method must be flexible enough to accommodate the full diversity of natural phenomena we seek to understand—from the fundamental particles that never change to the living systems that are constantly evolving.

The Unsung Hero: Why Exploratory Science Deserves Equal Billing with Hypothesis-Driven Research

For decades, the scientific method taught in classrooms has followed a neat, linear path: observe, hypothesize, test, conclude. This hypothesis-driven approach has become so deeply embedded in our understanding of “real science” that research proposals without clear hypotheses often struggle to secure funding. Yet some of the most transformative discoveries in history emerged not from testing predictions, but from simply looking carefully at what nature had to show us.

It’s time we recognize exploratory science—sometimes called discovery science or descriptive science—as equally valuable to its hypothesis-testing counterpart.

What Makes Exploratory Science Different?

Hypothesis-driven science starts with a specific question and a predicted answer. You think protein X causes disease Y, so you design experiments to prove or disprove that relationship. It’s focused, efficient, and satisfyingly definitive when it works.

Exploratory science takes a different approach. It asks “what’s out there?” rather than “is this specific thing true?” Researchers might sequence every gene in an organism, catalog every species in an ecosystem, or map every neuron in a brain region. They’re generating data and looking for patterns without knowing exactly what they’ll find.

The Case for Exploration

The history of science is filled with examples where exploration led to revolutionary breakthroughs. One of my lab chiefs at NIH was Craig Venter, famous for his exploratory project: sequencing the human genome. The Human Genome Project didn’t test a hypothesis—it mapped our entire genetic code, creating a foundation for countless subsequent discoveries. Darwin’s theory of evolution emerged from years of cataloging specimens and observing patterns, not from testing a pre-formed hypothesis. The periodic table organized elements based on exploratory observations before anyone understood atomic structure.

More recently, large-scale exploratory efforts have transformed entire fields. The Sloan Digital Sky Survey mapped millions of galaxies, revealing unexpected structures in the universe. CRISPR technology was discovered through exploratory studies of bacterial immune systems, not because anyone was looking for a gene-editing tool. The explosive growth of machine learning has been fueled by massive exploratory datasets that revealed patterns no human could have hypothesized in advance.

Why Exploration Matters Now More Than Ever

We’re living in an era of unprecedented technological capability. We can sequence genomes for hundreds of dollars, image living brains in real time, and collect environmental data from every corner of the planet. These tools make exploration more powerful and more necessary than ever.

Exploratory science excels at revealing what we don’t know we don’t know. When you’re testing a hypothesis, you’re limited by your current understanding. You can only ask questions you’re smart enough to think of. Exploratory approaches let the data surprise you, pointing toward phenomena you never imagined.

This is particularly crucial in complex systems—ecosystems, brains, economies, climate—where interactions are so intricate that predicting specific outcomes is nearly impossible. In these domains, careful observation and pattern recognition often outperform narrow hypothesis testing.

The Complementary Relationship

None of this diminishes the importance of hypothesis-driven science. Testing specific predictions remains essential for establishing causation, validating mechanisms, and building reliable knowledge. The most powerful scientific progress often comes from the interplay between exploration and hypothesis testing.

Exploratory work generates observations and patterns that inspire hypotheses. Hypothesis testing validates or refutes these ideas, often raising new questions that require more exploration. It’s a virtuous cycle, not a competition.

Overcoming the Bias

Despite its value, exploratory science often faces skepticism. It’s sometimes dismissed as “fishing expeditions” or “stamp collecting”—mere data gathering without intellectual rigor. This bias shows up in grant reviews, promotion decisions, and journal publications.

This prejudice is both unfair and counterproductive. Good exploratory science requires tremendous rigor in experimental design, data quality, and analysis. It demands sophisticated statistical approaches to avoid false patterns and careful validation of findings. The difference isn’t in rigor but in starting point.

We need funding mechanisms that support high-quality exploratory work without forcing researchers to shoehorn discovery-oriented projects into hypothesis-testing frameworks. We need to train scientists who can move fluidly between both modes. And we need to celebrate exploratory breakthroughs with the same enthusiasm we reserve for hypothesis confirmation.

Looking Forward

As science tackles increasingly complex challenges—understanding consciousness, predicting climate change, curing cancer—we’ll need every tool in our methodological toolkit. Exploratory science helps us map unknown territory, revealing features of reality we didn’t know existed. Hypothesis-driven science helps us understand the mechanisms behind what we’ve discovered.

Both approaches are essential. Both require creativity, rigor, and insight. And both deserve recognition as legitimate, valuable paths to understanding our world.

The next time you hear about a massive dataset, a comprehensive catalog, or a systematic survey, don’t dismiss it as “just descriptive.” Remember that today’s exploration creates the foundation for tomorrow’s breakthroughs. In science, as in geography, you can’t know where you’re going until you know where you are.

How America Built Its Science Foundation Before the War Changed Everything

Most people think America’s scientific dominance began with the Manhattan Project or the space race. That’s not wrong, but it misses the real story. By the time World War II arrived, we’d already spent decades quietly building the infrastructure that would make those massive wartime projects possible.

The foundation was laid much earlier, and in ways that might surprise you. What’s more surprising is how close that foundation came to crumbling—and what we nearly lost along the way.

The Land-Grant Revolution

The story really starts in 1862 with the Morrill Act—arguably the most important piece of science policy legislation most Americans have never heard of. While the Civil War was tearing the country apart, Congress was simultaneously creating a network of universities designed to teach “agriculture and the mechanic arts.”

This wasn’t just about farming. The land-grant universities were America’s first systematic attempt to connect higher education with practical problem-solving. Schools like Cornell, Penn State, and the University of California weren’t just teaching Latin and philosophy—they were training engineers, studying crop diseases, and developing new manufacturing techniques.

But here’s what’s remarkable: this almost didn’t happen. The 1857 version of Morrill’s bill faced heavy opposition from Southern legislators who viewed it as federal overreach and Western states who objected to the population-based allocation formula. It passed both houses by narrow margins, only to be vetoed by President Buchanan. The legislation succeeded in 1862 primarily because Southern opponents had left Congress to join the Confederacy.

Private Money Fills a Critical Gap

What’s fascinating—and telling—is how much of early American scientific investment came from private philanthropy rather than government funding. The industrial fortunes of the late 1800s flowed into research, but this created a system entirely dependent on individual wealth and personal interest.

The Carnegie Institution of Washington, established in 1902, essentially functioned as America’s first NSF decades before the actual NSF existed. Andrew Carnegie’s $10 million endowment was enormous—equal to Harvard’s entire endowment and vastly more than what all American universities spent on basic research combined. The Rockefeller Foundation transformed medical education and research on a similar scale.

But imagine if Carnegie had been less interested in science, or if the robber baron fortunes had flowed entirely into art collections and European estates instead. This mixed ecosystem worked, but it was inherently unstable. When economic conditions tightened, private funding could vanish. When wealthy patrons died, research priorities shifted with their successors’ interests.

Corporate Labs: Innovation with Built-In Vulnerabilities

By the 1920s, major corporations were establishing research laboratories. General Electric’s lab, founded in 1900 as the first industrial research facility in America, became the model. Bell Labs, created in 1925 through the consolidation of AT&T and Western Electric research, would later become legendary for discoveries that shaped the modern world.

These corporate labs solved an important problem, bridging the gap between scientific discovery and commercial application. But they also created troubling dependencies. Research priorities followed profit potential, not necessarily national needs. Breakthrough discoveries in fundamental physics might be abandoned if they didn’t promise immediate commercial returns.

More concerning, these labs were vulnerable to economic cycles. During the Great Depression, even well-established research programs faced significant budget cuts and staffing reductions.

Government Stays Reluctantly on the Sidelines

Through all of this, the federal government remained a hesitant, minor player. The National Institute of Health, created in 1930 with a modest $750,000 for building construction, was one of the few exceptions—and even then, the federal government rarely funded medical research outside its own laboratories before 1938.

Most university science departments survived on whatever they could patch together from donors, industry partnerships, and minimal federal grants. The system worked, but precariously. During the Depression, university budgets were slashed, enrollment dropped, and research programs had to be scaled back or eliminated. The National Academy of Sciences saw its operating and maintenance funds drop by more than 15 percent each year during the early 1930s.

The Foundation That Held—Barely

By 1940, America had assembled what looked like a robust scientific infrastructure, but it was actually a precarious arrangement held together by fortunate timing and individual initiative. Strong universities teaching practical skills, generous private funding that could shift with economic conditions, corporate labs vulnerable to business cycles, and minimal federal involvement.

When the war suddenly demanded massive scientific mobilization, the infrastructure held together long enough to support the Manhattan Project, radar development, and other crucial innovations. But it was a closer thing than most people realize. The Depression had already demonstrated the system’s vulnerabilities—funding cuts, program reductions, and the constant uncertainty that came with depending on private largesse.

What We Nearly Lost

Looking back, what’s remarkable isn’t just how much America invested in science before 1940, but how easily much of it could have been lost to economic downturns, shifting private interests, or political opposition. That decentralized mix of public and private initiatives created innovation capacity, but it also created significant vulnerabilities.

The war didn’t just expand American science—it revealed how unstable our previous funding system had been and demonstrated what sustained, coordinated investment could accomplish. The scientific breakthroughs that defined the next half-century emerged not from the patchwork system of the 1930s, but from the sustained federal commitment that followed.

Today’s scientific leadership isn’t an accident of American ingenuity. It’s the direct result of lessons learned from a system that worked despite its fragility—and the decision to build something more reliable in its place. The question is whether we remember why that change was necessary, and what we might lose if we return to depending on unstable, decentralized funding for our most critical research needs.

Post lunch conversation with a colleague: trust in science

Yesterday, I had lunch with a colleague at a favorite BBQ spot in Arlington. Both of us work in science communication, so naturally our conversation drifted to the question that’s been nagging at many of us: why has public trust in scientific institutions declined in recent years? By the time we finished our, actually healthy food, we’d both come to the same conclusion—the current way scientists communicate with the public might be contributing to the problem.

From vaccine hesitancy to questions about research reliability, the relationship between science and society has grown more complex. To understand this dynamic, we need to examine not only what people think about science but also how different cultures approach the validation of knowledge itself.

Harvard scholar Sheila Jasanoff offers valuable insights through her concept of “civic epistemologies”—the cultural practices societies use to test and apply knowledge in public decision-making. These practices vary significantly across nations and help explain why scientific controversies unfold differently in different places.

American Approaches to Knowledge Validation

Jasanoff’s research identifies distinctive features of how Americans evaluate scientific claims:

Public Challenge: Americans tend to trust knowledge that has withstood open debate and questioning. This reflects legal traditions where competing arguments help reveal the truth.

Community Voice: There’s a strong expectation that affected groups should participate in discussions about scientific evidence that impacts them, particularly in policy contexts.

Open Access: Citizens expect transparency in how conclusions are reached, including access to underlying data and reasoning processes.

Multiple Perspectives: Rather than relying on single authoritative sources, Americans prefer hearing from various independent institutions and experts.

How This Shapes Science Communication

These cultural expectations help explain some recent communication challenges. When public health recommendations changed during the COVID-19 pandemic, this appeared to violate expectations for thorough prior testing of ideas. Similarly, when social platforms restricted specific discussions, this conflicted with preferences for open debate over gatekeeping.

In scientific fields like neuroscience, these dynamics have actually driven positive reforms. When research reliability issues emerged, the American response emphasized transparency solutions: open data sharing, study preregistration, and public peer review platforms. Major funding agencies now require data management plans that promote accountability.

Interestingly, other countries have addressed similar scientific quality concerns in different ways. European approaches have relied more on institutional reforms and expert committees, while American solutions have emphasized broader participation and transparent processes.

Digital Platforms and Knowledge

Online platforms have both satisfied and complicated American expectations. They provide the transparency and diverse voices people want, but the sheer volume of information makes careful evaluation difficult. Platforms like PubPeer enable post-publication scientific review that aligns with cultural preferences for ongoing scrutiny; however, the same openness can also amplify misleading information.

Building Better Science Communication

Understanding these cultural patterns suggests more effective approaches:

Acknowledge Uncertainty: Present science as an evolving process rather than a collection of final answers. This matches realistic expectations about how knowledge develops.

Create Meaningful Participation: Include affected communities in research priority-setting and policy discussions, following successful models in patient advocacy and environmental research.

Increase Transparency: Share reasoning processes and data openly. Open science practices align well with cultural expectations for accountability.

Recognize Broader Concerns: Understand that skepticism often reflects deeper questions about who participates in knowledge creation and whose interests are served.

Moving Forward

Public skepticism toward science isn’t simply a matter of misunderstanding—it often reflects tensions between scientific institutions and cultural expectations about legitimate authority. Rather than dismissing these expectations, we might develop communication approaches that honor both scientific rigor and democratic values.

The goal isn’t eliminating all skepticism, which serves essential functions in healthy societies. Instead, it channels critical thinking in ways that strengthen our collective ability to address complex challenges that require scientific insight.

Zero-based budgeting experiment: US STEM

At research universities, zero-based budgeting is pretty rare. It means starting from zero expenditures and then justifying each budget line to reach an annual budget. It is frowned upon for long-term R&D projects for the apparent reason that it’s pretty challenging to predict a discovery that could be exploited to produce a measurable outcome.

Nevertheless, it’s worth considering using the process to optimize the entire US STEM/Biomedical enterprise from scratch.

Why Research Resists Zero-Based Budgeting

The resistance to zero-based budgeting in research environments stems from legitimate concerns. Academic institutions seldom adhere to a zero-based budget model because, as I stated above, scientific discovery is inherently unpredictable, and zero-based budgets require a significant amount of time and labor from units and university administrators to prepare, and this model can seriously encumber long-term planning.

Research requires substantial upfront investments in equipment, facilities, and human capital that only pay dividends over extended periods. The peer review system, while imperfect, has evolved as a way to allocate resources based on scientific merit rather than easily quantifiable metrics.

The Case for a National Reset

Despite these concerns, there’s a compelling argument for applying zero-based budgeting principles to the broader American STEM enterprise. Not at the individual project level, but at the systemic level—questioning fundamental assumptions about how we organize, fund, and conduct research.

Addressing Systemic Inefficiencies

Our current research ecosystem has evolved organically over decades, creating layers of bureaucracy, redundant administrative structures, and misaligned incentives. Universities compete for the same federal funding while maintaining parallel administrative infrastructures. A zero-based approach would force examination of whether these patterns serve our ultimate goals of scientific progress and national competitiveness.

Responding to Global Competition

The US still retains a healthy lead, spending $806 billion on R&D, both public and private, in 2021, but China is rapidly closing the gap. The Chinese government recently announced a massive $52 billion investment in research and development for 2024 — a 10% surge over the previous year, while the U.S. cut total investment in research and development for fiscal 2024 by 2.7%.

China had significantly increased its R&D investment, contributing over 24 percent of total global funding according to data from the Congressional Research Service, while the U.S. total remains strong, CRS data show that its share of total global expenditure dropped to just under 31 percent in 2020, down from nearly 40 percent in 2000.

Realigning with National Priorities

AI, pandemic preparedness, cybersecurity, and advanced manufacturing require coordinated, interdisciplinary approaches that don’t always fit neatly into existing departmental structures or funding categories. Starting from zero would allow us to design funding mechanisms that better align with strategic priorities while preserving fundamental research.

A Practical Framework

Implementing zero-based budgeting for the STEM enterprise could be approached systematically:

Phase 1: Comprehensive Mapping Begin by mapping the current research ecosystem—funding flows, personnel, infrastructure, outputs, and outcomes. This alone would be valuable, as we currently lack a complete picture of resource allocation.

Phase 2: Goal Setting Involve stakeholders in defining desired outcomes. What should American STEM research accomplish in the next 10-20 years? How do we balance basic research with applied research?

Phase 3: Pilot Implementation Rather than overhauling everything at once, implement zero-based approaches in specific domains or regions to identify what works while minimizing disruption.

Potential Benefits and Risks

A thoughtful application could yield improved efficiency by eliminating redundant processes, better alignment with national priorities, enhanced collaboration across institutional silos, and increased agility to respond to emerging threats.

However, any major reform involves significant risks. There’s danger of disrupting productive research programs, alienating talented researchers, or creating unintended bureaucratic complications. The political and logistical challenges would be immense.

Moreover, China has now surpassed the US in “STEM talent production, research publications, patents, and knowledge-and technology-intensive manufacturing”, suggesting that while spending matters, other factors are equally important.

Preserving What Works

Zero-based budgeting shouldn’t mean discarding what has made American research successful. The peer review system has generally identified quality research. The tradition of investigator-initiated research has fostered creativity and serendipitous discoveries. The partnership between universities, government, and industry has created a dynamic innovation ecosystem.

The goal isn’t elimination but examination of whether these elements are being implemented most effectively.

Conclusion

The idea of applying zero-based budgeting to American STEM research deserves serious consideration. By questioning assumptions, eliminating inefficiencies, and realigning priorities, we can create a research enterprise better positioned to tackle 21st-century challenges.

The process itself—careful examination of how we conduct and fund research—could be as valuable as specific reforms. In an era when Based on current enrollment patterns, China is projected to produce more than 77,000 STEM PhD graduates per year compared to approximately 40,000 in the United States by 2025, representing nearly double the US output., the ability to thoughtfully reimagine our institutions may be our greatest asset.

The question isn’t whether we can afford to undertake such a comprehensive review. The question is whether we can afford not to.