The Coordination Problem

What the geometry of a room taught me about the architecture of American science

Nearly two decades ago, from the mezzanine of the Krasnow Institute for Advanced Study, I had an accidental view into the federal government’s central nervous system and its central pathology.

Below me, perhaps sixty deputy-level civil servants had gathered for a reception following a day of lightning talks. These are the people who run the federal science enterprise between administrations: the Senior Executive Service officials and GS-15s who preserve institutional memory, make quiet go/no-go decisions that shape enormous investments, and keep the machine running while political appointees cycle in and out every four years. I had helped organize the day. I should have been down there working the room. Instead, I found myself watching.

The geometry was diagnostic.

At opposite diagonals of the square great room, the military establishment and the health establishment had each formed their own gravitational field. Uniforms clustered near the windows on one side: NIH and FDA people near the bar on the other. In the center, NSF, the National Labs, and NASA circulated politely among themselves, moving between the two poles without quite bridging them. In an hour of watching, I did not see a single sustained conversation cross the diagonal. People talked past one another in the most literal sense — bodies angled slightly away, addressing their own.

We had brought them together to discuss the possibility of a very large, coordinated federal investment in neuroscience. What would eventually become the BRAIN Initiative had been conceived by a group of about eight academics, and we had started planning before the 2008 election: something worth pausing on. Designing a major scientific program before you know who the president will be requires a particular kind of institutional literacy. You’re not lobbying a specific administration. You’re trying to shape the landscape that any incoming administration will find. The goal was to socialize the concept across agencies before new leadership arrived: to get the deputies thinking in the same direction, because we understood something easy to forget: political will at the top is necessary but nowhere near sufficient. The deputies would have to cooperate. And cooperation, as the mezzanine made clear, was not their natural state.

What the room taught us was that the coordination problem was worse than we had imagined. These were talented, dedicated people who understood their own domains with real depth. They were not cynical or lazy. They were not obstructionist in any simple sense. But their domains didn’t speak to each other, and neither did they when placed in an unstructured social setting.

What we eventually learned and what the BRAIN Initiative’s structure was specifically designed to reflect is that interagency coordination on a scientific bet of this scale doesn’t emerge from shared interest, goodwill, or a well-intentioned convening. It requires a power center above the agencies, willing to spend real political capital making it happen. In our case, that meant the White House. Not a memo from the White House. Not a task force. Not a working group with a White House name attached. Active, sustained leadership from the Office of Science and Technology Policy, with the explicit backing of the President, is publicly committed.

That architecture is harder to build than it sounds, and easier to dismantle than anyone would like.

To understand why the view from the mezzanine looked the way it did, it helps to understand what happens inside federal agencies over decades and why the standard explanations for bureaucratic dysfunction miss the point.

The political science literature on interagency conflict tends to reach for two explanations: turf protection and bureaucratic inertia. Both are real. Neither is primary. The deeper explanation is cultural, and it runs much further down.

Large federal science agencies are not bureaucracies in the pejorative sense. They are genuine intellectual cultures, with their own histories, epistemologies, and ways of assigning value to scientific work. NIH thinks in terms of disease mechanisms, and the pathway to clinical translation — its entire grant review apparatus, its study section culture, and its publication norms are organized around proximity to the patient. NSF thinks in terms of fundamental discovery and the long-term health of scientific disciplines; it is institutionally suspicious of applied framing and has spent decades defending basic research against short-term thinking in Congress. The Department of Energy thinks in terms of national security infrastructure and large-scale systems; its scientific culture comes out of the weapons labs and reflects a tolerance for massive, centrally managed projects that neither NIH nor NSF can quite replicate. DARPA is a different animal still — structurally flat, program-manager-driven, deliberately averse to peer review — existing in permanent tension with the academic science norms that dominate NIH and NSF.

These are not trivial differences in vocabulary. They reflect decades of accumulated mission, shaped by the constituencies each agency serves, the appropriations subcommittees that fund them, and the scientific communities that define their identity and police their boundaries. A neuroscientist trained in the NIH system and one trained in the NSF system have, in many cases, genuinely different intuitions about what good science looks like. They aren’t wrong to have those intuitions. The problem is that cultures, once formed, defend themselves — not cynically, but almost automatically, the way an immune system responds to foreign tissue.

Consider a seemingly simple question that arose during the early planning for what would become the BRAIN Initiative: where should federally funded neuroscience publications and data live? What cyberinfrastructure would anchor a shared repository?

NIH’s answer was immediate: PubMed. They had built it, maintained it, and trusted it. It was already the de facto home for biomedical literature globally. That it was showing its age as a platform and that a genuinely new large-scale data infrastructure might warrant a genuinely new architecture were beside the point from NIH’s perspective — it worked, it was theirs, and extending it was the obvious path.

NSF’s answer came just as quickly, pointing in an entirely different direction: use the Department of Energy’s platform. Whether this reflected a genuine assessment of DOE’s technical capabilities in large-scale data infrastructure — and those capabilities were real — or a reflexive resistance to being absorbed into NIH’s scientific ecosystem was never entirely clear. Probably both. The two things are hard to disentangle from the inside.

What was clear was that no one in either room was discussing compromise, nor was anyone discussing what the scientific community needed from a shared repository. The conversation was about which agency’s infrastructure would anchor the enterprise, which meant which agency’s culture would shape it, which metadata standards would prevail, which access controls would govern it, and whose bureaucratic stamp would appear on a decade of scientific output. The White House, watching from above, grew visibly frustrated. Here were two agencies that shared a nominal commitment to advancing American science, deadlocked over a question any genuinely neutral party might have resolved in an afternoon. But there was no neutral party. There was only the accumulated weight of two institutional identities, each pulling toward its own gravity.

This is what silos look like from the inside. Not obstruction. Not incompetence. Just two organizations being, with perfect fidelity, exactly what they had each spent decades becoming. The tragedy of it is that the people involved often know it’s happening and can’t stop it anyway.

The PubMed standoff faded, as many such disputes do, into institutional stalemate — each agency continuing to do what it had always done, the shared infrastructure question deferred rather than resolved. No one had been explicitly asked to back down. No one had been forced to. The question simply became too costly to revisit.

The Biden Cancer Moonshot episode was different in kind, and I was there.

The scientific logic was almost painfully straightforward. The Department of Defense maintains medical records, including biological samples collected at peak physical condition, for millions of young, healthy service members, who are among the most comprehensively documented populations in the country. The Veterans Administration maintains longitudinal health records on many of those same people decades later, including those who developed cancer in the intervening years. Linking those two datasets would give cancer researchers something extraordinarily rare: a biological baseline matched to long-term health outcomes, at scale, across a population of millions. The potential to identify early biomarkers, track environmental and occupational exposures, and understand the gap between apparent health and latent disease was enormous. This was not a speculative idea. The scientific community had been pointing at this dataset for years.

Biden wanted it done. He had staked his political identity on the Cancer Moonshot. Senior officials from both DOD and the VA were in the room. Their answer, expressed through lawyers and deputy secretaries rather than principals, was no.

The refusal came wrapped in legal and technical language: privacy regulations, incompatible record architectures, classification concerns, HIPAA obligations applied in ways that seemed to expand whenever the conversation moved toward a specific plan. These were not entirely fabricated objections. The legal and technical barriers were real enough to be inconvenient. But they were not the real structure underneath. The real structure was territorial: two agencies, each with its own medical infrastructure, its own relationships with its patient population, its own institutional identity built around serving a defined constituency, being asked to subordinate that identity to a shared project they hadn’t designed, didn’t control, and couldn’t shape. The language of legal compliance had become the language of institutional resistance, dressed in a costume that made resistance look like responsibility.

Biden pushed back. With visible anger. This is worth pausing on: a Vice President of the United States, sitting in his own office, pressing senior officials with the kind of controlled fury that comes from watching something obviously right fail in real time, because two agencies couldn’t agree to share what they already had. The frustration wasn’t abstract. He understood exactly what was in those records and exactly why they weren’t being linked, and he was watching it happen in front of him anyway.

Eventually, under that sustained pressure, the agencies moved toward compliance. Plans were made. Commitments were extracted. Progress, of a kind, was achieved.

But note what it took. Not a policy directive. Not a new framework. Not a cross-agency working group. A Vice President of the United States, in a room, pressing senior officials personally and with considerable force. And note also what happened the moment that pressure lifted: the agencies returned to being exactly what they had always been, because the underlying structure had not changed at all. The legal and technical barriers had dissolved under sufficient political heat. Which meant they had never quite been the barriers they appeared to be, but the institutional logic that had generated them was entirely intact.

This is the second face of the coordination problem. The first is passive: agencies that simply don’t interact, like the clusters in the Krasnow great room, or NIH and NSF talking past each other about platforms, because talk’s their natural mode. The second is active: agencies that resist coordination when it threatens their domain and reach for procedural language to make that resistance look like something other than what it is.

The passive version is frustrating. The active version is dangerous because it is nearly invisible. It doesn’t look like resistance. It looks like due diligence.

What eventually worked was a structure we had begun designing before most of the relevant agencies knew the BRAIN Initiative existed as a concept. The eight or so of us who conceived it understood from the beginning that the science was the tractable part. Mapping the functional connectome of the human brain is hard. The coordination architecture was harder.

The core insight was simple, even if executing it was not: no agency would willingly subordinate itself to another. This is not a character flaw. It is a structural feature, almost a law of institutional physics. Any architecture that handed one agency primacy over the others, even for good programmatic reasons, would fail before it started, for exactly the reasons the PubMed dispute illustrated. The losing agency would not simply accept the outcome. It would use every procedural and legal tool available to relitigate, delay, and hollow out the shared project until it resembled the losing agency’s preferred alternative.

The Biden model, relying on a senior official’s personal fury, was not a system. It was a workaround that depended entirely on the political will, personal energy, and continued engagement of a single powerful individual who had approximately ten thousand other things demanding attention. That’s not architecture. That’s heroics, and heroics don’t scale and don’t persist.

The only authority above the agencies that was not one of them was the White House itself. So, we designed around that.

The BRAIN Initiative was structured from the outset with active OSTP leadership. Not OSTP as a convener or facilitator, roles that are easy to ignore, but OSTP as a genuine power center with the explicit backing of the President and a mandate that the agencies understood was real. Interagency working groups were stood up with White House participation, which changed the political valence of every meeting. When NIH, NSF, DARPA, the National Labs, and DOE sat down together, they were no longer negotiating as peers protecting their own turf in a vacuum. They were operating under a shared mandate from above: one that came with an audience, because the President had announced the initiative publicly and tied his name to it.

This last piece was not incidental. It was load-bearing. Obama’s public commitment changed the incentive structure in a specific way: failure to coordinate was no longer just an internal bureaucratic inconvenience, invisible to outsiders and costless to the agencies involved. It was a visible failure in a project the President had claimed ownership of, in a domain (neuroscience and brain disease) that commanded broad public sympathy. An agency that stonewalled the BRAIN Initiative was not just slowing down a program; it was undermining it. It was creating a political liability for the White House. That kind of exposure concentrates minds in ways that no amount of memo-writing or task-force-convening can replicate.

The territorial instincts didn’t disappear. They were overridden. There’s a crucial difference. An instinct that is merely overridden will reassert itself the moment the override lifts. But consistently overriding it across multiple decision points and over multiple years can create precedents, working relationships, and shared infrastructure that outlast any individual’s political engagement. The goal was never to eliminate the silos. It was to build enough scaffolding above them that the work could proceed despite them, and that, over time, the scaffold itself would become part of the institutional landscape.

None of this was accidental, and none of it was obvious at the time. It was designed by a small group of scientists who had spent enough time studying the geometry of these rooms to understand what it required.

I tell these stories now for reasons that go beyond neuroscience history.

The conditions that made the BRAIN Initiative’s coordination architecture work are precisely the ones most difficult to sustain and easiest to dismantle. They require a White House that understands interagency coordination as a design problem, not a personnel problem — not something you fix by installing the right people, but by building the right structure and maintaining it actively. They require OSTP to function as a genuine scientific leadership office rather than a ceremonial one. They require a President willing to publicly and visibly stake political capital on a specific scientific program, so that agencies feel the cost of non-compliance.

Large scientific bets, the kind that require a decade of sustained investment, genuine coordination across agencies with different cultures and different constituencies, and a tolerance for uncertainty that normal appropriations logic resists, are exactly the kind of programs most vulnerable to the coordination problem. Neuroscience was one. Pandemic preparedness is another. Climate modeling. Nuclear fusion. Quantum computing. The list is long, and what unites them is that no single agency can do them alone, and the gap between what a single agency can do alone and what a well-coordinated federal enterprise could do is, in many cases, the difference between success and failure.

The mezzanine view is still available to anyone who looks. The question is whether the people who need to understand what it shows are in any position to act on what they see.

The Chronology Problem

How our bias towards recency in scientific discovery hurts our understanding

The white lab rat moved towards the food dispenser. It evidently heard the tone and correctly interpreted its meaning. The ten electrodes implanted deep within its brain were each recording the fingerprints of multiple neurons at millisecond time resolution—key to deciphering the neural code. A wireless transmitter relayed the massive data train to an analog-to-digital converter, where it was fed into a computer, first sorted by neuronal fingerprint and then collated and curated to make sense for the human experimenters. The year was 1975. The computer was a DEC PDP-8/E minicomputer—about as big as a dorm fridge and five orders of magnitude less powerful than your smartphone.

We are surprisingly bad at knowing when things began.

I’ve been thinking about this for a while, partly because I lived through several of the transitions we now misremember. In 1987, I used the Internet for early text-based email, file transfers, and reaching colleagues at other universities. In August of 1991, in the face of an impending direct hit of Hurricane Bob, I moved all of my image data from Woods Hole to NIH in Bethesda in a matter of minutes. This was entirely unremarkable at the time. And yet when I mention it today, people often look mildly startled, as if I’ve claimed to have owned a smartphone in 1987. In their minds, the Internet began sometime around 1994 or 1995, when the Web arrived and made it visible to everyone. Before that, apparently, there was nothing.

But of course, there was something. There was a rich, functional, and genuinely useful network that predated the Web by decades. And invented at the same time as the Web was Gopher, an ancient app for navigating and retrieving documents that worked elegantly and simply before the Web achieved wide public adoption. There were mailing lists, FTP archives, Usenet — an entire ecology of networked communication that the Web didn’t replace so much as it superseded, in the way that online streaming superseded television programming. TV is still there if you know where to look. Most people don’t look.

This isn’t just a historical curiosity. When we misplace the origin of a technology, we lose something important: our understanding of why it evolved the way it did. The Web wasn’t designed in a vacuum. It was a solution to specific problems regarding the use of hyperlinks to navigate the nascent Internet. The decisions Tim Berners-Lee made in 1989 were shaped by what already existed. If you don’t know what already existed, you can’t understand those decisions. You inherit the outcomes without understanding the tradeoffs. And some of what was traded away was worth keeping. The now-forgotten contemporary of web browsers, Gopher, was simple and decentralized. And this looks appealing again now that we’ve seen where social media and commercialization have taken us.

The same logic applies in science. The experiment I described at the top of this piece was real and took place in a Caltech lab. Multi-electrode neural recording, wireless transmission, real-time spike sorting — these capabilities existed fifty years ago. The folks doing that work understood aspects of the neural code in the context of learning and memory that are often invisible to current neuroscience trainees, partly because the papers were published a long time ago and because all of science is biased towards the most recent, shiny things. The finding doesn’t disappear. It just becomes functionally unavailable.

The field of artificial intelligence may be the most dramatic case study in collective chronological confusion we have. Most people who interact with today’s language models and image generators believe they are witnessing something genuinely unprecedented — a technology that sprang into being sometime around 2017. What happened is more complicated and more interesting.

The mathematical foundations for neural networks were laid in 1943, when Warren McCulloch and Walter Pitts published a paper describing how neurons could, in principle, compute logical functions. Frank Rosenblatt simulated a working perceptron at the Cornell Aeronautical Laboratory in 1958 — a system that could learn from examples. The 1986 backpropagation paper by Rumelhart, Hinton, and Williams, which most practitioners treat as a founding document, was itself a rediscovery and refinement of ideas that had been circulating since the early 1970s. Yann LeCun was training convolutional neural networks to read handwritten digits for the U.S. Postal Service in 1989. The architecture underlying those systems is recognizably the ancestor of what powers modern computer vision.

None of this was secret. It was published, presented, and in some cases deployed in real systems. What happened instead was a kind of institutional forgetting, accelerated by two “AI winters” — periods when funding dried up, interest collapsed, and computer science turned its attention elsewhere. Researchers who had spent careers on neural approaches moved on or retired. Graduate students who might have built on their work were instead trained in other paradigms. When the hardware finally caught up with the ambitions of the 1980s, around 2012, the rediscovery felt like a revolution. In some ways, it was. But the conceptual foundations were not new, and the people who had laid them got less credit than they deserved, partly because so many of the field’s new practitioners didn’t know they existed.

The practical cost here is the same as elsewhere: repeated investment in problems that had already been partially solved, frameworks that were novel mainly to their authors, and a set of origin myths that flatter the present at the expense of the past. The deeper cost is that we don’t understand what was tried and discarded and why — which algorithms were abandoned for reasons of computational expense rather than theoretical inadequacy, and which might be worth revisiting now that the expense has fallen.

Climate science offers a different version of the same problem — one with considerably higher stakes. The standard cultural narrative places the discovery of anthropogenic climate disruption sometime in the 1980s, anchored perhaps by James Hansen’s 1988 Senate testimony, or by the formation of the IPCC. If you read serious journalism about the climate, you might push it back to the 1970s. If you are diligent, you might encounter the Keeling Curve, which has been tracking atmospheric CO₂ from Mauna Loa since 1958.

The scientific recognition of the greenhouse effect and its potential consequences for global temperature dates to 1896. That year, Svante Arrhenius published a paper in which he calculated, with considerable accuracy, how much warming a doubling of atmospheric CO₂ would produce. He arrived at a figure somewhere between 5 and 6 degrees Celsius — higher than modern estimates, but in the right direction and for the right reasons. He then speculated, in print, that industrial combustion might one day alter the atmosphere’s composition enough to matter.

This was not forgotten in the way a 1975 neuroscience paper was. Arrhenius was a Nobel laureate; his work was well-known. What happened instead was that the question was considered, examined, and provisionally set aside — partly because mid-century scientists underestimated how rapidly fossil fuel consumption would grow, and partly because they assumed the ocean would absorb most of the excess carbon. These were empirical mistakes, not failures of reasoning. The framework was sound. The inputs were wrong.

What we lose when we date climate science to Hansen or to the IPCC is the understanding that this is not a young field with tentative conclusions. The core physics has been understood for over a century. The measurement of its consequences has been underway for nearly seventy years. When people argue that science is “still developing” or “too uncertain to act on,” they are often unconsciously drawing on a mental model in which the field is young and its conclusions preliminary. Knowing the actual timeline does not resolve all the uncertainties — science is always uncertain at its leading edge. But it changes how you should reason about the weight of evidence.

Economics has its own version of this confusion, though the consequences are harder to tabulate. The efficient market hypothesis is widely understood to have originated in the 1960s with Eugene Fama. The idea of index fund investing — holding the market rather than trying to beat it — is associated with John Bogle and the first retail index fund, launched in 1976. The behavioral critique of rational actor models, which demonstrated systematically that real human beings make predictable and consistent errors in judgment, is credited to Kahneman and Tversky’s work from the early 1970s.

All of this is broadly correct as a matter of attribution. What gets lost is the prior landscape of ideas these researchers were responding to. The observation that markets were difficult to beat systematically appeared in Louis Bachelier’s 1900 doctoral thesis on the mathematics of speculation — work so ahead of its time that it was largely ignored until Paul Samuelson encountered it in the 1950s and recognized what it contained. The psychological research on judgment and decision-making that Kahneman and Tversky formalized was in some respects a rigorous extension of observations that Herbert Simon had been making since the 1950s under the heading of “bounded rationality” — the recognition that human cognition operates under constraints that classical economics had simply assumed away.

Simon won the Nobel Prize in Economics in 1978. Kahneman won his in 2002. The ideas are connected clearly. And yet the field repeatedly had to be reminded that people are not rational actors, as if this were a new finding rather than a conclusion established, contested, partially absorbed, and then re-established over the course of half a century. Each rediscovery brought energy and refinement. But it also brought the inefficiency of not quite knowing what had been tried before.

This is the practical cost of chronological confusion: we reinvent. We pour resources into solving problems that are already solved, we fund theoretical frameworks that are novel mainly to us, and we write introduction sections that inadvertently misrepresent the state of the field by simply not knowing what came before the Internet made everything searchable.

But there’s a subtler cost, too. When we don’t understand how a technology or a scientific field evolved, we become poor navigators. We don’t know which roads were tried and abandoned and why. We don’t know which detours led to unexpected places. We can’t reason well about where to push next, because we don’t have an accurate map of where we’ve already been.

There is also a political cost, which the climate case makes vivid. When the historical depth of a finding is obscured, it becomes easier to argue that the finding itself is uncertain or contested. The chronological error licenses a kind of epistemic innocence: we can treat as open questions things that have, in fact, been largely closed for a long time. This is not a problem unique to climate science. Wherever institutional memory is thin, motivated actors can exploit the gap between what has been established and what is widely understood to have been established.

Technological and scientific genealogy isn’t nostalgia. It’s a form of rigor. The rat in the 1975 experiment knew something. So did the Caltech scientist, looking at the brain recordings. Arrhenius knew something in 1896. Bachelier knew something in 1900. Rosenblatt’s perceptron knew something in 1958. We could stand to know it, too.

Below Minimum Wage: The System We Built and the System We’re Losing

Two years before the Berlin Wall fell. It was just after 3 in the morning. At my lab bench, I was preparing samples for calculating a blood glucose curve in one of the early brain imaging studies. Across from me, my grad student colleague was extracting DNA for his work on the molecular basis of neurodegeneration. We were working in the Neuroscience Laboratory Building (now long extinct). It was the former student food services base for the University of Michigan, an irony I never really got over. It was mid-winter in Ann Arbor. Slush ruled the streets. When the day arrived in four hours, we could be sure the skies would be gray.

Suddenly, K. slammed down his pipetter and exclaimed, “I’m going to talk to the Boss tomorrow! I just figured it out, we make less than minimum wage!”

The calculation was straightforward. Our stipend was maybe $7,000 a year, with tuition covered. We worked—conservatively—60 hours a week, often more. Factor in the 3 AM sessions, the weekend tissue preparations, and the endless equipment maintenance that somehow became the grad students’ responsibility. Do the math: roughly 3,100 hours per year, $7,000 total. About $2.25 per hour—a third less than the 1987 minimum wage of $3.35—to do cutting-edge neuroscience in a converted cafeteria food prep building.

K. did talk to the Boss the next day. I don’t know exactly what he expected—acknowledgment, perhaps, or some explanation of how this was a temporary sacrifice for future reward, or at minimum an expression of concern about the system we were trapped in.

What he got was simpler: “Why should I worry? I’ve got a nice car. I’ve got nice clothes.”

The Divergence

K. and I responded to that moment differently, though we both understood its implications with perfect clarity.

K. finished his PhD. Then he left research entirely. He’s now a practicing radiologist—work that pays substantially more than minimum wage, has defined schedules rather than 3 AM obligations, and doesn’t require pretending that exploitation is training.

I stayed. Not because I had some moral superiority or different principles. I stayed because I was too committed to getting my doctorate at that point. I’d tried blue-collar work before graduate school, and I didn’t want to do that. The sunk costs were real—years invested, experiments underway, a thesis taking shape. Walking away would mean admitting those years at $2.25 per hour had purchased nothing.

The samples I was preparing that night at 3 AM were for quantifying local cerebral glucose utilization using autoradiography. The data would contribute to my PhD thesis on cerebral metabolic variability. It was genuinely interesting work—understanding how the brain’s energy consumption varies spatially could inform everything from imaging diagnostics to our understanding of neurological disorders.

But it was work being done by someone making $2.25 per hour, with no leverage, no bargaining power, and no alternative but quitting. The quality of the science didn’t change the economics. If anything, the importance of the work made the exploitation easier to rationalize: we were suffering for something that mattered.

K. made a rational choice. He extracted himself from a system that valued his labor at below minimum wage and found work that valued it appropriately. He’s probably had a better work-life balance, made more money, had more control over his time, and still contributed to human welfare through medical practice.

I made a different calculation. I stayed in the system, finished the PhD, did a postdoc (at similarly exploitative wages), and eventually built a career as an academic that culminated in serving as NSF’s Assistant Director for Biological Sciences, from the bottom of the exploitation to administering the funding system that perpetuated it.

The View from the Other Side

Fast forward to 2014-2018. I’m now at NSF, overseeing hundreds of millions in biological research funding. I visit grant review panels regularly—watching as distinguished scientists evaluate proposals, debate scientific merit, and argue about which projects deserve support in a constrained budget environment.

And the panelists complain. Not about the science—they’re excited about the research. They complain about the funding decisions: why do we fund these projects and not others? Why these amounts? Why can’t we support more graduate students? Why are stipend levels what they are?

I’m sitting there thinking about 3 AM in 1987, about K.’s calculation, about the Boss’s nice car and nice clothes. And I’m the one explaining the constraints now. Limited budgets. Many worthy proposals. Tough choices. The same justifications, delivered more professionally than “why should I worry,” but fundamentally the same message: the system is what it is.

The irony wasn’t lost on me. I remembered pipetting at 3 AM. I remembered the calculation. I remembered the casual indifference to exploitation. And now I was administering fundamentally the same system, just with better rhetoric.

Here’s what hadn’t changed in those 27 years: the basic model of graduate STEM training still rested on extracting maximum labor at minimum cost, justified as “training” rather than employment. Stipends had risen nominally but not dramatically in real terms. The hours hadn’t decreased—if anything, competitive pressure had intensified. The power imbalance remained: PIs controlled everything, and students had no recourse.

If I could have redesigned the system from scratch, I would have created something different: fewer graduate students, higher wages, and much better mentoring. Quality over quantity. Living wages over exploitation—professional development over just-in-time labor.

But that’s not what happened. Instead, the system expanded. More grad students, more postdocs, more soft-money positions, all built on the same below-minimum-wage foundation, just scaled up. We produced more PhDs chasing fewer permanent positions, intensifying competition at every level.

Why did it persist? Because it worked—not for the individuals trapped in it, but for the system itself. The model produced science. Papers got published. Grants got renewed. PIs advanced. Institutions collected overhead. The fact that it ran on exploitation was a feature, not a bug. It selected for people willing to accept it (like me) and filtered out those who wouldn’t (like K.).

And those of us who accepted it, who succeeded despite it, who rose through it—we administered it. We knew it was broken. We’d done the math ourselves. But we had competing obligations: limited budgets to allocate, scientific priorities to balance, and institutional constraints to navigate. Fixing the exploitation model wasn’t in our remit. Our job was to distribute resources within the system as it existed.

The System’s Logic

The defense of graduate student stipends—if anyone bothered to make one explicitly—would go something like this:

“It’s training, not employment.” Students are learning, not working. The stipend is support to enable education, not compensation for labor. Never mind that the “training” produces publishable research, grant-supported data, and intellectual property that belongs to the institution. Never mind that without graduate student labor, most academic research would halt.

“Everyone goes through it.” This is the initiation ritual, the paying of dues, the sacrifice that earns you entry to the profession. I suffered at $2.25 per hour; the Boss probably suffered at similar rates, and you suffer too. The hazing justifies itself through tradition.

“The payoff comes later.” Yes, current compensation is terrible, but you’re investing in future earnings. The PhD opens doors. Except that it doesn’t, not reliably. The academic job market is brutal. Industry positions often don’t require a PhD. And many of those doors lead to postdocs—more exploitation at slightly higher rates.

“You’re doing what you love.” This is the passion tax: because you find the work intrinsically rewarding, because you’re intellectually engaged, because you care about the science, you should accept compensation far below market value. Your enthusiasm is exploitable.

“The alternative is worse.” No funding means no graduate programs means no research training means no next generation of scientists. We’re doing the best we can with limited resources. Which might be true, but doesn’t change the mathematical reality: $2.25 per hour is exploitation regardless of budget constraints.

None of these arguments would satisfy an outside observer. They barely satisfied those of us inside the system. But they were sufficient to maintain the equilibrium because both sides had reasons to accept it. Students needed credentials. PIs needed labor. Institutions needed productivity. Everyone was complicit.

The system persisted because it was stable—not fair, not optimal, but stable. An equilibrium based on asymmetric power: PIs had alternatives (they could recruit new students), students didn’t (switching programs meant losing years of work). That asymmetry meant PIs could extract labor at $2.25 per hour, and students would accept it.

K.’s confrontation with the Boss revealed this clearly. The Boss wasn’t defending the system or explaining its necessity. He was simply observing that it didn’t affect him negatively. Nice car. Nice clothes. Why should he worry? The graduate students’ misery wasn’t his problem.

That’s the logic of exploitation: those who benefit from it don’t experience its costs, so they have no incentive to change it. And those who bear the costs have no power to change it. The system perpetuates.

The International Contrast

It’s worth noting that this isn’t how all countries approach graduate STEM training.

In Germany, PhD students are employees with contracts, salaries, and benefits. They’re part of the research staff and are compensated as such. The fiction of “training not employment” doesn’t work there—if you’re doing research work, you’re paid for research work.

When I’d present at international conferences during my NSF tenure, European colleagues would sometimes ask about American graduate training. When I explained the stipend levels and working conditions, the response was consistent surprise. “How do your students survive?” they’d ask.

The answer: barely, and many don’t.

The American model—long programs, low stipends, no benefits, complete PI control—isn’t universal. It’s a choice, defended by inertia and rationalized by those who succeeded within it. Other countries produce excellent science without requiring graduate students to work for below-minimum-wage wages. We could too, if we wanted to.

The Berlin Wall Moment

Two years before the Berlin Wall fell, K. and I were pipetting at 3 AM. The Wall seemed permanent then—an ugly fact of geopolitics, stable if not good. Systems that appear unshakable can collapse suddenly when their contradictions become unsustainable.

We’re in that moment now with American science.

The 2025 funding cuts aren’t routine budget tightening. They’re not temporary political fluctuations that will reverse with the next election. They represent something different: a fundamental questioning of the compact between government and science that has sustained American research since Vannevar Bush’s endless frontier.

More than 7,800 grants canceled or suspended at NIH and NSF. Billions in unspent funds frozen. Thousands of researchers terminated or leaving the country. Universities cutting graduate admissions, eliminating postdoc positions, restructuring programs. The infrastructure we spent 75 years building is being dismantled.

And here’s the uncomfortable question: Should we fight to rebuild it exactly as it was?

That system—the one now under assault—was the system where graduate students made $2.25 per hour, where the Boss had a nice car and nice clothes and didn’t worry, where exploitation was rationalized as training, where we produced too many PhDs for too few jobs and called it a pipeline problem rather than a design flaw.

The system produced important science. My thesis work on cerebral metabolic variability contributed to understanding brain function. K.’s work on neurodegeneration might have led somewhere if he’d stayed. The research mattered. But it mattered that, while being built on exploitation, everyone involved understood and accepted it.

Now external force is breaking the system. Not because we collectively decided to reform it. Not because we recognized its flaws and chose differently. But because political power decided that science funding was a convenient target for leverage and cuts.

The question facing us isn’t whether the cuts are bad—they are. It’s not whether we should oppose them—we should. The question is: when we argue for restoration of science funding, what are we arguing to restore?

The System We Could Build

If we’re going to rebuild American science from this moment of crisis, we could choose differently.

Fewer graduate students, better compensation. Instead of admitting cohorts of 20 students to work as cheap labor, admit cohorts of 10 and pay them living wages. Fund fewer projects but fund them properly. This would require PIs to do more of their own work or hire professional staff, which would be appropriate, since it’s their research program.

Limited time-to-degree with guaranteed support. If a PhD genuinely takes five years, fund all five years from admission. No scrambling for RA positions. No anxiety about whether your PI’s grant will renew. No leverage for PIs to extract extra years of cheap labor by withholding degrees.

Professional development is a core mission. Graduate programs should be about training the next generation of scientists, not just producing data for current PIs. That means mentoring, career development, and skill-building beyond bench work. It means treating students as early-career professionals, not disposable labor.

Portable funding. Rather than money going to PIs who then allocate it to students, fund students directly through fellowships and training grants. This shifts power dynamics—students choose labs based on training quality, not desperation for any funding source.

Employment status with benefits. Stop the fiction that graduate students are just students. They’re researchers doing work that produces value. Compensate them as such, with real salaries, health insurance, retirement contributions, and labor protections.

Honest accounting of opportunity costs. A PhD takes 5-7 years, which are prime earning years. The compensation should reflect that cost. If we can’t afford to pay graduate students fairly, maybe we shouldn’t be running programs that require exploiting them.

This isn’t radical. It’s how many other countries already operate. It’s what we could build if we chose to prioritize quality over quantity, people over productivity, and sustainability over short-term extraction.

But building this requires admitting that the old system was fundamentally flawed, not just under-resourced. It requires PIs to accept they can’t run labs of 15 people on the cheap. It requires universities to acknowledge that graduate programs shouldn’t be profit centers via overhead. It requires funding agencies to insist on fair labor practices as grant conditions.

Most of all, it requires breaking the cycle where those of us who succeeded by enduring exploitation then administer systems that perpetuate it. The fact that we survived at $2.25 per hour doesn’t make it acceptable. The fact that we built careers despite the system doesn’t mean others should have to do the same.

The Reckoning

I’m still in touch with K. He’s doing fine—radiologists make good money, have reasonable schedules, and contribute meaningfully to patient care. He saw the system clearly, did the math, confronted the Boss, got an honest answer, and made a rational choice to exit.

I made a different choice. I stayed. I succeeded. I administered. And now I’m watching the system I succeeded within face potential collapse, and I’m wrestling with complicated feelings about that.

There’s grief—genuine grief—for what’s being lost. Brilliant research programs shut down mid-stream. Talented scientists are leaving the country—graduate students whose training is disrupted. The accumulated infrastructure of American scientific excellence is under assault.

But there’s also—if I’m honest—something else. A recognition that the system we’re grieving was deeply flawed. That its excellence was built on exploitation. Those of us who rose through it had obligations to fix it, and we didn’t. We knew better—K.’s calculation proved we knew better—but knowing better didn’t translate to doing better.

When we fight to restore science funding—and we should fight—we need to be clear about what we’re fighting for. Not restoration of the exploitation model. Not rebuilding the $2.25-per-hour wage. Not recreating the power imbalances that let PIs accumulate nice cars and nice clothes while graduate students pipetted at 3 AM.

We should be fighting for something better: a system that produces excellent science while treating the people who produce it as valuable professionals rather than exploitable labor. A system where the next generation doesn’t have to choose between career aspirations and basic dignity. A system where doing the math doesn’t lead to the conclusion that you’re being exploited, because the math actually works out fairly.

What I Would Tell My Younger Self

If I could go back to that 3 AM moment in 1987, what would I say?

I wouldn’t tell younger-me to quit. The PhD mattered. The work mattered. The career I built was meaningful. I don’t regret staying.

But I would tell younger-me that K. was right. Not just about the $2.25 per hour—that was obviously correct mathematically. But about the fundamental point: the system was designed to extract maximum value while providing minimum compensation, and that design wasn’t accidental or temporary or likely to change through individual complaints.

I would tell younger-me that succeeding within an exploitative system doesn’t validate the system. That making it to the other side doesn’t mean the journey was necessary or appropriate. That future responsibility comes with having survived—responsibility to change things for those who come after.

I would tell younger-me to remember that moment, that calculation, that casual indifference, and to let it inform every decision about how science should be organized and funded and sustained. That when you have power later, you use it differently than it was used against you.

And I would tell younger-me that systems that seem permanent—like the Berlin Wall, like the graduate student exploitation model—can collapse suddenly when their contradictions become unsustainable. That the question is always what we build afterward, whether we repeat the same mistakes or choose differently.

The Choice Ahead

We’re at that Berlin Wall moment now. The old system is breaking. What comes next is undetermined.

We could fight to restore exactly what we had: the funding levels of 2024, the program structures we’re familiar with, the career paths we know. We could rebuild the $2.25-per-hour model, just with better marketing and more rhetoric about the nobility of sacrifice for science.

Or we could acknowledge that the crisis creates opportunity. That when systems break, we can build better ones. That American science doesn’t have to rest on exploitation to produce excellence.

Four decades after K. slammed down his pipetter and did the math, the system he calculated is facing its reckoning. Those of us who survived it, who succeeded within it, who administered it—we bear responsibility for what comes next.

We can rebuild exploitation with better PR. Or we can build something actually better.

K. figured out we made less than minimum wage. The Boss explained why that didn’t matter to him. And the system rolled on for nearly four decades.

It won’t roll on much longer. The question is what replaces it.

When we rebuild American science—and we will rebuild it—we should build it for people like K. and younger-me, not for people like the Boss. We should build it so the math works out differently. So the response to “we make less than minimum wage” is horror and reform, not nice cars and nice clothes.

The Berlin Wall fell. The system breaks. What we build next is our choice.

Let’s choose better.

The Hypothesis Trap

When Scientists Fall in Bad Love With Their Own Ideas

Approximately four decades ago, I became a witness in a scientific misconduct case. The charges had been brought by an international postdoc in the lab where I had also worked before moving on, and I cannot remember many of the details, except that my written testimony stated that I knew nothing. But I do remember, in the context of more recent high-profile cases, that the essence of the accusation then was the same as it is now: altering experimental data to support the ‘party line’.

The recent disruption to American science has been extensively documented. Given how deeply intertwined government research dollars are with the budget models for R1 universities and the large academic medical centers, it’s not surprising that those funds were chosen for their leverage, and that the consequence of their being in jeopardy will profoundly alter the course of pursuing Vannevar Bush’s version of the endless frontier.

But I want to explore a different question raised by that long-ago case. When I recall that the essence involved “altering data to support the party line,” I need to ask: whose party line was it? In that case, and in many since, the party line wasn’t imposed by some external authority. It was the PI’s own hypothesis, their pet theory, the idea they’d invested years in developing and defending. The fraud wasn’t about serving power—it was about rescuing a cherished belief from contradictory evidence.

This raises uncomfortable questions about how we organize biomedical research. The current system—hypothesis-driven projects led by individual PIs who develop deep attachments to specific ideas—contains structural flaws that push even honest scientists toward motivated reasoning and occasionally push the dishonest ones past the line into fraud.

The Romantic Model of Science

Our funding system enshrines a particular vision of how science works. A brilliant investigator conceives a hypothesis. They design clever experiments to test it. They write a compelling grant proposal. If funded, they spend 3-5 years testing their idea. Success means publishing papers that confirm the hypothesis, which leads to more grants to extend the work.

This model has romantic appeal. It positions the PI as the creative genius whose insight drives discovery. It makes science a battle of ideas where the best hypotheses prevail. It creates clear narratives: an investigator proposes a theory, designs experiments to test it, and demonstrates it is correct. This is how we teach science, how we write about it in popular accounts, how we celebrate it in awards and prizes.

The problem is that this romantic model creates precisely the conditions under which fraud becomes tempting and honest self-deception becomes nearly inevitable.

When Hypothesis Becomes One’s Identity

Here’s what happened in numerous misconduct cases from the 1980s onward: A researcher develops a hypothesis. It’s not just any hypothesis—it’s their hypothesis, the idea that defines their research program, the theory that distinguishes them from competitors. They build a laboratory around it, recruit students and postdocs to test it, and write grants that promise to extend it.

The hypothesis becomes their professional identity. Colleagues know them as “the person who works on that theory.” Graduate students join their lab specifically to work on that problem. Papers in high-impact journals describe their unique contribution. Tenure committees evaluate whether the hypothesis has generated sufficient publications. Grant review panels judge whether the approach is likely to continue producing results.

Then experiments start yielding contradictory data. Not every experiment—if every experiment failed, the researcher might abandon the hypothesis. However, when enough experiments yield ambiguous or contradictory results, the careful scientist should begin to question the core idea.

This is where the system’s design creates problems. Walking away from the hypothesis means walking away from professional identity, from grants that depend on that research program, from students and postdocs whose projects are built on that framework. It means admitting that years of work may have been directed toward the wrong question. It means watching competitors promote alternative theories.

The pressure isn’t external—nobody is ordering the researcher to maintain their hypothesis. The pressure is structural, built into how we organize careers and evaluate success. When your identity, your lab’s funding, and your scientific reputation all depend on a particular idea being correct, it takes extraordinary intellectual honesty to acknowledge that idea might be wrong.

On the Spectrum: From Delusion to Fraud

Most scientists don’t fabricate data. But many engage in practices that fall short of fraud while still distorting the scientific record. These practices stem from the same structural problem: excessive investment in a specific hypothesis.

Selective reporting occurs when experiments yielding inconvenient results are dismissed as “technical problems,” whereas experiments supporting the hypothesis are published. The researcher isn’t fabricating data—they’re making judgments about which data are “good.” But those judgments are biased by investment in the hypothesis.

Data massaging occurs when researchers make analytical decisions that favor their theory. Which outliers to exclude? How to set cutoffs? Which statistical tests to use? Each decision seems defensible individually, but collectively, they bias results toward the preferred outcome. Again, this isn’t fabrication—it’s motivated reasoning dressed up as methodological choice.

Hypothesis rescue manifests as increasingly elaborate explanations for why experiments that should have supported the theory failed. Maybe the conditions weren’t quite right. Maybe there’s an additional factor we didn’t control for. Maybe the effect is context-dependent. Some auxiliary hypotheses are legitimate scientific refinements. Others are epicycles added to save a failing theory.

Selective collaboration and citation appear when researchers preferentially cite papers supporting their view while ignoring contradictory work. They collaborate with scientists who share their hypothesis, while avoiding those who promote alternatives. This creates echo chambers where a contested theory looks like a consensus because the believers only talk to each other.

These practices aren’t fraud in the legal sense. They’re what happens when intelligent, well-meaning scientists become too invested in particular ideas. The investment doesn’t require conscious dishonesty—it just requires the normal human tendency to see what we expect to see, to value evidence confirming our beliefs more highly than evidence challenging them.

The Cases We Remember

The 1980s wave of misconduct cases illuminates this pattern. Take John Darsee at Harvard Medical School. His fraudulent cardiology research wasn’t random fabrication—it was data manufactured to support his ongoing research program. He was so invested in demonstrating that his approach worked that he fabricated results when experiments didn’t cooperate. His extraordinary productivity should have raised red flags, but it fit the romantic model: the brilliant investigator producing breakthrough after breakthrough.

The Baltimore affair involved Thereza Imanishi-Kari’s immunology data that Margot O’Toole couldn’t replicate. The decade-long controversy ended in 1996 when an appeals board cleared Imanishi-Kari of all misconduct charges. But the case revealed how competing interpretations of the same data can arise when different investigators bring different assumptions to their analysis, and how difficult it becomes to distinguish between legitimate scientific disagreement and potential misconduct when researchers are deeply invested in their theories.

Eric Poehlman’s obesity research fraud—falsifying data in 17 grant applications and 10 publications—followed the same pattern. He had a research program, a reputation, and a stream of funding dependent on showing that his hypotheses about aging and obesity were correct. When data didn’t cooperate, he made them cooperate.

The common thread isn’t that these individuals were uniquely evil. It’s that they were operating in a system where too much depended on specific hypotheses being correct. The same pressures that led them to commit fraud push others into questionable practices and drive everyone toward motivated reasoning.

The Structural Alternative: Team Science

Consider how differently science works in fields that have moved away from the PI-centered hypothesis-driven model.

Large-scale genomics operates with diverse teams interrogating datasets rather than testing specific hypotheses. The question isn’t “Is my theory correct?” but “What patterns exist in these data?” Multiple investigators with different backgrounds and biases analyze the same datasets. Results require replication across labs. The data-sharing infrastructure enables other groups to independently verify findings.

Nobody’s career depends on a specific gene being associated with a particular disease. If your analysis suggests gene X matters but another team’s analysis contradicts that, there’s no professional catastrophe. You’re contributing to collective understanding rather than defending personal theories.

The BRAIN Initiative that I helped launch during my tenure at NSF was designed in part to avoid the hypothesis trap. Rather than funding individual PIs to test specific theories about brain function, it funded tool development, data collection, and infrastructure that multiple investigators could use. The bet was that understanding the brain required comprehensive data and analytical capabilities, not just clever hypotheses.

This doesn’t eliminate all bias—researchers still have preferences about which tools to develop or which brain regions to map. But it reduces the intense personal investment in any particular theory about how the brain works. The focus shifts from testing hypotheses to building shared resources.

Particle physics has worked this way for decades. Nobody at CERN builds a career on predicting a specific particle will or won’t be found. The infrastructure supports collective inquiry. Results require consensus across large collaborations. Data are shared immediately. Multiple teams analyze the same detector output.

Can you imagine a particle physicist fabricating Higgs boson data? The system makes it nearly impossible—not because particle physicists are more ethical, but because the organizational structure distributes both credit and accountability across large teams working with shared data.

The Biomedical Research Counterfactual

What would biomedical research look like if we designed it to minimize the hypothesis trap?

Separation of hypothesis generation from testing. One team develops theories and predictions. A different team, with no stake in the theory’s success, conducts the experiments. The testing team is rewarded for rigorous methods and clear results, not for confirming or refuting specific hypotheses. This isn’t unprecedented—clinical trials often use this model, with statisticians who haven’t seen interim results conducting final analyses.

Registered reports and pre-registration. Require researchers to specify hypotheses, methods, and analyses before collecting data. Journals commit to publishing based on methodological quality, not results. This removes the temptation to massage data because publication is already guaranteed. The researcher benefits from doing careful work, not from obtaining specific results.

Adversarial collaboration. When competing theories exist, fund collaborations between proponents to design jointly agreed-upon decisive tests. Each side specifies in advance what results would falsify their theory. The collaboration is rewarded for clarity and rigor, not for one side winning.

Collective attribution and team leadership. Move away from the PI model toward team leadership with distributed authority. Make it normal for multiple investigators to share senior authorship without hierarchical ordering. Reward contributions to collective projects, not just defending personal theories. This reduces the intensity of individual investment in specific hypotheses.

Diverse parallel approaches. Rather than funding one investigator to test one hypothesis over five years, fund multiple teams to simultaneously test competing hypotheses. Make this explicit: “We think question X is important but don’t know which of three theories is correct, so we’re funding all three approaches.” The field benefits from comparative testing; individual investigators aren’t catastrophically invested in one answer.

The Objections

These suggestions will provoke immediate resistance, much of it justified. The romantic model of science—brilliant individual investigator pursuing visionary ideas—isn’t entirely fiction. Great insights do come from individuals. Breakthrough theories do require conviction to pursue against skepticism. Hypothesis-driven research has produced genuine discoveries.

Moreover, team science and collective approaches have their own challenges. Large collaborations can become bureaucratic. Consensus-building can delay needed action. Distributing credit across many people may reduce individual incentive for excellence. Pre-registration can be gamed by enrolling multiple studies and selectively reporting which ones to complete.

The adversarial collaboration model assumes good faith from competing investigators, which isn’t always present. Separating hypothesis generation from testing may slow progress if the best experiments require an intimate understanding of the theory. Distributed leadership creates coordination problems.

These are real concerns. I’m not arguing for the complete abandonment of hypothesis-driven research or the PI model. But I am arguing that we’ve over-indexed on one way of organizing science—a way that creates predictable problems around motivated reasoning and hypothesis attachment—without seriously considering alternatives that might mitigate those problems.

The Incentive Redesign

The deeper issue is incentive structure. We reward:

  • Publications in high-impact journals (which prefer dramatic confirmations of interesting hypotheses)
  • Grant funding (which requires convincing reviewers you’re pursuing important ideas likely to yield results)
  • Citations (which accumulate for papers making strong claims, not for careful null results)
  • Awards and prizes (which celebrate breakthroughs, not rigorous refutations)
  • Tenure and promotion (based on establishing an independent research program—meaning a distinctive hypothesis)

Each incentive encourages researchers to develop strong attachments to specific theories. The scientist who carefully tests a hypothesis, finds ambiguous results, and concludes, “This is more complicated than we thought,” doesn’t thrive under these incentives. The scientist who generates a provocative theory, designs experiments to support it, and publishes dramatic results thrives—even if the theory is ultimately wrong.

We could design different incentives:

  • Reward rigorous replication attempts
  • Fund adversarial collaborations that test competing theories
  • Celebrate careful negative results that prevent the field from pursuing dead ends
  • Promote scientists who change their minds when evidence demands it
  • Value contributions to infrastructure and methods that enable collective progress

None of this is unprecedented. Clinical trial statisticians build careers on methodological rigor, not therapeutic breakthroughs. Methods developers in genomics gain recognition for creating tools others use. Psychology researchers are valued for independently testing whether published findings hold up.

The question is whether biomedical research, more broadly, is willing to diversify its incentive structures and organizational models. The field is enormously successful—NIH funding, breakthrough therapeutics, extended lifespans. Why change a winning formula?

BACK TO THAT 1980S CASE

The postdoc who brought misconduct charges understood something important: when data are being altered to support “the party line,” someone needs to object. That takes courage—postdocs are vulnerable, whistleblowers face retaliation, and questioning senior scientists is risky.

But here’s what I’ve come to understand that I didn’t fully appreciate forty years ago: the party line wasn’t imposed from outside. It emerged from structural features of how we organize research. The PI who allegedly manipulated data wasn’t serving some external master. They were serving their own hypothesis, the idea they’d built a career around, the theory their lab existed to develop.

That makes the problem both worse and better than simple corruption. Worse, because it means well-meaning scientists with good intentions can slide into questionable practices without recognizing it. The same motivated reasoning that drives fraud also drives less dramatic but equally problematic biases in how we collect, analyze, and report data.

Better because it means organizational redesign might help. We can’t eliminate human fallibility or the emotional attachment scientists develop to their ideas. However, we can design systems that reduce the extent to which outcomes depend on any particular hypothesis being correct. We can create structures where admitting you were wrong is professionally survivable. We can reward rigor over drama, collective progress over individual breakthroughs.

The Path Forward

I’m not optimistic about radical transformation. The biomedical research enterprise is vast, successful, and institutionally entrenched. The romantic model of the lone investigator testing brilliant hypotheses is deeply embedded in how we tell science stories, train graduate students, and allocate prestige.

But incremental change is possible:

Funding agencies can require pre-registration for hypothesis-driven research while also funding more exploratory, team-based approaches. NIH’s BRAIN Initiative and precision medicine programs already point in this direction. Expanding these models would diversify how research gets organized.

Journals can mandate data sharing and the use of registered reports. Some journals already do this; others resist for fear of losing exciting submissions to competitors. But collective action could shift norms. If high-impact journals required rigorous transparency, researchers would adapt.

Universities can broaden tenure criteria to value methodological rigor, replication, infrastructure development, and collaborative contributions, alongside traditional metrics of independent research. This requires courage because it means promoting faculty who don’t fit the standard template, but it’s feasible.

Training programs can teach critical evaluation of one’s own hypotheses. Rather than just training students to design clever experiments and write compelling grants, we can teach them to actively look for ways they might be wrong, to value evidence against their theories, and to see changing one’s mind as a strength rather than a weakness. This is partly cultural, partly structural.

Funders can experiment with alternative models. Fund some research explicitly as adversarial collaboration. Fund some as team science with distributed leadership. Fund some as infrastructure development. Create parallel tracks so researchers can build careers through multiple pathways, reducing the pressure to develop intense attachment to specific hypotheses.

None of this will eliminate fraud—there will always be individuals who cheat. However, it might reduce the structural pressures that push honest scientists toward motivated reasoning and, in some cases, scientists toward outright fabrication.

Integrity is More Than Honesty

That 1980s case I barely remember continues to inform my thinking, not because I have clear memories of it but because it captures something essential: scientific integrity requires more than individual honesty. It requires organizational structures that don’t push even honest people toward biased reasoning.

The postdoc filing charges was practicing integrity. But they were fighting against a system where a PI’s attachment to their hypothesis created pressure—probably unconscious, probably rationalized, but pressure nonetheless—to make the data fit the theory. One brave postdoc can’t fix structural problems alone.

We’ve built an enormously productive research enterprise. Biomedical science has achieved genuine miracles. The hypothesis-driven, PI-centered model has generated breakthrough after breakthrough. I’m not arguing it’s failed—clearly it hasn’t.

However, I argue it’s flawed in predictable ways. The same features that make it successful—individual investigators developing strong convictions about important ideas and pursuing them relentlessly—also create conditions for motivated reasoning, questionable research practices, and occasional fraud.

Acknowledging those flaws doesn’t diminish the achievements. It opens space for experimentation with alternative models that might reduce the problematic incentives while preserving the creative energy that drives discovery. The question is whether we’re willing to diversify how we organize research or whether we’ll continue over-relying on a single model because it’s familiar and has worked in the past.

The endless frontier that Vannevar Bush envisioned shouldn’t be endless in just one direction. It should include exploring different ways of pursuing knowledge, different structures for organizing inquiry, and different incentives for rewarding contributions to collective understanding.

That’s the real challenge: not just preventing fraud but creating systems where the pressures toward fraud—and toward less dramatic but equally problematic biases—are reduced. Where changing your mind based on evidence is professionally rewarded rather than punished. Where attachment to ideas is balanced by commitment to collective truth-seeking.

The party line that worries me most isn’t imposed by political power. It’s the party line we impose on ourselves when we become too attached to our own hypotheses, when our professional identities become too entangled with specific theories, when the systems we’ve built make admitting error too costly. We need to extend that understanding to object not only to individual fraud but also to organizational structures that make such fraud more likely. Building not just oversight systems but alternative models of how to pursue science.

That’s the integrity challenge for the next forty years.

When Agencies Collaborate: What EEID Teaches Us About Pandemic Preparedness

The research team moved carefully through the forest canopy platform at dusk, nets ready. In Gabon and the Republic of Congo during the mid-2000s, international ecologists were hunting for the reservoir host of Ebola virus. They targeted fruit bat colonies—hammer-headed bats, Franquet’s epauletted bats, little collared fruit bats—collecting blood samples and oral swabs.

By December 2005, they had their answer, published in Nature. They’d found Ebola RNA and antibodies in three species of fruit bats across Central Africa. For years, scientists had known Ebola emerged periodically, but couldn’t identify where the virus persisted between human epidemics. This research provided the answer: fruit bats, widely distributed and increasingly in contact with humans as deforestation pushed people deeper into forests.

Thanks for reading sciencepolicyinsider! Subscribe for free to receive new posts and support my work.

That discovery triggered a wave of follow-up research, much of it funded through the Ecology and Evolution of Infectious Diseases program—EEID—a joint NSF-NIH-USDA initiative I would later help oversee. EEID-funded teams documented how human activities created spillover opportunities: bushmeat hunting, agricultural expansion into bat habitat, mining operations bringing workers into forests. They identified cultural practices that facilitated transmission: burial traditions, preparation of bushmeat, children playing with dead animals. They built mathematical models of how Ebola moved from bats to humans and then through human populations. The science showed where Ebola lived, how it spilled over, and which human behaviors created risk.

Yet nine years after that initial Nature paper—after years of EEID-funded research mapping Ebola ecology—the virus emerged in Guinea in late 2013 and was identified in March 2014. A two-year-old boy, likely exposed through contact with bats, became patient zero. Within months, the outbreak had spread to Liberia and Sierra Leone. By 2016, more than 28,000 people were infected and 11,000 died. The economic impact exceeded $2.8 billion.

I was leading NSF’s Biological Sciences Directorate at the time, overseeing NSF’s role in EEID. We had funded years of follow-up research. We knew fruit bats harbored Ebola. We had models for predicting transmission. We had mapped high-risk regions. And yet 11,000 people died anyway. All of this was foreshadowing what would happen with SARS-CoV-2 later and on a much larger scale.

Here is the uncomfortable question I’ve been wrestling with ever since: If we funded the right science and had years of warning, why were we not better prepared?

What EEID Was Supposed to Do

EEID launched in 2000 because infectious disease ecology fell between agency missions. NSF supported ecology but wasn’t focused on disease. NIH funded disease research but wasn’t equipped for field ecology. USDA cared about agricultural diseases but not the broader ecological context. The program brought all three together: NSF’s ecological expertise, NIH’s disease knowledge, and USDA’s understanding of agricultural-wildlife interfaces.

The administrative structure was elegant on paper. All proposals submitted through NSF underwent joint review by all three agencies, and then any agency could fund meritorious proposals based on mission fit. For Ebola research, this meant NSF might fund the bat ecology, NIH’s Fogarty International Center might support the human health surveillance component, and USDA might fund work on bushmeat practices—different pieces of the same puzzle, coordinated through a single program.

The program typically made 6-10 awards per year, totaling $15-25 million across agencies. Not huge money, but enough to support interdisciplinary teams working across continents. And it worked—EEID funded excellent science at the intersection of ecology and disease that no single agency could have supported alone.

Why Interagency Collaboration Is Genuinely Hard

When I arrived at NSF in 2014 with the outbreak at its peak, I inherited EEID oversight and quickly discovered that elegant-on-paper doesn’t mean easy-in-practice. The deepest challenges weren’t administrative—they were cultural.

NSF and NIH approach science from fundamentally different starting points. NSF’s mission is discovery-driven basic research. When NSF reviewers evaluate proposals, they ask: Is this important science? Will it advance the field? NIH’s mission is health-focused and translational. NIH reviewers want to know: Will this help prevent or treat disease? What’s the public health significance?

I saw this play out in a particularly contentious panel meeting around 2016. Our panelists were reviewing a proposal on rodent-borne hantaviruses in the southwestern U.S.—excellent ecology, good epidemiology, solid modeling. The NSF reviewers loved it: beautiful natural history, important insights about how environmental variability affects transmission. The NIH reviewers were skeptical: where was the preliminary data on human infection? How would this lead to intervention?

An hour passed debating what constituted “good preliminary data.” For NSF reviewers, the PI’s previous work establishing field sites was sufficient—it showed feasibility. NIH reviewers wanted preliminary data on the virus itself, on infection rates. They weren’t being unreasonable—they were applying NIH’s standards. But we were talking past each other.

That debate crystallized the challenge. Two agencies with different cultures had to agree on the same proposals. Sometimes it created productive tension. Sometimes it just meant frustration.

The administrative burden on investigators was worse than we acknowledged. When NIH selected a proposal for funding instead of NSF, the PI had to completely reformat everything for NIH’s system—different page limits, different budget structures, different reporting requirements. This could add 3-6 months to award start dates. Try explaining to a collaborator in Guinea why you don’t know which U.S. agency will fund your project or when you’ll actually get money.

For program officers, EEID meant constant coordination overhead—meetings to discuss priorities, coordinating review panel schedules across agencies, negotiating which agency would fund which proposals. This work wasn’t counted in official program costs, but it was real. Hours we could have spent on other portfolio management.

Despite all this friction, EEID succeeded at its core mission. It funded research that advanced both fundamental science and disease understanding. When the 2014 Ebola outbreak hit, epidemiologists reached for transmission models developed through EEID grants. The program had trained a generation of researchers in genuinely interdisciplinary work.

What the 2014 Outbreak Exposed

But here’s what haunts me: we funded the science but not the systems. By 2014, nearly a decade of research had confirmed fruit bats as Ebola reservoirs, mapped their distribution across Africa, and identified high-risk human-bat contact zones. Papers were published in top journals. And then… nothing. No one built surveillance systems in West African villages where contact with bats was common. No one established early warning networks. No one created mechanisms to translate “we found Ebola in these bats” into “we’re monitoring for spillover in Guinea.”

EEID funded research, not surveillance. That’s appropriate—it’s a research program, not an operational public health system. But there was no mechanism to bridge the gap. When EEID-funded scientists discovered important findings, those findings stayed in academic papers. They didn’t flow to CDC, didn’t trigger surveillance efforts, didn’t inform preparedness planning.

During our quarterly coordination calls with NIH and USDA program officers, the question would occasionally arise: Who’s responsible for acting on what we’re learning? If EEID research identifies high-risk pathogen reservoirs, whose job is it to establish surveillance? The answer was usually silence, then acknowledgment that it wasn’t our job—we fund research—but uncertainty about whose job it was.

The missing infrastructure was organizational, not intellectual. We knew enough to be better prepared. The problem was lack of systems to act on knowledge. No agency was responsible for translating academic research into surveillance systems. CDC focuses on domestic diseases. NIH funds research but doesn’t run operations overseas. USAID’s PREDICT program did fund surveillance but didn’t have coverage in Guinea. We had pieces of the puzzle but no mechanism to assemble them.

I remember discussions about whether EEID should become more operational—perhaps requiring funded projects to include surveillance components. The response was always that this would fundamentally change the program’s character. NSF resists mission-directed research. My former agency’s strength is supporting investigator-driven discovery. Making EEID operational would require multiple agencies and authorities, and, most importantly, substantially more funding. A research program can’t solve an operational preparedness gap.

The scale problem was obvious. At $15- $ 25 million per year, EEID could support excellent science but not comprehensive surveillance. Think about what that would require: ongoing monitoring in multiple countries, relationships with local health systems, rapid response capacity, and laboratory infrastructure. This requires hundreds of millions annually, not tens of millions.

The timeline mismatch was equally frustrating. Research operates on slow timescales—EEID grants ran five years, and from proposal to publication might take 6-7 years. The initial bat reservoir discovery was published in 2005. If that had immediately triggered surveillance in West Africa, we’d have had nearly nine years before the 2014 outbreak. But triggering surveillance takes decisions, funding, international coordination—processes that themselves take years. By the time anyone might have acted, attention had moved elsewhere.

What This Means for Pandemic Preparedness

The most troubling insight: we knew enough to be better prepared for Ebola, and later for COVID-19, but knowledge alone wasn’t enough. EEID succeeds at advancing knowledge but can’t create surveillance systems, can’t fund operational preparedness, can’t bridge the gap between discovering threats and preventing epidemics. That gap is organizational and political, not scientific.

Should we expand EEID? More funding would support more projects, but it wouldn’t solve the fundamental problem. You could triple EEID’s budget and still have the research-to-surveillance gap. More papers about bat reservoirs don’t automatically create early warning systems. The limitation isn’t insufficient research funding—it’s absence of operational systems to act on research findings.

We need something structurally different. Here’s what I’d do:

First, create a rapid-response funding mechanism within EEID. When Ebola emerged in 2014, imagine if researchers could have gotten funding within weeks to investigate transmission dynamics and surveillance in surrounding regions, rather than waiting for the next annual competition. Model this on NSF’s RAPID program—streamlined review, modest awards ($100-200K for one year), quick deployment—but create an entirely different pocket of money for it from all the participating funders.

Second, establish formal connections between EEID and operational agencies. This is the biggest gap. Require EEID-funded researchers to submit one-page “surveillance implications” memos with final reports, which program officers share with CDC, USAID, and WHO. Better yet, have CDC or BARDA co-fund some EEID proposals with clear surveillance applications. Create visiting scholar programs where CDC epidemiologists spend time with EEID research teams and vice versa.

Third, strengthen international partnerships with genuine co-leadership. The 2014 outbreak showed the cost of inadequate surveillance infrastructure in West Africa. Expand EEID to include more disease hotspot regions—India, Brazil, Indonesia, DRC, West African nations—where foreign investigators can be lead PIs, foreign institutions receive and administer funds, and research priorities reflect host country needs. This isn’t altruism—it’s pragmatic self-interest.

The Larger Lesson

Interagency collaboration is genuinely hard—the friction I described isn’t fixable through better management. It’s inherent when bringing together organizations with different missions and cultures. EEID proves such collaboration can work and produce excellent science. But it requires sustained effort, goodwill, and tolerance for complexity.

The alternative—each agency in its silo—is worse. Infectious disease ecology requires expertise no single agency possesses. Complex problems require complex solutions. EEID demonstrated this is possible. The challenge is making it sufficient.

What haunts me is that we’re probably going to repeat the pattern. Right now, post-COVID, pandemic preparedness has political salience. But history suggests this won’t last. After the 2014-2016 Ebola outbreak, there was similar urgency. Within a few years, budgets declined and attention shifted. USAID’s PREDICT program was terminated in 2019—just months before COVID—due to budget constraints. We cut surveillance funding during a quiet period, then paid an enormous price when the next pandemic hit.

Prevention is invisible. We never know which pandemics we successfully prevented. There’s no constituency defending preparedness funding when cuts loom. That’s the structural problem we haven’t solved.

What Needs to Happen

Will we learn from EEID’s experience and build the infrastructure we need? Or will we fund the right research but lack systems to act on it—again?

The answer depends on recognizing that pandemic preparedness isn’t primarily a scientific challenge—we know enough—but an organizational and political one. Can we create structures spanning research and operations? Can we sustain funding between crises? Can we build systems robust enough to survive political leadership changes?

EEID succeeded at what a research program can do: funding excellent science that advanced understanding. The larger failure—inadequate pandemic preparedness—requires solutions at different organizational levels. But EEID’s experience provides a foundation: proof that interagency collaboration can work, that we can identify threats before they become catastrophes.

The team in Central African forests collecting bat samples did their job. They found the virus, mapped the threat, advanced our understanding. The question for the rest of us—program officers, policymakers, public health officials, citizens who fund this through taxes—is whether we’ll do our job: building systems that turn knowledge into prevention.

Science can identify threats. But preventing pandemics requires more than science. It requires sustained organizational commitment, interagency coordination, international cooperation, and political will—especially during quiet periods when threats seem distant. EEID demonstrated the scientific component is feasible.

The rest is up to us. And based on what I’ve seen, I’m not optimistic we’ll get it right before the next one hits.

Three Things Aviation Teaches Us About Science Funding

A trip to Long Beach Airport reveals something deep about policy

The LA Uber driver let me off at the small passenger terminal at Long Beach Airport, and I had to do some serious trial and error with Google Maps to find the old Douglas Aircraft hangar where JetZero had set up shop with the admirable goal of completely disrupting the commercial aviation market by building a wide-body blended wing aircraft that would carry a 787 Dreamliner load of passengers across the country for half the fuel cost.

The hangar was open to the air, the ramp and runway fully active, yet the ethos inside was pure early-2000s Google—when anything seemed possible. The enormous space was filled with a full-size cabin mock-up, engineers at workstations, cinema-size screens streaming CAD imagery of the new plane sporting various well-known airline liveries, and a collection of flying scale model drones. The plane itself looked like it had flown off a science fiction set.

Thanks for reading sciencepolicyinsider! Subscribe for free to receive new posts and support my work.

The engineering team was equally striking: veterans from Boeing, Embraer, and McDonnell Douglas, each bringing decades of experience from very different aviation cultures. I met one of the chief designers—the inventor of sharklets, those forked wingtips that reduce drag and improve fuel efficiency, now ubiquitous on some commercial aircraft. Another engineer had come from Embraer, where he’d designed the popular 2×2 cabin configuration that passengers overwhelmingly prefer on narrow-body aircraft. Now he was tackling the challenge of designing a completely new kind of airplane cabin that would maximize comfort in a blended wing configuration.

These engineers had learned their craft in established organizations with very different approaches to decision-making, risk assessment, and innovation. The wave of consolidation in aviation—most notably Boeing’s merger with McDonnell Douglas and its subsequent shift from an engineer-driven culture to one focused on shareholder returns—had left many veteran engineers looking for something different. The 737 MAX crisis highlighted how far Boeing had drifted from its engineering roots. JetZero represented a chance to get back to what they loved: solving hard technical problems without the constraints of quarterly earnings calls and legacy infrastructure.

They were attempting something none of their former employers would touch: a radical departure from the tube-and-wing design that has dominated commercial aviation for seventy years. This raised a question that goes far beyond aircraft design: Why can radical innovation happen at a startup like JetZero but not at Boeing, Airbus, or Embraer?

This isn’t just about airplanes. It’s about how organizations—whether aircraft manufacturers or science funding agencies—decide what’s worth building, who gets to decide, and how they balance proven approaches against risky bets. Aviation and science funding face the same fundamental challenge: how to organize technical innovation.

Studying how Boeing, Airbus, and Embraer make these decisions has revealed patterns that apply directly to science funding. Here are three lessons from aviation that illuminate how research gets funded—and why some innovations happen while others never get off the ground.

Lesson 1: How Organizations Assess and Manage Technical Risk

The Aviation Pattern

Boeing, in its traditional engineer-driven culture, approached risk through data and testing. Engineers made decisions based on technical feasibility. They’d prove something worked, then seek regulatory approval. The 787 Dreamliner exemplified this: Boeing pushed carbon-composite technology to unprecedented levels while keeping the basic configuration conventional. The cultural assumption: engineers know best, prove it works, get approval, move forward.

Airbus operates from a completely different framework. As a consortium involving multiple governments, labor unions, and industry stakeholders, risk assessment includes political, economic, and social factors alongside technical ones. Workers’ councils have a voice in production decisions. Safety regulators participate earlier in the design process. The A380 Superjumbo was technically conservative—four engines, conventional configuration—but represented enormous manufacturing and political risks, requiring coordination across nations. The cultural assumption: technical decisions affect many stakeholders, and all deserve input.

Embraer’s approach reflects its position as a state development tool for Brazil (the country holds a veto over control of the company’s strategic direction). They can’t compete head-to-head with Boeing and Airbus, so their risk calculus focuses on market positioning. Find niches, develop partnerships, move quickly. The E-Jet family succeeded by targeting the underserved regional market. The cultural assumption: innovation means finding white space in a market dominated by established players.

Same engineering principles. Same physics. The same goal of building safe, efficient aircraft. But fundamentally different risk assessment frameworks.

The Parallel to Science Funding

The American system, through NSF and NIH, operates remarkably like Boeing’s traditional approach. Peer review is engineer-driven decision-making translated to science. Data—preliminary results, track record—drives decisions. The central question reviewers ask is Boeing’s question: “Can this PI deliver with taxpayer money?” Merit review happens after the proposal is submitted. The system rewards incremental progress from established investigators, just as Boeing refined the 737 through successive iterations.

European research funding embeds more stakeholder involvement. Horizon Europe’s missions approach brings policymakers, industry representatives, and public voices into the priority-setting process. Risk assessment explicitly includes societal benefit and economic impact. Clinical translation gets emphasized earlier in the research pipeline. Scientists remain central but aren’t the sole decision-makers.

Emerging science powers like China take yet another approach. Strategic national priorities drive funding decisions. The question isn’t “What’s the best science?” but “Where can we compete globally?” This enables leapfrog strategies: massive focused investments in AI, quantum computing, and biotechnology designed to establish leadership in emerging fields rather than catching up in established ones. This top-down approach is now also emerging within the US science ecosystem.

For researchers, understanding which risk framework you’re operating in helps you frame proposals effectively. The American system rewards demonstrated competence and incremental progress. Other systems may value societal impact, strategic positioning, or rapid deployment. Neither approach is better or worse—they reflect different cultural assumptions about how to allocate risk in technical innovation.

Lesson 2: Who Gets to Decide What Gets Built

The Aviation Pattern

At Boeing, engineers and program managers traditionally drove major decisions. Shareholders and the board provided financial constraints. Airlines shaped requirements. But core technical choices were the engineers’ responsibility. This produced technically sophisticated aircraft, sometimes disconnected from market realities. The 747-8 (the last of the classic jumbo jets’ instantiations), for instance, was an engineer’s dream—but the market for it was lukewarm.

Airbus engages multiple stakeholders from day one. National governments in France, Germany, the UK, and Spain have seats at the table. Workers’ councils negotiate production methods. Industry partners across Europe collaborate on components. Customers get involved earlier. The result is more consensus-driven and sometimes slower, but with broader buy-in. The A350’s long development process reflected extensive consultation but yielded strong market acceptance.

Embraer’s alignment with Brazil’s government development goals sets direction, but the company maintains a partnership model with established players and responds quickly to market signals. Less hierarchical decision-making enables nimble adaptation. The attempted Embraer-Boeing partnership that ultimately fell apart illustrated starkly different decision-making speeds between the two companies.

JetZero represents something different entirely. A small team iterates rapidly. Engineers from different aviation cultures bring different assumptions. Venture capital’s risk tolerance differs fundamentally from corporate risk aversion. They can attempt radical innovation precisely because they’re not constrained by established stakeholder expectations or legacy infrastructure.

The Parallel to Science Funding

American peer review puts scientists in the decision-making seat. On its face, this seems ideal: who better to judge scientific merit than other scientists? But peer review favors known researchers using proven methods. Peers can become conservative gatekeepers. The result is high quality and incremental progress, but potentially missed breakthroughs.

European models bring more voices into the room. The European Research Council maintains scientific independence but operates within frameworks emphasizing societal missions and grand challenges. Policymakers, industry representatives, and public stakeholders help set priorities. Scientists remain central but aren’t the sole arbiters. This creates stronger connections to societal needs, though sometimes at the cost of researcher autonomy.

Directed research models flip the equation. Governments or funding agencies set priorities; researchers respond to calls for proposals. This is top-down rather than bottom-up. The advantage is alignment with national priorities. The risk is missing unexpected discoveries that don’t fit predetermined categories.

I’ve seen these differences firsthand, reviewing for both American and international funding agencies. The questions panels ask reveal cultural assumptions about whose judgment matters. American panels debate scientific rigor and PI capability. International panels I’ve participated in spend more time on broadening participation and strategic fit with national priorities.

For researchers, understanding who has a voice in funding decisions is crucial for navigating the system. American researchers working internationally need to recognize that peer review isn’t universal—other countries organize scientific decision-making to reflect different values about expertise, accountability, and public benefit.

Lesson 3: The Tension Between Incremental Improvement and Radical Innovation

The Aviation Pattern

Established aircraft manufacturers favor incremental improvement for sound reasons. The tube-and-wing design has been refined for seventy years. Every iteration builds on accumulated knowledge. Existing manufacturing facilities, pilot training programs, maintenance infrastructure, and regulatory pathways all assume this configuration. Airlines understand the operating economics. Risk is manageable, returns are predictable. The 737 MAX—an incremental update to a 1960s design—still makes economic sense despite its troubles.

JetZero’s blended wing body has been studied since the 1940s. Its technical advantages are clear: dramatic improvements in fuel efficiency, reduced noise, and potential for entirely new cabin configurations. But it requires new manufacturing processes, new pilot training, and new regulatory frameworks. The risk isn’t primarily technical—it’s organizational and systemic. There’s no clear path from prototype to profitable, scalable production. Established players, accountable to shareholders and constrained by quarterly earnings expectations, can’t justify the investment.

Startups like JetZero can attempt radical innovation because they have no legacy infrastructure to protect. They can accept higher technical risk. The venture capital model tolerates failure in ways public corporations cannot. They don’t need to satisfy existing stakeholders or worry about cannibalizing current product lines. They can focus on long-term disruption rather than next quarter’s earnings.

But we should be clear: most aviation innovation is incremental for good reason. Lives depend on safety. Capital requirements are enormous. Development timelines span 10-15 years. Regulatory burden is intense. Incremental improvement has delivered extraordinary gains—modern aircraft are unimaginably more efficient, safe, and capable than those of fifty years ago.

The Parallel to Science Funding

Science funding faces the same tension. Established PIs using proven methods dominate for sound reasons. Track records reduce risk. Incremental progress is predictable, publishable, and fundable. Infrastructure investments favor established approaches—if your university has a state-of-the-art imaging facility, proposals that use it have an advantage. Peer reviewers understand and can evaluate proven methods. The “preliminary data” requirement inherently favors ongoing work over genuinely new directions. The system is designed to minimize taxpayer waste through careful risk management.

Truly novel approaches struggle in this environment. High-risk/high-reward programs exist but represent a tiny fraction of overall funding. Early career investigators face a chicken-and-egg problem: “How will you do this?” reviewers ask, but gathering preliminary data requires resources they don’t yet have. Reviewers are more comfortable funding known quantities. Paradigm shifts are rare and unpredictable—there’s no clear “return on investment” for genuinely radical ideas.

Consider the BRAIN Initiative. The vision was bold: transform neuroscience through new technologies and approaches. But implementation favored established neuroscientists with proven track records. The system worked as designed: minimizing risk by funding demonstrated competence. As I’ve written earlier, BRAIN fell short in its delivery goals: curing brain diseases. ARPA-H was explicitly created to escape the incremental trap, but it’s still finding its model. The European Research Council’s advanced grants show somewhat higher tolerance for risk, but even there, track record matters enormously.

For researchers pursuing truly novel approaches, it’s crucial to understand you’re working against system design, not just reviewer bias. The system is optimized for reliable incremental progress, not moonshots. Radical innovation in science, like radical innovation in aviation, may require different funding models—something more like venture capital, tolerant of high failure rates in pursuit of occasional transformative breakthroughs.

This raises a deeper question: Should science funding favor incremental or radical innovation? Or do we need both, in different proportions? Aviation supports both Boeing’s incremental refinements and JetZero’s radical rethinking. Should science funding do the same—and if so, in what balance?

What This Means for Science Policy

These aviation patterns reveal a fundamental feature of how societies organize technical innovation. The choices Boeing, Airbus, and Embraer make about risk assessment, decision-making authority, and the balance between incremental and radical innovation aren’t purely business decisions. They’re cultural choices embedded in what Sheila Jasanoff calls civic epistemologies—different assumptions about how knowledge should be produced, who should decide, and what goals matter most.

American science funding has historically reflected American cultural values: individual merit and achievement drive peer review by scientific peers. Data-driven decision-making shows up in preliminary data requirements. Risk minimization operates through proven track records. Incremental progress represents the reliable path. This isn’t accidental—it’s deeply cultural.

Other countries organize differently because they value different things. European systems emphasize societal benefit and stakeholder input. Asian systems prioritize strategic national development goals. Different countries strike different balances between discovery and application, between researcher autonomy and national priorities, between tolerance for failure and demands for accountability.

For all researchers, understanding these cultural patterns helps you work more effectively within the system. Know what the system optimizes for—reliable incremental progress from established investigators. If you’re pursuing radical innovation, recognize you’re working against the grain. International collaborations require understanding that your partners may operate within fundamentally different funding cultures with different assumptions about what science is for and how it should be organized.

For science policy, we should be explicit about what our funding systems optimize for. There’s no “best” system—only different tradeoffs reflecting different values. Maybe we need multiple models, as aviation has both Boeing and JetZero. Comparing systems reveals assumptions we don’t normally question.

In future posts, I’ll explore specific country comparisons: How does the European Research Council actually work? What can we learn from how other countries fund AI research? How do different countries handle the tension between researcher autonomy and national priorities?

A Final Thought

Visiting JetZero and seeing engineers from Boeing, Embraer, and McDonnell Douglas collaborate on something radical that couldn’t happen within their former companies crystallized something I’d been observing in science policy work: innovation doesn’t just require good ideas and talented people. It requires organizational structures and cultural assumptions that allow certain kinds of ideas to be pursued.

The JetZero engineers didn’t suddenly become more creative or capable. They remained the same engineers who’d designed sharklets at Boeing or cabin configurations at Embraer. What changed was the organizational context—the risk tolerance, decision-making authority, and freedom from legacy constraints. That shift in context enabled them to attempt what had been impossible in their former roles.

Science funding works the same way. Researchers operating within NSF’s peer review system are no less creative than those pursuing radical ideas through ARPA or venture-backed biotechs. But the organizational context shapes which ideas can be pursued and which innovations are possible.

Understanding how different countries organize technical innovation—whether building aircraft or funding research—helps us see our own system more clearly. And maybe, just maybe, it helps us imagine how we might do things differently.

What examples have you seen where organizational culture shaped what research got pursued? Have you experienced different funding cultures working internationally? Share in the comments.