AI, Scholarly Publishing, And The Future Of Research

Gita Manaktala draws upon decades of experience in scholarly publishing to examine the challenges and opportunities emerging at the intersection of artificial intelligence and knowledge production, in conversation with Elias Wondimu.

1

Artificial intelligence has rapidly moved from a technological curiosity to a force reshaping publishing, scholarship, education, and the global knowledge economy. While much of the public conversation has focused on chatbots, copyright lawsuits, and the race among technology giants to build ever more powerful systems, a deeper question remains largely unresolved: What happens to the intellectual ecosystem upon which these systems depend?

At the center of that question stands scholarly publishing. For generations, scholars, editors, peer reviewers, librarians, university presses, and research institutions have collectively built a vast body of knowledge that advances science, informs public policy, preserves culture, and expands human understanding. Today, that body of work has become one of the most valuable resources fueling the development of artificial intelligence.

In their widely discussed Chronicle of Higher Education essay, “Big Tech Owes Scholars,” Michael Schrage and Gita Manaktala argue that the relationship between AI companies and the scholarly community has reached a critical juncture. As AI firms increasingly rely on research literature, academic books, repositories, and journals to train their systems, questions of compensation, responsibility, sustainability, and public interest can no longer be postponed.

In this conversation, Manaktala draws upon decades of experience in scholarly publishing to examine the challenges and opportunities emerging at the intersection of artificial intelligence and knowledge production. She discusses copyright, open access, research infrastructure, the future of university presses, and why the global scholarly community must play an active role in shaping the policies that will define the next era of intellectual life.

The questions raised by AI are not merely technical or legal; they go to the heart of how societies create, value, and sustain knowledge. In the conversation that follows, Manaktala examines what is at stake and what must come next.

Elias: Your article opens with Anthropic’s $1.5-billion settlement but calls it only a “down payment.” What is the larger problem that settlement does not solve — and why is scholarly content arguably more valuable to AI systems than the 500,000 books at the heart of that case?

Gita: The Anthropic settlement provides some compensation to authors and publishers whose books the company used without permission to train its chatbot, Claude. In Bartz v. Anthropic, the court found that, on those facts, training on legitimately acquired books was fair and transformative — but their ingestion from “shadow libraries,” illegal repositories of copyrighted content, was not.

The settlement paves the way for similar lawsuits against other AI companies that have relied on pirated content. But it does nothing to deter the scraping of scholarly journals, open-access books, and preprints from the web, a practice every major AI company engages in. That literature is enormously valuable as training data because it is curated, vetted, revised, and edited; it is the cumulative output of a research community. A growing body of work on training-data quality suggests that high-signal text — peer-reviewed articles and books, technical references, carefully edited prose — punches well above its weight in shaping how models reason, even when it makes up a small share of the corpus. High-quality inputs inform more reliable outputs.

The settlement amount reflects the fact that the 500,000 books in the Anthropic case are valuable to the company. The corpus includes peer-reviewed university-press books alongside commercially published trade books that embody a high standard of expertise and editing.

Elias: You write that “the window for licensing deals is closing fast.” What specific recent developments — court rulings, publisher deals, AI deployments — convinced you that this moment, rather than five years from now, is the time to act?

Gita: A few developments convinced us that this moment matters. First and foremost, Judge Alsup’s reasoning in Bartz v. Anthropic — that training on legitimately acquired books is fair and transformative — hands AI companies a clear path to ingest published scholarship without licensing it, provided they don’t source it from pirate libraries. If that reasoning holds and spreads, the legal pressure to negotiate disappears.

Elias: Beyond pirated trade books, where in the scholarly ecosystem is the most consequential extraction happening: peer-reviewed journals, disciplinary working papers, open-access repositories, university-press monographs, or somewhere else?

Gita: All of it is being scraped, but the consequences do vary. Peer-reviewed journal articles and disciplinary review pieces do the heaviest lifting in training, because they resolve ambiguity, define concepts precisely, and aggregate prior evidence — exactly the qualities that improve model reliability and reduce hallucination. Open-access repositories like arXiv, SSRN, PubMed Central, and bioRxiv are particularly exposed because they can be freely scraped at scale and contain frontier work, often before formal publication. University press monographs matter for sustained argument and synthesis — the kind of long-form reasoning that short articles cannot capture. Disciplinary working papers and technical reports often carry the proprietary methods and datasets that AI systems most want to learn from.

If pressed to name a single point of greatest concern, I would point to the open-access scientific corpus. The openness that was meant to serve the public is now disproportionately serving private companies that have made no comparable contribution to its production. That is a failure of design, not of the open-access mission, and it is one the proposal in our article is meant to begin addressing.

Elias: Wiley and Taylor & Francis have cut licensing deals on terms their authors never saw. Scholars peer-review for free. Universities subsidize the system with public and philanthropic funding. In your view, who is being most harmed — and who is profiting at their expense?

Gita: AI companies are profiting massively at the expense of scholars, researchers, reviewers, universities, and nonprofit publishers. Ultimately the public is harmed if the system that produces high-quality research and scholarship is damaged — a real concern in the current funding environment. AI companies are not the only cause of that strain, but they are uniquely positioned to support an ecosystem they benefit from disproportionately.

Elias: You observe that courts have been “remarkably comfortable with fair use at scale.” Why has that interpretation taken hold so quickly, and what would it realistically take — a landmark case, new legislation, public pressure — to revisit it?

Gita: Well, I’m not a lawyer or a copyright expert. My understanding is that while some U.S. courts have treated the ingestion of creative and scholarly works as transformative and fair, the reasoning is contested and unsettled. The EU has gone in a different direction — the Digital Single Market Directive and the AI Act between them require transparency about training data and an opt-out for rightsholders. India’s framework recognizes fair dealing rather than fair use, and the Delhi High Court’s ongoing consideration of ANI v. OpenAI may produce a different doctrinal answer than U.S. courts have so far. The U.S. is increasingly the outlier, not the norm.

Open-access advocates argue persuasively that the public deserves access to the research it has funded, and the ability to use that research with attribution. But the ingestion of nearly all human knowledge at scale by a handful of private firms is a circumstance copyright law was not written to address. Common sense suggests it is not in fact analogous to the use of books to educate schoolchildren — the example Judge Alsup cited in Bartz — and the law will eventually have to catch up.

Elias: AI firms often argue that training on publicly available research is no different from a human scholar reading thousands of papers. Drawing on your years at MIT Press, how does that analogy break down once you account for scale, commercial purpose, and the unpaid labor of authors and reviewers?

Gita: Scale is key. Human scholars are limited in how many papers they can read, and someone — usually a library — is paying for that access unless the work is in an open journal or archive. Authors and reviewers may not be directly compensated, but the work is certainly not free to produce. Universities recognize its value and incentivize its production; hence the imperative to “publish or perish.”

The system has real problems, which are themselves the subject of entire books. Anonymous reviewers are poorly recognized and scarcely compensated for the value they add. If scholars went on a peer-review strike, the system would soon collapse.

U.S. copyright law has an explicit goal: to promote the progress of science and the useful arts. To do that, it creates economic incentives for original and expressive work. The scholar reading papers is expected to produce something new informed by existing knowledge. If she publishes enough, she can make the case for a higher salary and eventually tenure. AI companies argue that their systems, too, advance science and the arts, and benefit the public. They may be right. But in the process they are undermining the markets and incentives for the very creative and intellectual work humans produce — work they are using at massive scale, without payment or permission.

Elias: You and Michael Schrage propose that Big Tech commit 1 percent of average market capitalization to a permanent endowment and revolving loan facility. Walk us through what that fund would actually do.

Gita: The fund should provide research grants and fellowships that recognize and support the work of authors, peer reviewers, translators, and editors. It should subsidize the publication of work with specialized audiences — monographs, regional scholarship, translations — that the market won’t otherwise sustain. And it should fund infrastructure: open-access platforms, disciplinary repositories, and peer-review systems, along with the costs of editing, design, production, and distribution that are increasingly unsustainable, particularly for small and nonprofit publishers.

Elias: You warn that if firms refuse to act, “legislators will improvise cruder levies.” If that becomes inevitable, what specific policy mechanism — a data-training levy, copyright reform, federal procurement rules, mandatory licensing — would you most want to see, and which would you most fear?

Gita: I’m not a policy specialist, but I’ll venture a preference. What I’d most want to see is something close to what we propose — a governed endowment funded by a windfall clause on the largest AI firms, with proceeds dedicated to public-interest research infrastructure rather than to publishers as intermediaries. If that proves impossible, a transparent training-data levy with revenues ring-fenced for the same purpose would be a reasonable second-best.

The proposal Michael and I make is meant to be a constructive entry into that policy conversation, not the last word. It is a starting point that asks the firms benefiting most to act, voluntarily, before legislators do it for them.

It’s important to note that the scholarly knowledge being ingested by AI systems was produced by researchers around the world, including a rapidly growing share by Indian scholars. Any fund that emerges from this conversation must be globally accessible, not limited in its eligibility to North America and Europe.

India is doing things the U.S. is not. The One Nation One Subscription program, operational since January 2025, is a serious public investment in research access and a model others should study. The Delhi High Court’s consideration of ANI v. OpenAI may produce a more rights holder-protective doctrine than U.S. courts have so far. India’s broader digital-public-goods agenda offers a different model for thinking about AI in the public interest — one that takes seriously the question of who benefits and on what terms.

Whatever mechanism eventually emerges, the design conversation needs Indian publishers, scholars, and policymakers in it — not as an afterthought but as principal participants. The scholarly commons is global. It will only survive the AI moment if its defense is global, too.


Gita Manaktala is a publishing professional with more than two decades of experience in scholarly publishing. Most recently, she served as Executive Editor at the MIT Press, following fourteen years as the Press’s Editorial Director. Throughout her career, she has worked with hundreds of authors and acquired books on topics ranging from information science and AI policy to data science and communication.

Elias Wondimu is the founder and publisher of TSEHAI Publishers, an independent academic publishing house established in Los Angeles and dedicated to African scholarship, literature, and cultural heritage. For nearly two decades, TSEHAI stood as the only African academic press based at a university in North America, Europe, or Asia. He is also the founder of the International Journal of Ethiopian Studies. Through his work as a publisher, editor, writer, and advocate, Wondimu has advanced the global visibility of African scholarship and supported initiatives that strengthen knowledge production across Africa and the diaspora.

You might also like More from author

Leave A Reply