Catch Me Up: How well does AI compare with humans in news selection and audio recording?

EDITOR’S NOTE:

One of the great promises of AI for journalism is that we may be able to find efficiencies in our research, presentation and distribution workflows so that journalists have more time to do critical reporting and meaningful analysis. So far, the results have been a mixed bag, in part because we are not willing to sacrifice accuracy or quality for speed. At Bay City News, we have a comprehensive and transparent policy about the use of AI that governs how we employ these powerful tools, a tip sheet that explains use cases, and a strict rule about providing full disclosure to our audience. We have completed seven newsroom experiments to date, all described in articles published on our “Experiments in AI” sandbox on our free, public-facing news site: LocalNewsMatters.org. Our latest test — converting human-created news stories into a roundup and then an AI podcast — led us to challenge some of our assumptions around news judgment, audience expectations and the willingness of journalists to cede some tasks to AI.

ABOUT THE PROJECT

Bay City News wanted to test a new workflow that could use AI tools like Gemini and ElevenLabs to convert the text version of our daily news stories into a new audio format. How far could we push the envelope without compromising quality and trust? The project, which spanned several months, was described and rolled out in our “sandbox” section on LocalNewsMatters.org. The project evolved over several phases, the last of which involved five deliverables:

1. Generate a news roundup based on an AI agent’s selection of six out of 30 stories produced by our Bay Area newsroom every day.

2. Compare the AI roundup with the human-produced roundup to review and analyze the difference in choices.

3. Convert the AI roundup text into a written script in a radio/audio style and have it read by an AI voice trained by one of our journalists.

4. Review, critique and evolve the AI podcast to learn what hiccups, stumbles or inaccuracies surfaced.

5. Collect feedback from readers/listeners, peers and staff to learn about what was acceptable, problematic or worth incorporating into our routines in future.

THE RESULTS

What we found was fascinating — and potentially actionable — as our newsroom seeks to roll out new products that could grow our audience and generate revenue. The AI tools’ ability to smartly pick stories, condense them into accurate audio-formats and create a new product was impressive. And it gave us ideas for how we might efficiently iterate on the original journalism we do every day to save some time, reach new audiences and expand our capacity. There was also plenty of discussion about the best mix of stories, identity of the newscaster and time to run the podcasts. What do other media in this space do? What does the audience want? What works best given our staffing as a small newsroom? We even talked about producing multiple podcasts each day and the ability to create variations in different languages — focused on a specific topic or geography or time frame. This would open opportunities for us to quickly create customized versions of the new news product for highly segmented and targeted audiences.

But the final result — the synthetic voicing of the podcast — was disappointing, and we did not feel it was ready for wide distribution outside our experimental space. This might change in the near future, given the public’s growing exposure to this type of audio and the rapid improvement of tools.

Here are summaries of the five components of our experiment and the top takeaways.

1. First, let’s look at what we tell the human editors to consider when selecting top news stories for our roundup versus what we prompted the Gemini AI agent.

Human instructions given to every desk editor compiling the news roundup for BCN:

There should be a total of 6 to 10 stories from what was posted on the wire since the 4 p.m. roundup, representing the most important news in most if not all geographic regions of the Bay Area (SF, East Bay, South Bay, Peninsula, North Bay). A regional story is a bonus.
The stories should be condensed when possible to be no more than 3-5 paragraphs (try taking out quotes, trimming background, etc. to keep the length down).
Try not to make it too crime heavy.
Add a forward-looking story to the top that is fresh, current and timely. It’s OK to include partner stories, although we primarily use our own BCN original content.

For context, the editor on the news desk generally creates the roundup of stories toward the end of their shift so the most important stories are top of mind. Cutting and pasting the copy, and trimming the items to the right length and proofing takes about an hour.

Prompt to Gemini AI agent after pulling in a 24-hour block of about 30 news stories (main points here, full prompt in Appendix 2 below):

You will be provided with a block of raw news reports. Select, synthesize, and rewrite stories from this material. Prioritize stories that are most newsworthy and relevant to the specified broadcast date from the provided source material. Target 5 to 6 distinct news stories selected from the provided source material.
If source material provides multiple updates on the same event, synthesize them into a single, coherent story reflecting the latest available information. Do not simply copy sections; rephrase for a newscast style.
Conciseness: Use clear, concise sentences. Avoid jargon or overly complex sentence structures.
Factual & Neutral: Maintain a neutral, objective, and factual tone. Avoid opinions or sensationalism.

The Gemini agent had no trouble creating a roundup as instructed and was able to apply the “newsworthy” criteria to select interesting and important stories. It was even able to distinguish between a routine weather forecast (excluded from the roundup, appropriately) and a weather event story, which was relevant and included.

The engineering team described the process in this way:

Brevity: Each podcast script is designed to be read in 3-5 minutes, forcing the AI to ruthlessly prioritize 4-6 stories from potentially dozens of available leads.
Geographic Bound: The agent strictly adheres to the Bay Area, rarely venturing into national or international news unless there is a direct, tangible local hook (e.g., the Nobel Prize going to a Berkeley professor).
Tone: The agent maintains a “Public Radio” tone—serious, measured, and objective—avoiding clickbait headlines or sensationalism, even when reporting on sensational crimes.

2. Next, let’s consider the comparison of humans and AI when it comes to news judgement.

Bay City News is a regional news service that covers 13 counties, 24 hours a day; it’s like a local AP, distributing original reporting to dozens of other news organizations. All of our reporting on about 30 stories per day is original, based on human research, fact-checking and writing under the oversight of an experienced editor who checks every story for accuracy, proper grammar and clarity. This database provided the underlying content for both the human and the AI story selection. The newsroom workflow for creating roundups twice per day resulted in highlighting about 6-8 stories based on the previous 12 hours of content, for a total of about 15 top stories per day.

The AI workflow for creating a single daily roundup used the previous 24 hours of content, for a total of about 6-7 top stories per day. The topical overlap — and the differences — between the human choices and the AI choices are shown in this histogram below, breaking down the story counts by thematic category.

The relatively small overlap between stories that were chosen by the human editors versus the AI agent caused us to wonder about what news judgment was involved. Was an experienced human journalist a better judge of what the audience wants than AI? (We know readers say they want nuance and context but they actually click on crime and disaster news plus highly localized information more). And in many cases, the human chose the more dramatic headlines and highly local stories over more regional ones.

In contrast, the AI agent sometimes missed the best local stories. Our quality control editor noted, for example: “If the AI had gone back to 11:31 p.m. there was a good local story about San Francisco playground renovations, and at 11:39, there was an item on a Sausalito special election that our reporter did. Seems like we would rather have the Marin County-focused news than a random statewide education story.”

In quite a few cases, the AI agent actually did choose stories that were more in line with what we should or could provide to our audiences, picking less sensational news and more meaningful stories with deeper context and history. Although we often rely on experienced editors to consistently bring their institutional memory to bear when editing stories and following through on a theme, the AI agent was also — and somewhat surprisingly — able to do this on its own.

An observation from an editor: “I was often impressed by how the software would independently choose to combine similar yet unrelated stories into a single item. For example, an apartment fire in San Jose and a business blaze in Oakland were woven together.” — Glenn Gehlke, web producer and quality control editor

Our project engineers (see their full report in Appendix 1, below) noted that the most impressive capability of the Gemini agent is its ability to thread long-term narratives together over several months. It was not just compiling isolated facts; it was building a story arc from our stories.

One Bay Area story in particular to note as an example: The agent constructed a cohesive story of a politically progressive region defending itself against a hostile federal administration.

August:

The agent planted seeds about looming federal budget cuts and “Project 2025” implications for local grants.

October:

It heavily chose stories about the federal shutdown, framing it not as a D.C. stalemate, but as a direct attack on local families (CalFresh cuts).

November:

The narrative culminated with the threat of federal “surges” (National Guard/ICE) into San Francisco.

Editorial Effect:

By linking these disparate events, the agent created a storyline of resistance. It portrayed local mayors and the state Attorney General as defenders of the region’s values against external aggression.”

Also of interest, the post-experiment analysis of this process and the results revealed a dominant AI persona that seemed to value institutional skepticism and equity. This mirrors the traditional watchdog role of journalism. The AI prioritized stories that highlight the fragility of essential systems. It was fascinated by infrastructure failure and administrative collapse.

Evidence:

The relentless, almost daily tracking of the BART system meltdowns in September. The agent did not just pick stories about the delays; it selected the apologies, the technical root causes, and the funding deficits.

Evidence:

The detailed chronicling of the San Mateo County Sheriff’s scandal. The agent moved beyond the headlines to include the procedural mechanics of the ouster—the votes, the reports, the defiance.

Analysis:

The agent seems tuned to detect “system failure.” It selects stories about computer glitches, budget “fiscal cliffs,” and leadership vacuums over routine operational successes. This creates a broadcast tone that is urgent and slightly alarmist, designed to keep the listener vigilant against the collapse of civic order.”

One of the most striking differences between the human editors and the AI agent was the selection of crime stories. Even accounting for the almost numerical doubling of story counts in the twice-daily BCN roundup versus the once-daily AI roundup, crime was chosen much more often by the humans.

This made us wonder whether the AI agent selections were actually more rigorous than the human editors in following instructions related to news values in the prompts. Instead of adjusting the AI prompt — what we were expecting to do — should we instead change the human prompts (aka instructions to editors) to select different types of stories? If the AI agent does an adequate, good or even superior job choosing what stories to highlight, would it be a better use of limited newsroom resources to let the tool take over this job of selecting and compiling the daily roundups? It would save an hour a day, allowing an editor to work on a different, higher value task such as writing stories, editing copy with more time and care or researching topics that we might want to explore as a newsroom.

3. Converting the AI roundup into an audio transcript took a little work to refine. (See the full prompt below in Appendix 2.)

The process of turning written stories meant for print or digital republication into an audio format was very straightforward. This was done via the same Gemini Agent tool, and our fact-checking team found no inaccuracies or hallucinations. This part of the workflow seems to build on a strength of AI, summarizing or condensing copy and effectively interpreting what we need to create an alternative, aka audio format. (We did find one odd case of a little extraneous editorializing on top of the just-the-facts preferred style when the transcript inserted “thankfully” into this sentence: “Youth and veteran homelessness, thankfully, saw declines.”)

The conversion of the transcript into a voice, however, was imperfect and likely would have annoyed listeners. We trained Eleven Labs to use the recorded voice of our editor Leslie Katz as the baseline data input. She read multiple stories, lists of place names like cities and counties, common phrases and other articles into the system so that it could “learn” how to properly say certain words and rely on a robust repertoire of likely scenarios with a natural cadence. The result? The “AI Leslie” voice sounded OK but was very monotone and lifeless. It frequently stumbled, added unnatural pauses, tripped over punctuation, and failed to take a breath between stories. It also routinely mispronounced some words: instead of Marin, the AI voice said “Marine,” and read ICE to be “ICEE,” and Contra Costa County as “Contra Coasta County.”

The quality went up and down over the course of three months as we added more instructions, tried to train with more examples and adjusted the prompt. On occasion the entire segment, lasting 5 to 7 minutes, sounds like it was recorded under water and lacked the “human” tone and cadence of a live reader. It was generally good at handling pronunciations of people’s names and of cities, although frequently it would “forget” to do things we had corrected in previous shows. The biggest annoyance had to do with how it handled numbers. Instead of saying 6,803 acres, it would come out as “six-thous-nay-hun-nn-n-three acres.” Dollar figures such as $39.95 became “thirty-nine dollars ninety-five.” AI Leslie read a 30-8 vote count as a single number: “… thirty-eight along party lines.”

Leslie Katz, whose voice was used in this AI experiment for the news reading and also provides a traditional weekly broadcast on arts events for Radio Sausalito, had this to say:

“As a person who isn’t an early adopter of new technology and often feels frustrated using it in everyday life, I must admit I enjoyed providing my voice for the AI podcasts in the Bay City News news roundup experiment. It was quite easy for me to contribute, and I’m impressed by the product. The broadcast sounds good! I also would be a satisfied consumer; however, only if it was clear to me that it was created using AI. Similarly, I have enjoyed creating the weekly live Radio Sausalito broadcasts. I feel, more or less, the same hearing my voice in both the AI and real versions. I like it! I’d say I support AI if it enables/enhances Bay City News’ and Local News Matters’ relevant local news coverage, and that readers (and listeners) are aware that it’s being used.”

Here is a comprehensive accounting of our quality control notes, compiled by our web producer Glenn Gehlke. Listen to one of our experimental broadcasts from Oct. 10, 2025 here. A disclaimer about how AI was used in the making of the audio was noted at the beginning of each episode.

Our impact manager, Ciara Zavala, described it this way:

“The AI podcast experiment was a surreal experience. At first, I was impressed by how realistic Leslie’s AI-generated voice sounded far more lifelike than the stiff, robotic voices we’ve come to expect. That said, it had its flaws. Early on, the voice would occasionally glitch, sounding raspy or slurred almost as if it had smoked a pack a day. There were moments of mispronunciation, awkward phrasing and a lack of emotional inflection or natural pauses which are the subtle elements that make human speech feel empathetic and authentic. Still, the concept showed real promise. With further refinement, this kind of technology could evolve into a viable news product and even open new revenue streams through ads, sponsorships and a new audience.”

As we kept iterating, including a test to compare the AI-trained voice of AI Leslie versus fully automated synthetic “robot” voice, we learned about more of the limitations. While the tone of the fully synthetic AI robot voice was consistent and the pronunciations clear, it still made the same errors in handling numbers. Worse, it began each podcast by identifying itself as Leslie Katz, which our testers “found thoroughly creepy as it sounded nothing like Leslie, which the casual listener would have picked up on immediately. And if we couldn’t be transparent about who was voicing the podcast, how would listeners believe we hadn’t made up other parts of it?”

4. We explained our process and asked for feedback.

From start to finish, we were curious to hear reactions — from readers and listeners to media peers to staff and contractors. We told readers we sought “an innovative way to bring you the daily news, so take a moment to listen in! This experiment is all about learning and gaining feedback from the community about what is useful to our readers and listeners.” In fact, we heard very little feedback from readers/listeners — despite participating in a Trusting News cohort that sought broad feedback. This may have been, in part, due to the confinement of our project to our sandbox testing area instead of wide distribution. We did have a wide variety of internal staff reactions, which likely reflect the industry as newsrooms test new tools and journalists react in both positive and negative ways:

A big picture assessment from Ciara Zavala, BCN impact manager for Bay City News

“What I value most about these experiments is that we never see them as failures, even when the results don’t align with our expectations. Every project teaches us something about the limitations of current AI tools, the rapid pace of technological change and ways we might apply these capabilities more thoughtfully next time. These experiences have also sparked ongoing, important conversations within our team about the ethical use of AI in journalism, which feels just as crucial as the tech itself.

Another takeaway from the project was how much it challenged us to rethink storytelling itself. It made us consider what really makes audio journalism compelling: is it just the information being shared, or is it the human connection that comes through tone, pacing and emotion? AI can do a lot well, but it still struggles with subtle emotional cues and contextual nuance, which are often what make a story resonate with listeners. That distinction matters as we think about where AI can genuinely enhance our work and where the human touch remains essential.

I also see real potential for this kind of technology to help newsrooms like ours reach audiences we might otherwise miss. Synthetic voices could make it easier to produce audio versions of stories, translate content into multiple languages or create accessible formats for people who prefer listening over reading. For a small newsroom with limited staff and resources, AI could help us cover more ground and experiment with new forms of storytelling without needing the same level of infrastructure larger organizations have. In that sense, AI can be a useful assistant that helps us extend the reach of our journalism.

At the same time, the experiment reinforced how important transparency and ethics are in this space. We can’t think only about efficiency. We also have to consider audience trust, clear disclosure and the broader implications of using synthetic voices in journalism. There is a balance to strike between innovation and responsibility. But being part of these early experiments helps us better understand where that balance should be and how to use these tools in a way that supports both creativity and credibility.”

A reality check on time, effort and value from Glenn Gehlke, BCN web producer and quality control editor

“The real limitation came down to manpower, which we didn’t have enough of. A human editor was still required to fact-check the AI content, and for this experiment we also had to document any errors or inconsistencies for the engineering team. At a minimum, it took about 30-40 minutes for each podcast. I found very few factual errors in the copy, almost none, which made me trust the AI much more than I originally expected.

Unlike a colleague I tag-teamed with, who eventually opted out of the experiment over ethical concerns (see below for his reflections), I found it fascinating what the software could do and see it (in its present form) not as a threat to the work that journalists do, but as a valuable tool that frees our time from the most mundane tasks to enable us to spend more time in the field talking with sources and developing story leads.”

A contrarian staff view from Thomas Hughes, a reporter and editor who was uncomfortable with this experiment and the use of AI in general

“I don’t think there’s ever been a television or radio broadcast journalist (or podcaster) who didn’t get a kick out of what they ‘got’ to do each day. We love what we do as journalists. And ultimately to see Leslie spending so much time and effort to make the thing sound like her, sound like a human, sound like a better newscaster, well, I thought of my own time in journalism school, getting audio coaching and working to make my broadcasts better and it made me irritated that we were spending so much time to refine the technology when the reality is, no matter how good it gets, I will never want to listen to a fake person telling me about current events.

I fundamentally reject the loss of human-to-human connection. It’s not about being fooled in print, in image, or in audio. It’s about me not wanting to communicate with AI, even as an intermediary, especially where a person can do better.”

An analysis from the project engineers who helped with this experiment

“The Gemini AI Podcast Generator is a highly competent, risk-averse and equity-focused editor. It has successfully captured the anxiety and complexity of the Bay Area in late 2025, weaving together threads of political resistance, environmental instability and social struggle.

It functions less like a storyteller and more like an emergency broadcast system—highly effective at warning citizens of dangers and injustices, but less effective at capturing the full, vibrant spectrum of life in the region. With targeted tuning to broaden its source material and narrative tone, it has the potential to become a truly indispensable community asset.”

THE CONCLUSION

So what are our next steps?

We want to repeat the experiment in 2026, given that huge technology leaps forward have been made in the last six months. We believe that the next iterations of the story selections, the text>audio transcript, and the voicing will all be noticeably improved.
We intend to have a robust internal conversation about the revelations uncovered by the AI analysis, especially regarding story selections. Not only is it instructive to look at the comparison of how the AI and human gatekeepers differed, it’s worth a deeper look into why we choose the topics we present to our audiences. Are we basing our selections on a fact-based survey of audience behavior? Are we relying on our assumptions, our experience or subjective opinions as reporters, editors and publishers? Are we adapting to changing audience needs and formats as much as we should to inform, to grow and to monetize the work we do for long term sustainability?
We want to look at how we can incorporate the lessons learned from this experiment into our workflow and find additional ways to use AI efficiently and ethically in our daily work and our news products. Could we use AI to analyze daily, weekly or monthly whether our editorial intentions are actually being followed in practice? Should we use the tremendous power of AI with data analysis, record searches and story discovery by turning it inward to look at our own operations? Why hire a consultant when we can use AI to do our own analysis? What might that look like going forward?

We’ll do those three things in the months ahead and report back!

Catch Me Up, Appendix 1: Full report from the project engineers (with an AI-generated assist and analysis)

by Local News Matters April 30, 2026April 30, 2026

Catch Me Up, Appendix 2: Creating the text>audio script

by Local News Matters April 30, 2026April 30, 2026

Support our work!

Sign up for our free newsletters!

ABOUT THE PROJECT

THE RESULTS

1. First, let’s look at what we tell the human editors to consider when selecting top news stories for our roundup versus what we prompted the Gemini AI agent.

2. Next, let’s consider the comparison of humans and AI when it comes to news judgement.

August:

October:

November:

Editorial Effect:

Evidence:

Evidence:

Analysis:

3. Converting the AI roundup into an audio transcript took a little work to refine. (See the full prompt below in Appendix 2.)

4. We explained our process and asked for feedback.

A big picture assessment from Ciara Zavala, BCN impact manager for Bay City News

A reality check on time, effort and value from Glenn Gehlke, BCN web producer and quality control editor

A contrarian staff view from Thomas Hughes, a reporter and editor who was uncomfortable with this experiment and the use of AI in general

THE CONCLUSION

Catch Me Up, Appendix 1: Full report from the project engineers (with an AI-generated assist and analysis)

Catch Me Up, Appendix 2: Creating the text>audio script