Don't put rocks on pizza

Hello!

Here are three things I found interesting in the world of AI over the last week

People are laughing at Google's AI search summaries

Google has received a bunch of bad press about it's 'AI Overview' feature recommending users put glue on pizza or eat rocks. Google folks say they are isolated instances and by and large the feature is working well but it's a PR disaster for a company betting so much on AI.

The most interesting thing about this for me is that it highlights how risky AI can be for large brands. I don't know if Google's claims about it being mostly fine are true or not, but it doesn't matter. The tool could be working perfectly for nearly all queries and all it takes is a couple of absurd screenshots to trash the public perception of it. I guess it's a good way for more people to learn that AI's hallucinate and you always need to validate their output.

Ironically, search is a problem that I think is very tractable for AI. But I think it looks like an AI agent which searches for you so you never have to visit google again, which might be tough on their business model.

Anthropic are pushing the envelope on LLM explainability

The team behind Claude have released a detailed piece of research on identifying the concepts an AI learns. They built up a second small model which learned which collections of weights in the large model map to individual concepts. This means they can strengthen that concept by 'clamping' the model weights to a higher number.

Concepts can be things like 'coding errors', 'sarcastic praise', or 'the golden gate bridge'. To demonstrate they clamped the weights linked to the golden gate bridge and released a version of claude which would always talk about the bridge. The whole paper is fascinating but I recommend the section about correcting deceptive behaviours by clamping features like 'internal conflicts and dilemas' or 'openness and honesty' to get the model to own up when it was lying.

What's the bet google engineers are scrambling to find the 'bad things for humans to eat' feature in gemini?

Software development agents are on their way

The main way I use AI when coding is as a copilot. I write code myself and sometimes give the AI small tasks for it to complete. For simple stuff I might use the AI more, for harder stuff I will do more. It's been a massive productivity boost.

Another way to use AI is to give it a high level task which is less of a 'do this one small thing' and more of a 'do this large thing which takes multiple steps'. This typically involves the AI making a plan, (maybe) a human reviewing it, and then the AI will go through the tasks one by one. One benchmark for measuring this is called the SWE Bench and my favourite coding assistant just crushed the lite version.

Personally, I'm not in a rush to integrate the approach into my workflow but I wouldn't be surprised if later this year I'm shipping production code with software agents.