What's new in AI in October?
It's been a while since I did a tech roundup! A lot has been announced in the way of AI—let's dive in.
Anthropic has released updates, including a "computer use" mode that will control your computer with AI, allowing you to automate cumbersome tasks such as filling data into a webpage that's already stored in a document somewhere:
I enjoyed this anecdote from the Ars Technica article:
Anthropic has also said it is known to be "cumbersome and error-prone" at times. A blog post about developing the tool gave one example of a way it has gone wrong in testing: It abandoned a coding task before completing it and began instead "to peruse photos of Yellowstone National Park"—perhaps one of the most human-like things an AI bot has done. (I kid.)
Someone got it to order lunch for them:
XAI released an API and people are already building things with it, such as this AI-enhanced scraper:
Runway are doing some really interesting stuff to automate motion-capture for filmmaking. Gone are the days of having to place dots on Andy Serkis in order to track him for Gollum in Lord of the Rings. This tech allows you to record videos of people and then map their facial expressions to 3D models:
Midjourney has released an editor feature that allows you to upload a photo and then make edits using AI:
Perplexity has released features to search historical financial data and internal company data:
While OpenAI has yet to release Sora, they have collaborated with Toys 'R' Us for a commercial:
Despite the criticism, I think it's still impressive overall. However, an Indian company Hotshot have released a similar AI text-to-video generator, and the results are equally impressive to the Sora demos. Here's a compilation I put together of all of their demo videos, plus a few extra I generated myself:
Hilariously, one of the OpenAI demo videos got copyrighted by a South Korean news channel on YouTube!
Another competitor:
OpenAI now allows you to send text or audio to the API, and it will then give you responses back in either text, audio, or both:
What about code editing? I had an interesting listen to the team behind Cursor AI, a Visual Studio Code fork that allows you to edit chunks of code using models such as Claude Sonnet (this one is long, a podcast for several sessions):
I've been trying out Cursor and will document my experience in a separate post. I'm pretty impressed. It allows me to select pieces of code and say "fix this" if I'm getting a runtime error (and yes, it fixes it!), and I can add sorting to lists by selecting a chunk of code, and say "sort this by most recent name"—I can stay in a high-level thinking state rather than having to dive into nitty gritty of writing the correct arrow function.
v0.dev is seriously impressive re code generation. I did a short video converting a hand-drawn mockup to working code using a common React component library, shadcn:
Another code generation tool is the talk of the town: bolt.new:
Moving onto general AI tools, I came across this tool that converts PDFs into flashcards for memorisation:
People are using AI tools to come up with a comic idea and then feed it to another AI to generate the image:
I'll leave you with this video from The Wall Street Journal showing how NVIDIA used their own chips and super accurate lighting in architectural renderings to design their Silicon Valley HQ: