What's new in AI in October?

Steve Harrison

28 Oct 2024 — 5 min read

It's been a while since I did a tech roundup! A lot has been announced in the way of AI—let's dive in.

Anthropic has released updates, including a "computer use" mode that will control your computer with AI, allowing you to automate cumbersome tasks such as filling data into a webpage that's already stored in a document somewhere:

I enjoyed this anecdote from the Ars Technica article:

Anthropic has also said it is known to be "cumbersome and error-prone" at times. A blog post about developing the tool gave one example of a way it has gone wrong in testing: It abandoned a coding task before completing it and began instead "to peruse photos of Yellowstone National Park"—perhaps one of the most human-like things an AI bot has done. (I kid.)

Someone got it to order lunch for them:

Successfully got Claude to order me lunch all by himself!

Notes after 8 hours of using the new model:

• Anthropic really does not want you to do this - anything involving logging into accounts and especially making purchases is RLHF'd away more intensely than usual. In fact my… pic.twitter.com/ueo8kvZUPF
— near (@nearcyan) October 22, 2024

XAI released an API and people are already building things with it, such as this AI-enhanced scraper:

Introducing Grok-2 web crawler 🕸️

It crawls any website with @xai's new Grok-2 model and @firecrawl_dev

Just give it a URL and a goal then it will navigate + return the requested data in a structured format

Take a look: pic.twitter.com/irY4bnt3qs
— Nicolas Camara (@nickscamara_) October 21, 2024

Runway are doing some really interesting stuff to automate motion-capture for filmmaking. Gone are the days of having to place dots on Andy Serkis in order to track him for Gollum in Lord of the Rings. This tech allows you to record videos of people and then map their facial expressions to 3D models:

Midjourney has released an editor feature that allows you to upload a photo and then make edits using AI:

Perplexity has released features to search historical financial data and internal company data:

Perplexity AI’s new tool for researching the stock market | Hacker News

Hacker News

While OpenAI has yet to release Sora, they have collaborated with Toys 'R' Us for a commercial:

Just like a weird dream it is a different person each time you see them pic.twitter.com/5Vc85ElqO1
— syndrowm (@syndrowm) June 25, 2024

Despite the criticism, I think it's still impressive overall. However, an Indian company Hotshot have released a similar AI text-to-video generator, and the results are equally impressive to the Sora demos. Here's a compilation I put together of all of their demo videos, plus a few extra I generated myself:

Hilariously, one of the OpenAI demo videos got copyrighted by a South Korean news channel on YouTube!

OpenAI lost "ownership" over one of their Sora videos because a news channel in S.Korea used it in their news first and copyrighted it so when somebody tries uploading it on Youtube they will get claimed. (News shouldn't be able to claim them either)
by u/WonderfulWanderer777 in ArtistHate

Another competitor:

Closed AI won the left brain of AGI. We're here to make sure there's an open alternative for the right brain.

Mochi 1 sets a new SOTA for open-source video generation models. It is the strongest OSS model in the ecosystem. This will be a force for good, both for AI research and… https://t.co/1QORiG2Swr
— Paras Jain (@_parasj) October 22, 2024

OpenAI now allows you to send text or audio to the API, and it will then give you responses back in either text, audio, or both:

🔊 The Chat Completions API supports audio now. Pass text or audio inputs, then receive responses in text, audio, or both. https://t.co/468QclBSBU pic.twitter.com/uUFrJa9kZH
— OpenAI Developers (@OpenAIDevs) October 17, 2024

What about code editing? I had an interesting listen to the team behind Cursor AI, a Visual Studio Code fork that allows you to edit chunks of code using models such as Claude Sonnet (this one is long, a podcast for several sessions):

I've been trying out Cursor and will document my experience in a separate post. I'm pretty impressed. It allows me to select pieces of code and say "fix this" if I'm getting a runtime error (and yes, it fixes it!), and I can add sorting to lists by selecting a chunk of code, and say "sort this by most recent name"—I can stay in a high-level thinking state rather than having to dive into nitty gritty of writing the correct arrow function.

v0.dev is seriously impressive re code generation. I did a short video converting a hand-drawn mockup to working code using a common React component library, shadcn:

Another code generation tool is the talk of the town: bolt.new:

I'm not kidding around: go to bolt․new and type in:

"make a spotify clone"

I'm speechless. pic.twitter.com/ntR6eggRtk
— Tomek Sułkowski (@sulco) October 23, 2024

Moving onto general AI tools, I came across this tool that converts PDFs into flashcards for memorisation:

we’re launching our rebrand of memo from pdf2anki on ProductHunt! https://t.co/OhK5qfZqwq

my name is Jason - the co-founder of Memo (previously PDF2Anki) and a 21-year-old in medical student and builder from Hong Kong 🇭🇰

Memo is the smarter way to learn with flashcards. We… pic.twitter.com/uymyNheosk
— Jason (@thetechjason) October 20, 2024

People are using AI tools to come up with a comic idea and then feed it to another AI to generate the image:

Flux 1.1 does pretty decent political cartoons, so I wired up a glif workflow that let's Gemini Pro 1.5 write the cartoon and then passes it on the Flux the cartoonist to render

"EU regulations" pic.twitter.com/ynT17RHRox
— fabian (@fabianstelzer) October 16, 2024

I'll leave you with this video from The Wall Street Journal showing how NVIDIA used their own chips and super accurate lighting in architectural renderings to design their Silicon Valley HQ:

What's new in AI in October?

Steve Harrison

Read more

Local-first development & InstantDB

Exploring CloudKit JS

Query strings & cache keys

OpenAI announcements, dev creates entire web game using ChatGPT, and creating UI components with AI