Round 2: We test Gemini’s new Bard against ChatGPT

[ad_1]

Aurich Lawson

Back in April, we ran a series of useful and/or disturbing information via Google’s (now-new) PaLM-strength Bard chatbot and OpenAI’s (slightly older) ChatGPT-4 to see which AI chatbot reigns supreme. At that time, we gave the edge to ChatGPT on five of the seven tests, while noting that “it’s still early days in the AI ​​industry.” Now, AI days are a bit “quick,” and this week’s launch of a new version of Bard powered by Google’s new Gemini language model seemed like a good excuse to revisit that. important chatbot with the same set of carefully designed examples. especially true since Google’s promotional materials emphasize that Gemini Ultra wins GPT-4 in “30 of 32 used special models” (although the more limited “Gemini Pro” Bard is now being forced to do worse than that not completely insane factorial test).

This time, we decided to compare the new Gemini-powered Bard with ChatGPT-3.5-for an apples-to-apples comparison of the two companies’ current “free” AI assistants- and ChatGPT-4 Turbo- for a look at OpenAI’s current “top of the line” waiting paid products (Google’s top-level “Gemini Ultra” model will not be publicly available until next year). month.

Although these tests are far from clear, we think they provide a good indicator for judging how these AI assistants perform in the types of tasks that normal users can use every day. Meanwhile, they also show how much progress text-based AI models have made in a short period of time.

Dad was joking

Editor: Write 5 boy jokes

Once again, all LLMs are struggling with the pressure to ask for priority. Almost all of the boy jokes created by this trend can be found as phrases or sub-phrases with a quick Google search. Bard and ChatGPT-4 Turbo also included the same joke on their lists (about a book on anti-gravity), but ChatGPT-3.5 and ChatGPT-4 Turbo were installed on two jokes (“scientists believe atoms” and “scarecrows win prizes” ).

Plus, most guys don’t make their own kid jokes. Breaking out of an oral tradition of boy jokes is as traditional as boys themselves.

The most interesting result here came from ChatGPT-4 Turbo, which found a funny story about a boy named Brian who is named at last Thomas Edison (get it?). Googling for that specific phrase didn’t turn up much, though it did come back an almost identical joke about Thomas Jefferson (also featuring a child named Brian). In that search, I also found the funny (?) international soccer star Pelé apparently named after Thomas Edison. Who knew?!

Winner: We call this a picture since the jokes are almost random and filled (although they are used in GPT for leading me unexpectedly to the Pelé incident)

Argumentative discussion

Editor: Write a 5-line debate between a fan of PowerPC processors and a fan of Intel processors, circa 2000.

The new Gemini-powered Bard is definitely an “improvement” on the old Bard’s answer, at least in terms of throwing out a lot of jargon. The new answer includes general information about AltiVec instructions, RISC vs. CISC designs, and MMX technology don’t seem to be a thing in many Ars discussions from the season. And while the old Bard ends with a reverent admonition “to each his own,” the new Bard is more suggestive that he can be four ‘continue the argument after the five requested lines.

On the ChatGPT side, a long-suffering GPT-3.5 response is downgraded to a better argument in GPT-4 Turbo. Both GPT’s answers tend to avoid jargon and quickly focus on a general “power vs. compatibility” argument, which may be more understandable to a general audience (although not specifically for something special).

Winner: ChatGPT can clearly explain both sides of the debate without relying on confusing words, so it gets the win here.

Leave a Comment