Google Gemini AI tries to chatGPT with Photos, Videos

[ad_1]

Google has begun to bring a natural understanding of video, voice and images to its Bard AI chatbot with a new model called Gemini. Google Pixel 8 phone owners will be the first to use its new special abilities.

The first impressions of the new technology arrived on Wednesday in many countries Google Bard’s Gemini updates, but only in English. Chat related information may be provided Google says it improves AI skills in complex tasks such as summarizing documents, thinking and writing program code. Great editing and communication skills — for example understanding hand gestures in a video or simulating the solution to a jigsaw puzzle of the child — will arrive “soon,” Google said.

gemini-sb-v2-copy-01-00-01-19-02-still003.png

Watch this: Gemini Preview: Google’s Newest Major AI Improvement

Gemini is a big departure for AI. Text-based conversations are important, but people need to process more information as we inhabit our three-dimensional, ever-changing world. And we respond to complex communication skills, such as speech and images, not just written words. Gemini is an attempt to get closer to our own full understanding of the world.

Gemini comes in three versions designed for different levels of power usage, Google says:

  • Gemini Nano runs on mobile phones, two types are available built for different levels of available memory. New features will be encouraged in Google’s Pixel 8 phones, such as summarizing conversations in its Recorder app or suggesting answers in WhatsApp that are played with Google’s Gboard.
  • Gemini Pro, tuned for quick responses, runs in Google’s data centers and will power a new version of Bard, starting on Wednesday.
  • Gemini Ultra, limited to a trial phase for now, will be available with a new Bard Advanced chatbot that should be available in early 2024. Google declined to disclose pricing information, but is expected to pay a price for this high capacity.

The new edition highlights the speed of progress in the new development of AI, where chatbots create their own responses to the stimuli we write in plain language instead of programming instructions. Google’s top competitor, OpenAI, stole a march with the launch of ChatGPT last year, but now Google is on its third AI prototype and hopes to deliver that technology through products. used by billions of us, such as search, Chrome, Google Docs. and Gmail.

“For a long time we wanted to build a new generation of AI models inspired by the way people understand and interact with the world – an AI that feels more like an assistant and less like a smart group,” said Eli Collins. , a vice president of products at Google’s DeepMind division. “Gemini gives us a step closer to that vision.”

OpenAI also provides the brains behind Microsoft’s Copilot AI technology, including the new GPT-4 Turbo AI prototype that OpenAI released in November . Microsoft, like Google, has major products like Office and Windows that are adding AI features.

AI is smarter, but not perfect

Social media seems like a big change compared to writing when it comes. But what hasn’t changed is the basic problem of AI models that are trained by recognizing patterns in large amounts of real-world data. They can translate complex problems into incremental answers, but you can never trust that they didn’t just give you a more correct answer than they were. it’s true. As Google’s chatbot warns when you use it, “Bard may display incorrect information, including people, double-check its answers.”

Gemini is the next generation of Google multilingual models, a sequel to PaLM and PaLM 2 which became the basis of Bard today. But by learning Gemini at the same time on text, programming, images, audio and video, it can better handle the media communication than separate but connected AI models for each strategy. .

Examples of Gemini skills, according to a Google search document (PDF), it varies.

Given a series of shapes including a triangle, square and pentagon, the next shape in the series can be correctly guessed as a hexagon. Presented with pictures of the moon and a hand holding a golf ball and asked to find the connection, it correctly states that the Apollo astronauts hit two golf balls on the moon in 1971. way in a table of marks and see a statement outside, namely that the US throws more plastic in the trash than other regions.

The company also showed that Gemini uses a handwritten physics problem that includes a simple drawing, shows where the child’s mistake lies, and explains a correction. A more engaging video presentation showed Gemini discovering a blue duck, hand puppets, tricks and other videos. None of the prototypes survived, however, and it is not clear how often Gemini will overcome these challenges.

Is the Google Gemini video fake?

Google introduces Gemini in a video tutorial thus showing the recognition of hand gestures, following magic, ordering the pictures of the planets according to their distance from the sun, all from visual information. You have to think of that as a drama of the true abilities of Gemini, though.

It’s not uncommon for promotional videos to make products look more attractive than they actually are. In this case, you might think that Gemini is using integrated video and voice commands. Google included some good information – a comment on the video that does not respond quickly from Gemini and a link to the video description in a description of how Google’s Gemini demo actually worked. You may not have noticed any of that, though. It was followed by a post on X, formerly Twitter, which is revealing how quickly Gemini will respond.

The main thing, however, is that the video does not mislead the potential of Gemini, although outsiders are not usually able to find it. Speech and video can be accepted.

Gemini Ultra is coming in 2024

Gemini Ultra is waiting for further tests before it will be released next year.

“Red teaming,” in which a manufacturer recruits people to find bugs and other problems, is underway for Gemini Ultra. Such tests are more complex in multimedia applications. For example, a text message and a photo can be harmless on their own, but when combined they can express different meanings.

“We are approaching this work with courage and determination,” Google CEO Sundar Pichai said in a blog post. This means a combination of critical research and large profits, but also increased security and cooperation with governments and others “to address risks when AI becomes more capable.”

Editor’s note: CNET uses an AI engine to help generate stories. For more information, see this post.

Leave a Comment