We have seen a dramatic improve within the capabilities of synthetic intelligence (AI) over the previous couple of years and the tech group remains to be attempting to determine the potential functions and ramifications. Fashions like ChatGPT and DALL-E have apparent use circumstances on this planet of content material creation—one thing skilled content material creators like myself discover troubling — however makers discover extra modern makes use of for the expertise day-after-day. Mina Fahmi took benefit of a number of completely different AI providers to create Undertaking Ring, which is a hand-worn system that perceives the world and communicates what it sees to the consumer.
Undertaking Ring is a small system that straps onto the highest of the consumer’s hand and features a ring extension worn on the consumer’s index finger. The primary unit on the hand homes a lot of the {hardware} needed for processing, whereas the ring unit incorporates a joystick for consumer interplay and a digicam to take a look at its environment. When the consumer factors the digicam at one thing, Undertaking Ring will analyze what it sees and supply a spoken description to the consumer by their headphones. It additionally listens for consumer instructions to assist in interplay.
All of this works utilizing current AI providers that anybody can make the most of. The kicker is that Fahmi additionally programmed all the system utilizing an AI service (GPT-4). So, in a fashion of talking, an AI created this system that makes use of AI.
In fact, Fahmi nonetheless needed to conceptualize the system, information the AI programming, devise a {hardware} technique, design the 3D-printed elements, and assemble every part. The first piece of {hardware} is a Raspberry Pi Zero W single-board laptop, which accepts enter from the joystick and digicam. It communicates with Google Cloud Run to entry the varied AI providers wanted for this all to return collectively: image-to-text, voice-to-text, text-to-text, and text-to-voice. The Raspberry Pi does not do any of that processing itself and as an alternative offloads every part to those cloud providers, that means that it requires an web connection.
Undertaking Ring speaks to the consumer by their Android telephone and a headset. If, for instance, the consumer asks Undertaking Ring (with a voice command) to explain what it sees, then it can: seize a picture with the digicam, run that by the image-to-text service, then run it by the text-to-speech service to generate the audio fed to the consumer’s headset.
And whereas Undertaking Ring is usually simply an experiment in what one can obtain with AI, it might have real-world advantages for those who have poor eyesight.