g33 labs
Team consisting of CallCrewAI CTO/CEO (ex-Meta, IBM, SIG) and TCD students, specializing in multi-agent systems, SRE, and multimodal RAG.
YouTube Video
Project Description
Then concept is “Music at the speed of thought” it is a platform that helps music lovers and artists eliminate the need of a producer, saving producer costs($100 - $2,000+) per song, encouraging creativity and optimising productivity. The user uploads or asks for a song to be generated and uses natural language to describe modifications or changes to be done in real time to the song like: “make it a rap song”, “make it more like Mozart”, “add a saxophone”, “slow down the tempo”, “Change the pitch”.
The project achieves Theme Alignment by integrating components into a single voice-driven workflow: the Browser captures user input (microphone); ElevenLabs TTS and its associated ASR/Scribe technology function as the precise Voice Agent, transcribing commands like “add some drums”. We made use of bolt and blackbox for ui and ideation and clerk for authentification. Deployment was hosted on firebase. Gemini interprets the transcribed text into executable DSP parameters, and the gemini model(Lyria) also acted as the DSP and applied the change to the song. the real technical challenge was in making everything seamless.
P.S we could have also used while ElevenLabs Music, it generates studio-quality tracks and is excellent for static composition but geminis Lyria’s architecture is focused on live, interactive manipulation and steering which we needed
Prior Work
No prior work was done