Swecha AI Chandamama Kathalu
Hyderabad, 6th January 20204:
Swecha released “AI Chandamama Kathalu“, the first-of-its-kind AI built for storytelling. Shobu Yarlagadda, Founder, of Arka Media Works; graced the event as the chief guest and launched it to the public. Guest of Honor Ram Miriyala, Singer, composer; Prof Gaurav Raina, IITM, Former Chairman, MPFI; Kiran Chandra, Founder, Swecha; Chaitanya Chokkareddy, Co-Founder & CTO, Ozonetel; Ramadevi Lanka, Director, Emerging Tech, Govt of Telangana; Prof K S Rajan, Registrar, IIITH; Kiran Kumar, Intel fellow & GM; Sasikant Vallepalli, Director, Tana Foundation; & Nagarjun Malladi, Vice President, Tech Mahindra; was present on the occasion.
The goal is to bring back the moral and ethical values embedded in “Chandamama Kathalu” using a new and creative AI approach. Thousands took part in this initiative, to contribute their time and effort, charting a path towards Telugu AI.
We are seeing the growth of AI, especially, that of Large Language Models (LLMs), such as, ChatGPT. These LLMs are trained on the large amount (trillions of words) of raw text, typically, scraped from the web. Consequently, they end up requiring an immense amount of compute power and special hardware (GPUs) to learn patterns, so that they can produce a coherent text. For instance, It takes nearly 6 crore rupees (700,000 dollars), just to run ChatGPT daily.
For AI to be accessible to every farmer, every rural teacher, and every shopkeeper, AI should be developed for Indian Languages. Pondering on this problem of creating AI in Indian Languages, Chaitanya (CPO & Co-Founder, Ozonetel), Kiran Chandra (Founder, Swecha), and Professor Gaurav Raina (Professor IIT Madras), got together in a breakfast conversation, and formulated an idea to build an AI solution for stories. To build a story-oriented AI language model, we don’t need a large language model, which is very resource-intensive; a small language model (SLM) should be adequate.
To develop an AI vision for India, we do not need a one-size-fits-all approach, rather, we need to build AI for the specific needs of India, and, in this case, the model has to be built with storytelling in mind. For this model, Chandamama Kathalu was chosen as the source for data, these short stories, which are beloved by all Telugu-speaking people are the perfect choice, as they instilled moral values for generations, and developing an AI version of Chandamama would bring back these stories in a new form.
The stories available in scanned PDFs had to be digitized first. Normally, this would have taken months, but, through the contributions of the Swecha community, in a little over 4 hours, more than 10,000 students and faculty across 30 engineering colleges helped proofread the digitally scanned and converted Telugu text from the old magazines. A total of 40,000 stories have been converted during this sprint. This dataset has been released for the public and is available on the internet for anyone to download and/or improve upon. This part of the effort was led by Kiran Chandra, who looked back at his earlier efforts – “This whole effort reminds me of the effort we put 20 years ago in creating the first Telugu Operating System, creating the font and the glossary. This seems a logical step in our journey towards democratizing technology and we shall continue to take up more work in this space.”
Once the data was collected, the next step was to train the AI model on these stories, so that it could then create new stories. Chaitanya led this effort, which required technology expertise and no dearth of burning midnight oil. After close to a month of intense effort, the project is now launched to the public. Chaitanya stresses that ‘this is only the beginning. The team is working on not just improving this, but also has ambitious plans for Telugu AI models.” Further elaborating on the importance of this effort, Chaitanya said “The success of the Chandamama Kathalu datathon proved that a volunteer-driven effort for creating high-quality datasets is possible.” These datasets, being open, can now ignite research in universities and startups. So we have a flywheel for AI for India. Swecha, Professor Gaurav Raina, from IIT Madras, and companies like Ozonetel/Alpes are showing that such a flywheel is possible. The core focus is to create a truly open AI ecosystem that serves the needs of India.
Stories of India, in Indian languages, for all Indians As the next step, Professor Gaurav Raina encouraged us to build this into something far bigger; into an Indian AI-based language model, that can cater to stories of India, across all Indian languages.
Professor Gaurav Raina, commenting on the opportunity, said “Storytelling is central to our beliefs, and our values, and while it’s great to start with Telugu, the team has to help tell the stories across all Indian languages.”
Further, Professor Gaurav Raina adds, “I am really excited to work towards an Indian, storytelling AI-based language framework. Let’s hope that the joy of Chandamama stories also eventually finds an international audience.”
Our partnership with Professor Gaurav Raina will help us engage with researchers and students at IIT Madras to take this idea to a national level.