The Watchtower 28th Edition (2024 Recap)

For the best possible viewing experience, we recommend viewing this edition online.

đź—“ Upcoming: In AI Society

Fortnightly Discussion Session

Image Credit - Fireflies AI

Join AI Society for our fortnightly discussion sessions! Each session starts with 45 minutes of key AI news and trends from our newsletter, followed by 45 minutes exploring recent research papers and their potential impact. Stay informed, engage in discussions, and deepen your understanding of this rapidly evolving field.

đź“… Date: Wednesday Week 1, Term 1 (19/02/2025)

đź•’ Time: 1:00 - 2:30 pm

đź“Ť Location: Squarehouse E4 208

“How To Prompt” Workshop

Image Credit - DALL·E

Join AI Society for a hands-on workshop on crafting effective prompts to unleash the true power of ChatGPT and other text based models! Whether you're new to AI or just looking to refine your skills, this session is designed to help students of all backgrounds boost their productivity with AI-powered tools.

đź“… Date: Week 2-3, Term 1 (Date TBD)

đź•’ Time: TBD

đź“Ť Location: TBD

Stay tuned for more updates from the UNSW AI Soc Socials and Discord!

AI News Recap

DeepSeek-V3, Chinese Company Made An

Exceptional LLM … For Fun

The Chinese company High-Flyer achieved a staggering advancement in the AI industry with the launch of its latest model, DeepSeek-V3, earning widespread acclaim for its unparalleled efficiency compared to existing language models. Remarkably, DeepSeek-V3 was trained in just 2.788 million GPU hours at a cost of $5.576 million. This is only a fraction of the amount typically required to train similar large-scale models. Interestingly, DeepSeek-V3 was developed as an exploratory research project.

The model has profoundly disrupted the artificial intelligence landscape by demonstrating that high-performance models can be developed with drastically reduced computing costs. This breakthrough has triggered significant market reactions, including a sharp decline in the stock value of major tech companies such as Nvidia, whose share price fell 17% on Jan 27, 2025, wiping nearly $600 billion in market value.

However, the company has also faced its fair share of controversy. Recently, OpenAI filed a lawsuit against the company, alleging that DeepSeek-V3 infringes upon its intellectual property or misuses proprietary data. OpenAI has raised concerns that DeepSeek may have unlawfully utilized its proprietary models to develop a competing product. The controversy centers on allegations that DeepSeek employed a method known as "distillation" to replicate OpenAI's language models. This technique involves training a smaller model to mimic the capabilities of a larger, pre-trained one by leveraging its outputs. OpenAI asserts that there is evidence suggesting DeepSeek used this method illicitly to bolster its AI systems, which may result in profound legal and ethical consequences. Apart from that, due to its Chinese origin, DeepSeek has faced several bans and restrictions in countries such as the United States, South Korea, Australia and Taiwan due to security and privacy concerns when used on the devices of government employees.

Published by Victor Velloso, January 2025

UNSW Signs Deal With OpenAI To Test ChatGPT Edu

The University of New South Wales became the first university in the Asia-Pacific region to collaborate with OpenAI to deploy ChatGPT Edu in its academic and research environment.

A 12-month targeted pilot program is currently underway, involving 500 participants across the university. The pilot focuses on enhancing productivity, learning and teaching. Feedback will be collected every three months to assess the impact on these fields. There are many distinct features to be offered by ChatGPT Edu. These include building your own custom GPTs, increasing message limits, increasing language support, providing enterprise level security and greater data-privacy.

The project aims to enhance learning experiences and prepare students for an AI-integrated future. It marks an important milestone in the evolution of education.

Published by Victor Velloso, January 2025

NVIDIA’s Rise to Most Valuable Company

Nvidia’s ascent to the pinnacle of value during the recent AI boom was a defining moment in the tech industry. Once primarily known for manufacturing graphics and gaming cards, the company leveraged the rapid growths in AI advancements, to surpass long-standing giants in the tech world such as Microsoft, Google and Apple. 

Jensen Huang - CEO of Nvidia

At the start of 2024, Nvidia was already dominating the AI hardware market. Its H100 and A100 GPUs were the backbone of AI models developed by OpenAI, Google, Meta, and other industry leaders. Demand for Nvidia’s AI chips surged as businesses raced to integrate generative AI into their products and services. Nvidia was able to address the demand of multiple AI development companies due to the manufacturing of GPUs. GPUs assist in the training of machine learning models as they drastically reduce the learning time of a model. GPUs can process data in a parallel fashion, allowing numerous calculations to happen at the same time. This is extremely beneficial as they can handle large datasets and train models that require a large number of calculations. In February 2024, Nvidia reported record-breaking earnings for the fourth quarter of 2023 with data center revenue surpassing expectations as the future of LLMs and AI infrastructure began to structure around Nvidia’s cutting-edge GPUs.

By Mid-2024 Nvidia’s market capitalisation surpassed Apple. Such an achievement is largely due to the launch of Nvidia’s Blackwell architecture promising massive leaps in AI processing power, growing partnerships with major cloud providers, and the continued rise of AI applications in sectors beyond tech including healthcare, finance and automotive industries. Furthermore, by June 18, Nvidia became the world’s most valuable company, surpassing Microsoft, with a market capitalisation exceeding 3.3 trillion. This tripling of stock in just over a year is a reflection of increased investor confidence in Nvidia’s AI dominance. Key developments that contributed to this achievement included hype around the next-generation AI chips that are to be released in the coming future, expanding partnerships with AI startups and finally the growing anticipation of the GH200 “super chip” for training times of AI models.

As 2025 begins, Nvidia faces both opportunities and challenges. Competition is rising, with AMD and Intel ramping up their AI chip development, and tech giants like Google and Meta designing their own AI accelerators. Additionally, regulatory scrutiny over Nvidia’s dominance in AI hardware could impact its trajectory. However, Nvidia’s 2024 performance solidified its status as the defining player in the AI revolution, setting the stage for its continued leadership in the years to come.

Published by Aditya Shrivastava, Jan 2025

The Fall of Apple Intelligence

Tech giant Apple has faced heavy backlash upon the rollout of its deployed AI-generated news summary feature. In a recent beta update of iOS 18.3, apple announced the suspension of its AI-generated summary feature for news summaries. The rollback was in consequence of reports from the BBC detailing the feature producing “obscenely incorrect” and misleading information such as false alerts and misrepresented news reports.

A sample of apple-AI summary feature. Bear in mind Luigi did not shoot himself.

The reason behind the incorrect summaries is well known in the field of AI as the concept of Hallucination. The typical ways a large language model can hallucinate is oftentimes due to the model being trained on limited data, data with biases or poor prompts to the LLM that involve a lack of context. One could compare a large language model hallucination to a person trying to guess the lines of an incomplete sentence that they may not know the proper answer to. 

Apple, like many of its big tech competitors, rode the cascade to AI-driven innovation throughout 2024. Reports surfaced early in the year about Apple's behind-the-scenes work on “Apple intelligence”, a series of AI-powered generative frameworks built within Apple devices to assist user interaction. In June, Apple’s worldwide developers conference unveiled its AI roadmap. Although no major generative AI tools had been launched, it previewed AI-powered enhancements for Siri, text predictions, and image editing features. This was marked as a response to the growing AI competition from its fellow technological counterparts such as Google and Samsung. Despite these advancements, it was reported from a survey that 73% of iPhone users felt the AI features available to them offered them little to no value, indicating how user satisfaction with AI implementations remains a challenge for Apple. Throughout 2024, Apple signalled a major shift towards AI-driven innovation. Reports surfaced early in the year about Apple’s behind-the-scenes work on “Apple Intelligence,” a generative AI framework designed to enhance iOS, macOS, and iPadOS. Unlike competitors that relied heavily on cloud-based AI models, Apple focused on on-device processing to ensure privacy and efficiency.

Despite these challenges, Apple remains committed to advancing AI across its products. By Late 2024 Apple had ramped up the hiring of AI and machine learning experts, acquiring multiple AI startups specialising in natural language processing and computer vision. Furthermore, the company began integrating AI into its chip designs, with A18 and M4 processors optimised for machine learning tasks. Moving forward, the company is expected to unveil improved AI-driven features in late 2025, potentially alongside the iPhone 17 launch. With increasing pressure from its competition, Apple’s AI strategies will be closely watched in the months ahead.

Published by Aditya Shrivastava, Jan 2025

2024 Nobel Prizes - AI Leads The Charge

Nobel Prize In Chemistry

AI is revolutionizing scientific research, and this is especially evident by the winners of the 2024 Nobel Prize in Chemistry. Demis Hassabis and John M. Jumper received the 2024 Nobel prize in chemistry for developing an AI model AlphaFold2 that has made lightyears of progress on the infamous protein folding problem.

Generally, a gene encodes an amino acid sequence, which based on the environment of the cell, folds and assembles into a complex 3D protein structure that determines its properties and function. Since a given sequence always (with few exceptions) folds into the same structure, it should be possible to predict the protein structure from that sequence. However, the transition is immensely complex as the way its atoms interact is shaped by numerous forces, properties of the amino acids and the surrounding environment. Thus, over 200 million amino acid sequences have been identified, yet less than 1% of their corresponding three-dimensional protein structures have been experimentally determined. In 2020 Hassabis and Jumper developed a transformer-based AI model named AlphaFold2 that leapfrogged all previous results by placing first in CASP (a structure prediction contest) and scored above 90% for two-thirds of the proteins in CASP’s global distance test. While the accuracy is still not high enough for the other third of proteins, this result is astounding, and the AI approach shows potential to predict all 200 million sequences. This tool has already provided scientists access to orders of magnitude more information, vastly accelerating their ability to study diseases, develop targeted therapeutics, engineer solutions to antibiotic resistance, among other vast applications.

Nobel Prize In Physics

The founding fathers of prominent proto-AI algorithms have also received their flowers, with the 2024 Nobel Prize in Physics going to John J. Hopfield and Geoffrey E. Hinton for their invention of the Hopfield Networks and Boltzmann Machines. These are foundational discoveries that have enabled machine learning with artificial neural networks. Hopfield invented the Hopfield network, which works on the principle of associative memory, allowing it to retrieve complete patterns from partial or noisy inputs. This makes the network valuable for tasks like pattern recognition, image restoration, and solving optimization problems, even when presented with incomplete or corrupted data.

Hinton built on the Hopfield network to develop the Boltzmann machines which add layers of hidden nodes that affect how the network functions, which is a technique that has inspired modern deep neural networks. These networks are able to autonomously discover properties in data and learn complex probability distributions. Through these discoveries, Hopfield and Hinton made revolutionary but fundamental progress in artificial neural networks, greatly inspiring the techniques that have allowed AI to have such powerful capabilities today.

Published by Abhishek Moramganti, January 2025

CES 2025 - AI Highlights

AI took center stage at CES 2025, with numerous companies unveiling innovative AI-driven products and services across various sectors.

Samsung's AI-Powered Home Ecosystem

Samsung unveiled its "Home AI" system, which integrates multiple household devices into a single smart network that anticipates your needs without requiring endless manual commands. The system uses connected appliances as motion and sound sensors integrated with its various appliances to understand your habits and automate your home accordingly. For example, if your dog hops onto the couch, the SmartThings Home AI system can trigger an air purifier to clear the air of allergens. However, such deep integration of AI raises major privacy concerns. Samsung solves this issue by only relying on local processing within the Home AI hub, and ensuring that none of the data is sent to an external server either for storage or processing. 

Innovations in Robotics 

Robotics and physical AI was also a big focus at this year's CES whether that be as companions, appliances or industrial machines. Some highlights include Mixi’s palm-sized emotional support robot Romi Lacatan allows for lifelike conversations, providing real-time human-like interactions that help combat loneliness. Enchanted Tools’ Mirokai is a much more sophisticated humanoid robot designed for customer service and interaction. The robot can hold conversations and even has opposable thumbs to pick up objects. Its various stylised anthropomorphic traits make it well suited for its purpose. Robots like the Kubota KATR, an all-terrain robot with a carrying capacity of 240kg, and the Conit Runner, a lidar and ultrasound equipped robot that can precisely groove concrete to improve shear strength also stand out as innovative advancements within the agriculture and construction industries.

NVIDIA at CES

Image Credit: CNET

The big player at CES however, was NVIDIA, who brought a series of both incremental upgrades and groundbreaking AI products.

GeForce RTX 50 Series GPUs

First of the upgrades introduced was the GeForce RTX 50 series, powered by the new Blackwell architecture, succeeding the popular RTX 40 series. This lineup includes the RTX 5070, RTX 5080, and the flagship RTX 5090, offering impressive performance improvements in both gaming and AI computation. The RTX 5090 is priced at $1,999 USD, while the RTX 5070 is available for $549 USD.

DLSS 4 Technology

The company also announced DLSS 4, the latest iteration of its neural rendering technologies that uses AI to boost FPS, reduce latency, and improve image quality. The suite of features includes:

  • Multi Frame Generation which greatly boots FPS by AI generating up to three frames per rendered frame.

  • Frame Generation which allows for a more responsive gaming experience while generating slightly fewer frames.

  • Ray Reconstruction which uses AI to generate higher quality additional pixels for intensive ray-traced scenes, improving the overall ray traced image.

  • Super Resolution which boosts performance by rendering at a lower image quality and then improving the output via AI upscaling.

  • Anti-Aliasing which uses AI to improve the finer details within a rendered image.

The full suite of DLSS features will be readily accessible by the RTX 50 series GPUs, with a smaller subset of features being made available to older RTX models.

Cosmos World Foundation Models

NVIDIA unveiled Cosmos, a groundbreaking platform designed to accelerate the development of physical AI using synthetic data. Cosmos aims to democratise robotics by providing tools and resources to train robots and automated services more efficiently and cost-effectively. The platform uses physics-aware AI models to develop synthetic training data that vastly accelerates the training of physical AI models. This process can be integrated with any specific physical environment like a custom warehouse layout or factory pipeline, and provides end-to-end tools that allow for training, deploying and monitoring, all within the Cosmos platform. Ride-sharing company Uber is among the early adopters of this platform.

Project DIGITS

Another significant announcement was Project DIGITS, a high-end desktop AI supercomputer roughly the size of a mac mini. Aimed at developers and AI enthusiasts, the $3,000 USD desktop enables the execution of AI models with up to 200 billion parameters without relying on any cloud infrastructure. In the current market, the cost of high-end GPU and AI processing hardware has been a significant barrier in the adoption of AI development and research by the wider public, with building a comprehensive and powerful AI desktop costing anywhere between 5,000 to 10,000 AUD. With DIGITS, Nvidia has the potential to massively streamline and democratise local AI development, by providing a hassle-free, powerful and well integrated AI machine for a very competitive price. Moreover, encouraging powerful local processing not only eases the surging dependence on subscription based cloud AI solutions that could limit control and freedom, but also mitigates the privacy and security risks that might arise when relying on those systems. Overall, DIGITS has been an exciting development for developers everywhere.

Published by Abhishek Moramganti, January 2025

Upgrades To ChatGPT: 4o, o1, o1-mini & o3-mini

GPT-4o

In May 2024, OpenAI launched ChatGPT-4o, its new flagship large language model, with the “o” standing for “omni”, emphasizing its ability to process and generate text, vision, and audio. The live demo stunned audiences with its near-instantaneous voice responses, emotional inflections, and real-time conversational ability, showcasing a major leap in AI interaction. Since then, OpenAI has continuously refined the model, adding structured outputs in August 2024, which allow developers to generate responses in specific JSON schemas, making it easier to integrate AI into software applications. In July 2024, OpenAI introduced GPT-4o Mini to balance performance and cost, replacing GPT-3.5 Turbo as a more efficient, lightweight alternative for less demanding applications. 

GPT-4o unlocks a wide range of possibilities through its powerful multimodal capabilities. Users can, for example, take a photo of a menu in one language and have it translated in real time, analyze charts or diagrams for insights, or even generate images from text descriptions. Developers can further fine-tune responses for specific needs, integrate AI assistants into workflows, and leverage these multimodal features to build applications that go beyond traditional text interactions.

OpenAI - o series

OpenAI introduces further upgrades through its o1, o1-mini and o3-mini series of reasoning models. While these models still use the same underlying transformer architecture seen in GPT-4o, they attempt to improve reasoning by slowing down and "thinking" through complex problems before responding. This works by integrating a prompt engineering technique called Chain of Thought reasoning (CoT) directly into the model itself, which has been augmented with reinforcement learning algorithms to follow the optimal course of steps towards a solution. When you give o1 a complex prompt, rather than immediately trying to generate a response, it breaks down what you've asked it to do into multiple simpler steps. It then works through this chain of thought step by step before creating its output. This technique has yielded incredible results:

  • o1 ranks in the 89th percentile on competitive programming questions (Codeforces)

  • Places among the top 500 students in the US in a qualifier for the USA Math Olympiad (AIME)

  • Exceeds human PhD-level accuracy on a benchmark of physics, biology, and chemistry problems (GPQA).

Recently, o1-mini has been replaced by o3-mini, the newest, streamlined and cost-efficient model in this reasoning series, with developer features such as function calling, structured outputs and developer messages. Developers can also choose between three reasoning effort options: low, medium and high to optimize the speed and performance of the model for their specific use case. However, o3-mini does not currently support vision capabilities so users should continue to use o1 for visual reasoning tasks. Limited use of o3-mini is available to free users, but full use of o1 and o3-mini is available to ChatGPT Plus users and through the OpenAI API.

Published by Abhishek Moramganti, January 2025

AI Goes Nuclear

Artificial Intelligence development requires substantial energy for training and operating very large models. Wells Fargo is projecting AI power demand to surge 550% by 2026, from 8 TWh in 2024 to 52 TWh, before rising another 1,150% to 652 TWh by 2030. This is an astounding 8,050% growth from their projected level for 2024. This energy-intensive nature of AI has led to significant increases in both electricity costs and carbon emissions for Big Tech. In July 2024, Google reported a 48% rise in carbon emissions since 2019, primarily due to AI development efforts. Similarly, Microsoft disclosed a 30% increase in emissions since 2020, largely attributed to the construction and operations of new data centers. Nuclear power presents a viable solution to these challenges, offering a carbon-free and reliable energy source capable of meeting the continuous power demands of AI data centers. 

The Three Mile Island Nuclear Generating Station

In September 2024, Microsoft announced an agreement with Constellation Energy to reopen the Three Mile Island nuclear power plant. The agreement aims to bring the reactor back online by 2028, Once operational, this plant will supply 100% of its electricity output to Microsoft's data centers for the next 20 years. This move marks a significant shift in the perception of nuclear power, especially given Three Mile Island’s history as the site of the worst nuclear accident in U.S. history. Meanwhile, Google has partnered with Kairos Power to purchase electricity from small modular reactors (SMRs). These compact reactors offer flexibility and are designed to be more cost-effective than traditional nuclear plants. Amazon also invests in SMR technology, leading a $500 million funding round for X-Energy for the development of SMRs, highlighting the tech sector's growing interest in next-generation nuclear power solutions to support the era of AI.

Published by Abhishek Moramganti, January 2025

AI In The 2024 United States Elections

Misinformation has long posed a significant threat to democratic institutions worldwide, even before the advent of generative artificial intelligence. However, the widespread availability of AI tools has amplified these concerns, particularly with the rise of deepfake technology and its potential misuse.

In the U.S., only 6% of misinformation during presidential elections involved AI, highlighting the effectiveness of generative tools in implementing safeguards. For instance, OpenAI reported that ChatGPT successfully blocked approximately 250,000 attempts to generate deepfake images of political candidates, demonstrating the impact of these constraintsIn the U.S., only 6% of misinformation during presidential elections involved AI, highlighting the effectiveness of generative tools in implementing safeguards. For instance, OpenAI reported that ChatGPT successfully blocked approximately 250,000 attempts to generate deepfake images of political candidates, demonstrating the impact of these constraintsIn the U.S., only 6% of misinformation during presidential elections involved AI, highlighting the effectiveness of generative tools in implementing safeguards. For instance, OpenAI reported that ChatGPT successfully blocked approximately 250,000 attempts to generate deepfake images of political candidates, demonstrating the impact of these constraints.

These events underscore the need for ongoing vigilance in the development of hyper-realistic visual generation tools. At the same time, they highlight our ability to implement effective ethical safeguards, demonstrating that it is possible to balance technological advancement with responsible These events underscore the need for ongoing vigilance in the development of hyper-realistic visual generation tools. At the same time, they highlight our ability to implement effective ethical safeguards, demonstrating that it is possible to balance technological advancement with responsible use. These events underscore the need for ongoing vigilance in the development of hyper-realistic visual generation tools. At the same time, they highlight our ability to implement effective ethical safeguards, demonstrating that it is possible to balance technological advancement with responsible use.

Published by Victor Velloso, January 2025

Painting Created By Ai-Da humanoid Robot Sells For $1 Million Dollars

The integration of AI into the artistic medium has sparked widespread debate, with generative images challenging traditional notions of what qualifies as true artistic expression. Despite this ongoing controversy, a painting created by Ai-Da, a humanoid robot, recently sold for $1 million, igniting further discussions about the role of artificial intelligence in redefining creativity and value in the art world.

What sets Ai-Da’s piece apart from other AI-generated works is the robot's physical presence, which mimics the process of human creation. This tangible, human-like execution lends a sense of authenticity to the artwork, seemingly more alike traditional art.

Published by Victor Velloso, January 2025

Welcome To The Year Of AI Agents

2025 is shaping up to be a turning point within the AI landscape as many dub it as the year of AI agents. These intelligent, autonomous systems are moving beyond simple chatbots and into complex decision-making, task automation, and real-world interaction. AI agents have existed in smaller forms for years, but recent breakthroughs in large language models and their integration of reinforcement learning, and multi-modal AI have made them significantly more capable. Unlike traditional AI models that require direct user input for every interaction, AI agents can now set goals, plan actions, and adapt to changing conditions, marking a major shift from passive tools to proactive and autonomous problem solvers.

Just last year, Anthropic had already teased agentic features through a beta version of the “Computer Use” feature within it’s Claude models. This feature allows the AI control over a computer’s mouse and keyboard, allowing it to autonomously navigate various tasks, such as browsing web pages and entering data. While still experimental and error-prone, this feature represents an incredible step toward versatile AI agents. Other key players in this space include OpenAI’s GPT-based agents, Google DeepMind’s AlphaCode successors, and emerging open-source frameworks like AutoGPT. These systems are not just answering prompts but are managing projects, automating workflows, and even executing software development tasks with minimal human oversight. 

Incredible productivity improvements, operational cost savings and technological advancements, will all motivate the development and adoption of AI agents in 2025. Advancements in AI have allowed agents now to be equipped with improved contextual awareness, memory and long-term planning capabilities that enable them to handle complex, multi-step tasks without constant supervision. Moreover, new frameworks that allow them to be embedded in productivity tools, customer service platforms, and even robotics, make their widespread use all but inevitable.

However, this promised rise of AI agents also brings challenges. Ethical concerns around automation, data privacy, and AI decision-making biases need to be addressed. Moreover, AI agents are set to displace millions of workers over the next 5 years, making their adoption controversial to the wider public as they attempt to navigate sweeping changes across the job market. While some propose that these agents will augment the capabilities of human workers, their true effects remain to be seen as the technology progresses. With major tech companies investing heavily in AI agent development and open-source initiatives rapidly expanding capabilities, 2025 is likely just the beginning.

Published by Abhishek Moramganti, January 2025

The Best Models For Your AI Needs

Text and Chatbots

  • OpenAI o1: Designed to handle complex multi-step tasks with advanced accuracy, making it suitable for intricate reasoning and problem-solving.

  • DeepSeek R1: An open-source model that offers competitive performance, providing a cost-effective and customisable alternative to proprietary models.

  • Gemini 2.0: Demonstrates impressive multimodal and image recognition capabilities and integrates well with Google's ecosystem, however it is also proprietary and slightly lags behind o1 in a few reasoning based benchmarks.

Image Generation

  • Imagen 3: Excels in generating high-quality, photorealistic images from textual descriptions, leveraging advanced diffusion techniques.

  • Flux 1.1: Has a free open-source version that, while not as powerful as Imagen 3 or DALL·E 3, is still impressive and offers high customizability, allowing for fine-tuning to specific needs.

  • DALL·E 3: Known for its ability to generate creative images from text prompts, though it is not open-source and may have limitations compared to newer models.

Audio Generation

Music Generation
  • Suno AI: Currently the best AI music generator for creating full-length, high-quality songs with coherent lyrics and melody. It excels in natural-sounding vocals and commercial-ready output, but is proprietary.

  • Udio: A strong competitor to Suno AI, capable of generating detailed, structured songs with impressive coherence. Slightly weaker in vocal clarity, but still among the best.

  • Riffusion: An open-source model that generates music by visualizing it as spectrograms. While it allows for customization and experimentation, its quality is lower than Suno and Udio, and it struggles with long-form compositions.

Text-to-Speech Models
  • Eleven Labs: Current gold standard for natural and expressive speech synthesis, with near-human quality, multilingual support, and custom voice cloning. However, it is proprietary and can be costly for heavy use.

  • Kokoro: A newer, open-source alternative that delivers impressive emotional expressiveness. While not as polished as Eleven Labs, its openness allows for more customization and fine-tuning.

  • PlayHT: A strong commercial-grade TTS model with great voice cloning, making it ideal for businesses and content creators. However, it is proprietary and less flexible than open alternatives.

Video Generation

  • Sora: A state-of-the-art text-to-video model capable of producing detailed and coherent videos, suitable for various applications. However, it is proprietary and likely expensive, with no public release yet.

  • OmniHuman-1: A newly unveiled model that specializes in stunningly realistic human motion synthesis, making it ideal for applications requiring lifelike human representations, such as digital avatars and virtual production.

  • Runway Gen-3: A proprietary text-to-video model that delivers high-quality motion, better coherence than its predecessors, and strong adaptability across creative styles. While not as advanced as Sora, it is accessible and user-friendly, making it a top choice for creators.

Honorable Mention - Mochi 1: A free and open-source model that shows great motion consistency and detail, emerging as a strong competitor in the video generation space. While it lacks the refinement of proprietary models, its open nature makes it valuable for customization and research.

Published by Abhishek Moramganti, January 2025

đź“‘ Research Spotlightđź’ˇ

DeepSeek

January 2025

Transformers²

January 2025

Google Titans

December 2024

Titans: Learning to Memorize at Test Time

Over more than a decade there has been an extensive research effort on how to effectively utilize recurrent models and attention. While recurrent models aim to compress the data into a fixed-size memory (called hidden state), attention allows attending to the entire context window, capturing the direct dependencies of all tokens. This more accurate modeling of dependencies, however, comes with a quadratic cost, limiting the model to a fixed-length context. We present a new neural long-term memory module that learns to memorize historical context and helps attention to attend to the current context while utilizing long past information. We show that this neural memory has the advantage of fast parallelizable training while maintaining a fast inference. From a memory perspective, we argue that attention due to its limited context but accurate dependency modeling performs as a short-term memory, while neural memory due to its ability to memorize the data, acts as a long-term, more persistent, memory. Based on these two modules, we introduce a new family of architectures, called Titans, and present three variants to address how one can effectively incorporate memory into this architecture. Our experimental results on language modeling, common-sense reasoning, genomics, and time series tasks show that Titans are more effective than Transformers and recent modern linear recurrent models. They further can effectively scale to larger than 2M context window size with higher accuracy in needle-in-haystack tasks compared to baselines.

OmniThink

January 2025

Closing Notes

We welcome any feedback / suggestions for future editions here or email us at [email protected]

Stay curious,

🥫Sauces 🥫

Here, you can find all sources used in constructing this edition of WatchTower: