- The Turing Point
- Posts
- The Turing Point: 29th Edition
The Turing Point: 29th Edition
For the best possible viewing experience, we recommend viewing this edition online.
📰 Featured in This Edition:
Events
Neurons & Notions - AI SOC Fortnightly Discussion Session
AI News Recap
Research Spotlight
🗓 Upcoming: In AI Society
Neurons & Notions

Image Credit: UGAResearch
Join AI Society for our fortnightly discussion sessions! Each session starts with 45 minutes of key AI news and trends from our newsletter, followed by 45 minutes exploring recent research papers and their potential impact. Stay informed, engage in discussions, and deepen your understanding of this rapidly evolving field.
We will also be streaming the discussion on Youtube, so feel free to join us live (physically or virtually) or catch up with the session later!
📅 Date: Wednesday Week 4, Term 1 (12/03/2025)
🕒 Time: 1:00 - 2:30 pm
📍 Location: UNSW Business School 119
📺 Youtube Channel (Subscribe!): UNSW AISoc - Neurons & Notions #1
AI News Recap
Grok 3 - The New Top Dog?
Grok 3 is the latest LLM from Elon Musk’s xAI that has surprised the world with it’s incredible performance, feature rich package and an apparent lack of censorship. The new Grok 3 model boasts significant advancements in decision-making abilities, reasoning tasks and human-like thinking and interactions. To achieve this result xAI has assembled one of the largest computing clusters in the world, with the model being trained on 200,000 NVIDIA H100 GPUs which is an order of magnitude improvement in training power compared to GPT-4. The model has also been partly trained on the endless amount of data on the X (twitter) platform, which although improves it’s performance, also raises some questions about privacy and data quality.

Image Credit: Business Insider
So how does Grok 3 compare to other LLMs? For reasoning and STEM tasks such as American Invitational Mathematics Examination, LiveCodeBench and the science benchmark GPQA, Grok 3 outperformed all of its competitors by a significant margin. Moreover, it also placed first on Chatbot Arena’s LLM leaderboard, which is impressive as the rankings are determined from blind comparisons by users, and thus account for intangible factors that bypass traditional benchmarks.
Grok 3 also comes equipped with the following features:
Reasoning: As mentioned earlier Grok 3 joins the new wave of reasoning models, and boasts incredible performance across various reasoning tasks. There are two reasoning modes available:
Think, which is the standard mode and will display Grok’s reasoning as it performs a task
Big Brain, which is a more powerful mode for complex tasks that require more time and computational power.
DeepSearch: An agentic feature that allows Grok to search the web for any sources relevant to the task. It also thinks through the information gained from these sources and provides a reasoning interface that traces its chain of thought.
Voice Mode: This is Grok 3’s conservation model that allows it to provide audio outputs. This mode is also quite powerful but is largely uncensored compared to the voice modes in LLMs like ChatGPT, with the model having the ability to make distressing noises, stronger tones that convey sorrow, anger or annoyance and even swear.
As mentioned earlier, Grok 3 is greatly uncensored compared to models like GPT-4o, with the model providing responses on ethical dilemmas and moral conflicts, political figures and other controversial topics. This does not mean the model is completely uncensored as it will still avoid generating responses involving violence, crime or explicit details.
Grok 3 initially debuted on the X (Twitter) platform exclusively for Premium+ users, but is now available to all X users for free with the DeepSearch and Think features.The Big Brain feature however is still only offered to Premium+ subscribers. Additionally, xAI also offers a SuperGrok subscription that gives users access to the latest updates and advancements of Grok on the Grok website and the Grok App.
Published by Abhishek Moramganti, February 2025
Hunyuan Turbo S - Never Seen Speed
Moments after we were introduced to the acclaimed DeepSeek-V3, China's AI landscape gives us yet another significant development. The Chinese company Tencent has introduced its latest AI model, Hunyuan Turbo S, which boasts response times faster than both ChatGPT and DeepSeek, delivering answers often in the span of a second.

Image Credit: 天下雜誌
Even though the model’s most notorious feature is its speed, it also possesses other attributes matching or surpassing that of mainstream models:
Performance: In benchmark tests across various domains including knowledge, reasoning, math, and code, Hunyuan Turbo S has allegedly demonstrated capabilities on par with DeepSeek-V3, OpenAI's ChatGPT 4o, Claude 3.5 Sonnet, and Llama 3.1.
Architecture: Tencent's Hunyuan Turbo S introduces Hybrid-Mamba-Transformer fusion architecture, marking the first successful integration of Mamba and Transformer deep learning structures in a large-scale model. This design reduces the computational complexity and Key-Value (KV) cache usage associated with traditional Transformer architectures.
Efficiency: Tencent claims that the cost associated with deploying Hunyuan Turbo S is significantly lower than mainstream large-scale models. This greatly lowers the barrier for adopting advanced AI technologies.
Tencent’s Hunyuan Turbo S represents a pivotal evolution in AI technology. Its blend of speed, advanced architectural design, and cost efficiency demonstrates that high-quality AI can be achieved with innovative engineering, further intensifying the global AI race.
Published by Victor Velloso, February 2025
Claude 3.7 - A friend or foe of the modern SWE ?
Anthropic has unveiled Claude 3.7 Sonnet, a cutting-edge AI model that pushes the boundaries of artificial intelligence in software development. With enhanced reasoning, coding capabilities, and a new tool called Claude Code, this release is poised to reshape the way engineers interact with AI.

Claude 3.7 Sonnet introduces hybrid reasoning, allowing users to toggle between quick responses and detailed step-by-step analyses. This flexibility enhances problem-solving, catering to both rapid prototyping and complex debugging tasks. Additionally, the model’s enhanced coding capabilities make it a powerful tool for developers, particularly in generating and understanding code across multiple languages. The added feature of extended thinking mode, allows the model to self-reflect before answering, therefore it offers more nuanced and thoughtful responses compared to anthropic’s previous versions of Claude as there is a visible improvement in math, physics and coding related tasks. Furthermore, the Claude 3.7 sonnet excels in coding tasks, achieving an industry-leading 70.3% accuracy on the SWE-bench verified benchmark. This is a significant improvement over Claude 3.5 sonnet, allowing usage in complex coding workflows and ai agent integration.
The software engineering landscape is shifting as AI-driven tools become more prevalent. Studies show that over 60% of developers already use AI-powered code assistants like Claude, GitHub Copilot, or Google Gemini in their workflows. Moreover, a recent survey by Stack Overflow found that 44% of professional developers believe AI will significantly change their daily responsibilities within the next five years.
Despite fears of automation, industry leaders argue that AI is more likely to augment rather than replace engineers. Senior developers benefit the most from AI’s capabilities, using it to automate repetitive tasks and focus on higher-level problem-solving. However, Claude 3.7 Sonnet exemplifies AI’s potential to revolutionise software development. As AI becomes an integral part of engineering workflows, the industry will likely shift toward a collaborative model, where AI handles routine coding while human engineers focus on innovation, architecture, and ethical considerations.
While automation will undoubtedly change the job market, software engineering is far from obsolete. Instead, developers must adapt to a future where coding knowledge is essential, but the ability to leverage AI effectively will be the true differentiator.
Published by Aditya Shrivastava, February 2025
OpenAI Unveils GPT-4.5
OpenAI has unveiled GPT-4.5, its latest language model to date that promises significant leaps over its predecessors with the core aim of being more natural and human-like, but early impressions have been underwhelming.

Apart from the expected improvements in knowledge base, OpenAI claims that the defining features of 4.5 is that the model feels more natural, has a higher emotional intelligence and hallucinates less. Although these improvements can be immediately noticed, qualitative testing such as in this video by AI Explained shows that the new 4.5 model still slightly underperforms at EQ and creativity compared to Claude 3.7. Needless to say these claims are vague and difficult to represent in popular benchmarks, which has been part of the reason behind the early mixed reception.
Speaking of benchmarks, while 4.5 is an upgrade over 4o, the improvements are marginal rather than generational like seen in the leap from GPT-3.5 to GPT-4. Moreover, OpenAI has only released benchmarks comparing the model to its own 4o series of models and has excluded comparisons with models like Claude 3.7 or Grok 3, making it difficult to gauge the true performance of this model. Nevertheless, even relatively small improvements in the base model can be incredibly useful as they can allow for the development of better reasoning models (such as o1 and o3) or facilitate other post-training innovations that have greatly boosted performance in this new age of LLMs. Currently GPT-4.5 is only available for ChatGPT Pro tier users (with it arriving for Plus users next week) or through the OpenAI API where it costs a whopping $150 USD per million output tokens, making it the most expensive model currently on the market.
Published by Abhishek Moramganti, February 2025
Wan 2.1 - Best AI Video Model?
Wan 2.1 is the new free open-source and feature-rich AI video generation model from Alibaba that has swiftly taken its place among the top proprietary contenders like SORA from OpenAI.

While it is a common occurrence for open source models to lag behind monetised proprietary models, Wan 2.1 follows DeepSeek in this new age of free open source models that match or even outperform the best, with Wan 2.1 beating SORA in metrics like scene generation quality, single-object accuracy, and spatial positioning. Moreover, Wan 2.1 handles spatial and temporal consistency well and can deliver a smooth 1080p video at 30 fps, contributing to its with an impressive 84.7% VBench score. The core architecture of Wan is a diffusion transformer enhanced with a 3D casual variational autoencoder, trained on a mammoth 1.5 billion videos and 10 billion images. Wan 2.1 itself isn’t a single model and comes in 4 forms:
T2V-1.3B: The lightweight text-to-video model, requiring just 8.19GB of VRAM, making it practical for a large array of commercial GPUs
T2V-14B: The heavyweight text-to-video model with enhanced quality
I2V-14B-720P: Image-to-video transformation at 720p resolution
I2V-14B-480P: Image-to-video transformation at 480p resolution
The full feature list of Wan 2.1 includes:
Multilingual Text Support (English and Chinese)
Video editing to enhance existing videos
Text-to-Video
Image-to-Video
Text-to-Image
And even Video-to-Audio
Overall Wan 2.1 looks to be a powerful contender in the GenAI space, boosted by it’s flexibility and open-source availability, improving AI accessibility to artists and developers. The core Wan 2.1 model is free to download and use from HuggingFace, and also free to use on Alibaba’s Model Studio, a cloud based generative AI platform.
Published by Abhishek Moramganti, March 2025
Sesame Voice Assistant

Image Credit: medial.app
As originally defined by Alan Turing, the question of whether "can machines think?" is more accurately reframed as "can machines be distinguished from humans in conversation?" This fundamental shift in perspective gave rise to the Turing Test, which sought to explore the potential for machines to exhibit human-like intelligence. During the early days of computing, this was a groundbreaking concept. Today, however, we find ourselves approaching a new frontier—one that surpasses Turing's vision. Enter Sesame, the latest leap forward in speech-based generative AI, which promises to blur the boundaries of what we know as sentience.
Differently from other AI systems, Sesame distinguishes itself by focusing not only on conversational abilities but also on contextual understanding and emotional nuance. While earlier systems could replicate human-like responses based on pre-programmed rules or statistical models, Sesame goes a step further by leveraging deep learning algorithms that enable it to grasp underlying emotions, intentions, and other subtleties within human speech. This allows it to generate responses that feel more genuine, empathetic, and contextually aware.
As this technology continues to evolve, it raises important questions about the nature of sentience and consciousness. If machines like Sesame can engage in conversations that evoke empathy, understanding, and emotional connection, can they truly be considered sentient? Or is the line between human and machine proposed by Turing becoming so blurred that the distinction itself may become irrelevant?
Published by Victor Velloso, March 2025
📑 Research Spotlight💡
Reinforcement Learning … For Among Us?
February 2025
Minecraft Played By Multimodal AI Agents?
February 2025
Can AI Trained On Small Sample Rival The Big Players?
February 2025
Closing Notes
We welcome any feedback / suggestions for future editions here or email us at [email protected]
Stay curious,

🥫Sauces 🥫
Here, you can find all sources used in constructing this edition of WatchTower:
Grok 3:
https://www.oneusefulthing.org/p/a-new-generation-of-ais-claude-37
https://medium.com/ai-unscripted/grok-3-best-model-on-the-planet-c6d008f24848
Hunyuan Turbo S
https://pandaily.com/tencent-hunyuans-new-generation-turbo-s-fast-thinking-model-released/
OpenAI GPT-4.5
Wan 2.1
https://bgr.com/tech/alibabas-new-wan-2-1-text-to-video-ai-is-unbelievable/
Sesame
https://www.theverge.com/news/621022/sesame-voice-assistant-ai-glasses-oculus-brendan-iribe
Manus AI