The WatchTower: 12th Edition

Welcome to the captivating world of Artificial Intelligence! 🤩

Hello innovators and curious minds of AI Soc! 🤗

Welcome to the 12th Edition of The WatchTower 🙌, your quintessential guide to the riveting world of AI. In this edition, we're set to ignite your imagination with a dazzling array of AI innovations that are reshaping our society, art, and personal lives 🌄.

For the best possible viewing experience, we recommend viewing this edition online.

📰 Featured in This Edition:

  • BBQ Takeover 🍖 

  • Intro to LLM Workshop 💻️ 

  • Sora: OpenAI’s Leap into the Future of Text-to-Video Technology

  • Revolutionizing AI with Agentic Workflows: A Glimpse into the Future

FREE FOOD!? SIGN ME UP !

Get ready to taste the future at AI Soc’s BBQ Takeover!

Calling all AI enthusiasts! Get ready to take your knowledge to the next level with our LLM (Large Language Model) Masterclass happening this Thursday, 11/04/24.

Image Credit: OpenAI

Sora: OpenAI’s Leap into the Future of Text-to-Video Technology

In case you missed it, OpenAI captivated the tech community in February with its unveiling of Sora, a pioneering text-to-video model able to produce strikingly convincing, high-definition videos lasting up to a minute. The creation of these videos represents a giant leap in AI capabilities, hinting at a future where visual content is limited only by our imaginations. Check it out for yourself: OpenAI - Sora. While the release date of Sora is yet to be announced, recent interviews with members of OpenAI suggest that it could be available for public use by the end of this year.

How it Works - Understanding Diffusion Models and Transformers

At its core, Sora is a latent diffusion model, a term that represents a groundbreaking approach in machine learning. Put simply, diffusion models learn to generate high-quality data by initially introducing noise to a dataset and then learning to reverse this process effectively. The true innovation with latent diffusion models like Sora lies in their approach to handling data. Instead of manipulating the raw pixel values, these models work with an encoded latent representation of the image or video. This means the input is compressed into a more compact form that still retains all its critical features and can then be manipulated in a computationally efficient and flexible manner. Sora combines the use of these diffusion models with transformers, a type of neural network designed to grasp context and meaning by identifying and analyzing the relationships within sequential data.

For a deeper dive into the technical side of Sora, check out the following links:

Why Sora Stands Out

While the AI world has seen various implementations involving transformers and diffusion models, Sora represents one of the early successes of combining diffusion transformers specifically for video generation. This achievement showcases the robustness of diffusion transformers in creating video content that is not only visually appealing but also remarkably coherent and consistent over time. For more details, check out the following paper: Latte: Latent Diffusion Transformer for Video Generation

Does Sora Truly Understand?

Sora's outputs are so impressive that they have given rise to discussions around its ability to learn complex world models. The technology's adeptness at ensuring temporal coherence and object permanence suggests a deep understanding of the physical and visual elements that make up our reality. As stated by Dr Jim Fan, a Senior Research Scientist at NVDIA, in a recent tweet, Sora learns a physics engine implicitly in the neural parameters by gradient descent through massive amounts of videos.”

Looking Ahead

As we stand on the brink of what seems like a new era of creative possibilities, the introduction of Sora by OpenAI underscores both the immense potential and the responsibilities that come with advanced AI technologies. Industries ranging from entertainment to education could be transformed by the ability to generate bespoke video content on demand. Yet, the path forward requires a balanced and cautious approach, as it’s easy to imagine the myriad ways this technology could be misused (e.g., deepfakes, misinformation, etc.). So, while we harness the incredible capabilities of technologies like Sora for innovation and growth, we must take care to safeguard the fundamental pillars of trust and integrity that uphold our digital world.

Published by Jonas Macken, April 8 2024

Image Credit : OpenAI

Revolutionizing AI with Agentic Workflows: A Glimpse into the Future

In the ever-evolving world of Artificial Intelligence, we are constantly looking for ways to improve our productivity with the help of AI. Today, thanks to the pioneering insights of Computer Scientist Andrew Ng, we are able utilize agentic workflows to transform the way we interact with AI.

Traditionally, when we interact with AI models like ChatGPT or Claude, we provide a prompt and receive an answer. However, Andrew suggests that this is like asking someone to write an essay in one go, without ever hitting the backspace key—a daunting, if not impossible, task for most.

Agentic workflows on the other hand introduce a game-changing alternative: a multi-step, reflective process where the AI not only generates an initial output but also reviews and revises its work, much like a human iterating on a draft.

By utilizing Agentic workflow, AI is able to critique its own output and refine results through a series of revisions. This process is similar to how students modify their essays or how programmers fix bugs, it allows the quality of the output to improve drastically based on feedback.

Andrew had identified 4 key design patterns for Agentic reasoning.

1: Reflection: This is the process of making the AI tool evaluate its own output. AI tends to pick up on errors and improve its output iteratively.

2: Tool Use: AI and large language models (LLM) are able to utilize tools such as websearch to improve the output. You might have already seen this in ChatGPT, where they incorporated Code Copilot in one of their available GPTs.

3: Planning: It is possible for AI models to strategize and foresee future actions and make decisions based on goals, similarly to how a human would plan on solutions when faced with a complex problem.

4: Multi-agent collaboration: Much like how humans assign different responsibilities when working in a group, we can also incorporate multiple AI models together to collectively work on a prompt. Although relevant tools for this pattern are still emerging, it allows for better performance overall.

Andrew Ng's vision of agentic workflows marks a significant leap towards more intelligent, adaptive AI models. By harnessing the power of reflection and iteration, these workflows open up new possibilities for AI development, making the technology more useful, versatile, and, ultimately, more human-like in its problem-solving approach.

Published by Lucy, April 8 2024

🗣 Sponsors 🗣

The ambitious projects we aim to accomplish would not be possible without the support of our GOLD sponsor, UNOVA.

Closing Notes

As always, we welcome any and all feedback/suggestions for future topics here or email us at [email protected] 

Stay curious,

🥫Sauces 🥫

Here, you can find all sources used in constructing this edition of WatchTower: