The WatchTower: 13th Edition

Welcome to the captivating world of Artificial Intelligence! 🤩

Hello innovators and curious minds of AI Soc!

Welcome to the 13th Edition of The WatchTower, your quintessential guide to the riveting world of AI. With exams approaching, the team at AISoc wishes you the best of luck and trusts that you’re all making the most of your AI personalised tutors (i.e., ChatGPT) to ace them!

For the best possible viewing experience, we recommend viewing this edition online.

📰 Featured in This Edition:

  • Devin - The First AI Software Engineer

  • Introduction to Large Language Models (LLMs) - Masterclass Review

Image Credit : OpenAI

Devin – The AI Software engineer

Last month, Cognition Labs unveiled Devin, the first AI software engineer that can plan and execute complex software engineering tasks. It is the first of its kind designed to mimic the problem-solving skills of a seasoned software engineer. With the capability to autonomously find and fix bugs in codebases, this groundbreaking advancement is revolutionizing the landscape of software development yet also threatening to take the jobs of its creators.

 

What can Devin do?

Devin is an autonomous model that can plan, analyze, and execute complex software engineering tasks. Like other large language models (LLM), Devin can understand natural languages and translate them into functional code so users can type a prompt in the interface to start a project.

Devin first creates a detailed step-by-step plan to tackle the problem, gathers information from websites with its own browser, and writes code. It is also capable of fixing issues, testing, and reporting on its progress in real-time. Users can also jump into the chat interface anytime to give new commands to guide the process. This capability makes Devin a powerful teammate that can work alongside human engineers or independently.

Currently, Devin can handle a wide range of tasks including building and deploying apps end-to-end, finding and fixing bugs in codebases autonomously, training and finetuning AI models.

 

How good is Devin?

Devin is evaluated on SWE-bench, a dataset of 2294 issues and pull requests from 12 popular python repos that tests systems’ ability to solve GitHub issues automatically. Devin boasts an impressive success rate of 13.86% in solving these problems, significantly surpassing previous models like GPT-4 with 1.74% and Claude 2 with 4.8%. Unlike other large language models that might require human intervention, Devin can break down projects into smaller tasks and handle them independently.

Outlook

Devin is no doubt a huge stride forward in the Generative AI realm, revolutionizing the software development field by automating coding tasks and complex problems. Scott Wu, the CEO of Cognition, states that they are trying to build “tireless, skilled AI teammates” that assist humans so engineers can focus on more interesting problems. There have been heated debates whether Devin will eventually replace most software engineers. Maybe it is still too early to tell but one thing is for sure, Devin is still learning, and we will see Devin 2 not long from now.

Published by David Hung, April 15 2024

Image Credit: OpenAI

Introduction to Large Language Models (LLMs)

Last week, our AISoc held a workshop for introductions to Large Language Models (LLMs), which laid the foundation for understanding and utilizing these powerful AI tools. The workshop introduced the fundamentals of LLMs, emphasizing their crucial role in enabling machines to perform tasks related to human language with an advanced level of proficiency.

Language, being a crucial human ability, requires sophisticated AI algorithms for machines to interpret and communicate effectively. The workshop outlines the evolution of language modeling, from basic frameworks to advanced neural network-based approaches, highlighting the significant strides in technology that have made current models possible.

The applications of LLMs span a wide array of tasks including summarization, text generation, translation, text classification, and question answering, showcasing the versatility and impact of LLMs in various domains.

Optimization techniques such as fine-tuning, instruction tuning, and prompt engineering are discussed. These methods are crucial for enhancing the performance of LLMs, enabling them to adapt to specific tasks or requirements dynamically and effectively.

An empirical evaluation section underscores the importance of testing LLMs across several metrics. This includes their prowess in language generation, their utility in leveraging knowledge for tasks like question answering, and their capacity for complex reasoning, covering areas from symbolic reasoning to solving mathematical problems.

Overall, our workshop provides an overview of LLMs, covering their development, application, and evaluation, reflecting the ongoing advancements and the breadth of possibilities these models offer for the future of AI and language processing.

We are looking forward to seeing more new members attend our workshops and join us in exploring the exciting field of Large Language Models.

Published by Ziming, April 15 2024

🗣 Sponsors 🗣

Our ambitious projects would not be possible without the support of our GOLD sponsor, UNOVA.

Closing Notes

As always, we welcome any and all feedback/suggestions for future topics here or email us at [email protected] 

Stay curious,

🥫Sauces🥫

Here, you can find all sources used in constructing this edition of WatchTower: