The Story of Modular

Revolutionising the nature of AI programmability, usability, scalability & compute!

7 min readMay 30, 2023

Have you guys ever heard of Modular? Well, I didn’t until 2nd May 2023, when the teaser for their launch event was unveiled on YouTube, carrying some unrealistic claims:

The world’s fastest unified AI inference engine
A new way to unlock hardware
A programming language that gives Python super powers

This teaser was more than enough to capture the attention of every AI developer, and so did the statistics indicated. The official YouTube channel of Modular had only 4.9K subscribers, however, the number of views on the product launch exceeded 100K (at the time of writing this blog).

So, in this two-part blog, we are going to set-off on a journey together to explore the universe of Modular. In the first part of this blog, we are going to explore how Modular came into existence, who are it’s founding members, and what they have to offer to the AI community.

The History

Founded in Jan 2022, by Chris Lattner and Tim Davis, Modular is a for-profit, next-generation AI developer platform unifying the development and deployment of AI for the world. Situated in Palo Alto, California, United States, it has raised a total funding of $30M up until now from Google Ventures, Greylock, Factory and SVAngel.

It all started when Chris and Tim met at Google, and felt that AI was being held back by overly complex and fragmented architecture. Motivated by a desire to accelerate the impact of AI on the world by lifting the industry towards production-quality AI software, they founded Modular.

P.S. — I have borrowed this knowledge from it’s Crunchbase page, and it’s official website.

The Team

The founders of Modular, namely Chris Lattner (Co-Founder and CEO) and Tim Davis (Co-Founder and Chief Product Officer) are some of the top AI infrastructure visionaries, who believe that “fixing” AI as it stands would just serve as a patch. Instead, they are treading towards a modular rewrite of the entire software layer underlying the enormous applications of AI that assist us today.

Chris Lattner

He has been responsible for founding and scaling critical infrastructure including LLVM, Clang, MLIR, Cloud TPUs and the Swift programming language.

LLVM — It is a compiler similar to the GNU Compiler Collection (GCC) (installed to compile and run C/C++ programs). The difference is that GCC supports a number of programming languages while LLVM isn’t a compiler for any given language. LLVM is a framework to generate object code from any kind of source code. Note that LLVM is not an acronym, but the name of the project itself.
Clang — It is a front end compiler that is used to compile programming languages such as C, C++, Objective C and Objective C++ into machine code. It uses the LLVM compiler as its back end.
MLIR — The Multi-Level Intermediate Representation (MLIR) Compiler Framework is used to build reusable and extensible compiler infrastructure, and reduce duplicate codes.
Cloud TPUs — They are designed to run cutting-edge machine learning models with AI services on Google Cloud.
Swift — Built with the LLVM compiler framework, it is a high-level general-purpose, multi-paradigm, compiled programming language developed by Apple Inc. and the open-source community. It was developed as a replacement for Apple’s earlier programming language Objective C.

Tim Davis

He helped build, found and scale large parts of Google’s AI infrastructure at Google Brain and Core Systems from APIs (TensorFlow), Compilers (XLA & MLIR) and runtimes for server (CPU/GPU/TPU) and TF Lite (Mobile/Micro/Web), Android ML & NNAPI, large model infrastructure & OSS for billions of users and devices.

TensorFlow — It is an end-to-end machine learning platform, using which one can create production-grade machine learning models.
XLA — Accelerated Linear Algebra (XLA) is a domain-specific compiler for linear algebra that can accelerate TensorFlow models with potentially no source code changes.
TF Lite — It is a mobile library for deploying models on mobile, micro-controllers and other edge devices.
Android ML — It brings Google’s machine learning expertise to mobile developers in a powerful and easy-to-use package.
NNAPI — The Android Neural Networks API (NNAPI) is an Android C API designed for running computationally intensive operations for machine learning on mobile devices and enables hardware-accelerated inference operations on Android devices.

If knowing more about compilers, and how they work, is something that intrigues you, then do check out the below blogs.

Understanding Compilers — For Humans (Version 2)

How Programming Languages Work

towardsdatascience.com

Machine code vs. Byte code vs. Object code vs. Source code vs. Assembly code

1. Machine code

rahul-saini.medium.com

The Offerings

The world’s fastest unified AI inference engine

A single engine to consolidate the distinct AI tool-chains to simplify the AI deployment (Source: Official Website of Modular)

In today’s scenario, AI teams face a multitude of problems. Teams working with different frameworks have to create and maintain individual pipelines for each of the frameworks. Teams working on different platforms (like large-scale servers, workstations, edge devices, etc) have to deal with different tools, and apply the same optimisations over and over again. Teams are developing and deploying larger and larger models to improve the user experience, but this happens at the expense of inference latency and incurs heavy runtime costs. This highly complex and fragmented ecosystem is hampering the AI innovation, and is pulling back the AI community, as a whole.

In order to tackle this, the team at Modular developed a modular inference engine. In it’s essence, it is a powerful combination of state-of-the-art compiler and runtime technologies that unifies multiple AI frameworks (tensorflow, pytorch, etc) and devices (servers, mobiles, etc). It enables us to rely on a single execution engine for supporting all our workloads, and using just a simple API, we can enjoy compatibility with all our models trained using different frameworks. Read more about it here.

A new way to unlock hardware

We are living through a time of unprecedented hardware growth. From CPUs to GPUs to TPUs to other specialised accelerators, there is a profound increase in variety of new hardware platforms and AI compute capabilities. But taking advantage of these new capabilities isn’t trivial. The primary reason behind this is that each hardware vendor comes up with its own bespoke tool chains and programming model that work best for their respective hardware, each riddled with different bugs, error messages and limitations. This is not suitable for large scale.

In order to overcome this, the team at Modular developed a unified software platform for AI hardware. In it’s essence, it is a platform that makes it easy for application developers to migrate their models to new hardware. With a single set of tools and programming model, AI developers can deploy their models across a wide variety of hardware platforms, enabling the hardware community to expedite the innovation cycle. Read more about it here.

A programming language that gives Python super powers

Speed Comparison of different programming languages, Algorithm: Mandelbrot, Instance: AWS r7iz.metal-16xl Intel Xeon (Source: Official Website of Modular)

Another significant gap that existing AI systems struggle with is programmability. When state-of-the-art models are moved from the research phase to the production phase, developers are usually compelled to rewrite large portions of their models in more performing languages than Python, in order to meet the latency and cost targets. Creating and maintaining a code base containing fragments in different programming languages heavily impacts team collaboration and productivity. But why must we tread down this path? It all starts with Python; a powerful high-level language, equipped with a clean and simple syntax in an expansive ecosystem of libraries. However, at the same time, it lacks the scalability, performance and compatibility (across platforms and devices) demanded by the cutting-edge models of today.

In order to deal with this, Modular came up with one scalable language, in which we can do everything, be it model code, system code or hardware code. Extending Python to become more magical, the Modular team introduced Mojo, which incorporates all the strengths of Python that researchers love, and adds the incredible system programming features loved by the production engineers that Python misses on. In other words, it brings the best of both the worlds; research and production. Read more about it here.

In the second part of this blog, we are going to delve deeper into the world of Mojo, so stay-tuned for that.

Additional Resources

Modular: AI development starts here

The Modular Compute Platform dynamically partitions models with billions of parameters and distributes their execution…

www.modular.com

Modular: A unified, extensible platform to superpower your AI

Modular is moving AI infrastructure from the research era into the production era. AI infrastructure itself has been…