Audience: This is a primer written for those who are hearing a ton about AI, you aren’t exactly sure what it all means, and are looking for a framework for understanding
Abstract: Generative AI is as consequential a technology as we’ve seen in decades. Many are comparing this moment to the 1940s and the race to build nuclear weapons. While AI builds on the trends that we’ve been witnessing since the advent of the world wide web in the mid 1990s, there are significant features that make it a difference of kind more than a difference of degree. There are three primary questions this document will address.
- What is it? Generative AI is an example of the broader principle of math modeling. Understanding the equation f(x) = y goes a long way to comprehending how things like ChatGPT work.
- Why is this technology so important? Generative AI is a mathematical model encoded in software. Software has been “eating the world” for 75 years as it replaces and augments human effort. Generative AI takes this trend to new heights by harnessing the entirety of recorded information to potentially replace an entire class of human labor in our economy.
- What should concern us? Dystopian narratives of AI working against humanity have captured the public imagination. However, a more near-term concern is the concentration of generative AI resources and capabilities in the hands of a small number of Big Tech firms- particularly the 4 horsemen: Microsoft, Google, Facebook and Amazon.
Part I: What is it?
Background on Mathematical Modeling
As a first step to understanding AI, it helps to zoom out and begin with a broader concept for which AI is just a specific example: mathematical modeling. Math modeling is an effective way to take a complex system and reduce it to a handful of variables and establish the relationship between them with equations. By isolating the essentials, a good model provides insight into a problem which leads to better decision making. For example, films like MoneyBall demonstrated how a mathematical model was used to put together an elite professional baseball team by selecting players based on underappreciated attributes (on-base percentage instead of the more commonly utilized batting average). In the early 2000s the Oakland A’s utilized this approach to field a team that could compete with the resource rich New York Yankees and Boston Red Sox at a fraction of the cost. In the media space, companies like Netflix demonstrated how a mathematical model could help determine the ingredients for a TV show that was likely to produce a hit for certain audiences, thus reducing the risk of funds being poorly spent. In 2011, Netflix decided to pay $100 million for the rights to House of Cards without seeing a single episode, because they had a model that informed them that a political drama directed by David Fincher starring Kevin Spacey would be a hit with many of their then 20 million subscribers.
Math modeling is about processing the information that is available to us at the moment and then predicting the future value of some variable of interest. Consider the simple equation below that you probably remember from high school algebra:
f(x) = y
This statement is just saying that a function f takes a set X (we use lowercase x to denote an element of the set X) and maps it to the set Y (similarly, lowercase y to denote an element of the set Y). We can think of f as defining some relationship between two variables of interest. A common example is the relationship between a person’s age and their income, and we can say in words that a person’s income is an increasing function of their age. In the case of Netflix, we can think of X as the attributes of a television program- so the subject matter, the name of the director, the producer, and the actors, the age-appropriate rating, etc. We can think of Y as the likelihood of success, or the number of accounts that will watch it in the first 3 months of release. And we can think of f as the way to translate what we know today (X) to what we would like to forecast (Y) tomorrow.
A good model is one that is able to take what we know today and come up with reliable predictions of future outcomes. Those are the success stories that you hear about- like Netflix and House of Cards. Similarly, companies like Google and Facebook are very good at learning about a user’s attributes- the X in this case is their demographics, their browsing history, their friends demographics and browsing history, and the Y is the kind of advertisement that is likely to draw significant engagement by a particular user. Google and Facebook have a myriad of f’s and more X (data about all of us) than any private enterprise in history to provide exactly the audience that advertisers want. Similarly, Amazon is very good at taking those same attributes and determining which products you are likely to purchase. Competency in mathematical modeling turns out to be one of the most sought after skills in our current economy.
In conclusion, equation f(x) = y is very simple, but it turns out to provide a lot of the basic framework for understanding a whole class of problems, including generative AI. . Most of the complexity and difficulty around mathematical modeling is finding a set of fs and Xs and combining them in a way that produces good forecasts for Y. We call this process of taking a model and fitting it to data training, and this is a term that you will hear frequently when discussing AI models.
An important thing to note is that the proliferation of data sources over the last couple decades means that models being utilized today are far more complex than their predecessors. A good dataset to train a model may have had 500–1,000 observations at the turn of the century. It is common for datasets today to have millions of observations, which allows for models of greater complexity to be calibrated. Thus while the sheer volume of Xs and the complexity of fs produce increasingly more precise forecasts for an expanding set of Ys, this simple equation f(x) = y remains the conceptual grid that frames all of this research.
AI as an example of Mathematical Modeling
Armed with this tool, we will now see how generative AI is just an example of the broader discipline of mathematical modeling. There are many fs that have been deployed in the history of AI, but the current flavor of the month is what they call Large Language Models (LLMs). Using the framework we developed above, LLMs like ChatGPT and Google’s Bard or Facebook’s Llama are all a certain type of f called neural networks. They are rooted in our understanding of how human brains function, and they work by trying to predict the next letter, number or character that is likely to follow what has already come prior. So using our parlance, the X would be whatever text string is already present, the f is a neural network, and the prediction (Y) is the ensuing text string that ought to follow what came prior. A simple way to think about it is as a significantly expanded version of the autocomplete function you are already familiar with that is prominent in various text, email and search applications.
I suspect you have had at least some interaction with a service like ChatGPT, Bard or Llama. The interface is one where the user asks a question- something like, “Who is considered the greatest basketball player of all time?”. That question (X) is simply a set of strings as far as an LLM is concerned, and the response you get is each different LLMs best guess as to the string of characters (sentences) that ought to follow. The response will depend on how each model was trained, and so if an LLM was trained on a set of documents that placed Michael Jordan of the Chicago Bulls as the greatest basketball player ever, that’s what the response would be along with an explanation that mirrors the logic of those documents. However, if another model was trained on a set of documents that placed Lebron James as the greatest of all time, the response to that same question would be Lebron James along with the explanations that were embedded in those documents. For many of these LLMs, the training is done on publicly available information on the Internet, so both the accuracy and whatever biases appear to be present is a function of basic quality and biases in publicly available Internet data.
What’s amazing about the set of models that have been released over the past year is that despite a very simple underlying process- that of trying to predict the next character that ought to follow whatever came prior- the results we get when we ask even complex questions of these LLMs sound not only reasonable, but are oftentimes quite accurate and largely valid. There are many anecdotes where the earliest researchers in the field were also quite surprised at the quality of the output from the latest LLMs. Furthermore, the range of input prompts these models can take and the quality of the responses generated are improving at an extremely fast rate. If you were to ask the state of the art LLMs from 5 years ago questions on a high school math or history exam, the results would not have been very good. But today, it appears that the very best LLMs are able to ace many written exams. I recently asked ChatGPT a bunch of questions about some very loaded topics in the realm of politics and foreign affairs (the 2020 election and the Arab/Israeli conflict) and the answers were far more nuanced and complete than when asking the same questions to a human being.
IMO- what this implies is that the nature of knowledge that is encapsulated in recorded language is well defined, and a computer that is properly trained can capture an astoundingly large portion of mankind’s recorded knowledge. There is an open question as to whether or not these models are intelligent, and my general answer to this question is no, they do not exhibit intelligence. As of this writing, state of the art LLMs are more like calculators or advanced software programs like Mathematica- while they do things that no human could possibly do, they fall far short of the standards of intelligence of the typical human. An articulation goes beyond the scope of this document, but I find the thoughts of one of AI’s leading researchers Yann Lecun of NYU/Facebook on this matter quite persuasive.
As a pragmatic matter, this basic understanding of state of the art LLMs is quite useful to anyone trying to get the most out of today’s publicly available models. There is a burgeoning field that is called “prompt engineering”, which is essentially about how to seed an LLM with the right information to “help” it generate the best response. In our parlance, this is essentially giving a model the best possible X that will elicit a Y that is most useful for the end-user. It’s very similar to how an important skill for research analysts over the past twenty years is how to engage with Google’s search function in a way that can help the search algorithm return the most desired results.
Part 2: Why is this technology so important?
Now that we’ve established what generative AI is, we can tackle the question of what makes this technology so consequential. Generative AI is just applied math modeling that is encoded in software. So to understand why this technology is so important, we need to first understand the evolution of software, and how it came to become the core competency of the most valuable companies and organizations over the last 75 years.
In the beginning: 1948.
In 1948, the first software program was successfully executed at the University of Manchester, England. It calculated the highest factor of the integer 262,144. Interestingly, this historic event took place shortly after the Manhattan Project was disbanded.
For the next 75 years…
Since that day, software has continued to evolve and grow to the point where nearly every aspect of our daily lives is mediated by software. Whether you are shopping, learning something new, consuming media, or hailing a ride- a piece of code is somewhere in the middle of our experience.
The main purpose of every piece of software ever written is to either replace or augment human effort. This bears repeating, the main purpose of every piece of code ever written is to either replace or augment human effort. Google docs makes it easier to write and store documents than a typewriter and paper. Google sheets make the work of financial accountants and analysts much easier. Apps like Uber and DoorDash replace dispatchers with a few swipes and taps on your smartphone. Tasks that used to require countless man-hours, or were heretofore not possible, can be accomplished quickly and seamlessly with software. In the same way that the Industrial Revolution was about replacing physical human effort with machines powered by an energy source, software is designed to replace or augment work that requires mental energy.
It turns out that replacing or augmenting human effort generates a tremendous amount of economic value. Today, the most valuable firms and organizations in the world are masters of creating and deploying software at scale. Some of them build and sell software (Microsoft, Apple, Oracle) used by individuals and corporations, while others build software systems and deploy them towards the production of other goods and services (Tesla, Walmart, Amazon, Facebook, Google). Furthermore, the founders and executives of these companies are also among the wealthiest individuals and families in the world.
While the ingenuity that goes into designing and operating state of the art software systems is quite impressive, even the most advanced software systems today accomplishes rather simple, low level tasks. They answer basic questions like “where’s my package?”. It can help with a hotel booking, or a car rental, or a flight reservation. These systems essentially store and retrieve information. Even the latest programming languages like solidity that enable smart contracts on the Ethereum network simply allow a set of business processes to be automated. Software remembers and recalls, but it cannot plan or reason. It is not intelligent.
As I stated in Part I, generative AI is also not intelligent in the classic sense of the term. However, what is remarkable about the currently deployed technology is its ability to harness the entirety of recorded information in a way that is accessible and usable when properly prompted. Anyone who is using programs like ChatGPT rightfully marvels at its ability to provide reasonably accurate and informed answers to a wide range of questions across a variety of fields of human knowledge. It is not difficult to draw a line between what we observe right now to what is possible as more and more sources of data, both public and private, are fed into these LLMs. We are on the cusp (if not already at) of seeing LLMs draw heretofore undiscovered insights in any field of interest, similar to how AI models trained to play high dimensional games like chess and Go were able to come up with novel strategies that surprised the very best human players like Kasparov in chess and Lee SeDol in Go.
To the extent that human effort and activity that is valued in the marketplace is essentially the harnessing of recorded knowledge brought to bear on a specific problem, then a whole class of jobs and economic opportunities are ripe to be upended by this technology. The Industrial Revolution disrupted an entire class of laborers by replacing human physical energy with machines powered by hydrocarbons. Online retail disrupted an entire class of brick and mortar retailers by replacing an in-real-life (IRL) experience with a virtual experience. In the same way, generative AI that is embedded in software has the potential to disrupt a whole class of knowledge workers armed with knowledge that comes from years of schooling, training and experience. The most recent Hollywood screenwriters strike was centered on exactly this issue. It is not difficult to imagine a world where a properly trained AI is substantially better at diagnosing physical health issues than even the best physicians. Or an anthropomorphized AI providing mental health counseling in a way that makes a patient feel far more understood than a human.
In the words of Elon Musk- founder of Tesla and early investor in OpenAI: “AI is the most disruptive force in history. We have something for the first time that is smarter than a human.”
Part 3: What Should Concern Us?
Many observers are comparing this moment in generative AI to the development of the atom bomb memorialized in the 2023 film Oppenheimer. A powerful technology is emerging and there are calls from politicians and practitioners for it to be developed and stewarded in a responsible manner. Geoffrey Hinton, aka the Godfather of AI and a Turing Prize Winner, resigned from his post at Google given his concerns about the future while stating, “I’m sounding the alarm, saying we have to worry about this”.
Concentration of Power.
Those of us who remember the 1984 film Terminator have a mental picture of the dystopian world where machines rebel against their human creators and render mankind as a resistance movement against their AI overlords. While this particular risk has captured much of the public’s imagination, I think we have a more near-term and pressing issue at hand. At the time of this writing, the expertise and resources required to further develop generative AI is concentrated in the hands of a small number of for-profit corporate entities. Specifically, the four horsemen of Big Tech- Microsoft, Google, Amazon and Facebook- either internally or through significant financial investments in entities like OpenAI and Anthropic substantially control the trajectory of this technology. IMO, there is nothing overtly wicked or nefarious about these particular companies. However, the folks at the Center for Humane Technology put together a presentation called The AI Dilemma that highlights how AI’s nearest predecessor technology- social media- was our first contact with AI and provides some insights into what happens when power and influence is concentrated in a small number of entities.
To single out Facebook and Google for a moment- both of these companies began with very admirable mission statements. Facebook’s mission statement was:
“To give people the power to build community and bring the world closer together”
No one argues against a desire to bring the world closer together. But we now know that in reality, Facebook, as a result of their fiduciary responsibility to shareholders, is a hostage to the engagement maximization algorithm. The best way to maximize engagement was to create echo chambers that further polarize and “create dividing walls of hostility” between different segments of society. In other words, by design the company operates in a manner that is the exact opposite of its intended purpose.
Similarly, Google’s company motto was:
“Don’t be evil. We believe that in the long term we are better served- as shareholders and in all other ways- by a company that does good things for the world even if we forgo some short term gains”
This motto was a reference to much of Silicon Valley in the 1990s referring to Microsoft as the “evil empire”. However, Google as a public company also had to fulfill its fiduciary duty to shareholders and they became the leading practitioner of what professor Shoshana Zuboff calls “surveillance capitalism”. Google and firms like it in their ecosystem collect our personal data and transform them into prediction products and behavioral futures markets. They possess information about their 4.3 billion users that kings and priests of yesterday could only dream of having. In the limit, the essence of what it means to be a free-thinking, autonomous person is challenged as they know almost everything about us, but we know little to nothing about them.
A barrier to entry: Increasing returns to scale.
Firms like Google and Facebook were able to achieve their near monopoly status in their respective domains because their business processes exhibited network effects and their production functions experienced increasing returns to certain inputs. The more users joined Facebook- the more likely that the marginal user would join Facebook. The more data that Google captures about an end-user, the more likely they are able to show a relevant ad, which leads to more revenue and investments in other ways to capture data about an end-user, and so on.
Generative AI has a similar underlying production process. The three main inputs into generative AI are compute, data, and engineering talent. As of this writing, the scale of computing infrastructure necessary to train LLMs is alone a significant deterrent to entry, and firms like Amazon, Google, Facebook and Microsoft are already operating at massive scale to support their existing businesses. ChatGPT with its nearly 200million users receives a tremendous amount of human feedback that improves its model, and thus they are receiving important data at a rate that is faster than their peers. And finally, while much of the baseline knowledge for improving generative AI models is taught in universities and other academic settings, the frontier of this research is happening inside of Big Tech and this knowledge is a key asset legally protected in various employment agreements.
The antidote for these concerns?
A natural question is what can be done to prevent a replay of what we experienced over the past several decades with our first contact with AI? While government regulation is a common approach advocated for by concerned parties, I believe that government regulation can quickly devolve into government overreach that will either stifle innovation or result in regulatory capture that protects the incumbents. Another approach is to adopt mechanisms rooted in decentralized systems where power accrues to a protocol more so than to a particular company. This is where innovation centered on the blockchain (aka, the distributed ledger) and open source technologies provide a way forward. A fuller exposition goes beyond the scope of this article, but stay tuned as I will have more to say about this in the near future.