Offshoring and AI Agents

Plus! Capital Structure; Designer Brands and Transaction Costs; SMBs; Top-Down Market Reform; Chosen Obligations

In this issue:

audio-thumbnail
The Diff October 14th 2024
0:00
/380.604082

Today's free post is brought to you by our sponsor, Mainshares.

Offshoring and AI Agents

This piece is a guest post from TW Lim of Antithesis, which is building amazing things. For more on them, see this Diff profile.

Coding assistants are frequently compared to humans we’re still shaping – children, or high schoolers, or interns – and I’ve worked with a fair few of these, because I used to run a fancy restaurant. Every few months, we’d get a stagiare coming through. A stagiare is the restaurant equivalent of an intern, raw and wide-eyed. With any luck, they’d be eager to learn, and with even more luck, they’d actually be capable of working somewhat independently in the kitchen. So I’d take these stagiares, and find something useful for them to do, like slicing onions. 

On one level, what slicing onions involves is:

  1. Take onions, cut into 1/16” slices. 

In an ideal world, the stagiare will take my bag of onions, go away, and return with a pile of sliced onions, saving me the 15 minutes it would have taken to do it myself. In reality though, this apparently simple task is a minefield. A few things that I’ve seen go wrong:

  1. Onions got sliced into wrong shape (rings instead of sticks, half rings instead of full rings)
  2. Huge mess in prep station, onion detritus everywhere, other cooks pissed.
  3. Stagiare used knife instead of mandoline, returns with ¼” slices with 100% variance in thickness, not nice, even 1/16” slices.
  4. Stagiare used mandoline instead of knife, cut off fingertips, blood everywhere. 
  5. Stagiare used finger guard on mandoline, tiny plastic chips in onions (but no blood).
  6. Stagiare used meat slicer, fails to clean it (or tell anyone). 

You can see where this is going. If all I care about is getting onions subdivided, this is an easy job to specify. It’s even relatively easy to tighten up my spec, so I know they’re going to come back in the right shape:

  1. Take onions, cut into 1/16” slices like for onion rings, using a mandoline.

But if I care about things like there not being a mess in the kitchen, or not having to drive the stagiare to the emergency room, or the trimmings being saved for stock, then I have to specify all that too. These specifications get much more complex, because doing the job the way this particular kitchen does it, involves more tacit knowledge than explicit knowledge. All this is stuff my sous chef just knows, but needs to be spelled out for a stagiare (except maybe the part about not cutting your fingers off). 

At this point you can probably see the parallel. You might not need to write the function/slice the onions yourself, but unless you lay out all your tacit requirements – everything you’d expect a human colleague to take into consideration when given the same assignment – you might not be delighted with the result. And knowing what to specify is a different skill than actually writing code. (In fact, it’s one of the skills that contributes to the enormous variance in how productive different people are in jobs that are all nominally about “writing code.”)

If, like me, you remember when offshoring the bulk of our coding seemed like a great idea, this is probably sounding doubly familiar, because there’s a lot of it with offshoring relationships as well. The promise was exactly the same – offshoring would cause developer productivity to skyrocket, development costs would plummet, and we’d see a thousand techno-flowers bloom. But we soon discovered that specifying what to build took almost as much time as actually building it, and underspecification would result in surreal nightmares. We learned that we had to specify what happened in every single failure state, because that wasn’t something offshore teams just knew. We learned that we had to specify which libraries to use, how and where to log, which external APIs needed to be checked for updates when. 

And even when we specified everything we thought could possibly be specified, we’d sometimes get things back that met the acceptance criteria and also took the system down on every 57th run. Or the demo would go great and the madness lay under the hood, linked lists in unnatural places, applications written without a single library, data structures designed by HR Giger. Or maybe the shortcomings were subtler, just a slight difference of conventions, of coding styles, that mark the code as “not something we wrote here,” and then over time as the application accumulates new features and bits of cruft, the divergence grows greater and greater and the folks who originally worked with the offshore team leave—taking their mental spec, and all the refinements they’ve made to it since, with them—and sooner or later the application becomes hard enough to follow that it’s now technical debt. With enough bad outsourcing experiences, you start to long for some structured and completely unambiguous way to explain what a program is supposed to do—and then you realize that this exists, and it’s called a programming language.

If what we’re doing is outsourcing the slicing of onions to a stagiare, it’s relatively easy to verify. Do you have a bin of onions sliced to your liking? Is there a mess in the kitchen? Is the stagiare bleeding? Great – we’re done. Or are we? Were those the last onions, or are the vegetable trimmings you were saving for stock are now in the compost, or has some other less obvious disaster occurred? Even then, there are only so many places in a kitchen to check. But what if we’re not making French onion soup, but a complex distributed system? How do we verify that there isn’t a mess somewhere?

All three of these helpers – the stagiare, the offshore team, and the coding assistant – have shifted the nature of the work, from doing to specifying, from building to teaching, from creating to verifying. If you’re a lead engineer or CTO, you’re probably not making a lot of commits yourself anyway, but engineers who generate a lot of code with AI will find themselves doing more of this too. The change will be asymmetrical: we’ll be spending more time reviewing code, while we’re not necessarily going to spend that much more time problem solving or building new stuff. The profession, in other words, is going to get a lot more review-intensive. 

We’re not going to be spending that much more time problem solving because coding assistants as they exist today haven’t really changed that part of the work. Even before ChatGPT came on the scene, having clear goals and a solid vision of how those broke down into discrete functions was a terrific indicator of whether your software project would succeed. Many teams and many companies don’t allot enough time to this part of software engineering, but the nature of what needs to be done remains the same. 

Right now most engineers aren’t trying to get AI assistants to write large chunks of software at once – rather, they’re feeding them requests for one function at a time, or relying on a kind of high-grade autocomplete native to their coding environment. There are companies like Cognition.ai out there aiming to change this, to build coding AIs able to execute complex engineering tasks, and if we get to that point, the way we think about software specification, and the amount of time we spend doing it, will most likely change. But for now, today’s function-level assistants won’t change the way humans relate to the work of planning what they want software to do. 

However, AI assistants will likely make the work of knowing what exactly is in your codebase harder, for several reasons – none of which have to do with the well documented tendency for AIs to hallucinate weird code strings and phantom methods in phantom libraries, and all of which have to do with the practice and organization of software teams. 

The first, most basic issue is volume. Github, for instance, cites massive increases in throughput when developers use Github Copilot – 55% faster coding, 25% increase in developer speed and a 70% increase in pull requests. However you look at it, that’s just a lot more code to review. They also cite a 67% decrease in median code review turnaround time, so maybe this largely evens out, but I think that story is incomplete, because… 

Second, more code will be written by folks who are less familiar with what they’re writing. A major reason developers love AI assistants is that the assistants help them deal with unfamiliar problems faster than Stack Overflow or web search can, which enables them to both educate themselves, and tackle a wider range of projects. Most software companies already expect devs to figure out how to do things they don’t already know, but there’s a difference between giving someone time to really figure something out, and expecting that AI will help them almost instantly come up with a solution that they may be seeing for the first time, for a codebase they may not fully understand either. In addition, more code will be contributed by people whose core job is not software development, because AI assistants can take an intention expressed in natural language and convert that to code. This is a great indication of how well they work, but also ensures that the average contributor to a codebase will have less familiarity with it. Whether this pressure to work at the edge of your knowledge will extend to reviewers and repo owners will probably depend on the particular team and work environment, but the underlying direction is pretty clear. 

Third, reading code is just harder than writing code. Reading code requires getting into someone else’s head, because you’re trying to understand a slightly different dialect of the same language – dialects which can differ not just cosmetically, but in some pretty deep ways, like logic, construction, and framing. If you’re a code reviewer, think about every time you’ve had to walk over to someone’s desk and ask (nicely) what they were thinking, why they put something together a certain way. Think about the kind of answers you’d get from an AI assistant when you walk over to its desk. 

While you’ll probably learn your colleagues’ differing mental models over time, it might be harder to form a similar mental model of your AI assistant, and harder still to know what your AI assistant’s mental model of your system might be. And if your colleagues are all using AI assistants, how does that affect your ability to form and update your mental model of your system? All these things affect how easily you can review code, and I suspect AI assistants make all of these more challenging. 

Fourth, all of this is subject to the autopilot problem: A 99% reliable system can actually be safer than a 99.999% reliable one, because when errors are sufficiently rare, it's hard to pay close enough attention to spot them. This is especially true if the failure isn’t obvious to another human in the first place, and the way AI assistants work means most of their errors are either glaringly obvious to a human being, or subtle enough to get past a human. Finding these errors, by definition, is review-intensive, especially in a distributed system, where a perfectly logical line of code can, in the right context, create a problem.

Fifth and finally, I suspect that in the long term, there’s going to be a fundamental shift in the way developers learn their craft. Not everyone loved using assistants to write whole functions, but I couldn’t find anyone who disliked in-environment autocomplete. And no one complained about unit tests either – I don’t think anyone regards writing unit tests as anything other than pure drudgery, and fortunately AI assistants are terrific at them.

On some level, I don’t particularly want to slice another 50lb bag of onions for french onion soup ever again. On another level, I can’t imagine trying to make something elaborate, like a pâté-en-croûte, without having sliced a lot of onions first, working without a foundation of ingrained, basic knowledge. As we rely more and more on our AI assistants to autocomplete things for us, it becomes more and more like cooking without knowing how to use a knife (but in this scenario, you’re still very much a person who’s holding a knife and would prefer to keep having ten fingers).

It’s easy to see how all these factors mutually compound. I'm not arguing that machine-written code is not as good as handmade code, because I hope AI coders will one day be as good, or better, than humans are. At that point, the change I laid out earlier will probably feel a lot more complete – software engineering will be much more about specifying and reviewing than actually writing code. But in the meantime, we’ve taken another step in the long march towards abstraction that started with the first compiler, and it’s up to us, the humans, to make sure there aren’t problems hiding in the abstractions. 

A Word From Our Sponsors

Invest in the next generation acquiring American SMBs.

Over the next decade, an estimated $10T of small business value will be transferred to the next generation. These businesses are the backbone of our country. They employ over 60M Americans, generate 43.5% of our GDP, and consistently ranked the most trusted institution in the nation.

But, their owners have limited exit options. They can sell to an internal successor for pennies on the dollar, or sell out to a mediocre private-equity shop, who may fire their employees and tarnish their reputation.

At Mainshares, we are backing the next generation of American owner-operators keeping these businesses locally owned. We train them on acquisitions, help them finance deals with the backing of investors, and provide ongoing tools and support as they grow.

Investors join Mainshares to generate cash flow and grow wealth, while backing the next generation of American entrepreneurs.

Elsewhere

Capital Structure

Public market investors who want to bet on AI have three basic options:

  1. large tech companies with the resources to develop and deploy AI tools, but who also face the risk that someone else will build an AI-first version of their product before the AI-second version is ready.
  2. Smaller companies that have refashioned themselves as purer bets on AI, or at least that have a narrative that this is true.
  3. Nvidia.

The AI trades that involve betting on a company that is mostly doing AI, and that isn't an attempt to gin up investor interest ahead of a secondary offering, is mostly restricted to private markets. And any time investors' preferred investment is outside their asset-class habitat, there's an opportunity. So SuRo Capital shares rose 22% on Friday after the company disclosed investments in OpenAI and CoreWeave ($, WSJ). Aficionados of this particular kind of market history may recognize SuRo Capital from one of its previous incarnations, NeXt Innovation, which also had the strategy of investing in privately-held companies. That stock did well in the first few months of 2012 ahead of the Facebook IPO, and then lost two thirds of its value in a few months after the IPO happened.

Disclosure: Long META.

Designer Brands and Transaction Costs

Economists do a good job of constructing models of mostly rational agents, but obviously have a lot more fun explaining seemingly irrational behavior. Why, for example, does putting a designer's name on a piece of clothing make it worth more? It doesn't change the comfort of the product, but it's a signal to the buyer that it's high-quality, and signals the same thing to whoever sees them. So it's a similar function to the signaling model of education: it isn't changing the underlying reality so much as it's providing a high-bandwidth, low-latency way to communicate that it's true.

In this model, those brands get pricing power from an environment with just the right cost of information—low enough that it's possible for people to recognize a brand name, but not so low that they can buy cheaper but otherwise identical replacements. The latter is increasingly the environment in which fashion companies operate: either their suppliers, or competing suppliers with a knack for attentive copying, can sell copies of their products that are almost identical, but that are substantially cheaper. This market has existed for a long time, but two parts have gotten cheaper: first, it's much easier for people to find and vet sellers of these products; it's a purchase that naturally leads to bragging rights, haul videos, and the like, which means the goods are being constantly inspected in public and unreliable sellers get shamed. And those communities lower another barrier to entry: it doesn't occur to many people to even look for identical knockoffs at low prices, but the existence of these online communities makes the option hard to ignore.

SMBs

Sometimes there's a shorthand way to describe different investing strategies: you might use "satellite images of store parking lots" as a metonym for alt data signals, and "measuring whether the weather in Paris predicts the outperformance of Brazilian stocks whose name starts with the letter 'P'" as a shorthand for data mining-flavored quant approaches. The strategy of a search fund or PE firm buying a service business, often a blue-collar business with a founder approaching retirement age, tends to get summed up as something like "buying out HVAC contractors in the midwest." There are even memes. And the WSJ has done the meme, by writing about PE rollups of blue-collar services businesses like HVAC repair ($, WSJ).

The thesis is similar to the thesis for any other roll-up (The Diff has profiled a few examples which all focus on the specific opportunity of baby boomer entrepreneurs retiring ($)). If there were intrinsic economies of scale, the industry wouldn't have been so fragmented in the first place—but there can be moderate ones, from borrowing ideas across different companies, cutting a better deal with suppliers, and just getting the basic accounting right. And there's the natural social status factor: many of the people who ought to run these companies are similar to the people who started them, and, if they'd been born a few decades earlier, might well have started such a company. But now such people are identified early, sorted into better schools and then better starting jobs, and end up with a massive social status gap—even if they'd earn more and be happier running a small business than working as an investment banker, it would look like a step down in status. Wrapping it in what sounds like a high-status financial transaction rather than a job running a small business is a winning move.

Top-Down Market Reform

The Diff has been covering efforts in Japan, on the part of companies, exchanges, and regulators, to raise their market values. Korea has been following a similar mix of policies, and is also seeing results, with the value of shares canceled after buybacks more than doubling in the last year. In the US, trends in buybacks are more driven by which companies are public and by how margins are doing, but in markets where companies retain more of their earnings, and reinvest in lower-return activities, the aggregate direction of buybacks is less of near-term macro indicator and more of a long-term policy indicator (which, in a nice symmetry, feeds back into the macro picture in the form of higher long-term growth).

Chosen Obligations

In light of this morning's news from Sweden, a good little institutions story: one of the ways people who focus on institutions and property rights frame their view is that one of the most important rights people have is the right to be sued. Without it, there's no ability to make binding commitments. This is useful as an object-level observation about how property rights work, but it's also useful as a more general claim that there's sometimes value in deliberately constraining one's future behavior. This shows up in markets, too; handbag maker Vera Bradley adopted a poison pill plan, making it harder for the company to agree to a takeover, and shares rose ($, WSJ). This entrenches management, which is negative for long-term returns, but it also means that anyone who wants to fix that problem needs to pay a premium to do so. In this case, at least, the marginal shareholder believes that shareholders as a group are better-off when some of them have fewer rights.

Diff Jobs

Companies in the Diff network are actively looking for talent. See a sampling of current open roles below:

Even if you don't see an exact match for your skills and interests right now, we're happy to talk early so we can let you know if a good opportunity comes up.

If you’re at a company that's looking for talent, we should talk! Diff Jobs works with companies across fintech, hard tech, consumer software, enterprise software, and other areas—any company where finding unusually effective people is a top priority.