Adoption of microservices and software supply chain
Software development is undergoing industrialization, with more and more software rapidly assembled out of components and the emphasis on building automation around software validation and release processes.
Modern cloud-native software is no longer a monolithic application living in a single repo with majority of its dependencies self-contained. It is integrated from third-party components provided by vendors, cloud providers, OSS components with as much as 90% code coming from such dependencies. This allows us to build applications much faster, but makes maintenance much harder as it is no longer under our control. The third party vendor makes a change to their API and now we are on the hook to update our applications before they break.
Each microservice interacts with hundreds of other microservices and each is built and released independently making it hard to understand how to coordinate API changes across them. For example, if we want to make an API change to our microservice, what impact will this have on the consumers in the organization? Who will we need to talk to to coordinate the change? When it was all within the same repository and released all at once, it was much easier to make such changes.
Changing nature of technical debt
What we continue to call technical debt is really the activities that are related to tending to and upgrading our software when third party components are evolving or have common vulnerabilities and exposures (CVEs) and need to be upgraded.
These are tedious repetitive tasks that usually fall to the most experienced engineers as they require technical expertise to do correctly. Such activities can paralyze engineering organizations and are a tremendous burden on engineers that lead to burnout. Up to 30% of engineering time is spent on this and the perception that somehow developers accrued this technical debt and are doing something wrong that prevents them from keeping up is hugely demoralizing and demotivating.
Here is an example of one such tedious migration that a developer would have to do many thousands of times on a large codebase to migrate JUnit asserts to use Assert. You can see the full PR here.
However if we reframe technical debt as software supply chain management and stop blaming engineering for it, we can make the maintenance more predictable and consistent. By taking steps such as inventorying your third party components and determining how pervasive they are in the application (frameworks take more effort to maintain than a third party API that you just call from one part of the application), an organization can arrive at a maintenance estimate.
These activities are highly repetitive across organizations as everyone is integrating some subset of the same third-party components to create business value. This high level of repeatability points us to automation.
Specialized developer tools for software maintenance are non-existent
When we write code, our hands lag behind our thoughts. We think much faster than we are able to type the code out. You hear developers finishing some tedious task complain “my fingers hurt.” When the IDE autocompletes syntax, it helps reduce this lag to minutes.
When a major framework version is released, we contemplate our codebase. Maybe we need to change this type to that, reorder some method arguments, or change dependencies. The types of changes we need are enumerable almost immediately, but again our ability to implement them lags far behind our recognition of the problem. Often this type of lag can represent months or years of work in a large codebase. Accumulate enough of this lag, and the codebase can grind to a halt.
That lag kills the joy in development for most senior developers and leads to burnout. Upgrades/migrations amplify that lag by orders of magnitude.
However, there are few if any tools that focus on helping developers automate these remediations/upgrades/migrations. The IDE's focus is rightly on helping us write new code and maintain our own code, but when we have so much existing integrated code, we need a tool to help us maintain it as external components change.
Code authorship and code maintenance are quite different activities. Take for example, a rename method refactoring operation in the IDE - it lets us rename a method definition and all the call sites. This works well when the method declaration and all its call sites are in the same repository all in the IDE at once. But suppose a vendor changes a method declaration. The IDE refactor operations cannot help us with this. When a vendor changes a method/API, we end up using regex (regular expressions or text based search that hugely overeports) to look for the API in our code and then refactor it by hand.
When we write new code, these are all valid branches that IDE suggests that we could decide to pursue: Import, Create class ‘path’, Create enum ‘Path’, Create interface ‘Path’, Create type parameter ‘Path’ in function ‘assertChanged’, Create annotation ‘Path’.
Writing new code is like going to a restaurant and the waiter says do you want red wine or white? Neither is wrong per se at that point.
Note that new AI based autocompletion tools like Github’s Copilot are also code authoring tools. In the IDE, which is traditionally a rules-based engine, when we hit a shortcut, we know exactly what code is going to be generated. AI-based autocompletion is likely to generate a block of code that is unpredictable. As an authorship experience localized to a single point in the code, this can be valuable. Developers can review and accept or reject the suggestion because they are working on the code in that place.
There aren’t multiple valid branches to implement remediations like upgrading from JUnit 4 to 5. There are edge cases that require creativity in how they are approached, but this is the enjoyable part of work and should be left to developers. The majority is immediately enumerable. We look at release notes for any well documented upgrades and instantly know what we are going to be doing for a long while.
Because of this one-to-one correspondence between code before and after upgrades, a rules based engine for code transformations can be developed. Our IDEs use similar underlying technology, but are oriented toward single-repository manipulation, not the management of organization-wide software assets. Moreover, when transforming large bodies of existing code, we can make a simplifying assumption that the code is in a working state to begin with, so that we can assemble much more complex rules while still being 100% accurate. We can progressively encapsulate lower building blocks to do amazing things.
One last difference between code authoring and code maintenance is that code authoring is single threaded. We can only author in one place in the code at any one time. Code maintenance needs to be coordinated across multiple places across the code base, within the same repository or across the repository bounds. Often, we have a pattern we want to change in our repository, but the IDE only suggests this as an improvement in our current location (one file). Other times, changes need to be coordinated across repository boundaries if we want to change APIs and their consumers (dependency management being a case of this change, when the producer/consumer of changes are different organizations). That’s why this technology needs to exist outside of the IDE.
OpenRewrite is an OSS project that offers semantic analysis and refactoring of code as standalone operations, so that everyone can use and contribute, over time composing more and more refactoring operations that make whole framework migrations possible. It is integrated with build tooling and can be plugged into different workflows, from CI integrations to mass refactoring of multiple microservices/repositories.
OpenRewrite originated at Netflix, where Jonathan Schneider, founder of OpenRewrite, worked in engineering tools. Netflix’s culture of freedom and responsibility led to code that didn’t conform to any single style. It also prevented a central team from imposing any gates on product engineers, so they couldn’t just break the build when they saw a pattern they didn’t like. When a central team asked product teams to develop in a specific way, what they heard repeatedly is “I do not have time to deal with this, but if you do it for me, I’ll merge”. So eventually Jonathan took it literally and built automation to affect such changes. The predecessor to this was the Gradle lint plugin, a popular plugin that manipulates groovy build files and upgrades dependencies. The usage of OpenRewrite expanded to Java when the platform team wanted to stop maintaining an internal logging library that originated before SL4J and replace it with standard SL4J. At the time, it had been deprecated for six years, but because of this culture there were still countless references to this library in the Netflix codebase. With the adoption of automation, it was removed and finally deleted within days.
How OpenRewrite operates/internals
Similar to an IDE or other developer tools, OpenRewrite manipulates an abstract syntax tree (AST) representation of code, but this AST has special characteristics that allow it to transform this AST and generate code back in the standard text representation.
The OpenRewrite AST is produced by guiding the compiler through the first two phases of compilation to generate compiler type-attributed AST, but then maps it to the OpenRewrite AST that also preserves formatting and breaks cycles in the AST. The three unique characteristics of OpenRewrite AST are
- Type aware, allowing 100% correct semantic code analysis and transformation.
- Style-preserving, so produced transformations are idiomatic within the projects they are applied to. So the same transformation applied to multiple projects will look potentially different.
- Serializable, allowing to output the AST from the build and operate on it outside of the build en masse.
OpenRewrite calls a single code search or transformation operation a recipe. OpenRewrite provides a lot of building block recipes like find method, change method, find transitive dependency, upgrade or exclude dependency. These recipes in turn can be composed into more complex recipes by grouping them together into a composite recipe. When the building blocks are not enough, a recipe can be written as a program in the same language as the code we want to transform, allowing us to encapsulate complex logic with the full expressiveness of the language already familiar to developers. We don’t need to learn a new DSL or programming language.
These building blocks abstract away many of the details to ensure that edits that we make to source code obey the original style of the project. This is possible for even complex changes like this automated migration from Spring Boot 1.x @ConditionalOnBean to Spring Boot 2.x AnyNestedCondition looks idiomatically consistent in the context of the project that it is inserted into:
For example, our composite JUnit 4 to 5 recipe contains the following transformations:
One curious thing about OpenRewrite recipes is that it started by affecting code transformations and backed into search from there to address the common request of “I'd like to first search for this problem in my code to see how pervasive it is before I write a refactoring recipe for it”. So it treats search like a special case of transformation by just adding markers on AST elements that a recipe finds that it can then hydrate into text however the developer wishes (usually as comments that can contain TODOs, Jira issue numbers etc.).
To identify every method invocation in a whole package of Google’s Guava library, we can instruct OpenRewrite to perform this search:
The results show it isn’t looking for methods with particular names.OpenRewrite can actually prove that these method calls come from the desired package:
This recipe in plain OpenRewrite YAML looks like:
Last thing to note, OpenRewrite is not a replacement for an IDE or other developer tools, but a specialized tool for managing existing codebases and the software supply chain. To continue with the rename type example in IDE, OpenRewrite provides a change type that only changes the call sites when a third party dependency has changed.
Moderne hosts a public and private service that is complementary to OpenRewrite, allowing organizations with large codebases to search and transform code real time by applying OpenRewrite recipes as well as implementing organizational wide change campaign management to rid code of bad patterns and make sure they never come back. Developers can search and transform code, examining diffs and issuing Pull Requests that can later be reviewed and put through the usual CI/CD pipeline.
Moderne operates on already produced and cached AST in a horizontally scalable way, allowing it to produce results in seconds.
Open source authors can use the public Moderne platform to fix their code so they can focus on the enjoyable aspects of developing their project. There are a lot of facilities in the service that make it easier to write and validate recipes on millions of lines of other OSS code, so framework authors or security researchers can create recipes for upgrades or CVE patching, helping their consumers with this tedium as well.