OverFlowAI takes the corporate’s core asset, exposes solutions in a usable interface, and creates a gen AI loop to create new content material
With the emergence of extremely efficient fashions like GPT4 supercharging generative AI, how information professionals can present long-term worth to organizations they’re a part of is evolving. Actual worth goes to come back from not simply being essentially the most technically competent individual within the room however having the ability to form how this impacts product and enterprise outcomes. This consists of having the ability to information your group in direction of the fitting information technique and shaping how information merchandise tie seamlessly into product experiences. The evaluation on this article across the the transformation at StackOverflow serves as a compelling case examine in direction of that aim.
StackOverflow, essentially the most generally used platform by software program builders for programming assist, has been via a tough journey these days. For those who haven’t used StackOverflow earlier than, it’s a Quora / Reddit-like Q&A discussion board the place you’ll be able to ask programming-related questions. It’s been a number of years since I wrote production-quality code however again once I did, StackOverflow was unbelievable. For instance, in the event you bumped into essentially the most obscure of errors whereas compiling your code and received an error message you could possibly not make sense of, you’ll stick it into Google search. Extra typically that not, you’ll discover a StackOverflow web page the place somebody had requested the identical query and received a solution. Much less typically, you’ll discover one other soul who had the very same obscure downside as you however received no reply — through which case, good luck. Extra exactly, 69% of questions on StackOverflow are answered, which is fairly spectacular.
Lately nonetheless, StackOverflow’s visitors has been in decline. Similarweb’s information exhibits that their visitors dropped 14% 12 months over 12 months (StackOverflow says it’s closer to 5%). However, the development is downward and is defined primarily by the emergence of AI coding merchandise like ChatGPT and GitHub Copilot. These merchandise have significant code-writing capabilities and are subsequently in a position to present programming assist, at the very least partly nearly as good as StackOverflow does. Mockingly, a number of of the big language fashions (LLMs) behind these AI merchandise had been skilled utilizing scraped StackOverflow information.
The corporate has gotten fairly harsh media protection with these developments. Enterprise Insider, of their article Death by LLM, wrote:
Welcome to the way forward for the web in an AI world. On-line communities like Stack Overflow and Wikipedia thrived as hubs for specialists and curious browsers to come back collectively and share data freely. Now these digital assembly locations are being pillaged by massive tech corporations prowling for human information to coach their giant language fashions.
The brand new merchandise rising from this generative-AI growth are placing the way forward for these on-line boards unsure. The chatbots reply questions clearly, routinely, and sometimes pleasantly — so people don’t have to take care of different people to get data.
In midst of all this consideration, StackOverflow has performed a gentle hand and articulated their two-pronged method to addressing this problem:
- A couple of weeks again, they introduced that they may begin charging giant AI builders who use the platform’s 50M+ questions and solutions for mannequin coaching (we dug into this concern within the data scraping article earlier)
- Final week, they launched the OverflowAI product, which is a set of truly helpful generative AI options that may assist kick off their second innings — we are going to concentrate on this right now
On this article, we’ll dive deep into:
- AI code writing instruments disrupting StackOverflow
- What OverflowAI does
- Underlying developments from the StackOverflow technique
There are a number of AI code writing and editing tools obtainable out there right now. These are both impartial merchandise (like OpenAI Codex, ChatGPT, Google Bard) or merchandise which might be natively built-in inside current platforms (like GitHub Copilot, Replit Ghostwriter, Amazon CodeWhisperer). They’ve a broad vary of capabilities together with code technology, code enhancing, autocomplete and debugging.
The merchandise which have native distribution (like GitHub Copilot) are at a big benefit as a result of they will function seamlessly inside environments that programmers already use right now, and we are going to see extra merchandise trying to get plugged into current environments. For instance, CodeGPT has a plugin that lets builders use the product from within Visual Studio Code (a preferred code enhancing device).
Present AI code writing instruments are good at sure duties. For instance, this Reddit thread captures suggestions from a number of internet builders about GitHub Copilot — the overarching theme is that the product is beneficial in a subset of conditions the place builders have to put in writing internet new code and don’t need to spend time writing from the scratch. Even for these conditions, it’s typically hit and miss.
The reason being not shocking. Conceptually, giant language fashions (LLMs) absorb a ton of knowledge and generate output on the idea of this assemble: in a specific context, for the query you requested, what’s the almost certainly phrase / textual content to comply with the earlier phrase. It’s primarily calculating the likelihood of a phrase following one other, and producing output based mostly on that. Regardless of this assemble, given the quantity of knowledge that’s gone into coaching these fashions, the outcomes for the extra normal ChatGPT use circumstances (like drafting an e-mail or summarizing a web page) have been nothing in need of spectacular. Nevertheless it’s essential to do not forget that language fashions, by design, have restricted analytical / math capabilities. In different phrases, once you ask the mannequin “what’s 2+2”, it could provide the proper reply — not as a result of it is aware of math however as a result of it has seen that textual content sample earlier than in its coaching information.
Equally, relating to code technology, the mannequin does probably not “know” the underlying ideas behind programming however is predicting outcomes based mostly on its coaching with a ton of textual content information. The consequence of that is the GitHub Copilot suggestions above — it’s typically good at producing the bottom code you want, however its skill to really perceive code, debug and supply you explanations is proscribed. It will get higher over time however it’s exhausting to say if it should ever get to the purpose of excessive accuracy / excessive reliability.
StackOverflow CEO Prashanth Chandrasekar describes it succinctly:
One downside with fashionable LLM programs is that they may present incorrect solutions with the identical confidence as appropriate ones, and can ‘hallucinate’ info and figures in the event that they really feel it matches the sample of the reply a consumer seeks.
In some unspecified time in the future you’re going to wish to know what you’re constructing. You could have to debug it and do not know what was simply constructed, and it’s exhausting to skip the training journey by taking shortcuts.
That is the chance for StackOverflow — their visitors drop could also be everlasting and it’s very doubtless that programmers come to StackOverflow much less typically for easier questions (eg. they may not go to StackOverflow anymore for an off-the-shelf sorting algorithm). However the place the product can shine is: 1) offering excessive accuracy / excessive reliability solutions to extra advanced questions that language fashions won’t have the aptitude to reply, and a pair of) offering solutions to questions in new applied sciences / downside areas that the fashions haven’t had earlier information to coach on. OverflowAI is designed to instantly faucet into this chance.
There are three key aspects they’re betting on — direct solutions to questions, usability from inside developer environments, and supercharging data inside enterprises.
OverflowAI Search gives direct solutions to customers in a Q&A format (much like ChatGPT), however gives a number of hyperlinks to precise StackOverflow posts. Apart from serving to create belief, this additionally gives customers with the chance to go deeper the place the reply offered by AI doesn’t totally clear up the consumer’s downside. This strikes the fragile steadiness of giving a direct reply when the query is straightforward, but in addition guiding the consumer alongside a extra exploratory path for tough questions.
If the consumer shouldn’t be glad with the responses, they will enter a chat-like interface to ask follow-up questions. If not one of the solutions are passable, they will ask StackOverflow to draft a query on their behalf, able to be posted to the Q&A discussion board. This expertise additionally saves customers from the semi-often scenario the place they query they ask is already answered beforehand.
The product additionally doubles down on usability by making all of this functionality obtainable from Visible Studio Code via an extension. This helps StackOverflow compete extra successfully with natively built-in coding assistants by letting builders get solutions from inside their coding environments (as an alternative of getting to context change and search from a browser).
Along with this, for enterprise clients, OverflowAI is creating the flexibility to plug in a number of totally different sources of knowledge inside an organization (inner Q&A, wiki pages, doc repositories), to supply a cohesive Q&A expertise for builders. Having the ability to make the most of inner and StackOverflow information, and extra essential exposing this simply in a Q&A kind interface, could be a massive productiveness enhance for engineering organizations. Additionally they intend to launch a Slack integration as a seamless interface to reveal this functionality.
What’s spectacular about OverFlowAI’s product method is that it takes the corporate’s core asset (solutions to tough questions), exposes solutions in a extremely usable interface wherever the customers are (whether or not on slack or inside developer environments), and in flip creates a loop the place customers can leverage generative AI to submit new questions.
StackOverflow shouldn’t be precisely a public firm — they’re owned by Prosus, which is in flip a part of an even bigger holding firm, Naspers, which is publicly traded. Subsequently, it’s exhausting to get clear income information however a report from Prosus printed in May 2022 sheds some mild:
- The corporate made ~$89M income in 2022, break up 50–50 between the enterprise product StackOverflow for Groups and Attain merchandise (promoting and employer branding)
- From 2021 to 2022, StackOverflow for Groups income was +69% whereas Attain merchandise income was -12% (there may have been extraneous components that impacted 2022 income like slower hiring)
This income information mixed with the what the OverflowAI product does factors to some clear developments in direction of the place StackOverflow is headed on the planet of Generative AI (these developments can be prolonged to different Q&A platforms):
- Their promoting enterprise, whose success is instantly tied to visitors, is in decline. This isn’t essentially dire and simply factors in direction of a broader development — there’ll doubtless be fewer eyeballs / web page views as a result of shoppers will instantly get solutions to simpler questions (which is sweet) and subsequently promoting turns into a much less crucial income.
- StackOverflow will proceed to be a worthwhile supply of solutions for tough questions, and the quantity of questions and solutions will proceed to develop with the corporate’s generative AI push to routinely draft / submit questions. As well as, it’s additionally doubtless that if StackOverflow can preserve the content material engine operating, the high quality of content material on the platform improves, as repetitive / simple questions will not be the very best quantity of content material.
- StackOverflow will double down on constructing experiences the place they will ship most worth to customers (like OverflowAI Search and Visible Studio Code extension), and concentrate on product traces the place clients prepared to pay for these superior experiences (eg. StackOverflow for Groups)
- Knowledge licensing applications, the place they cost AI corporations for coaching on their information, will speed up
The developments all level in direction of a course the place StackOverflow is efficiently pivoting to the following part of the corporate, and the corporate has made the fitting product / enterprise investments to climate what was a possible disruption. As well as, they’ve additionally completed a worthwhile group service and laid out a playbook for different Q&A platforms to leverage. Total, I’m optimistic in regards to the course they’re headed in direction of and that it will ignite a thriving content material ecosystem sooner or later.