Datadog's Developer Experience for APIs and More

Subscribe to the Youtube Channel

The Guest

Stephen Boak is the Vice President of Design at DataDog, a monitoring service for cloud-scale applications. Datadog provides monitoring of servers, databases, tools, and services, through a SaaS-based data analytics platform. The company offers a powerful product to effortlessly explore large amounts of data, and help users understand their infrastructure through interactive visualizations.

The Interview

What problem is Datadog solving?

Datadog is the platform for Devs and Ops in the cloud age. We began as an infrastructure monitoring company; we have since expanded the platform to include application performance monitoring, logging, and synthetics, along with a bunch of other tools. We aim to be the single source for Devs and Operations professionals in the age of the cloud.


For your main user being a developer, what’s the developer experience like?

The developer experience for Datadog begins with an agent. A lot of developer tools have this in common with Datadog, that a lot of other consumer and non-developer tool products don’t share. You still need to get started by installing an agent to run Datadog.

Depending on your setup, that could mean using SHH (Secure Shell) to access a host and run a few commands to an installer agent. There are other ways to do it, but this is how we began to collect data off of the servers that you are running Datadog.

Our agent software collects everything from system metrics to traces to logs. Then you get started creating things like dashboards, monitors, and things people in Datadog use everyday.


Are the users moving elements in the UI or doing something on the backend?

This is an interesting thing about monitoring. It’s at a nice intersection. For me as a product designer, this is an interesting thing too. Datadog exists at the intersection of a developer tool and a really compelling UI.

A lot of developer tools are just APIs. When you think of companies like Stripe or Twilio, their core business is selling APIs. For Datadog, our core business is dashboards [and] monitors; things that are highly visual and highly interactive. So a lot of the business in the product, is done in the UI.


If someone did not want to use Datadog’s UI, could they do so?

Yes, you can automate via our APIs. A lot of our biggest customers do this.

Do you see the term “developer” transforming to include a wide range of skills and proficiency levels?

The move to DevOps has really broadened the meaning of developer. When developers and operations were separate, there was a pretty clean wall between writing code and managing infrastructure. That line, to say the least, has gotten a lot more blurry.

A lot of developers maintain their own infrastructure. They are oncall for their services, so when things go down, they are responsible for it. They are using tools like PagerDuty, in concert with Datadog, to manage those kinds of things. In addition to the ownership of the infrastructure, you have this full stack visibility across the frontend and the backend.

Datadog’s platform combines infrastructure monitoring, tracing, and logging. This can span across many different kinds of developer roles. Frontend developers may be very interested in logs as the source of a problem. Backend developers may be looking at traces or various infrastructure issues. All of those pieces could represent many different developer roles in a single company that are all coming together to solve a problem.    

When you approach a design problem, are you looking at the user called “developer” or do you try to segment?

We segment in all kinds of ways. Depending on the size of the company and the kind of company that it is, developers will be stratified in their roles in very different ways.

Whenever Datadog releases a new feature, we target pretty specifically for both the kind of company and the kind of developer. We will even go as far to do private betas and small rollouts of things to a specific customer group to make sure we are on the right track.


Can you pick a specific story about a pain point?

When we started to look at AWS Lambda, and serverless functions in general, that was a real test of what it meant to combine data from different sources. To fully understand Lambda and serverless, you are not just looking at infrastructure, but you have to bring together metrics, traces and logs. We could not paint a complete picture of Lambda without all three of those pillars.  

To support Lambda, we are ingesting data. We are getting metrics from Amazon CloudWatch, traces from AWS X-Ray, and logs from AWS Cloudtrail. All of those things come together into one central dashboard for the product. That was a new test for us; to bring these disparate sources together, to really understand a new type of product.

What kind of user research did you do for this project?

We do qualitative user experience testing. That can be everything from interviews with our early testers to close beta releases, where we’ll go interview a bunch of people we know that are interested in the new product. We try to iterate quickly in those early phases with the people who care the most about it and get to something that has much wider appeal.

You did over ten episodes on the podcast Don’t Make Me Code. What developer experience lessons stick with you?

There are a few episodes that really stand out to me. One of them was with Dustin Larimer from Keen.io. He was at Keen.io, but he’s not now. At that time, we were talking about Keen’s APIs, and how they were going through a major refractor of those APIs.

They began that process with the documentation. They wrote the documentation that they wanted for their APIs first and they actually refractored the APIs to match what they wanted from the docs, which I thought was really cool.

One of the themes from the episode was about affordances. For designers, you think of affordances in physical objects, like a toothbrush is about the size of your hand and you can sort of figure out what it is based on that. With software interfaces, we still have affordances, things like buttons and links. We all know what they are. Those are visual affordances.

Developers often don’t have the luxury of visual affordances. In fact, if you are dealing with an API, the documentation for that API is the only affordance. We spent a lot of time talking about the meaning and importance of good documentation. As an actual affordance for developers, it matters a lot. The consistency, the lexicon, and the choice of words, all of these things are super critical.

Datadog did a great job refactoring the Datadog API documentation to take into account the different ways people are using the platform to try to be consistent in terminology. Also, to give people lots of easy ways to find things with these same ideas in mind.


What does it mean to have “good” documentation?

Specifically with documentation, I think there are three things Stripe has done really well from the beginning. One is to have good verbose documentation of everything. Document every function and document it clearly. Second, provide examples, like this is how you would use this thing. Three, have a copy and paste example where you could literally take the example verbatim and copy into your terminal and get the results you expect.

With Datadog, and any developer tool, onboarding is almost certainly to be harder than it would be for some consumer tool. It’s not just enter your email address and get going. The bar is certainly higher.

For Datadog, it’s installing an agent. For other companies, it’s running shell commands, using APIs, or putting in API keys for some other service. Recognizing what the critical piece of your onboarding experience and making it as absolutely simple as it possibly can be is super important to traction.

How do you design for unhappy paths?

This is actually a great point that error states and error messages are important parts of the user experience, particularly for APIs. If you do happen to use an API incorrectly, the only feedback you are going to get is the error message that comes back. The error message should be verbose as possible and explain what went wrong so you know how to fix it.

This is a common pattern across all design. The prototyping tools we use have come a long way. We can build dynamic simulation of user experiences, but we still almost never do all of those other things; what does this look like if no data comes back, what does it look like if the data is poorly formatted, or some kind of error occurs. That we, as designers, don’t think of those things enough.  


Have you had an experience where you were leading a design project and a person said ‘well our audience is very technical, they like complexity’?

Assuming that a developer wants something that is extremely dense and extremely technical is a red herring. Yes, our users are very savvy and smart. They use our tools as part of their profession so they are expert users. It does not mean they want to employ all those parts of their brain all the time.

I have tried hard pretty much everywhere I worked to kind of fight the automatic assumption that people want more density, more control, more icons popping up everywhere to offer every single option. Yes, these are our power users and yes these are people that want to do a lot of things. That does not mean exposing everything all the time. Those are the wrong things to focus on.


It shouldn’t be called developer experience. It should be called people experience. They are still people. Do you resonate with that statement?

Absolutely. They want the same kind of automation, sensible defaults, and streamlined experiences that anyone else wants. Because developers have had such great experiences with consumer products, they are expecting more and more of us.


Has there been a part of the developer experience, that you feel like it’s difficult to change or move the needle?

To influence the experience for developers, developers have to trust you. As a designer, I think that has compelled me to learn a lot about all of this. Now growing a design team, is a big challenge. Not just hiring people, but hiring designers to work on developer tools is not the easiest thing in the world. It’s not the most natural place for a designer.

One, it is getting people who are interested in working on these kinds of problems; these highly technical, more developer-driven problems. Also, getting the designers confident in what they are talking about. Getting them to know enough about what they are doing so they can effectively design. To approach a designer that’s brand new to the developer tool space and say, “Hey, we have this new container scheduling platform and we want you to design an experience for it,” there is a pretty big gap between where they are starting and where they have to get to, to effectively design the product.    


How were you able to ramp up in the developer tool space?

I have only learned these things slowly. I have been at developer tool companies for almost 10 years. I never had anyone else pulling me up the ladder. Now, this is the thing we are dealing with at Datadog. We have a team of 16 product designers and I would like for them to learn faster than I did.

We have tried to put into place processes where at the beginning of a big project, we have some sort of kickoff meeting, a sort of design sprint. Where a product manager, an engineer, and a couple of designers get into a room and talk about all of this stuff. What are the use cases, what does the competition look like, what do the integration points look like, how does this new thing talk to other parts of Datadog.

We try to accelerate all of this. We try to do a rapid fire education to kind of get everyone knowing what they need to know.


When the designers join Datadog, what sort of questions do they have for you?

As a designer, you are not a user of Datadog. From day one, the challenge is getting you to understand what this platform is for and why people use it. For designers, we have some specific onboarding that goes through different product demos and tries to mimic.

Our solutions team built this incredible tool that will let us spin up a new Datadog environment with actual hosts running and we can simulate certain kinds of events, like an outage or spike in resource usage. With a button, I can click to simulate one of these events. That will generate a page, or a set of emails, to these new Datadog designers to get the feel of what it is like to be on-call.

It’s an exercise that actually walks them through this. I explain what the environment is like. I explain the different kinds of incidents that can happen. Then when they get this page, it is their responsibility to try to figure out what went wrong, and where it went wrong. For example, what host failed and in what way. In the end, they report it back as part of the exercise. That’s been a great thing to help with the empathy of a new designer on the team; to understand how people are using Datadog.

It’s even revealed to us some big pain points in our experience. When you are getting an alert you want to get to the right part of the product and how can we streamline those kinds of workflows.

Designers gravitate toward products that already have great design. The design teams at Instagram and Airbnb are very well established so a lot of designers want to go there. But in reality, as a new designer on a team like that, your impact is quite small. Where developer tools are still largely, underserved by design. If you want to be a designer at a company like this, you can have a big impact.

How is design for B2B (business-to-business) different from B2C (business-to-consumer)?

B2B, SaaS, and developer tools, these are tools people are using everyday so we are not measuring engagement. Our goal is not to keep you on the platform for as long as possible. We are not serving you ads. Speed, power, and efficiency are the kinds of things we are tracking. What is the shortest amount of time we can keep them on the platform? How can we get them to finish their jobs and move on with their lives?

Do you think designers in B2C (business-to-consumer) should aim for a lower duration metric?

We started to see that engagement is a toxic metric and society is reshaped around people spending far too much time in social media. I feel lucky to work on a product like this, it’s a pretty straightforward business. People pay us money for software. If they don’t like it, they stop paying us and leave. There is no serving of two masters. There’s really no complexity to any of this.

It’s a tool people need to use to do their jobs. I like building tools. I also like having that relationship with customers. I can tell you the names and faces of our biggest customers, because I see them pretty frequently.     


Any last words of advice for designers or engineers in the developer experience space?

The diversity of the developer tool space has grown so much over time, I don’t know if there’s one piece of advice I could give to a new company. The seamlessness of your onboarding experience and the effectiveness of a small surface area of product.

We talk about MVPs a lot; the minimum viable product that you can build. There is often one killer feature that you can build for one specific kind of workflow that is really going to get people onto your platform. Relentlessly focus on that one thing and doing it really well, can be a great start to a much broader experience. So not trying to build a large surface area of product early, but really focusing on one thing and getting it right.

I’m Steve and it’s been fun chatting.


Bonus Resource

Datadog is hiring a Product Designer (as of 5/8/19). For other roles, view Careers at Datadog.


Don't miss these stories: