Required reading before reading this blog post:
You're using HttpClient wrong and it is destabilizing your software

Introduction

All code in this blog post can be found here together with instructions on how to run the benchmarks. This post focuses on dotnet core, the benchmarks are run on dotnet core 3 preview3.

HttpClient is really easy to use, and because of that, it's also really easy to use it wrong. Since it's so easy to use, nobody takes the time to really learn how to use it correctly, my code works, why should I change it? My goal with this post is to show how to use HttpClient in the most efficient way.
What do I mean by the most efficient way?

  • Try to run as fast as possible
  • Try to allocate as little memory as possible

Steve Gordon has a bunch of posts regarding HttpClient, I really recommend that you check out his blog if you want to learn more.

The problem

We want to fetch JSON data from an external API and return a subset of it from a controller.
Our architecture will look like this:
Controller -> IGetAllProjectsQuery -> IGitHubClient.
The Query is responsible for mapping the data from DTOs to "domain models".
Note: We will NOT use the real GitHub API, I've created a API that returns dummy data, I just choose to name it GitHub to make the code more...authentic.

The size of the json response will differ between the benchmarks, we will run the benchmarks 4 times with the following sizes:

  • 10 items ~ 11 KB
  • 100 items ~ 112 KB
  • 1 000 items ~ 1 116 KB
  • 10 000 items ~ 11 134 KB

Error handling

I've intentionally left out all error handling here because I wanted to focus on the code that fetches/deserializes data. I have another post coming up that's full with opinions of how to use HttpClient through out your code base with many different third party integrations together with error handling, Polly and so on, so stay tuned!

Without IHttpClientFactory

This is just to show how one could do this before the IHttpClientFactory existed.
If you don't have access to IHttpClientFactory for whatever reason, look at Version 2 and store the HttpClient as a private field so that it could be reused. Also, don't forget to read the Optimization section!

Version 0

Ah, only a few lines of code, what could POSSIBLY be wrong with this code?
This code creates a new HttpClient in a using statement, calls .Result on GetStringAsync and saves the whole response in a string. It was hard writing this code because it goes against everything I stand for :).

Version0Configurator.cs

public static class Version0Configurator
{
    public static void AddVersion0(this IServiceCollection services)
    {
        services.AddSingleton<GetAllProjectsQuery>();
        services.AddSingleton<GitHubClient>();
    }
}

GitHubClient.cs

public class GitHubClient
{
    public IReadOnlyCollection<GitHubRepositoryDto> GetRepositories()
    {
        using (var httpClient = new HttpClient{BaseAddress = new Uri(GitHubConstants.ApiBaseUrl)})
        {
            var result = httpClient.GetStringAsync(GitHubConstants.RepositoriesPath).Result;
            return JsonConvert.DeserializeObject<List<GitHubRepositoryDto>>(result);
        }
    }
}

Pros

NONE

Cons

  • Using .Result on an asynchronous method. It's never a good idea, never.
    But what if I use .GetAwaiter().GetResult()???
    Nope, still bad. You can read more about common async gotchas/pitfalls/recommendations here. It's written by David Fowler (member of the ASP.NET team), he knows what he's talking about.
  • Creating a new HttpClient for every call in a using statement. HttpClient should not be disposed (well, it should, but not by you, more on that further down where I talk about IHttpClientFactory.
  • Fetching the whole response and storing it as a string, this is obviously bad when working with large response objects, they will end up on the Large object heap if they are larger than 85 000 bytes.

Version 1

We have now wrapped the return type in a Task<> and also added the async keyword. This allows us to await the call to .GetStringAsync.

Version1Configurator.cs

public static class Version1Configurator
{
    public static void AddVersion1(this IServiceCollection services)
    {
        services.AddSingleton<GetAllProjectsQuery>();
        services.AddSingleton<GitHubClient>();
    }
}

GitHubClient.cs

public class GitHubClient : IGitHubClient
{
    public async Task<IReadOnlyCollection<GitHubRepositoryDto>> GetRepositories()
    {
        using (var httpClient = new HttpClient { BaseAddress = new Uri(GitHubConstants.ApiBaseUrl) })
        {
            var result = await httpClient.GetStringAsync(GitHubConstants.RepositoriesPath).ConfigureAwait(false);
            return JsonConvert.DeserializeObject<List<GitHubRepositoryDto>>(result);
        }
    }
}

Pros

  • The request to the API is asynchronous.

Cons

  • Creating a new HttpClient for every call in a using statement.
  • Fetching and storing the whole response in a string.

Version 2

We are now creating a HttpClient in the constructor and then storing it as a field so that we can reuse it. Note, all implementations of the GitHubClients so far are intended to be used/registered as singeltons

Version2Configurator.cs

public static class Version2Configurator
{
    public static void AddVersion2(this IServiceCollection services)
    {
        services.AddSingleton<GetAllProjectsQuery>();
        services.AddSingleton<GitHubClient>();
    }
}

GitHubClient.cs

public class GitHubClient : IGitHubClient
{
    private readonly HttpClient _httpClient;

    public GitHubClient()
    {
        _httpClient = new HttpClient { BaseAddress = new Uri(GitHubConstants.ApiBaseUrl) };
    }

    public async Task<IReadOnlyCollection<GitHubRepositoryDto>> GetRepositories()
    {
        var result = await _httpClient.GetStringAsync(GitHubConstants.RepositoriesPath).ConfigureAwait(false);
        return JsonConvert.DeserializeObject<List<GitHubRepositoryDto>>(result);
    }
}

Pros

  • The request to the API is asynchronous.
  • Reusing the HttpClient

Cons

  • Fetching and storing the whole response in a string.
  • Since we are resuing the HttpClient forever, DNS changes won't be respected. You can read more about that here.

With IHttpClientFactory

As you have seen so far, it's really easy to use HttpClient wrong, here's what Microsoft has to say about it.

The original and well-known HttpClient class can be easily used, but in some cases, it isn't being properly used by many developers.
As a first issue, while this class is disposable, using it with the using statement is not the best choice because even when you dispose HttpClient object, the underlying socket is not immediately released and can cause a serious issue named ‘sockets exhaustion’.

Therefore, HttpClient is intended to be instantiated once and reused throughout the life of an application. Instantiating an HttpClient class for every request will exhaust the number of sockets available under heavy loads. That issue will result in SocketException errors. Possible approaches to solve that problem are based on the creation of the HttpClient object as singleton or static.

But there’s a second issue with HttpClient that you can have when you use it as singleton or static object. In this case, a singleton or static HttpClient doesn't respect DNS changes.
To address those mentioned issues and make the management of HttpClient instances easier, .NET Core 2.1 introduced a new HttpClientFactory...

Version 3

Here, we are injecting the IHttpClientFactory and then using it to create a new HttpClient every time the method gets called.
Yes, we are creating a new HttpClient every time, that's not a bad thing anymore since we are using the IHttpClientFactory.
Straight from Microsoft:

Each time you get an HttpClient object from the IHttpClientFactory, a new instance is returned. But each HttpClient uses an HttpMessageHandler that's pooled and reused by the IHttpClientFactory to reduce resource consumption, as long as the HttpMessageHandler's lifetime hasn't expired.
Pooling of handlers is desirable as each handler typically manages its own underlying HTTP connections; creating more handlers than necessary can result in connection delays. Some handlers also keep connections open indefinitely, which can prevent the handler from reacting to DNS changes.

Version3Configurator.cs

public static class Version3Configurator
{
    public static void AddVersion3(this IServiceCollection services)
    {
        services.AddSingleton<GetAllProjectsQuery>();
        services.AddHttpClient("GitHub", x => { x.BaseAddress = new Uri(GitHubConstants.ApiBaseUrl); });
        services.AddSingleton<GitHubClient>();
    }
}

GitHubClient.cs

public class GitHubClient : IGitHubClient
{
    private readonly IHttpClientFactory _httpClientFactory;

    public GitHubClient(IHttpClientFactory httpClientFactory)
    {
        _httpClientFactory = httpClientFactory ?? throw new ArgumentNullException(nameof(httpClientFactory));
    }

    public async Task<IReadOnlyCollection<GitHubRepositoryDto>> GetRepositories()
    {
        var httpClient = _httpClientFactory.CreateClient("GitHub");
        var result = await httpClient.GetStringAsync(GitHubConstants.RepositoriesPath).ConfigureAwait(false);
        return JsonConvert.DeserializeObject<List<GitHubRepositoryDto>>(result);
    }
}

Pros

  • Using IHttpClientFactory

Cons

Version 4

Here we are using a typed client instead of a named one. We are registering the typed client with the .AddHttpClient<> method. Note that we also changed the registration of GetAllProjectsQuery from singleton to transient since typed clients are registered as transient.

Version4Configurator.cs

public static class Version4Configurator
{
    public static void AddVersion4(this IServiceCollection services)
    {
        services.AddTransient<GetAllProjectsQuery>();
        services.AddHttpClient<GitHubClient>(x => { x.BaseAddress = new Uri(GitHubConstants.ApiBaseUrl); });
    }
}

GitHubClient.cs

public class GitHubClient : IGitHubClient
{
    private readonly HttpClient _httpClient;

    public GitHubClient(HttpClient httpClient)
    {
        _httpClient = httpClient ?? throw new ArgumentNullException(nameof(httpClient));
    }

    public async Task<IReadOnlyCollection<GitHubRepositoryDto>> GetRepositories()
    {
        var result = await _httpClient.GetStringAsync(GitHubConstants.RepositoriesPath).ConfigureAwait(false);
        return JsonConvert.DeserializeObject<List<GitHubRepositoryDto>>(result);
    }
}

Pros

  • Using IHttpClientFactory
  • Using a typed client

Cons

  • Needed to change the lifetime of GetAllProjectsQuery from singleton to transient.
  • Fetching and storing the whole response in a string.

Version 5

I really want my GetAllProjectsQuery to be a singleton. To be able to solve this I've created a GitHubClientFactory that returns a GitHubClient. The trick here is that it resolves the GitHubClient from the ServiceProvider, thus injecting all dependencies that we need, no need to new it up ourselfes. Who taught me that? Steve of course.

Version5Configurator.cs

public static class Version5Configurator
{
    public static void AddVersion5(this IServiceCollection services)
    {
        services.AddSingleton<GetAllProjectsQuery>();
        services.AddHttpClient<GitHubClient>(x => { x.BaseAddress = new Uri(GitHubConstants.ApiBaseUrl); });
        services.AddSingleton<GitHubClientFactory>();
    }
}

GitHubClientFactory.cs

public class GitHubClientFactory
{
    private readonly IServiceProvider _serviceProvider;

    public GitHubClientFactory(IServiceProvider serviceProvider)
    {
        _serviceProvider = serviceProvider;
    }

    public GitHubClient Create()
    {
        return _serviceProvider.GetRequiredService<GitHubClient>();
    }
}
public class GetAllProjectsQuery : IGetAllProjectsQuery
{
    private readonly GitHubClientFactory _gitHubClientFactory;

    public GetAllProjectsQuery(GitHubClientFactory gitHubClientFactory)
    {
        _gitHubClientFactory = gitHubClientFactory ?? throw new ArgumentNullException(nameof(gitHubClientFactory));
    }

    public async Task<IReadOnlyCollection<Project>> Execute()
    {
        var gitHubClient = _gitHubClientFactory.Create();
        var response = await gitHubClient.GetRepositories().ConfigureAwait(false);
        return response.Select(x => new Project(x.Name, x.Url, x.Stars)).ToArray();
    }
}

GitHubClient.cs
Looks the same as Version 4.

Pros

  • Using IHttpClientFactory
  • Using a typed client
  • GetAllProjectsQuery is now a singleton again

Cons

  • Fetching and storing the whole response in a string.

Optimization

So, with version 5 we are using a typed client and GetAllProjectsQuery is registered as a singleton, nice, only thing left is to try to get the code perform as good as possible, I have a few tricks :).

Version 6

We are now using the .SendAsync method instead of GetStringAsync. This is to allow us to stream the response instead of fetching it as a string. I've added the HttpCompletionOption.ResponseContentRead parameter to the code for brevity, it's the default option.
Now we are streaming the response from the HttpClient straight into the Deserialize method. We are also injecting the JsonSerializer that we have registered as a singleton.

Version6Configurator.cs

public static class Version6Configurator
{
    public static void AddVersion6(this IServiceCollection services)
    {
        services.AddSingleton<GetAllProjectsQuery>();
        services.AddHttpClient<GitHubClient>(x => { x.BaseAddress = new Uri(GitHubConstants.ApiBaseUrl); });
        services.AddSingleton<GitHubClientFactory>();
        services.AddSingleton<JsonSerializer>();
    }
}

GitHubClient.cs

public class GitHubClient : IGitHubClient
{
    private readonly HttpClient _httpClient;
    private readonly JsonSerializer _jsonSerializer;

    public GitHubClient(HttpClient httpClient, JsonSerializer jsonSerializer)
    {
        _httpClient = httpClient ?? throw new ArgumentNullException(nameof(httpClient));
        _jsonSerializer = jsonSerializer ?? throw new ArgumentNullException(nameof(jsonSerializer));
    }

    public async Task<IReadOnlyCollection<GitHubRepositoryDto>> GetRepositories()
    {
        var request = CreateRequest();
        var result = await _httpClient.SendAsync(request, HttpCompletionOption.ResponseContentRead).ConfigureAwait(false);

        using (var responseStream = await result.Content.ReadAsStreamAsync())
        {
            using (var streamReader = new StreamReader(responseStream))
            using (var jsonTextReader = new JsonTextReader(streamReader))
            {
                return _jsonSerializer.Deserialize<List<GitHubRepositoryDto>>(jsonTextReader);
            }
        }
    }

    private static HttpRequestMessage CreateRequest()
    {
        return new HttpRequestMessage(HttpMethod.Get, GitHubConstants.RepositoriesPath);
    }
}

Pros

  • Using IHttpClientFactory
  • Using a typed client
  • Streaming the response

Cons

  • ResponseContentRead

Version 7

The only difference here is that we are using ResponseHeadersRead instead of ResponseContentRead. You can read more about the difference between ResponseContentRead vs ResponseHeadersRead here but it basically boils down to that methods using ResponseContentRead waits until both the headers AND content is read where as methods using ResponseHeadersRead just reads the headers and then returns.

public class GitHubClient : IGitHubClient
{
    private readonly HttpClient _httpClient;
    private readonly JsonSerializer _jsonSerializer;

    public GitHubClient(HttpClient httpClient, JsonSerializer jsonSerializer)
    {
        _httpClient = httpClient ?? throw new ArgumentNullException(nameof(httpClient));
        _jsonSerializer = jsonSerializer ?? throw new ArgumentNullException(nameof(jsonSerializer));
    }

    public async Task<IReadOnlyCollection<GitHubRepositoryDto>> GetRepositories()
    {
        var request = CreateRequest();
        var result = await _httpClient.SendAsync(request, HttpCompletionOption.ResponseHeadersRead).ConfigureAwait(false);

        using (var responseStream = await result.Content.ReadAsStreamAsync())
        {
            using (var streamReader = new StreamReader(responseStream))
            using (var jsonTextReader = new JsonTextReader(streamReader))
            {
                return _jsonSerializer.Deserialize<List<GitHubRepositoryDto>>(jsonTextReader);
            }
        }
    }

    private static HttpRequestMessage CreateRequest()
    {
        return new HttpRequestMessage(HttpMethod.Get, GitHubConstants.RepositoriesPath);
    }
}

Pros

  • Using IHttpClientFactory
  • Using a typed client
  • Streaming the response
  • Using ResponseHeadersRead

Cons

  • Maybe the json deserialization could be improved?

Version 8

Here we've created a custom Json deserializer for JSON.Net, you can find a bunch of performance tips regarding JSON.Net here.

Version8Configurator.cs

public static class Version8Configurator
{
    public static void AddVersion8(this IServiceCollection services)
    {
        services.AddSingleton<GetAllProjectsQuery>();
        services.AddHttpClient<GitHubClient>(x => { x.BaseAddress = new Uri(GitHubConstants.ApiBaseUrl); });
        services.AddSingleton<GitHubClientFactory>();
        services.AddSingleton<JsonSerializer>();
        services.AddSingleton<ProjectDeserializer>();
    }
}

ProjectDeserializer.cs

public class ProjectDeserializer
{
    public IReadOnlyCollection<GitHubRepositoryDto> Deserialize(JsonTextReader jsonTextReader)
    {
        var repositories = new List<GitHubRepositoryDto>();
        var currentPropertyName = string.Empty;
        GitHubRepositoryDto repository = null;
        while (jsonTextReader.Read())
        {
            switch (jsonTextReader.TokenType)
            {
                case JsonToken.StartObject:
                    repository = new GitHubRepositoryDto();
                    continue;
                case JsonToken.EndObject:
                    repositories.Add(repository);
                    continue;
                case JsonToken.PropertyName:
                    currentPropertyName = jsonTextReader.Value.ToString();
                    continue;
                case JsonToken.String:
                    switch (currentPropertyName)
                    {
                        case "name":
                            repository.Name = jsonTextReader.Value.ToString();
                            continue;
                        case "url":
                            repository.Url = jsonTextReader.Value.ToString();
                            continue;
                    }
                    continue;
                case JsonToken.Integer:
                    switch (currentPropertyName)
                    {
                        case "stars":
                            repository.Stars = int.Parse(jsonTextReader.Value.ToString());
                            continue;
                    }
                    continue;
            }
        }
        return repositories;
    }
}

GitHubClient.cs

public class GitHubClient : IGitHubClient
{
    private readonly HttpClient _httpClient;
    private readonly ProjectDeserializer _projectDeserializer;

    public GitHubClient(HttpClient httpClient, ProjectDeserializer projectDeserializer)
    {
        _httpClient = httpClient ?? throw new ArgumentNullException(nameof(httpClient));
        _projectDeserializer = projectDeserializer ?? throw new ArgumentNullException(nameof(projectDeserializer));
    }

    public async Task<IReadOnlyCollection<GitHubRepositoryDto>> GetRepositories()
    {
        var request = CreateRequest();
        var result = await _httpClient.SendAsync(request, HttpCompletionOption.ResponseHeadersRead).ConfigureAwait(false);

        using (var streamReader = new StreamReader(await result.Content.ReadAsStreamAsync()))
        using (var jsonTextReader = new JsonTextReader(streamReader))
        {
            return _projectDeserializer.Deserialize(jsonTextReader);
        }
    }

    private static HttpRequestMessage CreateRequest()
    {
        return new HttpRequestMessage(HttpMethod.Get, GitHubConstants.RepositoriesPath);
    }
}

Pros

  • Using IHttpClientFactory
  • Using a typed client
  • Streaming the response
  • Using ResponseHeadersRead
  • Optimized the json deserialization

Cons

  • ?

Version 9

Waiting for the new Json Deserializer from Microsoft, will be available in dotnet core 3-preview4.

Version 10

Your version! Do you think you can make an even better version than my best version? Feel free to send a PR on GitHub and I will happily add it to the benchmarks and to this post!

Benchmarks

Can I prove that version 7/8 is infact the best version?
Am I SURE that version 0 sucks?

Time for some data!

I will run the different benchmarks four times, I will change how many items the API returns between the benchmarks.

  • First run: 10 items ~ 11 KB
  • Second run: 100 items ~ 112 KB
  • Third run: 1 000 items ~ 1 116 KB
  • Fourth run: 10 000 items ~ 11 134 KB

Benchmark 1

This benchmark resolves the GetAllProjectsQuery from the ServiceProvider, fetches the data with the GitHubClient, deserializes it and then maps it to domain objects.

10 projects

GetAllProjectsQuery - Benchmark 10 projects

Nothing weird here expect for version 0 that sucks, but we already knew that. It's worth noting that version 2 followed by version 3 allocates the least amount of memory.

100 projects

GetAllProjectsQuery - Benchmark 100 projects

Same pattern, version 2 and 3 allocates the least. What will happen when the json response increases?

1 000 projects

GetAllProjectsQuery - Benchmark 1 000 projects
Now our optimizations really pays off. It's only version 7 and 8 that does not cause any gen 2 collects. Version 6 that uses ResponseContentRead causes gen 2 collects but version 7 that uses ResponseHeadersRead does not... :)
It's also worth looking at the Ratio column, we can clearly see that version 7 and 8 starts to outperform the other versions.

10 000 projects

GetAllProjectsQuery - Benchmark 10 000 projects
Version 8 shines! The custom json deserializer makes a rather big difference here in terms of speed, but also in the allocation part. It's worth noting that the custom deserializer is something that I hacked together in 5 minutes so I know that it has room for improvements.

Benchmark 2

This benchmark resolves the GitHubClient from the ServiceProvider, fetches the data and deserializes it.

10 projects

GitHubClient - Benchmark 10 projects
Same pattern as in benchmark 1, version 0 sucks, version 2 and 3 allocates the least amount.

100 projects

GitHubClient - Benchmark 100 projects
Same as in benchmark 1, worth noting is that version 7 and 8 causes the same amount of gc collects as version 5.

1 000 projects

GitHubClient - Benchmark 1 000 projects
Version 7 and 8 starts to run away, the only two versions that does not cause any gen 2 collects.

10 000 projects

GitHubClient - Benchmark 10 000 projects
The custom json deserializer strikes again, version 8 rocks.