Working With Attachments in DocumentDB

I’m building a prototype for a new project and it was decided to use DocumentDB to store our data. There will be very little data and even less relationship between the data, so document database is a good fit. Also there is a chance for us to use DocumentDB in production.

There is a comprehensive documentation about the structure and how it all ties together. Yet not enough coding samples on how to use attachments. And I struggled a bit to come up with the working solution. So I’ll explain it all here for future generations.

Structure

This diagram is from the documentation

And this is correct, but incomplete. Store this for a moment, I’ll come back to this point later.

Ignore the left three nodes on the diagram, look on Documents and Attachments nodes. This basically shows that if you create a document, it will be available on URI like this:

https://{accountname}.documents.azure.com/dbs/{databaseId}/colls/{collectionId}/docs/{docId}

That’s fine – you call an authenticated request to the correctly formed URI (and authenticated) and you’ll get JSON back as a result.

According to the schema you will also get attachment on this address:

https://{accountname}.documents.azure.com/dbs/{databaseId}/colls/{collectionId}/docs/{docId}/attachments{attachId}

And this is correct. If you do HTTP GET to this address – you’ll get JSON. Something like this:

{
    "contentType": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
    "id": "1",
    "media": "/media/5VEpAMZpeasdfdfdAAAAAOFDl80B",
    "_rid": "5VEpAMZpeasdfdfdAAAAAOFDl80B=",
    "_self": "dbs\/5VEpAA==\/colls\/5VEpEWZpsQA=\/docs\/5VEpAMZpeasdfdfdAAAAAOFDl80B==\/attachments\/5VEpAMZpeasdfdfdAAAAAOFDl80B=",
    "_etag": "\"0000533e-0000-0000-0000-59079b8a0000\"",
    "_ts": 1493673393
}

Turns out that there are 2 ways you can do attachments in DocumentDB – managed and (surpise!) unmanaged. Unmanaged is when you don’t really attach anything, but just provide a link to an external storage. To be honest, I don’t see much sense in doing it that way – why bother with extra resource just to keep external links? It would be much easier to make these links as part of the actual document, so you don’t have to do another call to retrieve them.

Managed attachments is when you actually do store binaries in DocumentDB and this is what I chose to use. And unfortunately had to discover for myself that it is not straight forward.

Managed Attachments

If you noticed in the JSON above there is a line "media": "/media/5VEpAMZpeasdfdfdAAAAAOFDl80B". This is actually the link to the stored binary payload. And you need to query that URI to get the payload. So from knowing document id, you’ll need 2 requests to get your hands on attached binaries:

  1. Get list of attachments
  2. Every attachment contains link to Media – get that.

So this /media/{mediaId} is missing in the diagram above. Perhaps this is deliberate not to confuse users. I’ll go with that.

Code Samples

Now to the code samples.

I’m using NuGet package provided by Microsoft to do the requests for me:

Install-Package Microsoft.Azure.DocumentDB

Start with basics to get them out of the way:

private async Task<DocumentClient> GetClientAsync()
{
    if (documentClient == null)
    {
        var endpointUrl = configuration["DocumentDb:EndpointUri"];
        var primaryKey = configuration["DocumentDb:PrimaryKey"];

        documentClient = new DocumentClient(new Uri(endpointUrl), primaryKey);
        await documentClient.OpenAsync();
    }

    return documentClient;
}

where documentClient is a local variable in the containing class.

Now let’s create a document and attach a binary:

var myDoc = new { id = "42", Name = "Max", City="Aberdeen" }; // this is the document you are trying to save
var attachmentStream = File.OpenRead("c:/Path/To/File.pdf"); // this is the document stream you are attaching

var client = await GetClientAsync();
var createUrl = UriFactory.CreateDocumentCollectionUri(DatabaseName, CollectionName);
Document document = await client.CreateDocumentAsync(createUrl, myDoc);

await client.CreateAttachmentAsync(document.SelfLink, attachmentStream, new MediaOptions()
    {
        ContentType = "application/pdf", // your application type
        Slug = "78", // this is actually attachment ID
    });

Now a few things are going on here: I create an anonymous class for sample sake – use strongly typed models. Reading attachment stream from file system – that is also for sample sake; whatever source you have, you’ll need to provide an instance of Stream to upload an attachment.

Now this is worth paying attention to: var createUrl = UriFactory.CreateDocumentCollectionUri(DatabaseName, CollectionName);. UriFactory class is not really a factory in the broad OOP sense – it does not produce other objects that will do actual work. This class gives you a lot of patterns that create URI addressess based on names of things you use. In other words there are a lot of String.Format with templates.

Method UriFactory.CreateDocumentCollectionUri is a going to give you link in format /dbs/{documentId}/colls/{collectionId}/. If you are looking on CreateAttachmentUri it will work with this template: dbs/{dbId}/colls/{collectionId}/docs/{docId}/attachments/{attachmentId}.

Next line with await client.CreateDocumentAsync(createUrl, myDoc) is doing what you think it is doing – creating a document on Azure – no surprises here.

But when you look on block of code with client.CreateAttachmentAsync(), not everything might be clear. document.SelfLink is a URI that links back to the document – it will be in format of dbs/{dbId}/colls/{collectionId}/docs/{docId}. Next big question is Slug – this is actually works as attachment ID. They might as well could’ve called it Id because this is what goes into id field when you look on the storage.

Retrieving Attachments

Once we’ve put something in the storage, some time in the future we’ll have to take it out. Let’s get back our attached file.

var client = await GetClientAsync();
var attachmentUri = UriFactory.CreateAttachmentUri(DatabaseName, CollectionName, docId, attachId);

var attachmentResponse = await client.ReadAttachmentAsync(attachmentUri);

var resourceMediaLink = attachmentResponse.Resource.MediaLink;

var mediaLinkResponse = await client.ReadMediaAsync(resourceMediaLink);

var contentType = mediaLinkResponse.ContentType;
var stream = mediaLinkResponse.Media;

Here we have some funky things going on again. This part UriFactory.CreateAttachmentUri(DatabaseName, CollectionName, docId, attachId) will give dbs/{dbId}/colls/{collectionId}/docs/{docId}/attachments/{attachmentId}. And GETting to this address will return you JSON same as in the start of the article. Value for attachmentResponse.Resource.MediaLink will look like /media/5VEpAMZpeasdfdfdAAAAAOFDl80B3 and this is the path to GET the actual attached binary – this is what we are doing in await client.ReadMediaAsync(resourceMediaLink). The rest should be self-explanatory.

Conclusion

To be honest, lack of explanation in documentation of this /media/{mediaId} does not add kudos to the team. And I feel like the provided API is not straight-forwrard and not easy to use – I had to decompile and have a wonder about what is actually happening inside of the API library. Also there is too much leakage of the implementation: I really could’ve lived without ever having to know about UriFactory.