I’m building a prototype for a new project and it was decided to use DocumentDB to store our data. There will be very little data and even less relationship between the data, so document database is a good fit. Also there is a chance for us to use DocumentDB in production.

There is a comprehensive documentation about the structure and how it all ties together. Yet not enough coding samples on how to use attachments. And I struggled a bit to come up with the working solution. So I’ll explain it all here for future generations.

Structure

This diagram is from the documentation

And this is correct, but incomplete. Store this for a moment, I’ll come back to this point later.

Ignore the left three nodes on the diagram, look on Documents and Attachments nodes. This basically shows that if you create a document, it will be available on URI like this:

https://{accountname}.documents.azure.com/dbs/{databaseId}/colls/{collectionId}/docs/{docId}

That’s fine – you call an authenticated request to the correctly formed URI (and authenticated) and you’ll get JSON back as a result.

According to the schema you will also get attachment on this address:

https://{accountname}.documents.azure.com/dbs/{databaseId}/colls/{collectionId}/docs/{docId}/attachments{attachId}

And this is correct. If you do HTTP GET to this address – you’ll get JSON. Something like this:

{
    "contentType": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
    "id": "1",
    "media": "/media/5VEpAMZpeasdfdfdAAAAAOFDl80B",
    "_rid": "5VEpAMZpeasdfdfdAAAAAOFDl80B=",
    "_self": "dbs\/5VEpAA==\/colls\/5VEpEWZpsQA=\/docs\/5VEpAMZpeasdfdfdAAAAAOFDl80B==\/attachments\/5VEpAMZpeasdfdfdAAAAAOFDl80B=",
    "_etag": "\"0000533e-0000-0000-0000-59079b8a0000\"",
    "_ts": 1493673393
}

Turns out that there are 2 ways you can do attachments in DocumentDB – managed and (surpise!) unmanaged. Unmanaged is when you don’t really attach anything, but just provide a link to an external storage. To be honest, I don’t see much sense in doing it that way – why bother with extra resource just to keep external links? It would be much easier to make these links as part of the actual document, so you don’t have to do another call to retrieve them.

Managed attachments is when you actually do store binaries in DocumentDB and this is what I chose to use. And unfortunately had to discover for myself that it is not straight forward.

Managed Attachments

If you noticed in the JSON above there is a line "media": "/media/5VEpAMZpeasdfdfdAAAAAOFDl80B". This is actually the link to the stored binary payload. And you need to query that URI to get the payload. So from knowing document id, you’ll need 2 requests to get your hands on attached binaries:

  1. Get list of attachments
  2. Every attachment contains link to Media – get that.

So this /media/{mediaId} is missing in the diagram above. Perhaps this is deliberate not to confuse users. I’ll go with that.

Code Samples

Now to the code samples.

I’m using NuGet package provided by Microsoft to do the requests for me:

Install-Package Microsoft.Azure.DocumentDB

Start with basics to get them out of the way:

private async Task<DocumentClient> GetClientAsync()
{
    if (documentClient == null)
    {
        var endpointUrl = configuration["DocumentDb:EndpointUri"];
        var primaryKey = configuration["DocumentDb:PrimaryKey"];

        documentClient = new DocumentClient(new Uri(endpointUrl), primaryKey);
        await documentClient.OpenAsync();
    }

    return documentClient;
}

where documentClient is a local variable in the containing class.

Now let’s create a document and attach a binary:

var myDoc = new { id = "42", Name = "Max", City="Aberdeen" }; // this is the document you are trying to save
var attachmentStream = File.OpenRead("c:/Path/To/File.pdf"); // this is the document stream you are attaching

var client = await GetClientAsync();
var createUrl = UriFactory.CreateDocumentCollectionUri(DatabaseName, CollectionName);
Document document = await client.CreateDocumentAsync(createUrl, myDoc);

await client.CreateAttachmentAsync(document.SelfLink, attachmentStream, new MediaOptions()
    {
        ContentType = "application/pdf", // your application type
        Slug = "78", // this is actually attachment ID
    });

Now a few things are going on here: I create an anonymous class for sample sake – use strongly typed models. Reading attachment stream from file system – that is also for sample sake; whatever source you have, you’ll need to provide an instance of Stream to upload an attachment.

Now this is worth paying attention to: var createUrl = UriFactory.CreateDocumentCollectionUri(DatabaseName, CollectionName);. UriFactory class is not really a factory in the broad OOP sense – it does not produce other objects that will do actual work. This class gives you a lot of patterns that create URI addressess based on names of things you use. In other words there are a lot of String.Format with templates.

Method UriFactory.CreateDocumentCollectionUri is a going to give you link in format /dbs/{documentId}/colls/{collectionId}/. If you are looking on CreateAttachmentUri it will work with this template: dbs/{dbId}/colls/{collectionId}/docs/{docId}/attachments/{attachmentId}.

Next line with await client.CreateDocumentAsync(createUrl, myDoc) is doing what you think it is doing – creating a document on Azure – no surprises here.

But when you look on block of code with client.CreateAttachmentAsync(), not everything might be clear. document.SelfLink is a URI that links back to the document – it will be in format of dbs/{dbId}/colls/{collectionId}/docs/{docId}. Next big question is Slug – this is actually works as attachment ID. They might as well could’ve called it Id because this is what goes into id field when you look on the storage.

Retrieving Attachments

Once we’ve put something in the storage, some time in the future we’ll have to take it out. Let’s get back our attached file.

var client = await GetClientAsync();
var attachmentUri = UriFactory.CreateAttachmentUri(DatabaseName, CollectionName, docId, attachId);

var attachmentResponse = await client.ReadAttachmentAsync(attachmentUri);

var resourceMediaLink = attachmentResponse.Resource.MediaLink;

var mediaLinkResponse = await client.ReadMediaAsync(resourceMediaLink);

var contentType = mediaLinkResponse.ContentType;
var stream = mediaLinkResponse.Media;

Here we have some funky things going on again. This part UriFactory.CreateAttachmentUri(DatabaseName, CollectionName, docId, attachId) will give dbs/{dbId}/colls/{collectionId}/docs/{docId}/attachments/{attachmentId}. And GETting to this address will return you JSON same as in the start of the article. Value for attachmentResponse.Resource.MediaLink will look like /media/5VEpAMZpeasdfdfdAAAAAOFDl80B3 and this is the path to GET the actual attached binary – this is what we are doing in await client.ReadMediaAsync(resourceMediaLink). The rest should be self-explanatory.

Conclusion

To be honest, lack of explanation in documentation of this /media/{mediaId} does not add kudos to the team. And I feel like the provided API is not straight-forwrard and not easy to use – I had to decompile and have a wonder about what is actually happening inside of the API library. Also there is too much leakage of the implementation: I really could’ve lived without ever having to know about UriFactory.

Last year, before Christmas I had a jolly time in my office, cleaning up old bits of code and other stuff. One of the things I’ve “Cleaned-up” was old expired Azure Management Certificates. Then I went for 2 weeks holiday travelling around England. Until I had a text message from my colleague saying that one of our production system is down and not going back up. Error was something to do with Azure authentication.

Quickly doing the math I’ve realised that I have accidentally removed a current management certificate that was used in our system. The cert was used on the start-up phase of the app to check that all Azure Resources were available for the system to operate. And of-course at the time when I deleted the certs there were no outages and all systems run as normal. Until one of them decided to get restarted (because Azure). And when it went down it never came back up from because certificate that the system had was removed by me.

After fixing the systems (sorry, man!) and a bit of investigation turned out that our system had an expired certificate and it could authenticate with the service with it. So Azure Management Portal did not check if the management certificate was expired.

Now I’m not sure if this is a bug or a feature, but certainly I’m not going to report that because if this is going to be fixed, how many other systems with expired certs will go down without a warning?

From now on a rule of thumb – never to delete management certificates, even expired ones!

Today I’ve spent quite some time figuring out why my new Azure Subscription Settings file was not picked up by Octopus Deploy. And I was getting an obscure error message:

Get-AzureWebsite : Communication could not be established. This could be due to an invalid subscription ID. Note that subscription IDs are case sensitive.

Turned out that old Subscription Settings File was stuck in user cache and I had to “unstuck” it by executing this script from under the user account I was trying to execute PowerShell.

Remove-AzureSubscription 'Subscription Name' -Force

This does not actually do anything to the actual Azure subscription (I panicked about it first). Documentation says this only deletes subscription data file from so PowerShell can’t use it. Nothing to do with the actual Azure Subscription.

After removing old subscription data you can re-import new Subscription Settings File:

Import-AzurePublishSettingsFile 'C:/path/to/subscription.publishsettings'

Select-AzureSubscription -SubscriptionName "Subscription Name"

Hope this helps someone!

This post is a summary of links I’ve studied about ARM.

One of the things we do for our project – automatic provisioning of services in MS Azure for new clients. Couple years ago we used to do provisioning manually and that took days. Now we have a system that does it all for us – web-based system that talks to Azure API and asks for new websites/databases/storages/etc. to be created for the system we work on. And I’ve been actively writing this system for the last 3-4 months.

I’ve been using Azure Management C# libraries to access Azure API for couple years now. And as far as I could remember, these libraries never actually were out of preview and approved for production. And I had a lot of trouble with these, especially when I took half-year breaks from that project then came back and realised that half the API’s I’ve used are changed.

This time I come back to this problem and I realise that I’ve missed Azure Resource Management band-wagon and the plethora of new libraries. And need to start learning from scratch again (don’t you hate that?).

Now there are 2 ways to access Azure API: Azure Resource Management (ARM) and Classic Azure Management (the old way). ARM is the new cool kid in the block and looks like it is here to stay because new Azure Portal is totally based on this system. See comparison new and old ways.

There are a lot of differences between new and old. Old system required to authenticate through a certificate that you had to attach to your HTTP requests. And you had to manage these certs and all that. I’m sure there were other ways to authenticate, but when I started working with Azure Management API this was the only way that I knew. ARM allows to authenticate via Azure AD where you need to know couple Guids and a password. Here is the overview of ARM

The most radical change that ARM is really based on Resource Groups. And everything you create must be in a group. So you need to create a Resource Group first, then resources. There are benefits to that: you can view billing per group – i.e. put all resources related to a project and you can see how much that project costs you, without having to go through per-item subscription billing. Another massive benefit is access control. Now you can give users access only to a specific group of resources (you could only give access to a subscription before) (Read more about Role-Based Access Control and built-in roles).

Authentication

But I disgress – I’m working with API at the moment. Authentication is slightly easier now. You’ll need to create an application in Azure Active Directory, get a “client secret” from it and do 3-line C# code execution to get an authentication bearer token. Read more about authentication process here (including code sample). And this one shows creation of AD application.

Then for every request to ARM you need to attach this token as an Authentication header to HTTP request: request.Headers.Add(HttpRequestHeader.Authorization, "Bearer " + token);

Requests

Now you don’t even need any libraries – you can form requests yourself pretty easy. You need authentication token as a header, you need to know the URL you need to work with. And then you POST/PUT json-formatted object. And to delete you do a DELETE request to that URL. And URL always maps to a resource – very RESTful indeed.

URL you need to work with looks similar to this:

https://management.azure.com/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/{resourcetype}/{resourceName}?api-version={apiversion}

Here is the sample of URL accessing a website resource:

https://management.azure.com/subscriptions/suy64wae-f814-3189-8742-b53d5a4532cd/resourceGroups/NewResourceGroup/providers/Microsoft.Web/sites/MyHappyWebsiteName?api-version=2015-08-01

But don’t worry about this – you have a Resource browser https://resources.azure.com/ that will tell you exactly the URL you need – just navigate to already existing item and see the generated URL. This site is also going to give you JSON format/data to send through to portal.

Templates

Another great feature of ARM is Templates. Basically you write information about resources you need in JSON, add parameters there and feed that to ARM (either programmatically or through Azure Portal). Though I’ve not found a good programmatical way to use templates – I have seen samples where you need to upload template json file and parameters json file to Azure Storage and tell ARM where to look for. But I’m not convinced about this – sound like too many steps with uploading.

Here is the definition of templates. And you can create them within Visual Studio 2015 or you can copy-paste json from existing objects in resource browser and modify to your needs.

Links

My current setup for one of the projects requires a VM running in Azure – a tiny build server I set up just for myself. And I only need it when I actually working on the project. And the idea is to shut the VM down when I’m not working on the project. The fear was that one day I’ll forget to shut it down and I’ll be billed for time that I have not used.

So I’ve created a tiny script in PowerShell:

# Get your publishsettings file here: https://manage.windowsazure.com/publishsettings/
Import-AzurePublishSettingsFile "c:\Path\to\subscription\file.publishsettings"

Select-AzureSubscription -SubscriptionName "MySubscriptionName"

Stop-AzureVM -Name myVmName -ServiceName myVmName -Force

NB: you need -Force parameter otherwise the script will prompt for Y/N and will require user intervention.

And then I hooked the script as part of shutdown on my work PC:

  1. Run gpedit.msc
  2. Go to Computer Configuration -> Windows Settings -> Scripts -> Shutdown
    2015-07-04 15_03_14-Local Group Policy Editor

    1. Open Shutdown option and switch to PowerShell tab; Add your script there.
    2. PROFIT!

Now your VM will be shut-down when you shut-down your PC.

Another alternative was to have free web-site running on Azure, deploy there a Web-Job that constantly polls on your PC if it is working or not. And if your PC if offline, shut-down the VM. But this sounds like a lot of work and you’ll have to expose some sort of publicly available endpoint on your PC. So nothing fancy!

We are hitting the deck with our site performance and optimisation. It is fast, but we want it uber-fast! So next stage is to have IIS up and active all the time with all the views being compiled and ready before any user comes to them.

By default, IIS compiles views only when a request for that view comes in. So first time a user visits some rare page in your application, user is waiting a bit longer while IIS does Just-In-Time compilation. And actually if you look under the hood IIS does stacks of things before it shows you a web-site.

Despite of common believe, IIS does not run your web-application from /bin folder, it copies all required files to a temp folder. To be more specific, it copies files to c:\Windows\Microsoft.NET\Framework64\v4.0.30319\Temporary ASP.NET Files\. Reason for that – file locking. For just-in-time compilation, it needs to update binaries, but in /bin folder binaries can be locked.

Continue reading

I’ve updated one of our Azure-hosted projects from MVC3 to MVC5. And at the same time I’ve bumped from .Net 4 to version 4.5.1. And after deployment I’ve bumped into a cryptic error:

The feature named NetFx451 that is required by the uploaded package is not available in the OS * chosen for the deployment.

A quick search revealed that if your service is based on something below Windows Server 2012 R2, you’ll get this error for .Net 4.5.1. To fix this go into all your *.cscfg files and in the very top of the file, in the signature of <ServiceConfiguration> node, there is osFamily="3" property. Update this to osFamily="4":

image_476D46C2

Don’t forget to update this in all of your .cscfg files. By the way, this is recommended by Scott Gu in one of his announcements.

I did write about uploading files to Azure Storage via REST API before. It turns out that implementation is very naive. As soon as I tried uploading anything larger that 64Mb, I hid a brick wall with exceptions.

Azure Blob Storage has 2 types of blobs: Page Blobs and Block Blobs. Page Blobs are optimised for random read-write operations. Block Blobs are storage. Here more about Page vs Block blobs. If you are just storing files in Azure, you’ll most likely will use Block Blobs.

It turns out that Block Blob storage has some limitations on the upload front. In one go you can upload up to 64Mb: 1024 * 1024 * 64. Add one extra byte and you get an error from the API (I
tested it). So instead of uploading large files, you need to cut them into blocks and then upload separate pieces of no larger that 4Mb. And once all the pieces (Blocks) are uploaded, you need
to commit them all and give order in which they should appear.

Continue reading

Update: There is an updated implementation of this code: see this blog post

One of my applications has a feature where it is given a URL with Shared Access Signature to Azure Blob Storage and then via REST API it uploads files to the storage. You can do that by provided Azure Storage libraries. Originally I had that functionality implemented with these libraries. But now I’m slimming down the application (this is a small command line app) and getting rid of unnecessary dependencies. And for the sake of one upload method I don’t want to tie myself to masses of extra DLLs. So I’m replacing the libraries with REST API call.

I have spent some time trying to test this functionality. You may say that this is a not a unit test. Yes, it is not a unit test. Other questions? Also you may say that you shouldn’t test these kind of things and wrap a class around these and ignore. That is what I have done in the past, but that particular piece of code caused an endless pain because I’ve done it all wrong and it was not covered by tests.

Continue reading

ELMAH is a nifty tool that allows to you record exceptions that arise in your web-application. It is very handy and everyone should use it in addition to application logging. But if you are using Dependency Injection and have your connection strings provided by some service, then you might have trouble. For example, in our app, we use Azure Configuration settings and we have a service IConfiguration that has a lot of get-methods for different config settings in our app, including database connection string.

Continue reading