English

This is an article on how you can achieve Azure Media Services (AMS) streaming traffic distribution with Traffic Manager.

The process for a client to find target AMS streaming endpoints

The figure shows how a client find target AMS streaming endpoints with Traffic Manager and requests from video players are distributed to streaming endpoints in AMS:

Controling Azure Media Services Traffic with Traffic Manager

When AMS endpoints are added to an Azure Traffic Manager profile, Azure Traffic Manager keeps track of the status of the endpoints (running, stopped, or deleted) so that it can decide which of those endpoints should receive traffic. You can configure the way to route network traffic to the endpoints by choosing traffic routing methods available in Traffic Manager.

Configuration procedure

Suppose that you have 2 AMS accounts (amsaccount1, amsaccount2), and that you want to distribute requests to your Traffic Manager domain (myamsstreaming.trafficmanager.net) from video player clients to streaming endpoints in AMS. When AMS endpoints are added to an Azure Traffic Manager profile (myamsstreaming), Azure Traffic Manager keeps track of the status of your AMS streaming endpoints (running, stopped, or deleted) so that it can decide which of those endpoints should receive traffic. However, when it comes to AMS endpoints, it has to be toward a custom domain, NOT simply a Traffic Manager domain for clients’ traffic to be distributed into AMS endpoints using Traffic Manager. So let’s suppose you prepare a custom domain: streaming.mydomain.com.

Example accounts and domains

(1) Point a custom domain to a Traffic Manager domain

You add the following alias (CNAME) to point the custom domain to the traffic manager domain:

streaming.mydomain.com IN CNAME myamsstreaming.trafficmanager.net

Useful Link for this step: Point a company Internet domain to an Azure Traffic Manager domain

(2) Add Azure Media Services origin hosts to the Traffic Manager

Add AMS endpoints to an Azure Traffic Manager profile as “External Endpoint” via either Azure Portal or Azure CLI / PowerShell. Here is an image of adding endpoints in Azure portal:

Add endpoints to Traffic Manager

Useful Link for this step: Add, disable, enable, or delete endpoints

(3) Custom domain name ownership verification

First of all, get Media Service Account IDs (GUID) for your AMS accounts in this step. To find the Azure Media Service ID , go to the Azure portal and select your Media Service account. The Azure Media Service ID appears on the right of the DASHBOARD page. Let’s suppose you get the following account IDs for AMS accounts (amsaccount1, amsaccount2):

Then, create CNAME that maps (accountId).(parent custom domain) to verifydns.(mediaservices-dns-zone). This is necessary to proves that the Azure Media Services ID has the ownership of the custom domain. Here are CNAMEs to create in this step:

## <MediaServicesAccountID>.<custom parent domain> IN CNAME verifydns.<mediaservices-dns-zone>

## Custom name Ownership verification for amsaccount1
8dcbe520-59c7-4591-8d98-1e765b7f3729.mydomain.com IN CNAME  verifydns.mediaservices.windows.net

## Custom name Ownership verification for amsaccount2
5e0e6784-4ed0-40a0-8444-33da6d4f7171.mydomain.com IN CNAME  verifydns.mediaservices.windows.net

Refer to CustomHostNames section of StreamingEndpoint document to learn more about the configuration in this step.

(4) Add Custom host name to each Azure Media Service streaming endpoint

Once you completed “Custom domain name ownership verification” configuration, you then need to configure to add custom host name to each Azure Media Service streaming endpoint. Unfortunately new portal doesn’t include capability to add custom domain to the streaming endpoint. You can set it either using REST API directly or using Azure Media Explorer. Here is how you add custom host name to the streaming endpoint in Azure Media Explorer:

azure-media-explorer-custom-host-streaming-endpoint

Choose “Streaming endpoint” in the top menu, right click on your streaming endpoint, and select “Streaming endpoint information and settings”. In Streaming endpoint information form, type in your custom host name for the endpoint, and click “Update settings and close”. That’s it.

(5) Test video playback with your custom domain

Once all configurations above are completed (+ DNS settings are reflected), you will see the custom domain name lookup points to either of AMS endpoints added to the traffic manager like this:

$ dig streaming.mydomain.com

;; ANSWER SECTION:
streaming.mydomain.com. 600 IN CNAME myamsstreaming.trafficmanager.net.
myamsstreaming.trafficmanager.net. 300 IN CNAME amsaccount1.streaming.mediaservices.windows.net.
amsaccount1.streaming.mediaservices.windows.net. 60 IN CNAME wamsorigin-origin-903f1f37105244fba2270ae7b64021bd.cloudapp.net.
wamsorigin-origin-903f1f37105244fba2270ae7b64021bd.cloudapp.net. 60 IN A 104.215.4.76

Make sure to check if the custom name lookup points to the other endpoint when one of the endpoints are down.

Finally, check if you can playback video with your custom domain.

$ curl http://amsaccount1.streaming.mediaservices.windows.net/ee2e5286-d3fa-40fc-a393-c3c2a3ca5a84/BigBuckBunny.ism/manifest
<?xml version="1.0" encoding="UTF-8"?><SmoothStreamingMedia MajorVersion="2" MinorVersion="2" Duration="84693333" TimeScale="10000000"><StreamIndex Chunks="2" Type="audio" Url="QualityLevels({bitrate})/Fragments(aac_eng_2_128={start time})" QualityLevels="1" Language="eng" Name="aac_eng_2_128"><QualityLevel AudioTag="255" Index="0" BitsPerSample="16" Bitrate="128000" FourCC="AACL" CodecPrivateData="1190" Channels="2" PacketSize="4" SamplingRate="48000" /><c t="0" d="60160000" /><c d="24533333" /></StreamIndex><StreamIndex Chunks="2" Type="video" Url="QualityLevels({bitrate})/Fragments(video={start time})" QualityLevels="5"><QualityLevel Index="0" Bitrate="2896000" FourCC="H264" MaxWidth="1280" MaxHeight="720" CodecPrivateData="000000016764001FACD9405005BB011000000300100000030300F18319600000000168EBECB22C" /><QualityLevel Index="1" Bitrate="1789000" FourCC="H264" MaxWidth="960" MaxHeight="540" CodecPrivateData="000000016764001FACD940F0117EF011000003000100000300300F1831960000000168EBECB22C" /><QualityLevel Index="2" Bitrate="946000" FourCC="H264" MaxWidth="640" MaxHeight="360" CodecPrivateData="000000016764001EACD940A02FF97011000003000100000300300F162D960000000168EBECB22C" /><QualityLevel Index="3" Bitrate="612000" FourCC="H264" MaxWidth="480" MaxHeight="270" CodecPrivateData="0000000167640015ACD941E08FEB011000000300100000030300F162D9600000000168EBECB22C" /><QualityLevel Index="4" Bitrate="324000" FourCC="H264" MaxWidth="320" MaxHeight="180" CodecPrivateData="000000016764000DACD941419F9F011000000300100000030300F14299600000000168EBECB22C" /><c t="0" d="60000000" /><c d="24583333" /></StreamIndex></SmoothStreamingMedia>

$ curl http://streaming.mydomain.com/ee2e5286-d3fa-40fc-a393-c3c2a3ca5a84/BigBuckBunny.ism/manifest
<?xml version="1.0" encoding="UTF-8"?><SmoothStreamingMedia MajorVersion="2" MinorVersion="2" Duration="84693333" TimeScale="10000000"><StreamIndex Chunks="2" Type="audio" Url="QualityLevels({bitrate})/Fragments(aac_eng_2_128={start time})" QualityLevels="1" Language="eng" Name="aac_eng_2_128"><QualityLevel AudioTag="255" Index="0" BitsPerSample="16" Bitrate="128000" FourCC="AACL" CodecPrivateData="1190" Channels="2" PacketSize="4" SamplingRate="48000" /><c t="0" d="60160000" /><c d="24533333" /></StreamIndex><StreamIndex Chunks="2" Type="video" Url="QualityLevels({bitrate})/Fragments(video={start time})" QualityLevels="5"><QualityLevel Index="0" Bitrate="2896000" FourCC="H264" MaxWidth="1280" MaxHeight="720" CodecPrivateData="000000016764001FACD9405005BB011000000300100000030300F18319600000000168EBECB22C" /><QualityLevel Index="1" Bitrate="1789000" FourCC="H264" MaxWidth="960" MaxHeight="540" CodecPrivateData="000000016764001FACD940F0117EF011000003000100000300300F1831960000000168EBECB22C" /><QualityLevel Index="2" Bitrate="946000" FourCC="H264" MaxWidth="640" MaxHeight="360" CodecPrivateData="000000016764001EACD940A02FF97011000003000100000300300F162D960000000168EBECB22C" /><QualityLevel Index="3" Bitrate="612000" FourCC="H264" MaxWidth="480" MaxHeight="270" CodecPrivateData="0000000167640015ACD941E08FEB011000000300100000030300F162D9600000000168EBECB22C" /><QualityLevel Index="4" Bitrate="324000" FourCC="H264" MaxWidth="320" MaxHeight="180" CodecPrivateData="000000016764000DACD941419F9F011000000300100000030300F14299600000000168EBECB22C" /><c t="0" d="60000000" /><c d="24583333" /></StreamIndex></SmoothStreamingMedia>

Useful Links

Logstash is an open source, server-side data processing pipeline that ingests data from a multitude of sources simultaneously, transforms it, and then sends it to your favorite destinations. Here is a list of logstash plugins for Microsoft Azure Services.

Plugin NameTarget Azure ServicesNote
logstash-input-azureeventhubEventHubLogstash input plugin reads data from specified Azure Event Hubs
logstash-input-azureblobBlob StorageLogstash input plugin that reads and parses data from Azure Storage Blobs
logstash-input-azuretopicService Bus TopicLogstash input plugin reads messages from Azure Service Bus Topics
logstash-input-azuretopicthreadableService Bus TopicLogstash input plugin reads messages from Azure Service Bus Topics using multiple threads
logstash-output-applicationinsightsApplication InsightsLogstash output plugin that store events to Application Insights
logstash-input-azurewadtable Table StorageLogstash input plugin for Azure Diagnostics. Specifically pulling diagnostics data from Windows Azure Diagnostics tables
logstash-input-azurewadeventhubEventHubLogstash input plugin reads Azure diagnostics data from specified Azure Event Hubs and parses the data for output
logstash-input-azurewadtable Table StorageLogstash input plugin reads Azure diagnostics data from specified Azure Storage Table and parses the data for output
logstash-output-documentdbDocumentDBlogstash output plugin that stores events to Azure DocumentDB
logstash-output-azuresearchAzure Searchlogstash output plugin that stores events to Azure Search
logstash-output-azure_loganalyticsLog Analyticslogstash output plugin that stores events to Azure Log Analytics
logstash-input-jdbc
SQL Database, Azure Database for MySQL/PostgreSQLInput plugin to ingest data in any database with a JDBC interface into Logstash that support most of major RDBMS such as MySQL、PostgreSQL、OracleDB、Microsoft SQL, etc

(as of Dec 29, 2016)



logstash

In this article, I’d like to introduces a solution to collect events from various sources and send them into HTTP Trigger function in Azure Functions using fluent-plugin-azurefunctions. Triggers in Azure Functions are event responses used to trigger your custom code. HTTP Trigger functions allow you to respond to HTTP events sent from fluentd and cook them into whatever you want!


fluent-plugin-azurefunctions

[note] Azure Functions is a (“serverless”) solution for easily running small pieces of code, or “functions,” in Azure. Fluentd is an open source data collector, which lets you unify the data collection and consumption for a better use and understanding of data. fluent-plugin-azurefunctions is a fluentd output plugin that enables to collect events into Azure Functions.

Pre-requisites

Setup: Azure Functions (HTTP Trigger Function)

Create a function (HTTP Trigger). First, you need to have an function app that hosts the execution of your functions in Azure if you don’t already have. Once you have an function app, you can create a function. Here are instructions:

A quick-start HTTP trigger function sample is included under examples/function-csharp in Github repository. You simply need to save the code (run.csx) and configuration files (function.json, project.json) in the same Azure function folder. Explaining a little bit about each of files, the function.json file defines the function bindings and other configuration settings. The runtime uses this file to determine the events to monitor and how to pass data into and return data from function execution. The project.json defines packages that the application depends. The run.csx is a core application file where you write your code to process Your jobs. Here is a sample run.csx:

Setup: Fluentd

First of all, install Fluentd. The following shows how to install Fluentd using Ruby gem packger but if you are not using Ruby Gem for the installation, please refer to this installation guide where you can find many other ways to install Fluentd on many platforms.

# install fluentd
sudo gem install fluentd --no-ri --no-rdoc

# create fluent.conf
fluentd --setup <directory-path-to-fluent-conf>

Also, install fluent-plugin-azurefunctions for fluentd aggregator to send collected event data into Azure Functions.

sudo gem install fluent-plugin-azurefunctions

Next, configure fluent.conf, a fluentd configuration file as follows. Please refer to this for fluent-plugin-azurefunctions configuration. The following is a sample configuration where the plugin writes only records that are specified by key_names in incoming event stream out to Azure Functions:

# This is used by event forwarding and the fluent-cat command
<source>
    @type forward
    @id forward_input
</source>

# Send Data to Azure Functions
<match azurefunctions.**>
    @type azurefunctions
    endpoint  AZURE_FUNCTION_ENDPOINT   # ex. https://<accountname>.azurewebsites.net/api/<functionname>
    function_key AZURE_FUNCTION_KEY     # ex. aRVQ7Lj0vzDhY0JBYF8gpxYyEBxLwhO51JSC7X5dZFbTvROs7uNg==
    key_names key1,key2,key3
    add_time_field true
    time_field_name mytime
    time_format %s
    localtime true
    add_tag_field true
    tag_field_name mytag
</match>

[note] If key_names not specified above, all incoming records are posted to Azure Functions (See also this).

Finally, run fluentd with the fluent.conf that you configure above.

fluentd -c ./fluent.conf -vv &

TEST

Let’s check if test events will be sent to Azure Functions that triggers the HTTP function (let’s use the sample function included in Github repo this time). First, generate test events using fluent-cat like this:

echo ' { "key1":"value1", "key2":"value2", "key3":"value3"}' | fluent-cat azurefunctions.msg

As both add_time_field and add_tag_field are enabled, time and tag fields are added to the record that are selected by key_names before posting to Azure Functions, thus actual HTTP Post request body would be like this:

{
    "payload": '{"key1":"value1", "key2":"value2", "key3":"value3", "mytime":"1480195100", "mytag":"azurefunctions.msg"}'
}

If events are sent to the function successfully, a HTTP trigger function handles the events and the following logs can be seen in Azure Functions log stream:

2016-11-26T21:18:55.200 Function started (Id=5392e7ae-3b8e-4f65-9fc1-6ae529cdfe3a)
2016-11-26T21:18:55.200 C# HTTP trigger function to process fluentd output request.
2016-11-26T21:18:55.200 key1=value1
2016-11-26T21:18:55.200 key2=value2
2016-11-26T21:18:55.200 key3=value3
2016-11-26T21:18:55.200 mytime=1480195100
2016-11-26T21:18:55.200 mytag=azurefunctions.msg
2016-11-26T21:18:55.200 Function completed (Success, Id=5392e7ae-3b8e-4f65-9fc1-6ae529cdfe3a)

Advanced Senarios

1. Near Real-time processing

Function Apps can output messages to different means or data stores. For example, fluentd collects events generated from IoT devices and send them to Azure Function, and the the HTTP trigger function transforms the events and processes the data to store in a persistent storage or to pass them to different means. Here are some of options available at the time of writing:

2. Background jobs processing

If the jobs are expected to be large long running ones, it’s recommended that you refactor them into smaller function sets that work together and return fast responses. For example, you can pass the HTTP trigger payload into a queue to be processed by a queue trigger function. Or if the payload is too big to pass into the queue, you can store them onto Azure Blob storage at first, then pass only limited amount of the data into a queue just to trigger background workers to process the actual work. These approaches allow you to do the actual work asynchronously and return an immediate response.

LINKS

END

Here is a list of embulk plugins that you can leverage to transfer your data between Microsoft Azure Services and various other databases/storages/cloud services.

Plugin NameTarget Azure ServicesNote
embulk-output-azure_blob_storageBlob StorageEmbulk output plugin that stores files onto Microsoft Azure Blob Storage
embulk-input-azure_blob_storageBlob StorageEmbulk input plugin that reads files stored on Microsoft Azure Blob Storage.
embulk-output-sqlserverSQL Databases, SQL DWHEmbulk output plugin that Inserts or updates records to SQL server type of services like SQL DB/SQL DWH
embulk-input-sqlserverSQL Databases, SQL DWHEmbulk input plugin that selects records from SQL type of services like SQL DB/SQL DWH
embulk-output-documentdbDocumentDBEmbulk output plugin that dumps records to Azure DocumentDB
embulk-output-azuresearchAzure SearchEmbulk output plugin that dumps records to Azure Search

(as of Aug 30, 2016)

For embulk, check this site: https://github.com/embulk/embulk



embulk-screenshot

I’d like to announce fluent-plugin-documentdb finally supports Azure DocumentDB Partitioned collections for higher storage and throughput. If you’re not familiar with fluent-plugin-documentdb, read my previous article before move on.

Partitioned collections is kick-ass feature that I had wanted to support in fluent-plugin-documentdb since the feature came out public (see the announcement). For big fan of fluent-plugin-documentdb, sorry for keeping you waiting for such a long time 🙂 If I may make excuses, I would say I haven’t had as much time on the project, and I had to do ruby client implementation of Partitioned collections by myself as there is no official DocumentDB Ruby SDK that supports it (As a result I’ve created tiny Ruby DocumentDB client libraries that support the feature. Check this out if you’re interested).


fluentd-azure-documentdb-collection

What are Partitioned collections?

According to official documentation, Partitioned collections can span multiple partitions and support very large amounts of storage and throughput. You must specify a partition key for the collection. Partitioned collections can support larger data volumes and process more requests compared to Single-partitioned collection. Partitioned collections support up to 250 GB of storage and 250,000 request units per second of provisioned throughput [Updated Aug 21, 2016] (@arkramac pointed that out for me) Partitioned collections support unlimited storage and throughput. 250GB storage and 250k req/sec are soft cap. You can increase these limits by contacting and asking Azure support.

On the other hand, Single-partition collections have lower price options and the ability to query and perform transactions across all collection data. They have the scalability and storage limits of a single partition. You do not have to specify a partition key for these collections.

Creation of Partitioned collections

You can create Partitioned collections via the Azure portal, REST API ( >= version 2015-12-16), and client SDKs in .NET, Node.js, Java, and Python. In addition, you let fluent-plugin-documentdb create Partitioned collections automatically by adding the following configuration options upon the ones for single-partitioned collection in fluentd.conf:

It creates a partitioned collection as you configure in starting the plugin if not exist at that time.

Configuration Example

Suppose that you want to read Apache access log as source for fluentd, and that you pick “host” as a partition Key for the collection, you can configure the plugin like this following:

<source>
    @type tail                          # input plugin
    path /var/log/apache2/access.log   # monitoring file
    pos_file /tmp/fluentd_pos_file     # position file
    format apache                      # format
    tag documentdb.access              # tag
</source>

<match documentdb.*>
    @type documentdb
    docdb_endpoint https://yoichikademo.documents.azure.com:443/
    docdb_account_key Tl1xykQxnExUisJ+BXwbbaC8NtUqYVE9kUDXCNust5aYBduhui29Xtxz3DLP88PayjtgtnARc1PW+2wlA6jCJw==
    docdb_database mydb
    docdb_collection my-partitioned-collection
    auto_create_database true
    auto_create_collection true
    partitioned_collection true
    partition_key host
    offer_throughput 10100
    localtime true
    time_format %Y%m%d-%H:%M:%S
    add_time_field true
    time_field_name time
    add_tag_field true
    tag_field_name tag
</match>

Basically that’s all additional configuration for Partitioned collections. Please refer to my previous article for the rest of setup and running work for the plugin.

Happy log collections with fluent-plugin-documentdb!!

LINKS

In this article, I’d like to introduces a solution to collect logs and store them into Azure DocumentDB using fluentd and its plugin, fluent-plugin-documentdb.

Azure DocumentDB is a managed NoSQL database service provided by Microsoft Azure. It’s schemaless, natively support JSON, very easy-to-use, very fast, highly reliable, and enables rapid deployment, you name it. Fluentd is an open source data collector, which lets you unify the data collection and consumption for a better use and understanding of data. fluent-plugin-documentdb is fluentd output plugin that enables to store event collections into Azure DocumentDB.

This article shows how to

UPDATED:

Here is a list of fluentd plugins for Microsoft Azure Services.

Plugin NameTarget Azure ServicesNote
fluent-plugin-azurestorageBlob StorageAzure Storate output plugin buffers logs in local file and upload them to Azure Storage periodicall
fluent-plugin-azureeventhubsEvent HubsAzure Event Hubs buffered output plugin for Fluentd. Currently it supports only HTTPS (not AMQP)
fluent-plugin-azuretablesAzure TablesFluent plugin to add event record into Azure Tables Storage
fluent-plugin-azuresearchAzure SearchFluent plugin to add event record into Azure Search
fluent-plugin-documentdbDocumentDBFluent plugin to add event record into Azure DocumentDB
fluent-plugin-azurefunctionsAzure FunctionsAzure Functions (HTTP Trigger) output plugin for Fluentd. The plugin aggregates semi-structured data in real-time and writes the buffered data via HTTPS request to HTTP Trigger Function.
fluent-plugin-azure-loganalyticsLog AnalyticsAzure Log Analytics output plugin for Fluentd. The plugin aggregates semi-structured data in real-time and writes the buffered data via HTTPS request to Azure Log Analytics.

(as of Nov 23, 2016)



fluentd

I’ve just noticed that some of patents, which I was involved in for its publishing process while I was at Yahoo! JAPAN, have been granted. Well, what I’ve done for them is just a tiny part but patent is patent, it’s good news anyway.