I’d like to announce fluent-plugin-documentdb finally supports Azure DocumentDB Partitioned collections for higher storage and throughput. If you’re not familiar with fluent-plugin-documentdb, read my previous article before move on.
Partitioned collections is kick-ass feature that I had wanted to support in fluent-plugin-documentdb since the feature came out public (see the announcement). For big fan of fluent-plugin-documentdb, sorry for keeping you waiting for such a long time 🙂 If I may make excuses, I would say I haven’t had as much time on the project, and I had to do ruby client implementation of Partitioned collections by myself as there is no official DocumentDB Ruby SDK that supports it (As a result I’ve created tiny Ruby DocumentDB client libraries that support the feature. Check this out if you’re interested).
According to official documentation, Partitioned collections can span multiple partitions and support very large amounts of storage and throughput. You must specify a partition key for the collection. Partitioned collections can support larger data volumes and process more requests compared to Single-partitioned collection.
Partitioned collections support up to 250 GB of storage and 250,000 request units per second of provisioned throughput [Updated Aug 21, 2016] (@arkramac pointed that out for me) Partitioned collections support unlimited storage and throughput. 250GB storage and 250k req/sec are soft cap. You can increase these limits by contacting and asking Azure support.
On the other hand, Single-partition collections have lower price options and the ability to query and perform transactions across all collection data. They have the scalability and storage limits of a single partition. You do not have to specify a partition key for these collections.
You can create Partitioned collections via the Azure portal, REST API ( >= version 2015-12-16), and client SDKs in .NET, Node.js, Java, and Python. In addition, you let fluent-plugin-documentdb create Partitioned collections automatically by adding the following configuration options upon the ones for single-partitioned collection in fluentd.conf:
It creates a partitioned collection as you configure in starting the plugin if not exist at that time.
Suppose that you want to read Apache access log as source for fluentd, and that you pick “host” as a partition Key for the collection, you can configure the plugin like this following:
Basically that’s all additional configuration for Partitioned collections. Please refer to my previous article for the rest of setup and running work for the plugin.
Happy log collections with fluent-plugin-documentdb!!