Azure Data Lake Storage Gen2 preview – More features, more performance, better availability

Since we announced the limited public preview of Azure Data Lake Storage (ADLS) Gen2 in June, the response has been resounding. Customers participating in the ADLS Gen2 preview have directly benefitted from the scale, performance, security, manageability, and cost-effectiveness inherent in the ADLS Gen2 offering. Today, we are very pleased to announce significant updates to the preview that will allow an even greater experience for customers.

Today’s announcements include additional features that preview customers have been asking for:

  • Enterprise-class security features integrated into Azure Databricks and Azure HDInsight (available shortly)
  • Azure Storage Explorer support to view and manage data in ADLS Gen2 accounts, including data exploration and access control management
  • Support for connecting external tables in SQL Data Warehouse, including when Storage Firewalls are active on the account
  • Power BI and SQL Data Warehouse supporting the Common Data Model for entities stored in ADLS Gen2
  • Storage Firewall and Virtual Network rules integration for all analytics services
  • Encryption of data at rest using either Microsoft or customer supplied keys as well as encryption in transit via TLS 1.2
  • Ability to mount an ADLS Gen2 filesystem into the Databricks File System (DBFS)

Additionally, as of today, the ADLS Gen2 public preview is fully open to all customers of Azure in all public and sovereign Azure regions. Customers may take advantage of the ABFS driver in HDInsight, Databricks, or SQL Data Warehouse with the Hierarchical Namespace enabled on all new accounts without any requirement to signup or be whitelisted.

Providing enterprise-class security for your Data Lake 

As customers create vast enterprise-wide repositories of data for analytics, not only do they need a storage solution that is capable of scaling and performing to meet their increasing demands, they MUST be able to secure this data. There are multiple aspects to securing the rich assets in an enterprise data lake:

  • Apply permissions so that only authorized users or groups may have access to read or write the data
  • Encrypt the data at rest (using own or system keys) and in transit to eliminate any possibility of loss of data
  • Provide transport-layer protections so even in the event of user credentials being compromised, the physical network layer provides protection

In the same manner as ADLS Gen1, ADLS Gen2 now provides both Role Based Access Control (RBAC) and POSIX-compliant Access Control Lists (ACLs) that restrict access to only authorized users, groups, or service principals in a flexible, fine-grained, and manageable manner. Authentication is via Azure Active Directory OAuth 2.0 bearer tokens which allows for flexible authentication schemes including federation with AAD Connect and multi-factor authentication for stronger protection than just passwords. More significantly, these authentication schemes are now integrated into the main analytics services including Azure Databricks, HDInsight, and SQL Data Warehouse as well as management tools such as Azure Storage Explorer. Once authenticated, permissions are applied at the finest granularity to ensure the right level of authorization for protecting an enterprise’s big data assets.

End-to-end encryption of data and transport layer protections complete the security shield for an enterprise data lake. Given that ADLS Gen2 is built on top of the Azure Blobs service, these existing capabilities that are already trusted by Blobs customers automatically apply to ADLS Gen2 data. The same set of analytics engines and tools are capable of taking advantage of these additional layers of protection, resulting in complete end to end protection of your analytics pipelines.

Your Data Lake is powered by performance

As we’ve discussed many times, the performance of the storage layer has an outsized impact on the total cost of ownership (TCO) for your complete analytics pipeline. This is due to the fact that every percentage point improvement in storage performance results in that same percentage reduction in the requirement for the very expensive compute layer. Given that the disaggregated storage model allows us to scale compute and storage independently, that percentage reduction in compute requirement results in almost the same (compute typically equates to 90 percent of the TCO) reduction in TCO.

So, when I say that ADLS Gen2 provides performance improvements ranging from 10-50 percent, depending on the nature of the workload over existing storage solutions, this equates to VERY significant reductions in the monthly analytics spend. It also has the added benefit of providing your insights sooner!

ADLS Gen2 is priced equivalent to general-purpose object (Blob) storage. However, all of the above performance and security features are now included at that price. This makes ADLS Gen2 the ideal environment to create or migrate your enterprise data lake as you get all of this dedicated functionality at commodity object storage prices.

Data Lakes everywhere

As ADLS Gen2 is a feature of the Azure Blobs service, it MUST be in ALL Azure regions. This is significant for enterprises that want to run their data lakes close to where their employees can gain benefit without the latency of travelling half way around the world. Many countries, and therefore enterprises, stipulate sovereignty requirements for where data may reside. Azure already has the biggest footprint of public cloud providers with regions around the world and now with ADLS Gen2 being available in all of those regions, customers can build their data lakes where ever they desire.

Start using Azure Data Lake Storage Gen2

To find out more you can:

Source: Azure Blog Feed

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.