Azure Event Hubs integration with Apache Spark now generally available
The Event Hubs team is happy to announce the general availability of our integration with Apache Spark. Now, Event Hubs users can use Spark to easily build end-to-end streaming applications. The Event Hubs connector for Spark supports Spark Core, Spark Streaming, and Structured Streaming for Spark 2.1, Spark 2.2, and Spark 2.3.
For users new to Spark, Spark Streaming and Structured Streaming are scalable, fault-tolerant stream processing engines. These processing engines allow users to process huge amounts of data using complex algorithms expressed with high-level functions like map, reduce, join, and window. This data can then be pushed to file systems, databases, or even back to Event Hubs.
Setting up a stream is easy, check it out:
import org.apache.spark.eventhubs._ import org.apache.spark.sql.SparkSession val eventHubsConf = EventHubsConf("{EVENT HUB CONNECTION STRING FROM AZURE PORTAL}") .setStartingPosition(EventPosition.fromEndOfStream) // Create a stream that reads data from the specified Event Hub. val spark = SparkSession.builder.appName("SimpleStream").getOrCreate() val eventHubStream = spark.readStream .format("eventhubs") .options(eventHubsConf.toMap) .load()
It's as easy as that! Once your events are streaming into Spark, you can process them as you wish. Spark provides a variety of processing options, such as graph analysis and machine learning. Our documentation has more details on linking our connector with your project!
The project is open source and available on GitHub. All details and documentation can be found there. Any and all community involvement is welcome, come say hello! If you like what you see, please star the repo to show your support!
Finally, if you have any questions, comments, feedback, please join our gitter chat. Contributors are in the channel to chat and answer questions as they come up, Enjoy the connector!
Next steps
Source: Azure Blog Feed