In this meetup, we’re diving deeper into the newest real-time analytics database on the market: Druid.
We’ll discuss how to tune druid clusters at high-scale (several million events per second)
and how to run queries quickly that can handle this high traffic.
You’ll hear from our expert speakers from ironSource, Lyft, and Imply about how each company deploys druid and creates the best architectures for this cutting-edge technology.
Elad Eldor is a Data Infrastructure Team Leader at ironSource, working mainly with Druid, Kafka, Presto and Spark on AWS. He has 12 years of experience as a Java software engineer and 5 years as a SRE in big data linux-based clusters. Before joining ironSource, Elad was a SRE at Verint (currently Cognyte), where he developed big data applications (using Spark, Hadoop and Kafka) and handled the reliability and scalability of Spark and Kafka clusters in production. His main interests are JVM tuning, performance tuning, and cost reduction of big data clusters (Kafka, Druid, Spark, Presto).
Jonathan Kaplan is a Data Infrastructure Engineer at ironSource, specializing in performance tuning and deployments of Big Data technologies such as Druid, Redshift, and EMR (Trino and Spark) on AWS. Prior to joining ironSource, Jonathan was a DBA Team Lead at the Israeli Military Intelligence, and a Data Engineering Team Lead as part of the IDF Covid-19 Task Force with the Israeli Ministry of Health.
Rachel Pedreschi is the VP of Community & Developer Relations at Imply. A "Data Geek-ette”, Rachel is no stranger to the world of high-performance databases and data warehouses. She is a Vertica, Informix and Redbrick certified DBA on top of her work with Cassandra and has 20+ years of business intelligence and ETL tool experience. Rachel has an MBA from San Francisco State University and a BA in Mathematics from University of California, Santa Cruz.
Decision making is changing: Apache Druid is a new type of database for creating the next generation of analytics applications that maximize flexible exploration over fresh, fast-arriving data. In this talk, Rachel Pedreschi introduces these new "immediate intelligence" applications, tells the story of Druid's emergence, and describes how data pipelines built with Druid differ from those you may already be familiar with.
In this talk, we'll learn more about how Lyft builds data pipelines using Apache Druid, which is useful for several use cases including metrics tracking, model forecasting, and internal tools. We'll also talk about the challenges we faced while setting up our real-time ingestion pipeline into Druid using Apache Flink and Kafka, and how we went about solving them.
Arrival instructions:
To get to our offices, please go through the building's main lobby (floor 0) and pass through the turnstiles to get to the elevators. Head to Elevator Group A and go to floor 12.
