The Learn Druid repository contains all manner of resources to help you learn and apply Apache Druid®.
It contains:
- Jupyter Notebooks that guide you through query, ingestion, and data management with Apache Druid.
- A Docker Compose file to get you up and running with a learning lab.
Suggestions or comments? Call into the discussions. Found a problem or want to request a notebook? Raise an issue. Want to contribute? Raise a PR.
Contributions to this community resource are welcome! Contribute your own notebook on a topic that's not listed here, and check out the issue list, where you'll find bugs and enhancement requests.
Come meet the friendly Apache Druid community if you have any questions about the functionality you see here.
Imply's courses on Apache Druid at https://learn.imply.io have additional commentary for these notebooks, and you can earn a certificate of completion.
If your team is just getting to know Druid, Imply also offer bookable team tech talks on the basics of Apache Druid. And if you want to check whether Apache Druid is the right fit, or would like to get hints on the functionality you should look at, book one of Imply's getting started with Druid meetings.
To use the "Learn Druid" Docker Compose, you need:
-
Git or Github Desktop
-
Docker Desktop with Docker Compose
-
A machine with at least 6 GiB of RAM.
Of course, more power is better. The notebooks have been tested with the following resources available to docker: 6 CPUs, 8GB of RAM, and 1 GB swap.
To get started quickly:
-
Clone the repository:
git clone https://github.com/implydata/learn-druid
-
Navigate to the directory:
cd learn-druid
-
Launch the environment:
docker compose --profile druid-jupyter up -d
The first time you launch the environment, it can take a while to start all the services.
-
Navigate to Jupyter Lab in your browser at
http://localhost:8889/lab
.
From there you can read the introduction or use Jupyter Lab to navigate the notebooks folder. -
When you're finished, stop all services:
docker compose --profile druid-jupyter down
Once you have cloned the repository, get the latest version as follows:
git restore .
git pull
While using the notebooks, monitor ingestion tasks, compare query results, and more in the web console directly at http://localhost:8888
.
Individual notebooks may state a specific compose profile that you need to use.
Specify the profile after the --profile
parameter to the docker compose
command. For example, to start with the all-services
profile, use this command:
docker compose --profile all-services up -d
To stop all services:
docker compose --profile all-services down
To stop all services without keeping any data:
docker compose --profile all-services down -v
Run the notebooks against an existing Apache Druid database using the DRUID_HOST
parameter and the jupyter
profile.
DRUID_HOST=[host address] docker compose --profile jupyter up -d
When you have Druid running on the local machine, use host.docker.internal
as the host address.
DRUID_HOST=host.docker.internal docker compose --profile jupyter up -d
The Learn Druid environment includes the following services:
Jupyter Lab: An interactive environment to run Jupyter Notebooks. The image for Jupyter used in the environment contains Python along with all the supporting libraries you need to run the notebooks.
Apache Kafka: Streaming service as a data source for Druid.
Imply Data Generator: A tool to generate sample data for Druid. It can produce either batch or streaming data.
Apache Druid: The currently released version of Apache Druid by default.
This repository is not affiliated with, endorsed by, or otherwise associated with the Apache Software Foundation (ASF) or any of its projects. Apache, Apache Druid, Druid, and the Druid logo are either registered trademarks or trademarks of ASF in the USA and other countries.