Skip to main content

Data Streaming

The Djuno Data Stream offers a seamless cloud-based platform for users to stream, process, and integrate real-time data with ease.

Djuno Support avatar
Written by Djuno Support
Updated this week

Overview

The Djuno Data Stream offers a seamless cloud-based platform for users to stream, process, and integrate real-time data effortlessly. This setup is part of a comprehensive cloud management platform where users can create and manage various data streaming services. The Create a Service button is the gateway to setting up a new data stream, allowing users to quickly deploy, configure, and scale their data pipelines according to their needs.

How create a data streaming

When creating a new data stream service in Djuno Cloud, users are guided through a series of configuration steps to customize their data streaming setup. First, users select the streaming service, such as Kafka, Kafka MirrorMaker, or Kafka Connect, each designed for specific data streaming and integration needs. After choosing the service, users can pick a version, such as Kafka 3.8, to ensure compatibility with their system. The next step is to choose a service plan, such as Business or Enterprise, based on requirements like RAM, vCores, storage, and node count. Hosting regions, like Beauharnois, Frankfurt, Gravelines, Strasbourg, London, and Warsaw, can be selected to ensure optimal data processing performance and compliance with data residency requirements. Users then select the node template, which includes choices for vCores, memory, storage, and estimated costs per hour or month. Lastly, the number of nodes can be adjusted to scale the data stream for higher availability or processing capacity. Once all selections are made, users click Order to deploy the data stream service, offering flexibility to meet both technical and budgetary needs.

After creating a Kafka service, users will find additional options in the Kafka cluster overview, such as General Information, Edit, and Delete functionalities.

The Cluster Overview provides key details about the Kafka instance, including the cluster name, unique ID, and the current status, which shows whether the service is in the process of being created or is active. The service operates on a specified version, such as Kafka 3.8, and follows the chosen service plan (e.g., Business). Users can view and adjust configurations for CPU, RAM, and storage allocation, with options for upgrading these resources based on their requirements. The instance is hosted in a specific datacenter, such as BHS, and uses first-generation remote storage.

Login information provides the URI and host details for connecting to the Kafka cluster, along with SSL requirements for secure communication. Users can manage authorized IP addresses to control access to the service. The Kafka REST API allows users to interact with the Kafka cluster via HTTP requests, making integration with external systems easier. The Schema Registry section provides a REST service for storing and managing Apache Kafka schemas, using the Karapace project.

In the configuration section, users can set maintenance schedules, manage network settings, and configure authorized IP addresses for security. The Kafka cluster is hosted on a public network, and backup and recovery options are available to ensure data protection.

users

This tab allows administrators to manage data stream users by adding new users, and viewing details such as usernames, creation date, and account status. Each user’s entry includes their status (e.g., READY), providing an overview of their current access. The dropdown menu for each user offers options to delete the user, reset their password, view the certificate, or view the access key for Kafka, giving administrators quick controls for managing user access and ensuring the security of the data stream service.

Create user:

ACL

The ACL (Access Control List) section allows administrators to manage user permissions within the data stream service. By adding a new entry, administrators can specify a user, the corresponding Kafka topic, and the permission level (e.g., admin, read, write). This setup ensures that only authorized users are granted the appropriate permissions to access and interact with specific Kafka topics, enhancing security and enabling controlled access to the data stream. The ACL feature helps safeguard data and ensures that only the right users have the correct level of access to critical streaming data.


Create ACL:

Click the "Add a New Entry" button to open the "Add an Access" modal, where you can assign specific permissions to a user for Kafka topics. In this modal, you can specify the Username, Topic, and select the appropriate Permission level (e.g., read, write, or admin). After filling in the details, click "Add an access" to save the entry, or click "Cancel" to exit without making changes. This provides administrators with the ability to manage user access to Kafka topics efficiently.

Authorised IPs

In the Authorized IP tab, you can manage the IP addresses allowed to access the data stream. The table displays each IP address or mask along with a corresponding description. You can use the Edit IP Address option to modify the existing IP address or the Delete IP Address option to remove it from the allowed list.

Create IP:

click Add on IP or IP block(CIDR) to create new IP address

Edit IP:

click Edit IP Address option from dropdown to edit description

Logs

In the Logs tab, you can monitor and manage your data stream service by viewing the latest events (logs) in nearly real-time. The retention period for these logs varies depending on your service plan, allowing you to track and analyze the stream's performance and activity over time.

Metrics

To help you track and manage your data stream service, you can view its main metrics and statistics below. The retention period for these metrics depends on your service plan, allowing you to monitor the performance and health of your data streams over time.

Topics

In Kafka, a topic is a named stream or category to which producers publish data (messages or records). It serves as the central abstraction in Kafka’s publish-subscribe model, enabling efficient message distribution. Data is written to topics by producers and can be consumed by consumers interested in that specific topic.

A topic is divided into partitions, which allows Kafka to distribute the data across multiple servers, providing parallel processing and increasing throughput. Each partition can be replicated across different nodes to ensure fault tolerance and high availability. The number of partitions and replicas can be configured according to the system's scalability and reliability needs.

Kafka topics are configured with retention policies, meaning data within a topic can either be kept for a specified period or until a certain size limit is reached. When the retention criteria are met, Kafka can automatically delete or compact old data, depending on the configured deletion policy.

In essence, topics provide an organized and scalable way of managing data streams, ensuring that Kafka can handle large volumes of real-time data with high availability and fault tolerance.

Create topic:

When creating a Kafka topic, you will need to provide a Topic name to identify the stream of data. There is also an option to access Advanced configuration for further customization of the topic. You can set the Replication to determine the number of replicas for the topic, which helps ensure fault tolerance and availability, such as setting it to 3 replicas. The Partitions setting defines how the data is distributed and processed in parallel, with the option to specify the number of partitions, like 1 partition.

Additionally, the Retention time (hours) allows you to specify how long the data will be retained in the topic before being deleted, with a value of -1 indicating unlimited retention. The Minimum in-sync replica setting defines the minimum number of replicas that must remain in sync for the topic to stay available, for example, 2 replicas. The Retention size (bytes) defines the maximum size of the data to be kept before older records are deleted, with -1 meaning no size limit. The Deletion Policy option lets you choose how data should be deleted, with the default setting being the typical option.

Once you have configured these settings, you can click Create topic to finalize the creation of the topic, or click Cancel if you wish to discard the changes.

Service integration

Service Integration in the context of data streaming refers to the process of linking various software services or applications to work together efficiently. It enables seamless communication, data sharing, and coordinated actions between different systems, tools, or platforms.

In the Service Integration tab, you can manage the connections between your data streaming solution and other services. The "Add an integration" option allows you to connect different services, making the integration process straightforward. Additionally, there are options available to Delete or manage existing service integrations, ensuring flexibility and control over the connections within the data stream ecosystem.

Create service integration:

In the "Add Service Integration" modal, you can start the process of adding a new integration by selecting the type of integration you wish to configure. You will need to choose both a source service and a target service from the available options. Additionally, you can specify the index prefix and set the maximum number of days for the index, starting from zero. These fields allow you to customize the integration to match your specific data streaming and retention requirements.

Advanced configuration

In the "Advanced Configuration" tab for data stream, you can modify several key In the "Advanced Configuration" tab for data stream, you can customize various key settings related to authentication and security methods. You can modify keys such as kafka_authentication_methods.certificate, where you can select between True or False to enable or disable certificate-based authentication. Similarly, the kafka_authentication_methods.sasl key lets you toggle True or False to enable or disable SASL (Simple Authentication and Security Layer) authentication.

The values for these keys are displayed in the configuration, allowing you to control the security mechanisms applied to your data stream. After selecting your desired settings, clicking the Update advanced configuration button will save and apply the changes, ensuring your service is configured according to your preferences.

Connectors

The Connectors tab for Kafka Connect allows you to manage the connectors that link Kafka with external systems. In this tab, you can view all the connectors associated with your service, such as the Couchbase Source connector, which is used to stream data from a Couchbase database into Kafka.

Each connector displays key information, including its status, which indicates whether the connector is functioning correctly. The Tasks section shows the number of tasks the connector is handling (e.g., 0/0 means no tasks are active).

If a connector encounters issues, you have the option to Restart it directly from this tab. To add a new connector, simply click the Add a connector button, where you can provide the necessary configurations for connecting Kafka to other systems.

This tab ensures that you can easily manage and monitor your Kafka connectors, ensuring smooth data streaming between systems.

Create connector:

When you click Add a connector in the Connectors tab, a modal will appear allowing you to set up a new connector. In this modal, you can select a connector type from a list of available connectors, each designed to integrate Kafka with different external systems. Once you've chosen the appropriate connector, you can proceed with configuring it to establish the connection.

After selecting the desired connector, you can click Add a connector to finalize the creation and set it up in your Kafka Connect service.

After adding a connector, you will be taken to a configuration screen with several tabs, each containing specific settings for the connector. These tabs allow you to customize and manage the connector according to its type, such as the Couchbase Source connector.

The Connection tab lets you configure the connection settings, where you can specify details like the connection URL, credentials, and other necessary parameters for linking the external system (in this case, Couchbase) to Kafka.

In the Security tab, you can manage security configurations, including authentication methods, SSL/TLS settings, and access control, ensuring secure communication between Kafka and the external system.

The Logging tab enables you to adjust logging settings, such as log levels, formats, and storage locations. This helps in monitoring and troubleshooting the connector’s operation.

The Source Behavior tab gives you control over how the connector behaves when fetching data. You can configure the polling frequency, offset management, and error handling for data ingestion.

In the Database Change Protocol tab, you can specify how database changes should be tracked, including options for Change Data Capture (CDC) or similar mechanisms, ensuring that changes in the source system are captured and streamed.

The Common tab includes universal settings that apply to most connectors, such as timeouts, retries, and buffer configurations.

In the Transforms tab, you can define any transformations to apply to the data as it flows from the source to Kafka, allowing for modifications to the data structure or content before it reaches Kafka.

The Predicates tab allows you to set filtering conditions that control which records are processed based on specific criteria, such as values or other conditions in the data.

The Error Handling tab offers options for managing errors during the streaming process. You can define retry strategies, dead-letter queues, and error logging to ensure data integrity and resilience.

The Topic Creation tab lets you configure how new Kafka topics are created when data arrives, including naming conventions and additional topic settings.

In the Exactly Once Support tab, you can configure the connector to ensure exactly-once semantics, which prevents data duplication and ensures reliable data delivery.

The Offsets Topic tab allows you to define the Kafka topic where offsets will be stored, which track the progress of data consumption by the connector.

The Extra tab provides additional settings for the connector, allowing you to make any further customizations based on your specific needs.

Each of these tabs offers granular control over the connector's behavior, allowing you to configure it to work seamlessly with your data streaming requirements.

Replication flows

Replication flows allow you to synchronize data between services by setting up custom rules and intervals. Each flow defines a source service, a target service, and a replication policy. You can also configure options like sync intervals, group offsets, and heartbeat monitoring.

To get started, click “Add a replication”, fill in the required details, and create your replication flow.

Did this answer your question?