comprehensive list of available Sinks, Sources, and Channels in Apache Flume, along with brief descriptions of each. This information is quite accurate and helpful for anyone looking to understand the different components that Flume offers for data ingestion and transfer. Let's summarize the key points about these components:

Sinks:

  1. HDFS Sink: Writes Events to Hadoop Distributed File System (HDFS).
  2. Logger Sink: Writes Events using the logger at INFO level.
  3. Avro Sink: Writes Events to Avro Source/Servers.
  4. Thrift Sink: Writes Flume Events as Thrift messages to a Thrift Source/Server.
  5. IRC Sink: Writes Events to configured Internet Relay Chat (IRC) destinations.
  6. File Roll Sink: Writes Events to the local file system and rolls files periodically.
  7. Null Sink: Discards all Events received from the Channel.
  8. HBase Sink: Writes Events to Apache HBase.
  9. Morphline Solr Sink: Writes Events into Apache Solr, applying transformation using morphlines.

Sources:

  1. Avro Source: Receives Events from external Avro Clients.
  2. Thrift Source: Receives Events from external Thrift Clients.
  3. Exec Source: Executes a Unix command (like tail) at startup and creates Events from the command output.
  4. JMS Source: Consumes/reads messages from Java Message Service (JMS) Queue/Topic, creating Events from the messages.
  5. Spooling Directory Source: Ingests data from files placed in a directory, creating Events from the data.
  6. Netcat Source: Listens on a port and creates an Event from each line of data received.
  7. Sequence Generating Source: Continuously generates a stream of Events with a counter for testing purposes.
  8. Syslog Source: Reads syslog data and generates Events from it.
  9. HTTP Source: Receives Events via HTTP GET/POST requests.
  10. Scribe Source: Ingests data from Apache Scribe.

Channels:

  1. Memory Channel: Stores all Events in memory. Events are lost if Flume process goes down. Max capacity is configurable.
  2. File Channel: Stores all Events on the file system. Designed for durability and reliability.
  3. JDBC Channel: Stores Events in a database. Currently supports Derby database.

This breakdown of Flume's components provides a clear understanding of how to handle data ingestion from various sources, transfer it using different sinks, and manage the data flow using different channel implementations. Depending on your use case and requirements, you can choose the appropriate combination of these components to build a robust data ingestion pipeline.

Here's a comprehensive list of available sources, sinks, and channels in Apache Flume, along with examples for each:

Sources:

  1. Avro Source:
    • Description: Receives Avro events over the network.
    • Example:

properties

·    avro-source.sources = avro-source-1

·    avro-source.sources.avro-source-1.type = avro

·    avro-source.sources.avro-source-1.bind = 0.0.0.0

·    avro-source.sources.avro-source-1.port = 41414

  Thrift Source:

  • Description: Listens for Thrift-encoded events over the network.
  • Example:

properties

·         thrift-source.sources = thrift-source-1

·         thrift-source.sources.thrift-source-1.type = thrift

·         thrift-source.sources.thrift-source-1.bind = 0.0.0.0

·         thrift-source.sources.thrift-source-1.port = 41415

  Exec Source:

  • Description: Executes a command to generate events.
  • Example:

properties

·         exec-source.sources = exec-source-1

·         exec-source.sources.exec-source-1.type = exec

·         exec-source.sources.exec-source-1.command = tail -F /var/log/syslog

  JMS Source:

  • Description: Consumes messages from a JMS destination.
  • Example:

properties

·         jms-source.sources = jms-source-1

·         jms-source.sources.jms-source-1.type = jms

·         jms-source.sources.jms-source-1.initialContextFactory = org.apache.activemq.jndi.ActiveMQInitialContextFactory

·         jms-source.sources.jms-source-1.providerURL = tcp://localhost:61616

·         jms-source.sources.jms-source-1.destinationName = queueName

  Spooling Directory Source:

  • Description: Monitors a directory for files and ingests the files' content.
  • Example:

properties

o    spooling-source.sources = spooling-source-1

o    spooling-source.sources.spooling-source-1.type = spooldir

o    spooling-source.sources.spooling-source-1.spoolDir = /path/to/spool/directory

    •  

Sinks:

  1. HDFS Sink:
    • Description: Writes events to HDFS.
    • Example:

properties

·         hdfs-sink.sinks = hdfs-sink-1

·         hdfs-sink.sinks.hdfs-sink-1.type = hdfs

·         hdfs-sink.sinks.hdfs-sink-1.hdfs.path = hdfs://localhost:9000/flume/events

  Logger Sink:

  • Description: Logs events using logger.
  • Example:

properties

·         logger-sink.sinks = logger-sink-1

·         logger-sink.sinks.logger-sink-1.type = logger

·         logger-sink.sinks.logger-sink-1.logLevel = INFO

  Avro Sink:

  • Description: Writes events to Avro endpoints.
  • Example:

properties

·         avro-sink.sinks = avro-sink-1

·         avro-sink.sinks.avro-sink-1.type = avro

·         avro-sink.sinks.avro-sink-1.hostname = localhost

·         avro-sink.sinks.avro-sink-1.port = 41414

  Thrift Sink:

  • Description: Writes events as Thrift messages to a Thrift server.
  • Example:

properties

·         thrift-sink.sinks = thrift-sink-1

·         thrift-sink.sinks.thrift-sink-1.type = thrift

·         thrift-sink.sinks.thrift-sink-1.hostname = localhost

·         thrift-sink.sinks.thrift-sink-1.port = 41415

  File Roll Sink:

  • Description: Writes events to local files and rolls them based on time or size.
  • Example:

properties

o    file-roll-sink.sinks = file-roll-sink-1

o    file-roll-sink.sinks.file-roll-sink-1.type = roll

o    file-roll-sink.sinks.file-roll-sink-1.sink.directory = /path/to/output

Channels:

  1. Memory Channel:
    • Description: Stores events in memory.
    • Example:

properties

·         memory-channel.channels = memory-channel-1

·         memory-channel.channels.memory-channel-1.type = memory

·         memory-channel.channels.memory-channel-1.capacity = 10000

  File Channel:

  • Description: Stores events in local files for durability.
  • Example:

properties

·         file-channel.channels = file-channel-1

·         file-channel.channels.file-channel-1.type = file

·         file-channel.channels.file-channel-1.checkpointDir = /path/to/checkpoint

·         file-channel.channels.file-channel-1.dataDirs = /path/to/data

  JDBC Channel:

  • Description: Stores events in a relational database.
  • Example:

properties

o    jdbc-channel.channels = jdbc-channel-1

o    jdbc-channel.channels.jdbc-channel-1.type = jdbc

o    jdbc-channel.channels.jdbc-channel-1.connectionURL = jdbc:mysql://localhost:3306/flume

o    jdbc-channel.channels.jdbc-channel-1.table = flume_events

These examples cover a wide range of sources, sinks, and channels available in Apache Flume, along with their configurations. Use these as a reference to build your data ingestion pipelines based on your specific requirements. Remember to replace placeholders with actual values when creating your Flume configuration files.

 

Previous Post Next Post