Discussion:
ConsumerCoordinator Offset commit failed on partition xxx at offset nnn
Ramin Farajollah (BLOOMBERG/ 731 LEX)
2018-04-23 21:24:11 UTC
Permalink
Hi,

We use a custom Kafka spout in our Apache Storm topology.
When a machine in the Kafka cluster is bounced, we see the following error message in the logs:

ERROR ConsumerCoordinator [thread-iii] - [Consumer clientId=consumer-1, groupId=xxxxx] Offset commit failed on partition yyyy at offset nnn: The request timed out.

I understand Kafka's enable.auto.commit is not allowed in Storm 1.1.1.

Please help me understand what is the consequence of this timeout:
- Is a tuple lost or reprocessed?
- Is the tuple processed out of order (at a later time)?

Thank you
Ramin
Stig Rohde Døssing
2018-04-26 10:12:26 UTC
Permalink
This is handled by the KafkaConsumer.

As far as I can tell, the consumer will just retry. See
https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/internals/ConsumerCoordinator.java#L748
and
https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/internals/ConsumerCoordinator.java#L610.
The request timeout error is retriable, so the consumer will try again. The
spout calls commit synchronously, so until the commit succeeds (or fails in
some non-retriable way), the spout will be blocked in the commitSync call.

The consequence of the timeout should just be a delay while the consumer is
retrying. If the commit ends up succeeding, there will be no other
consequence. If the commit fails entirely (i.e. the commitSync method on
the KafkaConsumer throws a CommitFailedException), the worker running the
spout will crash, and the tuples will be reprocessed starting at the last
committed offset. If this happens you will be able to see it in the log,
since the worker will die.

2018-04-23 23:24 GMT+02:00 Ramin Farajollah (BLOOMBERG/ 731 LEX) <
Post by Ramin Farajollah (BLOOMBERG/ 731 LEX)
Hi,
We use a custom Kafka spout in our Apache Storm topology.
When a machine in the Kafka cluster is bounced, we see the following error
ERROR ConsumerCoordinator [thread-iii] - [Consumer clientId=consumer-1,
groupId=xxxxx] Offset commit failed on partition yyyy at offset nnn: The
request timed out.
I understand Kafka's enable.auto.commit is not allowed in Storm 1.1.1.
- Is a tuple lost or reprocessed?
- Is the tuple processed out of order (at a later time)?
Thank you
Ramin
Ramin Farajollah (BLOOMBERG/ 731 LEX)
2018-04-26 14:42:06 UTC
Permalink
Stig,

Thanks for your clear explanation. It is quite helpful.

Regards,
Ramin


From: ***@storm.apache.org At: 04/26/18 06:12:31To: ***@storm.apache.org
Subject: Re: ConsumerCoordinator Offset commit failed on partition xxx at offset nnn

This is handled by the KafkaConsumer.

As far as I can tell, the consumer will just retry. See https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/internals/ConsumerCoordinator.java#L748 and https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/internals/ConsumerCoordinator.java#L610. The request timeout error is retriable, so the consumer will try again. The spout calls commit synchronously, so until the commit succeeds (or fails in some non-retriable way), the spout will be blocked in the commitSync call.

The consequence of the timeout should just be a delay while the consumer is retrying. If the commit ends up succeeding, there will be no other consequence. If the commit fails entirely (i.e. the commitSync method on the KafkaConsumer throws a CommitFailedException), the worker running the spout will crash, and the tuples will be reprocessed starting at the last committed offset. If this happens you will be able to see it in the log, since the worker will die.

2018-04-23 23:24 GMT+02:00 Ramin Farajollah (BLOOMBERG/ 731 LEX) <***@bloomberg.net>:

Hi,

We use a custom Kafka spout in our Apache Storm topology.
When a machine in the Kafka cluster is bounced, we see the following error message in the logs:

ERROR ConsumerCoordinator [thread-iii] - [Consumer clientId=consumer-1, groupId=xxxxx] Offset commit failed on partition yyyy at offset nnn: The request timed out.

I understand Kafka's enable.auto.commit is not allowed in Storm 1.1.1.

Please help me understand what is the consequence of this timeout:
- Is a tuple lost or reprocessed?
- Is the tuple processed out of order (at a later time)?

Thank you
Ramin

Loading...