Discussion:
Fields grouping consistency for separate streams
Add Reply
b***@gmail.com
2018-11-23 08:06:05 UTC
Reply
Permalink
Hi.

I have a question, regarding fields grouping. In documentation we have:
```
Fields grouping: The stream is partitioned by the fields specified in the
grouping. For example, if the stream is grouped by the "user-id" field,
tuples with the same "user-id" will always go to the same task, but tuples
with different "user-id"'s may go to different tasks.
```
It is clear, when we have single point-to-point connection i.e [A]
--sream--> [B]. But if we have some bolt that consume 2 stream from 2
different bolts but use same key(also streams have same "format") for
fields grouping. Something like this:

```
TopologyBuilder topology = new TopologyBuilder();

// ----

topology.setBolt("alpha", new AlphaBolt(), alphaScaleFactor);

topology.setBolt("beta", new BetaBolt(), betaScaleFactor);

Fields grouping = new Fields("key");
topology.setBolt("gamma", new GammaBolt(), gammaScaleFactor)
.fieldsGrouping("alpha", "alpha-stream", grouping)
.fieldsGrouping("beta", "beta-stream", grouping);

// ----
```

Is there any guarantee that both streams will be partitioned in same way
and tuples with same value in "key" field in both streams will be directed
into same instance/task of "gamma" bolt?
Stig Rohde Døssing
2018-11-23 11:24:58 UTC
Reply
Permalink
See
https://github.com/apache/storm/blob/21bb1388414d373572779289edc785c7e5aa52aa/storm-client/src/jvm/org/apache/storm/daemon/GrouperFactory.java#L174
and
https://github.com/apache/storm/blob/21bb1388414d373572779289edc785c7e5aa52aa/storm-client/src/jvm/org/apache/storm/utils/TupleUtils.java#L38
.

The "key" field value will be hashed using Arrays.deepHashCode, so if you
have two tuples t1 and t2, and hashCode(t1["key"]) == hashCode(t2["key"]),
then t1 and t2 will go to the same task in "gamma".
Post by b***@gmail.com
Hi.
```
Fields grouping: The stream is partitioned by the fields specified in the
grouping. For example, if the stream is grouped by the "user-id" field,
tuples with the same "user-id" will always go to the same task, but tuples
with different "user-id"'s may go to different tasks.
```
It is clear, when we have single point-to-point connection i.e [A]
--sream--> [B]. But if we have some bolt that consume 2 stream from 2
different bolts but use same key(also streams have same "format") for
```
TopologyBuilder topology = new TopologyBuilder();
// ----
topology.setBolt("alpha", new AlphaBolt(), alphaScaleFactor);
topology.setBolt("beta", new BetaBolt(), betaScaleFactor);
Fields grouping = new Fields("key");
topology.setBolt("gamma", new GammaBolt(), gammaScaleFactor)
.fieldsGrouping("alpha", "alpha-stream", grouping)
.fieldsGrouping("beta", "beta-stream", grouping);
// ----
```
Is there any guarantee that both streams will be partitioned in same way
and tuples with same value in "key" field in both streams will be directed
into same instance/task of "gamma" bolt?
b***@gmail.com
2018-11-27 07:49:55 UTC
Reply
Permalink
Thank you.

It will be nice if documentation cover such scenarios in more clear way. :)
Post by Stig Rohde Døssing
See
https://github.com/apache/storm/blob/21bb1388414d373572779289edc785c7e5aa52aa/storm-client/src/jvm/org/apache/storm/daemon/GrouperFactory.java#L174
and
https://github.com/apache/storm/blob/21bb1388414d373572779289edc785c7e5aa52aa/storm-client/src/jvm/org/apache/storm/utils/TupleUtils.java#L38
.
The "key" field value will be hashed using Arrays.deepHashCode, so if you
have two tuples t1 and t2, and hashCode(t1["key"]) == hashCode(t2["key"]),
then t1 and t2 will go to the same task in "gamma".
Post by b***@gmail.com
Hi.
```
Fields grouping: The stream is partitioned by the fields specified in the
grouping. For example, if the stream is grouped by the "user-id" field,
tuples with the same "user-id" will always go to the same task, but tuples
with different "user-id"'s may go to different tasks.
```
It is clear, when we have single point-to-point connection i.e [A]
--sream--> [B]. But if we have some bolt that consume 2 stream from 2
different bolts but use same key(also streams have same "format") for
```
TopologyBuilder topology = new TopologyBuilder();
// ----
topology.setBolt("alpha", new AlphaBolt(), alphaScaleFactor);
topology.setBolt("beta", new BetaBolt(), betaScaleFactor);
Fields grouping = new Fields("key");
topology.setBolt("gamma", new GammaBolt(), gammaScaleFactor)
.fieldsGrouping("alpha", "alpha-stream", grouping)
.fieldsGrouping("beta", "beta-stream", grouping);
// ----
```
Is there any guarantee that both streams will be partitioned in same way
and tuples with same value in "key" field in both streams will be directed
into same instance/task of "gamma" bolt?
Loading...