Examples of tutorial.storm.trident.operations.Print

Package tutorial.storm.trident.operations

Examples of tutorial.storm.trident.operations.Print

tutorial.storm.trident.operations.Print
@author Enno Shioji (enno.shioji@peerindex.com)

        topology
                .newStream("tweets", spout)
                .each(new Fields("str"), new ParseTweet(), new Fields("text", "content", "user"))
                .each(new Fields("text", "content"), new TweetIdExtractor(), new Fields("tweetId"))
                .project(new Fields("tweetId", "text"))
                .each(new Fields("tweetId", "text"), new Print())
                .partitionPersist(new ElasticSearchStateFactory(), new Fields("tweetId", "text"), new ElasticSearchStateUpdater());


        /**
         * Now we need a DRPC stream to query the state where the tweets are stored.
         * To do that, as shown below, we need an implementation of {@link QueryFunction} to

View Full Code Here

        // Now, let's calculate the average age of actors seen
        topology
                .newDRPCStream("age_stats", drpc)
                .stateQuery(countState, new TupleCollectionGet(), new Fields("actor", "location"))
                .stateQuery(nameToAge, new Fields("actor"), new MapGet(), new Fields("age"))
                .each(new Fields("actor","location","age"), new Print())
                .groupBy(new Fields("location"))
                .chainedAgg()
                .aggregate(new Count(), new Fields("count"))
                .aggregate(new Fields("age"), new Sum(), new Fields("sum"))
                .chainEnd()

View Full Code Here

                .chainedAgg()
                .aggregate(new Count(), new Fields("count"))
                .aggregate(new Fields("age"), new Sum(), new Fields("age_sum"))
                .chainEnd()
                .each(new Fields("age_sum", "count"), new DivideAsDouble(), new Fields("mean_age"))
                .each(new Fields("city", "mean_age"), new Print())
        ;


        // What if we want to persist results of an aggregation, but want to further process these
        // results? You can use "newValuesStream" for that
        topology
                .newStream("further",spout)
                .groupBy(new Fields("city"))
                .persistentAggregate(new MemoryMapState.Factory(), new Count(), new Fields("count"))
                .newValuesStream()
                .each(new Fields("city", "count"), new Print());


        return topology.build();
    }

View Full Code Here

        // The "each" primitive allows us to apply either filters or functions to the stream
        // We always have to select the input fields.
        topology
                .newStream("filter", spout)
                .each(new Fields("actor"), new RegexFilter("pere"))
                .each(new Fields("text", "actor"), new Print());


        // Functions describe their output fields, which are always appended to the input fields.
        // As you see, Each operations can be chained.
        topology
                .newStream("function", spout)
                .each(new Fields("text"), new ToUpperCase(), new Fields("uppercased_text"))
                .each(new Fields("text", "uppercased_text"), new Print());


        // You can prune unnecessary fields using "project"
        topology
                .newStream("projection", spout)
                .each(new Fields("text"), new ToUpperCase(), new Fields("uppercased_text"))
                .project(new Fields("uppercased_text"))
                .each(new Fields("uppercased_text"), new Print());


        // Stream can be parallelized with "parallelismHint"
        // Parallelism hint is applied downwards until a partitioning operation (we will see this later).
        // This topology creates 5 spouts and 5 bolts:
        // Let's debug that with TridentOperationContext.partitionIndex !
        topology
                .newStream("parallel", spout)
                .each(new Fields("actor"), new RegexFilter("pere"))
                .parallelismHint(5)
                .each(new Fields("text", "actor"), new Print());


        // You can perform aggregations by grouping the stream and then applying an aggregation
        // Note how each actor appears more than once. We are aggregating inside small batches (aka micro batches)
        // This is useful for pre-processing before storing the result to databases
        topology
                .newStream("aggregation", spout)
                .groupBy(new Fields("actor"))
                .aggregate(new Count(),new Fields("count"))
                .each(new Fields("actor", "count"),new Print())
        ;


        // In order ot aggregate across batches, we need persistentAggregate.
        // This example is incrementing a count in the DB, using the result of these micro batch aggregations
        // (here we are simply using a hash map for the "database")

View Full Code Here

        topology
                .newStream("aggregation", spout)
                .partitionBy(new Fields("location"))
                .partitionAggregate(new Fields("location"), new StringCounter(), new Fields("count_map"))
                .each(new Fields("count_map"), new HasSpain())
                .each(new Fields("count_map"), new Print("AFTER-HAS-SPAIN"))
                .parallelismHint(3)
                .shuffle()
                .each(new Fields("count_map"), new TimesTen(), new Fields("count_map_times_ten"))
                .each(new Fields("count_map_times_ten"), new Print("AFTER-TIMES-TEN"))
                .parallelismHint(3)
        ;


        // Without the "shuffle" partitioning, only a single partition will be executing the "TimesTen" function,
        // i.e. the workload will not be distributed. With the "shuffle" partitioning, the skew is corrected and
        // the workload will be distributed again.
        // Note the need for two parallelismHints, as parallelismHints apply downwards up until a partitioning operation


        // There are several other partitioning operations.
        // Here is an example that uses the "global" partitining, which sends all tuples to the same partition
        // This means however, that the processing can't be distributed -- something you want to avoid
        topology
                .newStream("aggregation", spout)
                .global()
                .each(new Fields("actor"), new Print())
                .parallelismHint(3)
        ;


        //

View Full Code Here

TOP

Examples of tutorial.storm.trident.operations.Print

Related Classes of tutorial.storm.trident.operations.Print