Examples of com.google.common.hash.HashFunction

com.google.common.hash.HashFunction
A hash function is a collision-averse pure function that maps an arbitrary block of data to a number called a hash code.
Definition

Unpacking this definition:
- block of data: the input for a hash function is always, in concept, an ordered byte array. This hashing API accepts an arbitrary sequence of byte and multibyte values (via {@link Hasher}), but this is merely a convenience; these are always translated into raw byte sequences under the covers.
- hash code: each hash function always yields hash codes of the same fixed bit length (given by {@link #bits}). For example, {@link Hashing#sha1} produces a160-bit number, while {@link Hashing#murmur3_32()} yields only 32 bits. Because a{@code long} value is clearly insufficient to hold all hash code values, this APIrepresents a hash code as an instance of {@link HashCode}.
- pure function: the value produced must depend only on the input bytes, in the order they appear. Input data is never modified. {@link HashFunction} instancesshould always be stateless, and therefore thread-safe.
- collision-averse: while it can't be helped that a hash function will sometimes produce the same hash code for distinct inputs (a "collision"), every hash function strives to some degree to make this unlikely. (Without this condition, a function that always returns zero could be called a hash function. It is not.)
Summarizing the last two points: "equal yield equal always; unequal yield unequal often." This is the most important characteristic of all hash functions.
Desirable properties

A high-quality hash function strives for some subset of the following virtues:
- collision-resistant: while the definition above requires making at least some token attempt, one measure of the quality of a hash function is how well it succeeds at this goal. Important note: it may be easy to achieve the theoretical minimum collision rate when using completely random sample input. The true test of a hash function is how it performs on representative real-world data, which tends to contain many hidden patterns and clumps. The goal of a good hash function is to stamp these patterns out as thoroughly as possible.
- bit-dispersing: masking out any single bit from a hash code should yield only the expected twofold increase to all collision rates. Informally, the "information" in the hash code should be as evenly "spread out" through the hash code's bits as possible. The result is that, for example, when choosing a bucket in a hash table of size 2^8, any eight bits could be consistently used.
- cryptographic: certain hash functions such as {@link Hashing#sha512} aredesigned to make it as infeasible as possible to reverse-engineer the input that produced a given hash code, or even to discover any two distinct inputs that yield the same result. These are called cryptographic hash functions. But, whenever it is learned that either of these feats has become computationally feasible, the function is deemed "broken" and should no longer be used for secure purposes. (This is the likely eventual fate of all cryptographic hashes.)
- fast: perhaps self-explanatory, but often the most important consideration. We have published microbenchmark results for many common hash functions.
Providing input to a hash function

The primary way to provide the data that your hash function should act on is via a {@link Hasher}. Obtain a new hasher from the hash function using {@link #newHasher}, "push" the relevant data into it using methods like {@link Hasher#putBytes(byte[])}, and finally ask for the {@code HashCode} when finished using {@link Hasher#hash}. (See an {@linkplain #newHasher example} of this.)
If all you want to hash is a single byte array, string or {@code long} value, thereare convenient shortcut methods defined directly on {@link HashFunction} to make thiseasier.
Hasher accepts primitive data types, but can also accept any Object of type {@code T} provided that you implement a {@link Funnel Funnel} to specify how to "feed" datafrom that object into the function. (See {@linkplain Hasher#putObject an example} ofthis.)
Compatibility note: Throughout this API, multibyte values are always interpreted in little-endian order. That is, hashing the byte array {@code}{0x01, 0x02, 0x03, 0x04}} is equivalent to hashing the {@code int} value {@code 0x04030201}. If this isn't what you need, methods such as {@link Integer#reverseBytes}and {@link Ints#toByteArray} will help.
Relationship to {@link Object#hashCode}

Java's baked-in concept of hash codes is constrained to 32 bits, and provides no separation between hash algorithms and the data they act on, so alternate hash algorithms can't be easily substituted. Also, implementations of {@code hashCode} tendto be poor-quality, in part because they end up depending on other existing poor-quality {@code hashCode} implementations, including those in many JDK classes.
{@code Object.hashCode} implementations tend to be very fast, but have weakcollision prevention and no expectation of bit dispersion. This leaves them perfectly suitable for use in hash tables, because extra collisions cause only a slight performance hit, while poor bit dispersion is easily corrected using a secondary hash function (which all reasonable hash table implementations in Java use). For the many uses of hash functions beyond data structures, however, {@code Object.hashCode} almostalways falls short -- hence this library. @author Kevin Bourrillion @since 11.0

        }
    }


    @Override
    public int hashCode() {
        HashFunction hf = Hashing.goodFastHash(32);
        Hasher h = hf.newHasher();
        h.putInt(slots.size());
        for (int i=0; i<slots.size(); i++) {
            h.putInt(slots.get(i).size());
            for (int j=0; j<slots.size(); j++) {
                h.putBytes(slots.get(i).get(j).getLowerRange());

View Full Code Here

     * @throws FrontendException if signature can't be computed
     */
    public String getSignature() throws FrontendException {


        // Use a streaming hash function. We use a murmur_32 function with a constant seed, 0.
        HashFunction hf = Hashing.murmur3_32(0);
        HashOutputStream hos = new HashOutputStream(hf);
        PrintStream ps = new PrintStream(hos);


        LogicalPlanPrinter printer = new LogicalPlanPrinter(this, ps);
        printer.visit();

View Full Code Here

     * @throws FrontendException if signature can't be computed
     */
    public String getSignature() throws FrontendException {


        // Use a streaming hash function. We use a murmur_32 function with a constant seed, 0.
        HashFunction hf = Hashing.murmur3_32(0);
        HashOutputStream hos = new HashOutputStream(hf);
        PrintStream ps = new PrintStream(hos);


        LogicalPlanPrinter printer = new LogicalPlanPrinter(this, ps);
        printer.visit();

View Full Code Here


    /**
     * Default constructor.
     */
    ExceptionKey() {
        final HashFunction hashFunction = Hashing.md5();
        final HashCode hashCode = hashFunction.newHasher().putString(UUID.randomUUID().toString()).hash();
        this.key = hashCode.asInt();
        this.machineName = "UNKNOWN";
        try {
            InetAddress localMachine = java.net.InetAddress.getLocalHost();
            if (localMachine != null) {

View Full Code Here

        return v;
    }


    @Override
    public int hashCode() {
        HashFunction hf = Hashing.murmur3_32();
        Hasher hc = hf.newHasher();
        for (String key : fields.keySet()) {
            hc.putString(key);
        }
        return hc.hash().asInt();
    }

View Full Code Here

  public void testGroupByCaching() throws Exception
  {
    List<AggregatorFactory> aggsWithUniques = ImmutableList.<AggregatorFactory>builder().addAll(AGGS)
        .add(new HyperUniquesAggregatorFactory("uniques", "uniques")).build();


    final HashFunction hashFn = Hashing.murmur3_128();


    GroupByQuery.Builder builder = new GroupByQuery.Builder()
        .setDataSource(DATA_SOURCE)
        .setQuerySegmentSpec(SEG_SPEC)
        .setDimFilter(DIM_FILTER)
        .setGranularity(GRANULARITY)
        .setDimensions(Arrays.<DimensionSpec>asList(new DefaultDimensionSpec("a", "a")))
        .setAggregatorSpecs(aggsWithUniques)
        .setPostAggregatorSpecs(POST_AGGS)
        .setContext(CONTEXT);


    final HyperLogLogCollector collector = HyperLogLogCollector.makeLatestCollector();
    collector.add(hashFn.hashString("abc123", Charsets.UTF_8).asBytes());
    collector.add(hashFn.hashString("123abc", Charsets.UTF_8).asBytes());


    testQueryCaching(
        client,
        builder.build(),
        new Interval("2011-01-01/2011-01-02"),

View Full Code Here

      }
    }
  }


  private boolean haveSameContents(File file, final JarFile jar, final JarEntry entry) throws IOException {
    HashFunction hashFun = Hashing.md5();
    HashCode fileHash = Files.hash(file, hashFun);
    HashCode streamHash = ByteStreams.hash(new InputSupplier<InputStream>() {
      public InputStream getInput() throws IOException { return jar.getInputStream(entry); }
    }, hashFun);
    return fileHash.equals(streamHash);

View Full Code Here


  /**
   * Walk project references recursively, adding thrift files to the provided list.
   */
  List<File> getRecursiveThriftFiles(MavenProject project, String outputDirectory, List<File> files) throws IOException {
    HashFunction hashFun = Hashing.md5();
    if (dependencyIncludes.contains(project.getArtifactId())) {
      File dir = new File(new File(project.getFile().getParent(), "target"), outputDirectory);
      if (dir.exists()) {
        URI baseDir = getFileURI(dir);
        for (File f : findThriftFilesInDirectory(dir)) {

View Full Code Here

  // Provides a nice printout of error rates as a function of cardinality
  @Ignore
  @Test
  public void showErrorRate() throws Exception
  {
    HashFunction fn = Hashing.murmur3_128();
    Random random = new Random();


    double error = 0.0d;
    int count = 0;


    final int[] valsToCheck = {
        10, 20, 50, 100, 1000, 2000, 5000, 10000, 20000, 50000, 100000, 1000000, 2000000, 10000000, Integer.MAX_VALUE
    };


    for (int numThings : valsToCheck) {
      long startTime = System.currentTimeMillis();
      HyperLogLogCollector collector = HyperLogLogCollector.makeLatestCollector();


      for (int i = 0; i < numThings; ++i) {
        if (i != 0 && i % 100000000 == 0) {
          ++count;
          error = computeError(error, count, i, startTime, collector);
        }
        collector.add(fn.hashLong(random.nextLong()).asBytes());
      }


      ++count;
      error = computeError(error, count, numThings, startTime, collector);
    }

View Full Code Here

        }
    }


    @Override
    public int hashCode() {
        HashFunction hf = Hashing.goodFastHash(32);
        Hasher h = hf.newHasher();
        h.putInt(slots.size());
        for (int i=0; i<slots.size(); i++) {
            h.putInt(slots.get(i).size());
            for (int j=0; j<slots.size(); j++) {
                h.putBytes(slots.get(i).get(j).getLowerRange());

View Full Code Here

0 1 2

TOP

Related Classes of com.google.common.hash.HashFunction

brickhouse.analytics.uniques.SketchSetTest

com.clearspring.analytics.stream.cardinality.TestHyperLogLog

com.clearspring.analytics.stream.cardinality.TestLogLog

com.facebook.util.digest.TestMurmurHash

com.mcmartins.reuse.core.exception.api.ExceptionKey

com.salesforce.phoenix.filter.SkipScanFilter

com.twitter.AbstractMavenScroogeMojo

io.druid.client.CachingClusteredClientTest

io.druid.query.aggregation.hyperloglog.HyperLogLogCollectorTest

net.octal.supinbank.servlet.AddAccountServlet

All source code are property of their respective owners. Java is a trademark of Sun Microsystems, Inc and owned by ORACLE Inc. Contact coftware#gmail.com.

Examples of com.google.common.hash.HashFunction

Definition

Desirable properties

Providing input to a hash function

Relationship to {@link Object#hashCode}

Related Classes of com.google.common.hash.HashFunction