Examples of edu.ucla.sspace.tri.FixedDurationTemporalRandomIndexing

edu.ucla.sspace.tri.FixedDurationTemporalRandomIndexing
ize the following variables according to program semantics int numThreads; TimeSpan partitionDuration; Iterator<TemporalDocument> documents; FixedDurationTemporalRandomIndexing fdTRI; // As threads finish processing a semantic partition, they add the value of the // next time stamp as a key in this map, which allows the processing thread // (see partitionHook below) to determine the start time of the next partition ConcurrentNavigableMap<Long,Object> futureStartTimes = new ConcurrentSkipList<Long,Object>(); // Create a custom Runnable that will handle processing the semantic space // after the partition has been finished. Runnable partitionHook = new Runnable() { // Process the semantic space as necessary here... // Once processing has finished, notify the threads of the next // time stamp that will be processed. In the unlikely event that // the number of documents in a partition would be less than the number of // threads, this ensures that thread processing the partition after the next // correctly waits. Long ssStart = futureStartTimes.firstKey(); futureStartTimes.clear(); // reset for next partition // last update the date with the new time curSSpaceStartTime.set(ssStart); } // Create the barrier that the threads will use to synchronize their // processDocument() calls. Note that we use the partition hook here // instead of attaching it via the addPartitionHook() method final CyclicBarrier exceededTimeSpanBarrier = new CyclicBarrier(numThreads, partitionHook); // A required barrier for the initial case of setting the start time for the // first partition final AtomicBoolean startBarrier = new AtomicBoolean(false); // The starting time for the current semantic partition. This value is used to // determine if processing the next document would cause the current partition // to be partitioned and a new partition created. final AtomicLong startTimeOfCurrentPartition = new AtomicLong(); // Before a Thread blocks waiting for partition processing, it enqueues the // time for its next document (exceeding the duration). These times are used // to select the start time for the next partition. final Queue futureStartTimes = new ConcurrentLinkedQueue(); // A counter for which document is being processed final AtomicInteger docCounter = new AtomicInteger(0); // Start all the threads for (int i = 0; i < numThreads; ++i) { Thread processingThread = new Thread() { public void run() { // repeatedly try to process any remaining documents while (documents.hasNext()) { TemporalDocument doc = docuemnts.next(); long docTime = doc.timeStamp(); int docNumber = docCounter.incrementAndGet(); // special case for first document if (docNumber == 1) { startTimeOfCurrentPartition.set(docTime); startBarrier.set(true); } // Spin until the Thread with the first document sets the // initial starting document time. Note that we spin here // instead of block, because this is expected that another // thread will immediately set this and so it will be a // quick no-op while (startBarrier.get() == false) ; // Check whether the time for this document would exceed the // maximum duration of the current partition. Loop to ensure // that if this thread does loop and another thread has an // earlier time that exceeds the time period, then this // thread will block until the earlier partition has finished // processing while (!timeSpan.insideRange(startTimeOfCurrentPartition.get(), docTime)) { try { // notify the barrier that this Thread is now // processing a document in the next time span. In // addition, enqueue the time for this document so // the serialization thread can reset the correct // s-sspace start time futureStartTimes.add(docTime, new Object()); exceededTimeSpanBarrier.await(); } catch (Exception ex) { // Handle exception here; } } try { fdTRI.processDocument(doc.reader()); } catch (IOException ioe) { throw new IOError(ioe); // rethrow } } } }; // Start threads and wait for processing to finish...
Note that the requirements of an {@code OrderedTemporalRandomIndexing} classstipulate that the documents be processed in order. For this class, the documents must be in order according to their semantic partition. In addition, the first document seen for a semantic partition should be the earliest for that partition. This behavior is most easily accomplished by sorting the documents according to time stamp prior to processing the documents. @author David Jurgens

        // use the System properties in case the user specified them as
        // -Dprop=<val> to the JVM directly.
        Properties props = setupProperties();


        
        FixedDurationTemporalRandomIndexing fdTri = 
            new FixedDurationTemporalRandomIndexing(props); 


        // The user may also specify a limit to the words for which semantics
        // are computed.  If so, set up Random Indexing to not keep semantics
        // for those words.
        if (argOptions.hasOption("semanticFilter")) {
            String fileName = argOptions.getStringOption("semanticFilter");
            BufferedReader br = new BufferedReader(new FileReader(fileName));
            Set<String> wordsToCompute = new HashSet<String>();
            for (String line = null; (line = br.readLine()) != null; ) {
                for (String s : line.split("\\s+")) {
                    wordsToCompute.add(s);
                }
            }
            LOGGER.info("computing semantics for only " + wordsToCompute.size()
                        + " words");


            fdTri.setSemanticFilter(wordsToCompute);
        }


        // Load the word-to-IndexVector mappings if they were specified.
        if (argOptions.hasOption("loadVectors")) {
            String fileName = argOptions.getStringOption("loadVectors");
            LOGGER.info("loading index vectors from " + fileName);
            Map<String,TernaryVector> wordToIndexVector = 
                IndexVectorUtil.load(new File(fileName));
            fdTri.setWordToIndexVector(wordToIndexVector);
        }
        
        String formatName = (argOptions.hasOption("outputFormat"))
            ? argOptions.getStringOption("outputFormat").toUpperCase()
            : "TEXT";
        
        format = SSpaceFormat.valueOf(formatName.toUpperCase());


        parseDocumentsMultiThreaded(fdTri, docIter, timeSpan, numThreads);


        long startTime = System.currentTimeMillis();
        fdTri.processSpace(props);
        long endTime = System.currentTimeMillis();
        LOGGER.info(String.format("processed space in %.3f seconds%n",
                                   ((endTime - startTime) / 1000d)));
        
        // save the word-to-IndexVector mapping if specified to do so
        if (argOptions.hasOption("saveVectors")) {
            String fileName = argOptions.getStringOption("saveVectors");
            LOGGER.info("saving index vectors to " + fileName);
            IndexVectorUtil.save(fdTri.getWordToIndexVector(), 
                                 new File(fileName));
        }
    }

View Full Code Here

TOP

Related Classes of edu.ucla.sspace.tri.FixedDurationTemporalRandomIndexing

edu.ucla.sspace.mains.FixedDurationTemporalRandomIndexingMain

edu.ucla.sspace.util.TimeSpan

All source code are property of their respective owners. Java is a trademark of Sun Microsystems, Inc and owned by ORACLE Inc. Contact coftware#gmail.com.