opennlp.maxent
Class TwoPassDataIndexer

java.lang.Object
  extended by opennlp.maxent.AbstractDataIndexer
      extended by opennlp.maxent.TwoPassDataIndexer
All Implemented Interfaces:
DataIndexer

public class TwoPassDataIndexer
extends AbstractDataIndexer

Collecting event and context counts by making two passes over the events. The first pass determines which contexts will be used by the model, and the second pass creates the events in memory containing only the contexts which will be used. This greatly reduces the amount of memory required for storing the events. During the first pass a temporary event file is created which is read during the second pass.


Field Summary
 
Fields inherited from class opennlp.maxent.AbstractDataIndexer
contexts, numTimesEventsSeen, outcomeLabels, outcomeList, predCounts, predLabels
 
Constructor Summary
TwoPassDataIndexer(EventStream eventStream)
          One argument constructor for DataIndexer which calls the two argument constructor assuming no cutoff.
TwoPassDataIndexer(EventStream eventStream, int cutoff)
          Two argument constructor for DataIndexer.
 
Method Summary
 
Methods inherited from class opennlp.maxent.AbstractDataIndexer
getContexts, getNumTimesEventsSeen, getOutcomeLabels, getOutcomeList, getPredCounts, getPredLabels, getValues, sortAndMerge, toIndexedStringArray, update
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

TwoPassDataIndexer

public TwoPassDataIndexer(EventStream eventStream)
                   throws java.io.IOException
One argument constructor for DataIndexer which calls the two argument constructor assuming no cutoff.

Parameters:
eventStream - An Event[] which contains the a list of all the Events seen in the training data.
Throws:
java.io.IOException

TwoPassDataIndexer

public TwoPassDataIndexer(EventStream eventStream,
                          int cutoff)
                   throws java.io.IOException
Two argument constructor for DataIndexer.

Parameters:
eventStream - An Event[] which contains the a list of all the Events seen in the training data.
cutoff - The minimum number of times a predicate must have been observed in order to be included in the model.
Throws:
java.io.IOException


Copyright © 2005 Jason Baldridge, Gann Bierner, and Thomas Morton. All Rights Reserved.