opennlp.maxent
Class AbstractDataIndexer

java.lang.Object
  extended by opennlp.maxent.AbstractDataIndexer
All Implemented Interfaces:
DataIndexer
Direct Known Subclasses:
OnePassDataIndexer, TwoPassDataIndexer

public abstract class AbstractDataIndexer
extends java.lang.Object
implements DataIndexer

Abstract class for collecting event and context counts used in training.


Field Summary
protected  int[][] contexts
          The integer contexts associated with each unique event.
protected  int[] numTimesEventsSeen
          The number of times an event occured in the training data.
protected  java.lang.String[] outcomeLabels
          The names of the outcomes.
protected  int[] outcomeList
          The integer outcome associated with each unique event.
protected  int[] predCounts
          The number of times each predicate occured.
protected  java.lang.String[] predLabels
          The predicate/context names.
 
Constructor Summary
AbstractDataIndexer()
           
 
Method Summary
 int[][] getContexts()
          Returns the array of predicates seen in each event.
 int[] getNumTimesEventsSeen()
          Returns an array indicating the number of times a particular event was seen.
 java.lang.String[] getOutcomeLabels()
          Returns an array of outcome names.
 int[] getOutcomeList()
          Returns an array indicating the outcome index for each event.
 int[] getPredCounts()
          Returns an array of the count of each predicate in the events.
 java.lang.String[] getPredLabels()
          Returns an array of predicate/context names.
 float[][] getValues()
          Returns the values associated with each event context or null if integer values are to be used.
protected  int sortAndMerge(java.util.List eventsToCompare)
          Sorts and uniques the array of comparable events and return the number of unique events.
protected static java.lang.String[] toIndexedStringArray(gnu.trove.TObjectIntHashMap labelToIndexMap)
          Utility method for creating a String[] array from a map whose keys are labels (Strings) to be stored in the array and whose values are the indices (Integers) at which the corresponding labels should be inserted.
protected static void update(java.lang.String[] ec, java.util.Set predicateSet, gnu.trove.TObjectIntHashMap counter, int cutoff)
          Updates the set of predicated and counter with the specified event contexts and cutoff.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

contexts

protected int[][] contexts
The integer contexts associated with each unique event.


outcomeList

protected int[] outcomeList
The integer outcome associated with each unique event.


numTimesEventsSeen

protected int[] numTimesEventsSeen
The number of times an event occured in the training data.


predLabels

protected java.lang.String[] predLabels
The predicate/context names.


outcomeLabels

protected java.lang.String[] outcomeLabels
The names of the outcomes.


predCounts

protected int[] predCounts
The number of times each predicate occured.

Constructor Detail

AbstractDataIndexer

public AbstractDataIndexer()
Method Detail

getContexts

public int[][] getContexts()
Description copied from interface: DataIndexer
Returns the array of predicates seen in each event.

Specified by:
getContexts in interface DataIndexer
Returns:
a 2-D array whose first dimenstion is the event index and array this refers to contains the contexts for that event.

getNumTimesEventsSeen

public int[] getNumTimesEventsSeen()
Description copied from interface: DataIndexer
Returns an array indicating the number of times a particular event was seen.

Specified by:
getNumTimesEventsSeen in interface DataIndexer
Returns:
an array indexed by the event index indicating the number of times a particular event was seen.

getOutcomeList

public int[] getOutcomeList()
Description copied from interface: DataIndexer
Returns an array indicating the outcome index for each event.

Specified by:
getOutcomeList in interface DataIndexer
Returns:
an array indicating the outcome index for each event.

getPredLabels

public java.lang.String[] getPredLabels()
Description copied from interface: DataIndexer
Returns an array of predicate/context names.

Specified by:
getPredLabels in interface DataIndexer
Returns:
an array of predicate/context names indexed by context index. These indices are the value of the array returned by getContexts.

getOutcomeLabels

public java.lang.String[] getOutcomeLabels()
Description copied from interface: DataIndexer
Returns an array of outcome names.

Specified by:
getOutcomeLabels in interface DataIndexer
Returns:
an array of outcome names indexed by outcome index.

getPredCounts

public int[] getPredCounts()
Description copied from interface: DataIndexer
Returns an array of the count of each predicate in the events.

Specified by:
getPredCounts in interface DataIndexer
Returns:
an array of the count of each predicate in the events.

sortAndMerge

protected int sortAndMerge(java.util.List eventsToCompare)
Sorts and uniques the array of comparable events and return the number of unique events. This method will alter the eventsToCompare array -- it does an in place sort, followed by an in place edit to remove duplicates.

Parameters:
eventsToCompare - a ComparableEvent[] value
Returns:
The number of unique events in the specified list.
Since:
maxent 1.2.6

update

protected static void update(java.lang.String[] ec,
                             java.util.Set predicateSet,
                             gnu.trove.TObjectIntHashMap counter,
                             int cutoff)
Updates the set of predicated and counter with the specified event contexts and cutoff.

Parameters:
ec - The contexts/features which occur in a event.
predicateSet - The set of predicates which will be used for model building.
counter - The predicate counters.
cutoff - The cutoff which determines whether a predicate is included.

toIndexedStringArray

protected static java.lang.String[] toIndexedStringArray(gnu.trove.TObjectIntHashMap labelToIndexMap)
Utility method for creating a String[] array from a map whose keys are labels (Strings) to be stored in the array and whose values are the indices (Integers) at which the corresponding labels should be inserted.

Parameters:
labelToIndexMap - a TObjectIntHashMap value
Returns:
a String[] value
Since:
maxent 1.2.6

getValues

public float[][] getValues()
Description copied from interface: DataIndexer
Returns the values associated with each event context or null if integer values are to be used.

Specified by:
getValues in interface DataIndexer
Returns:
the values associated with each event context.


Copyright © 2005 Jason Baldridge, Gann Bierner, and Thomas Morton. All Rights Reserved.