Mahout 0.2 changed the framework to operate only in terms of numeric (long) ID values for users and items. This is, obviously, not compatible with applications that used other key types -- most commonly {@link String}. Implementation of this class provide support for mapping String to longs and vice versa in order to provide a smoother migration path to applications that must still use strings as IDs.
The mapping from strings to 64-bit numeric values is fixed here, to provide a standard implementation that is 'portable' or reproducible outside the framework easily. See {@link #toLongID(String)}.
Because this mapping is deterministically computable, it does not need to be stored. Indeed, subclasses' job is to store the reverse mapping. There are an infinite number of strings but only a fixed number of longs, so, it is possible for two strings to map to the same value. Subclasses do not treat this as an error but rather retain only the most recent mapping, overwriting a previous mapping. The probability of collision in a 64-bit space is quite small, but not zero. However, in the context of a collaborative filtering problem, the consequence of a collision is small, at worst -- perhaps one user receives another recommendations.
@since 0.2
|
|