net.sf.okapi.common.resource
Class RawDocument

java.lang.Object
  extended by net.sf.okapi.common.resource.RawDocument
All Implemented Interfaces:
IResource

public class RawDocument
extends java.lang.Object
implements IResource

Resource that carries all the information needed for a filter to open a given document, and also the resource associated with the event RAW_DOCUMENT. Documents are passed through the pipeline either as RawDocument, or a filter events. Specialized steps allows to convert one to the other and conversely. The RawDocument object has one (and only one) of three input objects: a CharSequence, a URI, or an InputStream.


Field Summary
static java.lang.String UNKOWN_ENCODING
           
 
Fields inherited from interface net.sf.okapi.common.IResource
COPY_ALL, COPY_CONTENT, COPY_PROPERTIES, COPY_SEGMENTATION, COPY_SEGMENTED_CONTENT, CREATE_EMPTY
 
Constructor Summary
RawDocument(java.lang.CharSequence inputCharSequence, LocaleId sourceLocale)
          Creates a new RawDocument object with a given CharSequence and a source locale.
RawDocument(java.lang.CharSequence inputCharSequence, LocaleId sourceLocale, LocaleId targetLocale)
          Creates a new RawDocument object with a given CharSequence, a source locale and a target locale.
RawDocument(java.io.InputStream inputStream, java.lang.String defaultEncoding, LocaleId sourceLocale)
          Creates a new RawDocument object with a given InputStream, a default encoding and a source locale.
RawDocument(java.io.InputStream inputStream, java.lang.String defaultEncoding, LocaleId sourceLocale, LocaleId targetLocale)
          Creates a new RawDocument object with a given InputStream, a default encoding and a source locale.
RawDocument(java.net.URI inputURI, java.lang.String defaultEncoding, LocaleId sourceLocale)
          Creates a new RawDocument object with a given URI, a default encoding and a source locale.
RawDocument(java.net.URI inputURI, java.lang.String defaultEncoding, LocaleId sourceLocale, LocaleId targetLocale)
          Creates a new RawDocument object with a given URI, a default encoding, a source locale and a target locale.
RawDocument(java.net.URI inputURI, java.lang.String defaultEncoding, LocaleId sourceLocale, LocaleId targetLocale, java.lang.String filterConfigId)
          Creates a new RawDocument object with a given URI, a default encoding, a source locale and a target locale, and the filter configuration id.
 
Method Summary
 void close()
          Close the underlying stream of this RawDocument.
 java.io.File createOutputFile(java.net.URI outputURI)
          Creates a new output file object based on a given output URI and the URI of the raw document.
 void finalizeOutput()
          Finalizes the name for this output file.
<A extends IAnnotation>
A
getAnnotation(java.lang.Class<A> annotationType)
          Gets the annotation object for a given class for this resource.
 Annotations getAnnotations()
          Gets the iterable list of the annotations for this resource.
 java.lang.String getEncoding()
          Gets the default encoding associated to this resource.
 java.lang.String getFilterConfigId()
          Gets the identifier of the filter configuration to use with this document.
 java.lang.String getId()
          Gets the identifier of the resource.
 java.lang.CharSequence getInputCharSequence()
          Gets the CharSequence associated with this resource.
 java.net.URI getInputURI()
          Gets the URI object associated with this resource.
 java.io.Reader getReader()
          Returns a Reader based on the current Stream returned from getStream().
 ISkeleton getSkeleton()
          Always throws an exception as there is never a skeleton associated with a RawDocument.
 LocaleId getSourceLocale()
          Gets the source locale associated to this resource.
 java.io.InputStream getStream()
          Returns an InputStream based on the current input.
 LocaleId getTargetLocale()
          Gets the target locale associated to this resource.
 void setAnnotation(IAnnotation annotation)
          Sets an annotation object for this resource.
 void setEncoding(java.lang.String encoding)
          Set the input encoding.
 void setFilterConfigId(java.lang.String filterConfigId)
          Sets the identifier of the filter configuration to use with this document.
 void setId(java.lang.String id)
          Sets the identifier of this resource.
 void setSkeleton(ISkeleton skeleton)
          This method has no effect as there is never a skeleton for a RawDocument.
 void setSourceLocale(LocaleId locId)
          Sets the source locale associated to this document.
 void setTargetLocale(LocaleId locId)
          Sets the target locale associated to this document.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

UNKOWN_ENCODING

public static final java.lang.String UNKOWN_ENCODING
See Also:
Constant Field Values
Constructor Detail

RawDocument

public RawDocument(java.lang.CharSequence inputCharSequence,
                   LocaleId sourceLocale)
Creates a new RawDocument object with a given CharSequence and a source locale.

Parameters:
inputCharSequence - the CharSequence for this RawDocument.
sourceLocale - the source locale for this RawDocument.

RawDocument

public RawDocument(java.lang.CharSequence inputCharSequence,
                   LocaleId sourceLocale,
                   LocaleId targetLocale)
Creates a new RawDocument object with a given CharSequence, a source locale and a target locale.

Parameters:
inputCharSequence - the CharSequence for this RawDocument.
sourceLocale - the source locale for this RawDocument.
targetLocale - the target locale for this RawDocument.

RawDocument

public RawDocument(java.net.URI inputURI,
                   java.lang.String defaultEncoding,
                   LocaleId sourceLocale)
Creates a new RawDocument object with a given URI, a default encoding and a source locale.

Parameters:
inputURI - the URI for this RawDocument.
defaultEncoding - the default encoding for this RawDocument.
sourceLocale - the source locale for this RawDocument.

RawDocument

public RawDocument(java.net.URI inputURI,
                   java.lang.String defaultEncoding,
                   LocaleId sourceLocale,
                   LocaleId targetLocale)
Creates a new RawDocument object with a given URI, a default encoding, a source locale and a target locale.

Parameters:
inputURI - the URI for this RawDocument.
defaultEncoding - the default encoding for this RawDocument.
sourceLocale - the source locale for this RawDocument.
targetLocale - the target locale for this RawDocument.

RawDocument

public RawDocument(java.io.InputStream inputStream,
                   java.lang.String defaultEncoding,
                   LocaleId sourceLocale)
Creates a new RawDocument object with a given InputStream, a default encoding and a source locale.

Parameters:
inputStream - the InputStream for this RawDocument.
defaultEncoding - the default encoding for this RawDocument.
sourceLocale - the source locale for this RawDocument.

RawDocument

public RawDocument(java.net.URI inputURI,
                   java.lang.String defaultEncoding,
                   LocaleId sourceLocale,
                   LocaleId targetLocale,
                   java.lang.String filterConfigId)
Creates a new RawDocument object with a given URI, a default encoding, a source locale and a target locale, and the filter configuration id.

Parameters:
inputURI - the URI for this RawDocument.
defaultEncoding - the default encoding for this RawDocument.
sourceLocale - the source locale for this RawDocument.
targetLocale - the target locale for this RawDocument.
filterConfigId - the filter configuration id.

RawDocument

public RawDocument(java.io.InputStream inputStream,
                   java.lang.String defaultEncoding,
                   LocaleId sourceLocale,
                   LocaleId targetLocale)
Creates a new RawDocument object with a given InputStream, a default encoding and a source locale.

Parameters:
inputStream - the InputStream for this RawDocument.
defaultEncoding - the default encoding for this RawDocument.
sourceLocale - the source locale for this RawDocument.
targetLocale - the target locale for this RawDocument.
Method Detail

getReader

public java.io.Reader getReader()
Returns a Reader based on the current Stream returned from getStream().

WARNING:

For CharSequence and URI inputs the Reader returned will be recreated (and more importantly reset) for each call. For InputStream input the same Reader is returned for each call and it is the responsibility of the caller to reset it if needed.

Returns:
a Reader

getStream

public java.io.InputStream getStream()
Returns an InputStream based on the current input.

WARNING:

For CharSequence and URI inputs the stream returned will be recreated (and more importantly reset) for each call. For InputStream input the same stream is returned for each call and it is the responsibility of the caller to reset it if needed.

Returns:
the InputStream
Throws:
OkapiIOException

getAnnotation

public <A extends IAnnotation> A getAnnotation(java.lang.Class<A> annotationType)
Description copied from interface: IResource
Gets the annotation object for a given class for this resource.

Specified by:
getAnnotation in interface IResource
Parameters:
annotationType - the class of the annotation object to retrieve.
Returns:
the annotation for the given class for this resource.

getId

public java.lang.String getId()
Description copied from interface: IResource
Gets the identifier of the resource. This identifier is unique per extracted document and by type of resource. This value is filter-specific. It and may be different from one extraction of the same document to the next. It can a sequential number or not, incremental or not, and it can be not a number. It has no correspondence in the source document ("IDs" coming from the source document are "names" and not available for all resources).

Specified by:
getId in interface IResource
Returns:
the identifier of this resource.

getSkeleton

public ISkeleton getSkeleton()
Always throws an exception as there is never a skeleton associated with a RawDocument.

Specified by:
getSkeleton in interface IResource
Returns:
never returns.
Throws:
OkapiNotImplementedException

setAnnotation

public void setAnnotation(IAnnotation annotation)
Description copied from interface: IResource
Sets an annotation object for this resource.

Specified by:
setAnnotation in interface IResource
Parameters:
annotation - the annotation object to set.

setId

public void setId(java.lang.String id)
Description copied from interface: IResource
Sets the identifier of this resource.

Specified by:
setId in interface IResource
Parameters:
id - the new identifier value.
See Also:
IResource.getId()

setSkeleton

public void setSkeleton(ISkeleton skeleton)
This method has no effect as there is never a skeleton for a RawDocument.

Specified by:
setSkeleton in interface IResource
Parameters:
skeleton - the skeleton object to set.
Throws:
OkapiNotImplementedException

getInputURI

public java.net.URI getInputURI()
Gets the URI object associated with this resource. It may be null if either CharSequence InputStream inputs are not null.

Returns:
the URI object for this resource (may be null).

getInputCharSequence

public java.lang.CharSequence getInputCharSequence()
Gets the CharSequence associated with this resource. It may be null if either URI or InputStream inputs are not null.

Returns:
the CHarSequence

getEncoding

public java.lang.String getEncoding()
Gets the default encoding associated to this resource.

Returns:
The default encoding associated to this resource.

getSourceLocale

public LocaleId getSourceLocale()
Gets the source locale associated to this resource.

Returns:
the source locale associated to this resource.

setSourceLocale

public void setSourceLocale(LocaleId locId)
Sets the source locale associated to this document.

Parameters:
locId - the locale to set.

getTargetLocale

public LocaleId getTargetLocale()
Gets the target locale associated to this resource.

Returns:
the target locale associated to this resource.

setTargetLocale

public void setTargetLocale(LocaleId locId)
Sets the target locale associated to this document.

Parameters:
locId - the locale to set.

setEncoding

public void setEncoding(java.lang.String encoding)
Set the input encoding.

WARNING:

Any Readers gotten via getReader() are now invalid. You should call getReader after calling setEncoding. In some cases it may not be possible to create a new Reader. It is best to set the encoding before any calls to getReader.

Parameters:
encoding -

setFilterConfigId

public void setFilterConfigId(java.lang.String filterConfigId)
Sets the identifier of the filter configuration to use with this document.

Parameters:
filterConfigId - the filter configuration identifier to set.

getFilterConfigId

public java.lang.String getFilterConfigId()
Gets the identifier of the filter configuration to use with this document.

Returns:
the the filter configuration identifier for this document, or null if none is set.

close

public void close()
Close the underlying stream of this RawDocument. Calling getStream or getReader after calling close may still generate a valid stream as long as RawDocument is not based on a raw InputStream


getAnnotations

public Annotations getAnnotations()
Description copied from interface: IResource
Gets the iterable list of the annotations for this resource.

Specified by:
getAnnotations in interface IResource
Returns:
the iterable list of the annotations for this resource.

createOutputFile

public java.io.File createOutputFile(java.net.URI outputURI)
Creates a new output file object based on a given output URI and the URI of the raw document.

If the path of the raw document is the same as the path of the output a temporary file is created, otherwise the output URI is used directly. You must call finalizeOutput() when all writing is done and both the input file and output file are closed to make sure the proper output file name is used.

If one or more directories of the output path do not exist, they are created automatically.

If the input of the raw document is a CharSequence or a Stream, the method assumes it can use directly the path of the output URI.

Parameters:
outputURI - the URI of the output file.
Throws:
OkapiIOException - if an error occurs when creating the work file or its directory.
See Also:
finalizeOutput()

finalizeOutput

public void finalizeOutput()
Finalizes the name for this output file. If a temporary file was used, this call deletes the existing file, and then rename the temporary file to the existing file. This method must always be called after both input and output files are closed.

Throws:
OkapiIOException - if the original input file cannot be deleted or if the work file cannot be renamed.
See Also:
createOutputFile(URI)