[ocrfeeder/user_documentation] Added the user documentation

From: Joaquim Manuel Pereira Rocha <jrocha src gnome org>
To: commits-list gnome org
Cc:
Subject: [ocrfeeder/user_documentation] Added the user documentation
Date: Wed, 15 Dec 2010 11:51:40 +0000 (UTC)
commit 377173487825897e00b2d7f22fedf2a63992fc7a
Author: Joaquim Rocha <jrocha igalia com>
Date:   Sat Dec 11 02:12:09 2010 +0100

    Added the user documentation

 help/C/addingfolder.page               |   17 ++++++
 help/C/addingimage.page                |   41 +++++++++++++++
 help/C/automaticrecognition.page       |   34 ++++++++++++
 help/C/deskewing.page                  |   29 +++++++++++
 help/C/documentgeneration.page         |   27 ++++++++++
 help/C/figures/areas-edition.png       |  Bin 0 -> 64634 bytes
 help/C/figures/content-areas.png       |  Bin 0 -> 75291 bytes
 help/C/finetuning.page                 |   54 ++++++++++++++++++++
 help/C/importingfromscanner.page       |   26 ++++++++++
 help/C/importingpdf.page               |   27 ++++++++++
 help/C/index.page                      |   46 +++++++++++++++++
 help/C/legal.xml                       |    9 +++
 help/C/manualeditionandcorrection.page |   81 +++++++++++++++++++++++++++++
 help/C/ocrconfiguration.page           |   87 ++++++++++++++++++++++++++++++++
 help/C/projects.page                   |   65 ++++++++++++++++++++++++
 help/C/unpaper.page                    |   45 ++++++++++++++++
 16 files changed, 588 insertions(+), 0 deletions(-)
---
diff --git a/help/C/addingfolder.page b/help/C/addingfolder.page
new file mode 100644
index 0000000..093ba34
--- /dev/null
+++ b/help/C/addingfolder.page
@@ -0,0 +1,17 @@
+<page xmlns="http://projectmallard.org/1.0/";
+      type="topic"
+      id="addingfolder">
+
+<info>
+    <link type="guide" xref="index#images"/>
+    <link type="seealso" xref="addingimage"/>
+    <desc>Adding all the images from a folder</desc>
+</info>
+
+<title>Adding Folder</title>
+
+<p>Sometimes it is useful to add all the images from a given
+folder. <app>OCRFeeder</app> provides this functionality
+by choosing <guiseq><gui>File</gui><gui>Add Folder</gui></guiseq>.</p>
+
+</page>
diff --git a/help/C/addingimage.page b/help/C/addingimage.page
new file mode 100644
index 0000000..3ec5236
--- /dev/null
+++ b/help/C/addingimage.page
@@ -0,0 +1,41 @@
+<page xmlns="http://projectmallard.org/1.0/";
+      type="topic"
+      id="addingimage">
+
+<info>
+    <link type="guide" xref="index#images"/>
+    <desc>Adding an image to be recognized</desc>
+</info>
+
+<title>Adding An Image</title>
+
+<p>Adding an image to OCRFeeder is usually the first step when
+converting a document.</p>
+
+<p>Each added image represents a page in the final document.
+A thumbnail of the image will be shown in the pages area (left
+area of <app>OCRFeeder</app>).</p>
+
+<p>The order of the pages in the final document will be the
+same as the images' order in the pages' area. This way, pages
+can be reordered by dragging them in the images' thumbnails
+in the pages' area.</p>
+
+<p>You can add an image by clicking
+<guiseq><gui>File</gui><gui>Add Image</gui></guiseq>.</p>
+
+<p>To delete a page, click in
+<guiseq><gui>Edit</gui><gui>Delete Page</gui></guiseq> or
+right-click over the page's thumbnail and choose <gui>Delete</gui>.</p>
+
+<section>
+<title>Page Configuration</title>
+
+<p>To configre the pages' size click in
+<guiseq><gui>Edit</gui><gui>Edit Page</gui></guiseq>
+and choose either a custom size, providing the respective
+values or a standard paper size from a list.</p>
+
+</section>
+
+</page>
diff --git a/help/C/automaticrecognition.page b/help/C/automaticrecognition.page
new file mode 100644
index 0000000..58f4435
--- /dev/null
+++ b/help/C/automaticrecognition.page
@@ -0,0 +1,34 @@
+<page xmlns="http://projectmallard.org/1.0/";
+      type="topic"
+      id="automaticrecognition">
+
+<info>
+    <link type="guide" xref="index#recognition"/>
+    <link type="seealso" xref="addingimage"/>
+    <desc>Automatically recognizing an image</desc>
+</info>
+
+<title>Automatic Recognition</title>
+
+<p><app>OCRFeeder</app> tries to detect the contents in a
+document image and perform OCR over them, also distinguishing
+between what is graphics and what is text. To simplify this
+concept, we call it recognition.</p>
+
+<p>After an image is added it can be automatically recognized
+by clicking
+<guiseq><gui>Document</gui><gui>Recognize Document</gui></guiseq>.</p>
+
+<note style="important"><p>Since there are many different document
+layouts out there, the automatic recognition, mainly the page
+segmentation, may turn out not to be accurate for you document. In this
+case, some manual editing of the recognition results might be needed.
+</p></note>
+
+<note style="warning"><p>The automatic recognition performs some complex
+operations and may take some time depending on the size of the image
+and the complexity of the layout.</p>
+<p>The automatic recognition will replace all the content areas
+in the currently selected page.</p></note>
+
+</page>
diff --git a/help/C/deskewing.page b/help/C/deskewing.page
new file mode 100644
index 0000000..00fbe39
--- /dev/null
+++ b/help/C/deskewing.page
@@ -0,0 +1,29 @@
+<page xmlns="http://projectmallard.org/1.0/";
+      type="topic"
+      id="deskewing">
+
+<info>
+    <link type="guide" xref="index#configuration"/>
+    <link type="seealso" xref="manualeditionandcorrection"/>
+    <desc>Correcting the skew in the images</desc>
+</info>
+
+<title>Deskewing</title>
+
+<p>Some images, especially if they were added from a scanner device,
+may be skewed and this makes it harder to recognize the image.</p>
+
+<p><app>OCRFeeder</app> offers a way to automatically deskew an
+image. To deskew a loaded image, click
+<guiseq><gui>Tools</gui><gui>Image Deskewer</gui></guiseq>.</p>
+
+<p>This operation can also be set to be performed automatically
+every time an image is added. To set it, simply open the
+<gui>Preferences</gui> dialog from
+<guiseq><gui>Edit</gui><gui>Preferences</gui></guiseq> and check
+<gui>Deskew images</gui> under the <gui>Tools</gui> tab.</p>
+
+<note type="warning"><p>Depending on the size and characteristics
+of the image, deskewing an image may take some time.</p></note>
+
+</page>
diff --git a/help/C/documentgeneration.page b/help/C/documentgeneration.page
new file mode 100644
index 0000000..bbd3abe
--- /dev/null
+++ b/help/C/documentgeneration.page
@@ -0,0 +1,27 @@
+<page xmlns="http://projectmallard.org/1.0/";
+      type="topic"
+      id="documentgeneration">
+
+<info>
+    <link type="guide" xref="index#recognition"/>
+    <link type="seealso" xref="automaticrecognition"/>
+    <link type="seealso" xref="manualeditionandcorrection"/>
+    <desc>Creating an editable document</desc>
+</info>
+
+<title>Document Generation</title>
+
+<p><app>OCRFeeder</app> currently generates two document formats:
+<em>ODT</em> and <em>HTML</em>.</p>
+
+<p>After the recognition and eventual manual edition has been
+performed, it is possible to generate a document by clicking
+<guiseq><gui>File</gui><gui>Exportâ?¦</gui></guiseq> and choosing
+the desired document format.</p>
+
+<note style="tip"><p>The HTML exportation generates a folder
+with the document pages represented by one HTML file. In each page
+there are links to go to the previous and next pages. Image content
+areas are stored in a subfolder called <em>images</em>.</p></note>
+
+</page>
diff --git a/help/C/figures/areas-edition.png b/help/C/figures/areas-edition.png
new file mode 100644
index 0000000..0ac82ab
Binary files /dev/null and b/help/C/figures/areas-edition.png differ
diff --git a/help/C/figures/content-areas.png b/help/C/figures/content-areas.png
new file mode 100644
index 0000000..cb3f471
Binary files /dev/null and b/help/C/figures/content-areas.png differ
diff --git a/help/C/finetuning.page b/help/C/finetuning.page
new file mode 100644
index 0000000..1a3d143
--- /dev/null
+++ b/help/C/finetuning.page
@@ -0,0 +1,54 @@
+<page xmlns="http://projectmallard.org/1.0/";
+      type="topic"
+      id="finetuning">
+
+<info>
+    <link type="guide" xref="index#configuration"/>
+    <link type="seealso" xref="manualeditionandcorrection"/>
+    <desc>Advanced options for a better recognition</desc>
+</info>
+
+<title>Fine-tuning</title>
+
+<p><app>OCRFeeder</app> has some advanced options that can be
+used to perform a better recognition. These options can be
+chosen from the <guiseq><gui>Edit</gui><gui>Preferences</gui></guiseq>
+dialog, under its <gui>Recognition</gui> tab.</p>
+
+<p>The following list describes the mentioned options:</p>
+<list>
+    <item><p><gui>Fix line breaks and hyphenization</gui>: OCR engines
+    usually read the text line by line and seperate each line with a
+    line break. Sometimes, this is not what the user wants because the
+    text might be broken in the middle of a sentence.</p>
+    <p>Checking this option will make <app>OCRFeeder</app> remove single
+    newline characters after the text is recognized by the engines.</p>
+    <p>Since just removing newlines in an hyphenized text would result
+    in wrongly separated words, hyphenization is also detected and removed
+    in this process.</p></item>
+    <item><p><gui>Window Size</gui>: <app>OCRFeeder</app>'s algorithm to
+    detect the contents in an image uses the concept of <em>window size</em>
+    which is the division of the image in small windows. A smaller window
+    size means it is likely to detect more content areas but size that is
+    too small may result in contents that should be part of a bigger area
+    instead. On the other hand, a bigger window size means less divisions
+    of contents but may end up in contents which should be subdivided.</p>
+    <p>A good window size should be slightly bigger than the text line spacing
+    in an image.</p><p>Users may want to manually set this value if automatic
+    one doesn't produce any valid content areas but normally it is easier to
+    use the automatic one and perform any needed corrections directly in
+    the content areas.</p></item>
+    <item><p><gui>Improve columns detection</gui>: Check this option if
+    <app>OCRFeeder</app> should try to divide the detected content areas
+    horizontally (originating more columns). The value that is used to
+    check the existance of blank space within the contents may be set to
+    automatic or manual when the columns aren't detected correctly.</p></item>
+    <item><p><gui>Adjust content areas' bounds</gui>: The detected content
+    areas sometimes have a considerable margin between their contents and
+    the areas' edges. By checking this option, <app>OCRFeeder</app> will
+    minimize those margins, adjusting the areas to its contents better.
+    Optionally, a manual value can be check to indicate the minimum value
+    of the adjusted margins.</p></item>
+</list>
+
+</page>
diff --git a/help/C/importingfromscanner.page b/help/C/importingfromscanner.page
new file mode 100644
index 0000000..6acf508
--- /dev/null
+++ b/help/C/importingfromscanner.page
@@ -0,0 +1,26 @@
+<page xmlns="http://projectmallard.org/1.0/";
+      type="topic"
+      id="importingfromscanner">
+
+<info>
+    <link type="guide" xref="index#images"/>
+    <link type="seealso" xref="addingimage"/>
+    <desc>Importing from a scanner device</desc>
+</info>
+
+<title>Importing From Scanner</title>
+
+<p>In order to help convert a printed document into
+an editable document, <app>OCRFeeder</app> offers a
+way to import images directly from a scanner device.</p>
+
+<p>To import an image from a scanner device, use the menu
+<guiseq><gui>File</gui><gui>Import Page From Scanner</gui></guiseq>
+or the keyboard shortcut
+<keyseq><key>Ctrl</key><key>Shift</key><key>I</key></keyseq>.</p>
+
+<p>The currently detected scanner device will be used to
+scan the page. If more than one scanner if found, then a dialog
+will be shown with the options to choose from.</p>
+
+</page>
diff --git a/help/C/importingpdf.page b/help/C/importingpdf.page
new file mode 100644
index 0000000..3067340
--- /dev/null
+++ b/help/C/importingpdf.page
@@ -0,0 +1,27 @@
+<page xmlns="http://projectmallard.org/1.0/";
+      type="topic"
+      id="importingpdf">
+
+<info>
+    <link type="guide" xref="index#images"/>
+    <link type="seealso" xref="addingimage"/>
+    <desc>Importing PDF documents</desc>
+</info>
+
+<title>Importing PDF</title>
+
+<p>Some documents are nothing more than images placed in a
+PDF document. For cases like this, <app>OCRFeeder</app> can
+still import a PDF document so it can then be converted into
+an editable document.</p>
+
+<p>To import a PDF document, click in
+<guiseq><gui>File</gui><gui>Import PDF</gui></guiseq>.</p>
+
+<p>Each PDF page will be converted to an image and placed
+in the pages' area.</p>
+
+<note style="warning"><p>The PDF conversion can be a demanding
+process and take some time for large PDF files.</p></note>
+
+</page>
diff --git a/help/C/index.page b/help/C/index.page
new file mode 100644
index 0000000..8150784
--- /dev/null
+++ b/help/C/index.page
@@ -0,0 +1,46 @@
+<page xmlns="http://projectmallard.org/1.0/";
+      type="guide"
+      id="index">
+
+<info>
+    <desc>Help for the <app>OCRFeeder Document Conversion System</app>.</desc>
+    <title type='link'>OCRFeeder Document Conversion System</title>
+    <title type='text'>OCRFeeder Document Conversion System</title>
+    <credit type="author">
+      <name>Joaquim Rocha</name>
+      <email>jrocha igalia com</email>
+    </credit>
+
+    <include href="legal.xml" xmlns="http://www.w3.org/2001/XInclude"; />
+</info>
+
+<title>OCRFeeder Document Conversion System</title>
+<p>OCRFeeder is a document layout analysis and optical character recognition system.</p>
+
+<p>OCRFeeder was created to allow users to easily convert document images
+(for example, a PNG image with text) into editable documents (for example,
+an ODT version with that text).</p>
+
+<p>Given the images it will automatically outline its contents, perform OCR and
+distinguish between what's graphics and text. It generates multiple formats being
+its main one ODT.</p>
+
+<p>This guide will explain you how to configure and use OCRFeeder.</p>
+
+<section id="images" style="2column">
+    <title>Adding Images</title>
+</section>
+
+<section id="recognition" style="2column">
+    <title>Recognition</title>
+</section>
+
+<section id="configuration" style="2column">
+    <title>Configuration</title>
+</section>
+
+<section id="projects" style="2column">
+    <title>Projects</title>
+</section>
+
+</page>
diff --git a/help/C/legal.xml b/help/C/legal.xml
new file mode 100644
index 0000000..0e59883
--- /dev/null
+++ b/help/C/legal.xml
@@ -0,0 +1,9 @@
+<license xmlns="http://projectmallard.org/1.0/";
+ href="http://creativecommons.org/licenses/by-sa/3.0/us/";>
+<p>This work is licensed under a
+<link href="http://creativecommons.org/licenses/by-sa/3.0/us/";>Creative Commons
+Attribution-Share Alike 3.0 United States License</link>.</p>
+<p>As a special exception, the copyright holders give you permission to copy,
+modify, and distribute the example code contained in this document under the
+terms of your choosing, without restriction.</p>
+</license>
diff --git a/help/C/manualeditionandcorrection.page b/help/C/manualeditionandcorrection.page
new file mode 100644
index 0000000..3e2e5d3
--- /dev/null
+++ b/help/C/manualeditionandcorrection.page
@@ -0,0 +1,81 @@
+<page xmlns="http://projectmallard.org/1.0/";
+      type="topic"
+      id="manualeditionandcorrection">
+
+<info>
+    <link type="guide" xref="index#recognition"/>
+    <link type="seealso" xref="addingimage"/>
+    <link type="seealso" xref="automaticrecognition"/>
+    <desc>Manual edition and correction of results</desc>
+</info>
+
+<title>Manual Edition</title>
+
+<p>One may want to manually select just a portion of an image to
+be recognized or correct the results of the automatic recognition.
+<app>OCRFeeder</app> lets its users manually edit every aspect of
+a document's contents in an easy way.</p>
+
+<section>
+
+<title>Content Areas</title>
+
+<p>The mentioned document's contents are represented by areas like
+shown in the following image:
+<media type="image" mime="image/png" src="figures/content-areas.png">
+A picture of two content areas with one of them selected.
+</media>
+</p>
+
+<p>The attributes of a selected are shown and can be changed from
+the right part of the main window, like shown in the following image:
+<media type="image" mime="image/png" src="figures/areas-edition.png" width="100px">A
+picture showing the areas' edition UI</media>
+</p>
+
+<p>The following list describes the content areas' attributes:</p>
+<list>
+    <item><p><em>Type</em>: sets the area to be either the type image or text.
+             The image type will clip the area from the original page and
+             place it in the generated document. The text type will use the
+             text assigned to the area and represent it as text in the generated
+             document. (Generated ODT documents will have text boxes when an
+             area was marked as being of the type text)</p></item>
+    <item><p><em>Clip</em>: Shows the current clip from the original area. This makes
+             it easier for users to check exactly what's within the area.</p></item>
+    <item><p><em>Bounds</em>: Shows the point (X and Y) in the original image where the
+             top left corner of the area is placed as well as the areas' width
+             and height.</p></item>
+    <item><p><em>OCR Engine</em>: Lets the user choose an OCR engine and recognize the
+             area's text with by (by pressing the <gui>OCR</gui> button)</p>.
+             <note type="warning"><p>Using the OCR engine to recognize the text
+             will directly assign that text to the area and replace the one
+             assigned before.</p></note></item>
+    <item><p><em>Text Area</em>: Represents the text assigned to that area and lets the
+             user edit it. This area is disabled when the area is of the type
+             image</p></item>
+    <item><p><em>Style Tab</em>: Lets the user choose the font type and size, as well as
+             the text alignment, line and letter spacing.</p></item>
+</list>
+
+<p>The content areas can be selected by clicking on them or by using the menus
+<guiseq><gui>Document</gui><gui>Select Previous Area</gui></guiseq> and
+<guiseq><gui>Document</gui><gui>Select Next Area</gui></guiseq>. There are
+also keyboard shortcuts for these actions:
+<keyseq><key>Ctrl</key><key>Shift</key><key>P</key></keyseq> and
+<keyseq><key>Ctrl</key><key>Shift</key><key>N</key></keyseq>, respectively.</p>
+
+<p>Selecting all areas is also possible using
+<guiseq><gui>Document</gui><gui>Select All Areas</gui></guiseq> or
+<keyseq><key>Ctrl</key><key>Shift</key><key>A</key></keyseq>.</p>
+
+<p>When at least one content area is selected, it is possible to recognize
+their contents automatically or delete them. These actions can be accomplished
+by clicking <guiseq><gui>Document</gui><gui>Recognized Selected Areas</gui></guiseq>
+and <guiseq><gui>Document</gui><gui>Delete Selected Areas</gui></guiseq> (or
+<keyseq><key>Ctrl</key><key>Shift</key><key>Delete</key></keyseq>), respectively.
+</p>
+
+</section>
+
+</page>
diff --git a/help/C/ocrconfiguration.page b/help/C/ocrconfiguration.page
new file mode 100644
index 0000000..2cd059a
--- /dev/null
+++ b/help/C/ocrconfiguration.page
@@ -0,0 +1,87 @@
+<page xmlns="http://projectmallard.org/1.0/";
+      type="topic"
+      id="ocrconfigutation">
+
+<info>
+    <link type="guide" xref="index#configuration"/>
+    <link type="seealso" xref="automaticrecognition"/>
+    <link type="seealso" xref="manualeditionandcorrection"/>
+    <desc>Configure the OCR engines to recognize the text</desc>
+</info>
+
+<title>OCR Engines Configuration</title>
+
+<p><app>OCRFeeder</app> uses system-wide OCR engines to extract
+the text from images. This means any OCR engine that can be
+used from the command line should also be used in <app>OCRFeeder</app>.</p>
+
+<section>
+
+<title>Automatic Detection of OCR Engines</title>
+
+<p>The OCR engines (<em>Tesseract</em>, <em>GOCR</em>, <em>Ocrad</em>
+and <em>Cuneiform</em>) are already automatically detected and configured
+in most systems, the first time <app>OCRFeeder</app> is run.</p>
+
+<p>If an OCR engine is installed after <app>OCRFeeder</app> had configured
+already an engine, it will not be automatically configured but, depending on
+the engine, users might easily go to the <gui>OCR Engines</gui> dialog and
+choose it from the list of detected engines after pressing <gui>Detect</gui>.</p>
+
+<note style="tip"><p>Already configured OCR engines might be detected again and it is
+up to the user to uncheck these engines if they shouldn't be added again.</p></note>
+
+</section>
+
+<section>
+
+<title>Manual Configuration</title>
+
+<p>The currently configured OCR engines are shown in the
+<gui>OCR Engines</gui> dialog which can be opened from
+<guiseq><gui>Tools</gui><gui>OCR Engines</gui></guiseq>.</p>
+
+<p>Besides seeing the configured OCR engines, the <gui>OCR Engines</gui>
+dialog allows to add new engines, edit or delete the current ones and
+detect engines installed in the system.</p>
+
+<p>When adding or editing an OCR engine (by pressing the <gui>Add</gui>
+or <gui>Edit</gui> buttons, respectively), a dialog is shown with the
+following fields:</p>
+
+<list>
+    <item><p><gui>Name</gui>: The engine's name. This name will be used
+    in throughout the UI when referring to the engine;</p></item>
+    <item><p><gui>Image format</gui>: The image format that the engine
+    recognizes (for example, <em>TIF</em> in the case of
+    <em>Tesseract</em>);</p></item>
+    <item><p><gui>Failure string</gui>: Some engines replace unrecognized
+    characters by another, pre-defined character (for example,
+    <em>_</em> in the case of <em>GOCR</em>).</p></item>
+    <item><p><gui>Engine path</gui>: The path in the system to the
+    engine's executable (for example, <em>/usr/bin/tesseract</em>).</p></item>
+    <item><p><gui>Engine arguments</gui>: The arguments that feed an image
+    to the engine and make it output the recognized text to the standard
+    output. <app>OCRFeeder</app> runs the engine with these arguments
+    as if it was in the command line and looks for the recognized text
+    in the standard output. Some engines already do this, like
+    <em>Ocrad</em> and <em>GOCR</em> while other, like <em>Tesseract</em>,
+    write the text into a file.</p>
+    <p>Since the image's path to be read is always needed, a special argument
+    <em>$IMAGE</em> is provided for this and will be replaced by the image path
+    when the engine is run. For the cases
+    where a file name is needed, like the one mentioned previously, a special
+    argument <em>$FILE</em> is provided and will be replaced by a temporary
+    file name.</p>
+    <p>So, in case of <em>Tesseract</em> (which writes the recognized text
+    into a file), the arguments would be <em>$IMAGE $FILE; cat $FILE.txt;
+    rm $FILE</em>.</p></item>
+
+</list>
+
+<note style="advanced"><p>The engines' configuration is stored in their own XML file
+in the user's home under <em>.ocrfeeder/engines/</em>.</p></note>
+
+</section>
+
+</page>
diff --git a/help/C/projects.page b/help/C/projects.page
new file mode 100644
index 0000000..7853f28
--- /dev/null
+++ b/help/C/projects.page
@@ -0,0 +1,65 @@
+<page xmlns="http://projectmallard.org/1.0/";
+      type="topic"
+      id="projects">
+
+<info>
+    <link type="guide" xref="index#projects"/>
+    <link type="seealso" xref=""/>
+    <desc>Loading and saving projects</desc>
+</info>
+
+<title>Projects</title>
+
+<p>Sometimes a user may want to save the progress of the work
+done so far in an image and continue with it later. For this case
+<app>OCRFeeder</app> offers the possibility to save and load
+projects.</p>
+
+<p>Projects are compressed files with the <em>ocrf</em> extension
+which hold information about pages (images) and content areas.</p>
+
+<section>
+<title>Saving A Project</title>
+
+<p>After having done some work in an image, a project can be created
+by clicking <guiseq><gui>File</gui><gui>Save</gui></guiseq> or
+<guiseq><gui>File</gui><gui>Save Asâ?¦</gui></guiseq>. Optionally, the
+<keyseq><key>Control</key><key>S</key></keyseq> or
+<keyseq><key>Control</key><key>Shift</key><key>S</key></keyseq> keyboard
+shortcuts can be used. A file saving dialog will then be shown so the
+project's name and location is entered.</p>
+
+</section>
+
+<section>
+<title>Loading A Project</title>
+
+<p>An existing project can be loaded simply by clicking
+<guiseq><gui>File</gui><gui>Open</gui></guiseq> or
+<keyseq><key>Control</key><key>O</key></keyseq>.</p>
+
+</section>
+
+<section>
+<title>Appending A Project</title>
+
+<p>Sometimes it is useful to merge two or more projects in order to create
+only one document with the pages of several <app>OCRFeeder</app> projects.
+This can be accomplished by appending a project, which simply loads the pages
+from a chosen project into the current one. To do this, click in
+<guiseq><gui>File</gui><gui>Append Project</gui></guiseq> and choose the
+wanted project.</p>
+
+</section>
+
+<section>
+<title>Clearing A Project</title>
+
+<p>If all the information is a project should be deleted (for example,
+to start over again), it can be done by choosing
+<guiseq><gui>Edit</gui><gui>Clear Project</gui></guiseq>.</p>
+
+</section>
+
+
+</page>
diff --git a/help/C/unpaper.page b/help/C/unpaper.page
new file mode 100644
index 0000000..96189ce
--- /dev/null
+++ b/help/C/unpaper.page
@@ -0,0 +1,45 @@
+<page xmlns="http://projectmallard.org/1.0/";
+      type="topic"
+      id="unpaper">
+
+<info>
+    <link type="guide" xref="index#configuration"/>
+    <link type="seealso" xref="manualeditionandcorrection"/>
+    <desc>Cleaning images before performing OCR</desc>
+</info>
+
+<title>Unpaper</title>
+
+<p><em>Unpaper</em> is a tool to clean images in order to make
+them easier to read on screen. It is aimed mainly at images
+obtained from scanned documents which usually show dust, black
+margins or other flaws.</p>
+
+<p><app>OCRFeeder</app> can use <em>Unpaper</em> to clean its
+images before processing them, which usually results in a better
+recognition.</p>
+
+<p><em>Unpaper</em> needs to be installed in order to be used.
+If it is not installed, <app>OCRFeeder</app> won't show it's action
+in the interface.</p>
+
+<p>To use <em>Unpaper</em> on a loaded image, click
+<guiseq><gui>Tools</gui><gui>Unpaper</gui></guiseq>. The
+<gui>Unpaper Image Processor</gui> dialog will be shown with
+<em>Unpaper</em>'s options and an area to preview the changes before
+applying them to the loaded image. Depending on the size and
+characteristics of the image, using this tool might take some time.</p>
+
+<p><em>Unpaper</em> can be configured opening
+<guiseq><gui>Edit</gui><gui>Preferences</gui></guiseq> and accessing
+the <gui>Tools</gui> tab. In this area one can enter the path to
+<em>Unpaper</em>'s executable (normally this is already configured if
+<em>Unpaper</em> was installed the first time <app>OCRFeeder</app> was
+run). In the same area, under <gui>Image Pre-Processing</gui>, one can
+check <gui>Unpaper images</gui> to make images being processed automatically
+by <em>Unpaper</em> after they are loaded into <app>OCRFeeder</app>.
+The options taken by <em>Unpaper</em> when it's automatically
+called after adding an image can be configured by clicking the
+<gui>Unpaper Preferences</gui> button.</p>
+
+</page>
[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]