[ocrfeeder/user_documentation] Added the user documentation
- From: Joaquim Manuel Pereira Rocha <jrocha src gnome org>
- To: commits-list gnome org
- Cc:
- Subject: [ocrfeeder/user_documentation] Added the user documentation
- Date: Wed, 15 Dec 2010 11:51:40 +0000 (UTC)
commit 377173487825897e00b2d7f22fedf2a63992fc7a
Author: Joaquim Rocha <jrocha igalia com>
Date: Sat Dec 11 02:12:09 2010 +0100
Added the user documentation
help/C/addingfolder.page | 17 ++++++
help/C/addingimage.page | 41 +++++++++++++++
help/C/automaticrecognition.page | 34 ++++++++++++
help/C/deskewing.page | 29 +++++++++++
help/C/documentgeneration.page | 27 ++++++++++
help/C/figures/areas-edition.png | Bin 0 -> 64634 bytes
help/C/figures/content-areas.png | Bin 0 -> 75291 bytes
help/C/finetuning.page | 54 ++++++++++++++++++++
help/C/importingfromscanner.page | 26 ++++++++++
help/C/importingpdf.page | 27 ++++++++++
help/C/index.page | 46 +++++++++++++++++
help/C/legal.xml | 9 +++
help/C/manualeditionandcorrection.page | 81 +++++++++++++++++++++++++++++
help/C/ocrconfiguration.page | 87 ++++++++++++++++++++++++++++++++
help/C/projects.page | 65 ++++++++++++++++++++++++
help/C/unpaper.page | 45 ++++++++++++++++
16 files changed, 588 insertions(+), 0 deletions(-)
---
diff --git a/help/C/addingfolder.page b/help/C/addingfolder.page
new file mode 100644
index 0000000..093ba34
--- /dev/null
+++ b/help/C/addingfolder.page
@@ -0,0 +1,17 @@
+<page xmlns="http://projectmallard.org/1.0/"
+ type="topic"
+ id="addingfolder">
+
+<info>
+ <link type="guide" xref="index#images"/>
+ <link type="seealso" xref="addingimage"/>
+ <desc>Adding all the images from a folder</desc>
+</info>
+
+<title>Adding Folder</title>
+
+<p>Sometimes it is useful to add all the images from a given
+folder. <app>OCRFeeder</app> provides this functionality
+by choosing <guiseq><gui>File</gui><gui>Add Folder</gui></guiseq>.</p>
+
+</page>
diff --git a/help/C/addingimage.page b/help/C/addingimage.page
new file mode 100644
index 0000000..3ec5236
--- /dev/null
+++ b/help/C/addingimage.page
@@ -0,0 +1,41 @@
+<page xmlns="http://projectmallard.org/1.0/"
+ type="topic"
+ id="addingimage">
+
+<info>
+ <link type="guide" xref="index#images"/>
+ <desc>Adding an image to be recognized</desc>
+</info>
+
+<title>Adding An Image</title>
+
+<p>Adding an image to OCRFeeder is usually the first step when
+converting a document.</p>
+
+<p>Each added image represents a page in the final document.
+A thumbnail of the image will be shown in the pages area (left
+area of <app>OCRFeeder</app>).</p>
+
+<p>The order of the pages in the final document will be the
+same as the images' order in the pages' area. This way, pages
+can be reordered by dragging them in the images' thumbnails
+in the pages' area.</p>
+
+<p>You can add an image by clicking
+<guiseq><gui>File</gui><gui>Add Image</gui></guiseq>.</p>
+
+<p>To delete a page, click in
+<guiseq><gui>Edit</gui><gui>Delete Page</gui></guiseq> or
+right-click over the page's thumbnail and choose <gui>Delete</gui>.</p>
+
+<section>
+<title>Page Configuration</title>
+
+<p>To configre the pages' size click in
+<guiseq><gui>Edit</gui><gui>Edit Page</gui></guiseq>
+and choose either a custom size, providing the respective
+values or a standard paper size from a list.</p>
+
+</section>
+
+</page>
diff --git a/help/C/automaticrecognition.page b/help/C/automaticrecognition.page
new file mode 100644
index 0000000..58f4435
--- /dev/null
+++ b/help/C/automaticrecognition.page
@@ -0,0 +1,34 @@
+<page xmlns="http://projectmallard.org/1.0/"
+ type="topic"
+ id="automaticrecognition">
+
+<info>
+ <link type="guide" xref="index#recognition"/>
+ <link type="seealso" xref="addingimage"/>
+ <desc>Automatically recognizing an image</desc>
+</info>
+
+<title>Automatic Recognition</title>
+
+<p><app>OCRFeeder</app> tries to detect the contents in a
+document image and perform OCR over them, also distinguishing
+between what is graphics and what is text. To simplify this
+concept, we call it recognition.</p>
+
+<p>After an image is added it can be automatically recognized
+by clicking
+<guiseq><gui>Document</gui><gui>Recognize Document</gui></guiseq>.</p>
+
+<note style="important"><p>Since there are many different document
+layouts out there, the automatic recognition, mainly the page
+segmentation, may turn out not to be accurate for you document. In this
+case, some manual editing of the recognition results might be needed.
+</p></note>
+
+<note style="warning"><p>The automatic recognition performs some complex
+operations and may take some time depending on the size of the image
+and the complexity of the layout.</p>
+<p>The automatic recognition will replace all the content areas
+in the currently selected page.</p></note>
+
+</page>
diff --git a/help/C/deskewing.page b/help/C/deskewing.page
new file mode 100644
index 0000000..00fbe39
--- /dev/null
+++ b/help/C/deskewing.page
@@ -0,0 +1,29 @@
+<page xmlns="http://projectmallard.org/1.0/"
+ type="topic"
+ id="deskewing">
+
+<info>
+ <link type="guide" xref="index#configuration"/>
+ <link type="seealso" xref="manualeditionandcorrection"/>
+ <desc>Correcting the skew in the images</desc>
+</info>
+
+<title>Deskewing</title>
+
+<p>Some images, especially if they were added from a scanner device,
+may be skewed and this makes it harder to recognize the image.</p>
+
+<p><app>OCRFeeder</app> offers a way to automatically deskew an
+image. To deskew a loaded image, click
+<guiseq><gui>Tools</gui><gui>Image Deskewer</gui></guiseq>.</p>
+
+<p>This operation can also be set to be performed automatically
+every time an image is added. To set it, simply open the
+<gui>Preferences</gui> dialog from
+<guiseq><gui>Edit</gui><gui>Preferences</gui></guiseq> and check
+<gui>Deskew images</gui> under the <gui>Tools</gui> tab.</p>
+
+<note type="warning"><p>Depending on the size and characteristics
+of the image, deskewing an image may take some time.</p></note>
+
+</page>
diff --git a/help/C/documentgeneration.page b/help/C/documentgeneration.page
new file mode 100644
index 0000000..bbd3abe
--- /dev/null
+++ b/help/C/documentgeneration.page
@@ -0,0 +1,27 @@
+<page xmlns="http://projectmallard.org/1.0/"
+ type="topic"
+ id="documentgeneration">
+
+<info>
+ <link type="guide" xref="index#recognition"/>
+ <link type="seealso" xref="automaticrecognition"/>
+ <link type="seealso" xref="manualeditionandcorrection"/>
+ <desc>Creating an editable document</desc>
+</info>
+
+<title>Document Generation</title>
+
+<p><app>OCRFeeder</app> currently generates two document formats:
+<em>ODT</em> and <em>HTML</em>.</p>
+
+<p>After the recognition and eventual manual edition has been
+performed, it is possible to generate a document by clicking
+<guiseq><gui>File</gui><gui>Exportâ?¦</gui></guiseq> and choosing
+the desired document format.</p>
+
+<note style="tip"><p>The HTML exportation generates a folder
+with the document pages represented by one HTML file. In each page
+there are links to go to the previous and next pages. Image content
+areas are stored in a subfolder called <em>images</em>.</p></note>
+
+</page>
diff --git a/help/C/figures/areas-edition.png b/help/C/figures/areas-edition.png
new file mode 100644
index 0000000..0ac82ab
Binary files /dev/null and b/help/C/figures/areas-edition.png differ
diff --git a/help/C/figures/content-areas.png b/help/C/figures/content-areas.png
new file mode 100644
index 0000000..cb3f471
Binary files /dev/null and b/help/C/figures/content-areas.png differ
diff --git a/help/C/finetuning.page b/help/C/finetuning.page
new file mode 100644
index 0000000..1a3d143
--- /dev/null
+++ b/help/C/finetuning.page
@@ -0,0 +1,54 @@
+<page xmlns="http://projectmallard.org/1.0/"
+ type="topic"
+ id="finetuning">
+
+<info>
+ <link type="guide" xref="index#configuration"/>
+ <link type="seealso" xref="manualeditionandcorrection"/>
+ <desc>Advanced options for a better recognition</desc>
+</info>
+
+<title>Fine-tuning</title>
+
+<p><app>OCRFeeder</app> has some advanced options that can be
+used to perform a better recognition. These options can be
+chosen from the <guiseq><gui>Edit</gui><gui>Preferences</gui></guiseq>
+dialog, under its <gui>Recognition</gui> tab.</p>
+
+<p>The following list describes the mentioned options:</p>
+<list>
+ <item><p><gui>Fix line breaks and hyphenization</gui>: OCR engines
+ usually read the text line by line and seperate each line with a
+ line break. Sometimes, this is not what the user wants because the
+ text might be broken in the middle of a sentence.</p>
+ <p>Checking this option will make <app>OCRFeeder</app> remove single
+ newline characters after the text is recognized by the engines.</p>
+ <p>Since just removing newlines in an hyphenized text would result
+ in wrongly separated words, hyphenization is also detected and removed
+ in this process.</p></item>
+ <item><p><gui>Window Size</gui>: <app>OCRFeeder</app>'s algorithm to
+ detect the contents in an image uses the concept of <em>window size</em>
+ which is the division of the image in small windows. A smaller window
+ size means it is likely to detect more content areas but size that is
+ too small may result in contents that should be part of a bigger area
+ instead. On the other hand, a bigger window size means less divisions
+ of contents but may end up in contents which should be subdivided.</p>
+ <p>A good window size should be slightly bigger than the text line spacing
+ in an image.</p><p>Users may want to manually set this value if automatic
+ one doesn't produce any valid content areas but normally it is easier to
+ use the automatic one and perform any needed corrections directly in
+ the content areas.</p></item>
+ <item><p><gui>Improve columns detection</gui>: Check this option if
+ <app>OCRFeeder</app> should try to divide the detected content areas
+ horizontally (originating more columns). The value that is used to
+ check the existance of blank space within the contents may be set to
+ automatic or manual when the columns aren't detected correctly.</p></item>
+ <item><p><gui>Adjust content areas' bounds</gui>: The detected content
+ areas sometimes have a considerable margin between their contents and
+ the areas' edges. By checking this option, <app>OCRFeeder</app> will
+ minimize those margins, adjusting the areas to its contents better.
+ Optionally, a manual value can be check to indicate the minimum value
+ of the adjusted margins.</p></item>
+</list>
+
+</page>
diff --git a/help/C/importingfromscanner.page b/help/C/importingfromscanner.page
new file mode 100644
index 0000000..6acf508
--- /dev/null
+++ b/help/C/importingfromscanner.page
@@ -0,0 +1,26 @@
+<page xmlns="http://projectmallard.org/1.0/"
+ type="topic"
+ id="importingfromscanner">
+
+<info>
+ <link type="guide" xref="index#images"/>
+ <link type="seealso" xref="addingimage"/>
+ <desc>Importing from a scanner device</desc>
+</info>
+
+<title>Importing From Scanner</title>
+
+<p>In order to help convert a printed document into
+an editable document, <app>OCRFeeder</app> offers a
+way to import images directly from a scanner device.</p>
+
+<p>To import an image from a scanner device, use the menu
+<guiseq><gui>File</gui><gui>Import Page From Scanner</gui></guiseq>
+or the keyboard shortcut
+<keyseq><key>Ctrl</key><key>Shift</key><key>I</key></keyseq>.</p>
+
+<p>The currently detected scanner device will be used to
+scan the page. If more than one scanner if found, then a dialog
+will be shown with the options to choose from.</p>
+
+</page>
diff --git a/help/C/importingpdf.page b/help/C/importingpdf.page
new file mode 100644
index 0000000..3067340
--- /dev/null
+++ b/help/C/importingpdf.page
@@ -0,0 +1,27 @@
+<page xmlns="http://projectmallard.org/1.0/"
+ type="topic"
+ id="importingpdf">
+
+<info>
+ <link type="guide" xref="index#images"/>
+ <link type="seealso" xref="addingimage"/>
+ <desc>Importing PDF documents</desc>
+</info>
+
+<title>Importing PDF</title>
+
+<p>Some documents are nothing more than images placed in a
+PDF document. For cases like this, <app>OCRFeeder</app> can
+still import a PDF document so it can then be converted into
+an editable document.</p>
+
+<p>To import a PDF document, click in
+<guiseq><gui>File</gui><gui>Import PDF</gui></guiseq>.</p>
+
+<p>Each PDF page will be converted to an image and placed
+in the pages' area.</p>
+
+<note style="warning"><p>The PDF conversion can be a demanding
+process and take some time for large PDF files.</p></note>
+
+</page>
diff --git a/help/C/index.page b/help/C/index.page
new file mode 100644
index 0000000..8150784
--- /dev/null
+++ b/help/C/index.page
@@ -0,0 +1,46 @@
+<page xmlns="http://projectmallard.org/1.0/"
+ type="guide"
+ id="index">
+
+<info>
+ <desc>Help for the <app>OCRFeeder Document Conversion System</app>.</desc>
+ <title type='link'>OCRFeeder Document Conversion System</title>
+ <title type='text'>OCRFeeder Document Conversion System</title>
+ <credit type="author">
+ <name>Joaquim Rocha</name>
+ <email>jrocha igalia com</email>
+ </credit>
+
+ <include href="legal.xml" xmlns="http://www.w3.org/2001/XInclude" />
+</info>
+
+<title>OCRFeeder Document Conversion System</title>
+<p>OCRFeeder is a document layout analysis and optical character recognition system.</p>
+
+<p>OCRFeeder was created to allow users to easily convert document images
+(for example, a PNG image with text) into editable documents (for example,
+an ODT version with that text).</p>
+
+<p>Given the images it will automatically outline its contents, perform OCR and
+distinguish between what's graphics and text. It generates multiple formats being
+its main one ODT.</p>
+
+<p>This guide will explain you how to configure and use OCRFeeder.</p>
+
+<section id="images" style="2column">
+ <title>Adding Images</title>
+</section>
+
+<section id="recognition" style="2column">
+ <title>Recognition</title>
+</section>
+
+<section id="configuration" style="2column">
+ <title>Configuration</title>
+</section>
+
+<section id="projects" style="2column">
+ <title>Projects</title>
+</section>
+
+</page>
diff --git a/help/C/legal.xml b/help/C/legal.xml
new file mode 100644
index 0000000..0e59883
--- /dev/null
+++ b/help/C/legal.xml
@@ -0,0 +1,9 @@
+<license xmlns="http://projectmallard.org/1.0/"
+ href="http://creativecommons.org/licenses/by-sa/3.0/us/">
+<p>This work is licensed under a
+<link href="http://creativecommons.org/licenses/by-sa/3.0/us/">Creative Commons
+Attribution-Share Alike 3.0 United States License</link>.</p>
+<p>As a special exception, the copyright holders give you permission to copy,
+modify, and distribute the example code contained in this document under the
+terms of your choosing, without restriction.</p>
+</license>
diff --git a/help/C/manualeditionandcorrection.page b/help/C/manualeditionandcorrection.page
new file mode 100644
index 0000000..3e2e5d3
--- /dev/null
+++ b/help/C/manualeditionandcorrection.page
@@ -0,0 +1,81 @@
+<page xmlns="http://projectmallard.org/1.0/"
+ type="topic"
+ id="manualeditionandcorrection">
+
+<info>
+ <link type="guide" xref="index#recognition"/>
+ <link type="seealso" xref="addingimage"/>
+ <link type="seealso" xref="automaticrecognition"/>
+ <desc>Manual edition and correction of results</desc>
+</info>
+
+<title>Manual Edition</title>
+
+<p>One may want to manually select just a portion of an image to
+be recognized or correct the results of the automatic recognition.
+<app>OCRFeeder</app> lets its users manually edit every aspect of
+a document's contents in an easy way.</p>
+
+<section>
+
+<title>Content Areas</title>
+
+<p>The mentioned document's contents are represented by areas like
+shown in the following image:
+<media type="image" mime="image/png" src="figures/content-areas.png">
+A picture of two content areas with one of them selected.
+</media>
+</p>
+
+<p>The attributes of a selected are shown and can be changed from
+the right part of the main window, like shown in the following image:
+<media type="image" mime="image/png" src="figures/areas-edition.png" width="100px">A
+picture showing the areas' edition UI</media>
+</p>
+
+<p>The following list describes the content areas' attributes:</p>
+<list>
+ <item><p><em>Type</em>: sets the area to be either the type image or text.
+ The image type will clip the area from the original page and
+ place it in the generated document. The text type will use the
+ text assigned to the area and represent it as text in the generated
+ document. (Generated ODT documents will have text boxes when an
+ area was marked as being of the type text)</p></item>
+ <item><p><em>Clip</em>: Shows the current clip from the original area. This makes
+ it easier for users to check exactly what's within the area.</p></item>
+ <item><p><em>Bounds</em>: Shows the point (X and Y) in the original image where the
+ top left corner of the area is placed as well as the areas' width
+ and height.</p></item>
+ <item><p><em>OCR Engine</em>: Lets the user choose an OCR engine and recognize the
+ area's text with by (by pressing the <gui>OCR</gui> button)</p>.
+ <note type="warning"><p>Using the OCR engine to recognize the text
+ will directly assign that text to the area and replace the one
+ assigned before.</p></note></item>
+ <item><p><em>Text Area</em>: Represents the text assigned to that area and lets the
+ user edit it. This area is disabled when the area is of the type
+ image</p></item>
+ <item><p><em>Style Tab</em>: Lets the user choose the font type and size, as well as
+ the text alignment, line and letter spacing.</p></item>
+</list>
+
+<p>The content areas can be selected by clicking on them or by using the menus
+<guiseq><gui>Document</gui><gui>Select Previous Area</gui></guiseq> and
+<guiseq><gui>Document</gui><gui>Select Next Area</gui></guiseq>. There are
+also keyboard shortcuts for these actions:
+<keyseq><key>Ctrl</key><key>Shift</key><key>P</key></keyseq> and
+<keyseq><key>Ctrl</key><key>Shift</key><key>N</key></keyseq>, respectively.</p>
+
+<p>Selecting all areas is also possible using
+<guiseq><gui>Document</gui><gui>Select All Areas</gui></guiseq> or
+<keyseq><key>Ctrl</key><key>Shift</key><key>A</key></keyseq>.</p>
+
+<p>When at least one content area is selected, it is possible to recognize
+their contents automatically or delete them. These actions can be accomplished
+by clicking <guiseq><gui>Document</gui><gui>Recognized Selected Areas</gui></guiseq>
+and <guiseq><gui>Document</gui><gui>Delete Selected Areas</gui></guiseq> (or
+<keyseq><key>Ctrl</key><key>Shift</key><key>Delete</key></keyseq>), respectively.
+</p>
+
+</section>
+
+</page>
diff --git a/help/C/ocrconfiguration.page b/help/C/ocrconfiguration.page
new file mode 100644
index 0000000..2cd059a
--- /dev/null
+++ b/help/C/ocrconfiguration.page
@@ -0,0 +1,87 @@
+<page xmlns="http://projectmallard.org/1.0/"
+ type="topic"
+ id="ocrconfigutation">
+
+<info>
+ <link type="guide" xref="index#configuration"/>
+ <link type="seealso" xref="automaticrecognition"/>
+ <link type="seealso" xref="manualeditionandcorrection"/>
+ <desc>Configure the OCR engines to recognize the text</desc>
+</info>
+
+<title>OCR Engines Configuration</title>
+
+<p><app>OCRFeeder</app> uses system-wide OCR engines to extract
+the text from images. This means any OCR engine that can be
+used from the command line should also be used in <app>OCRFeeder</app>.</p>
+
+<section>
+
+<title>Automatic Detection of OCR Engines</title>
+
+<p>The OCR engines (<em>Tesseract</em>, <em>GOCR</em>, <em>Ocrad</em>
+and <em>Cuneiform</em>) are already automatically detected and configured
+in most systems, the first time <app>OCRFeeder</app> is run.</p>
+
+<p>If an OCR engine is installed after <app>OCRFeeder</app> had configured
+already an engine, it will not be automatically configured but, depending on
+the engine, users might easily go to the <gui>OCR Engines</gui> dialog and
+choose it from the list of detected engines after pressing <gui>Detect</gui>.</p>
+
+<note style="tip"><p>Already configured OCR engines might be detected again and it is
+up to the user to uncheck these engines if they shouldn't be added again.</p></note>
+
+</section>
+
+<section>
+
+<title>Manual Configuration</title>
+
+<p>The currently configured OCR engines are shown in the
+<gui>OCR Engines</gui> dialog which can be opened from
+<guiseq><gui>Tools</gui><gui>OCR Engines</gui></guiseq>.</p>
+
+<p>Besides seeing the configured OCR engines, the <gui>OCR Engines</gui>
+dialog allows to add new engines, edit or delete the current ones and
+detect engines installed in the system.</p>
+
+<p>When adding or editing an OCR engine (by pressing the <gui>Add</gui>
+or <gui>Edit</gui> buttons, respectively), a dialog is shown with the
+following fields:</p>
+
+<list>
+ <item><p><gui>Name</gui>: The engine's name. This name will be used
+ in throughout the UI when referring to the engine;</p></item>
+ <item><p><gui>Image format</gui>: The image format that the engine
+ recognizes (for example, <em>TIF</em> in the case of
+ <em>Tesseract</em>);</p></item>
+ <item><p><gui>Failure string</gui>: Some engines replace unrecognized
+ characters by another, pre-defined character (for example,
+ <em>_</em> in the case of <em>GOCR</em>).</p></item>
+ <item><p><gui>Engine path</gui>: The path in the system to the
+ engine's executable (for example, <em>/usr/bin/tesseract</em>).</p></item>
+ <item><p><gui>Engine arguments</gui>: The arguments that feed an image
+ to the engine and make it output the recognized text to the standard
+ output. <app>OCRFeeder</app> runs the engine with these arguments
+ as if it was in the command line and looks for the recognized text
+ in the standard output. Some engines already do this, like
+ <em>Ocrad</em> and <em>GOCR</em> while other, like <em>Tesseract</em>,
+ write the text into a file.</p>
+ <p>Since the image's path to be read is always needed, a special argument
+ <em>$IMAGE</em> is provided for this and will be replaced by the image path
+ when the engine is run. For the cases
+ where a file name is needed, like the one mentioned previously, a special
+ argument <em>$FILE</em> is provided and will be replaced by a temporary
+ file name.</p>
+ <p>So, in case of <em>Tesseract</em> (which writes the recognized text
+ into a file), the arguments would be <em>$IMAGE $FILE; cat $FILE.txt;
+ rm $FILE</em>.</p></item>
+
+</list>
+
+<note style="advanced"><p>The engines' configuration is stored in their own XML file
+in the user's home under <em>.ocrfeeder/engines/</em>.</p></note>
+
+</section>
+
+</page>
diff --git a/help/C/projects.page b/help/C/projects.page
new file mode 100644
index 0000000..7853f28
--- /dev/null
+++ b/help/C/projects.page
@@ -0,0 +1,65 @@
+<page xmlns="http://projectmallard.org/1.0/"
+ type="topic"
+ id="projects">
+
+<info>
+ <link type="guide" xref="index#projects"/>
+ <link type="seealso" xref=""/>
+ <desc>Loading and saving projects</desc>
+</info>
+
+<title>Projects</title>
+
+<p>Sometimes a user may want to save the progress of the work
+done so far in an image and continue with it later. For this case
+<app>OCRFeeder</app> offers the possibility to save and load
+projects.</p>
+
+<p>Projects are compressed files with the <em>ocrf</em> extension
+which hold information about pages (images) and content areas.</p>
+
+<section>
+<title>Saving A Project</title>
+
+<p>After having done some work in an image, a project can be created
+by clicking <guiseq><gui>File</gui><gui>Save</gui></guiseq> or
+<guiseq><gui>File</gui><gui>Save Asâ?¦</gui></guiseq>. Optionally, the
+<keyseq><key>Control</key><key>S</key></keyseq> or
+<keyseq><key>Control</key><key>Shift</key><key>S</key></keyseq> keyboard
+shortcuts can be used. A file saving dialog will then be shown so the
+project's name and location is entered.</p>
+
+</section>
+
+<section>
+<title>Loading A Project</title>
+
+<p>An existing project can be loaded simply by clicking
+<guiseq><gui>File</gui><gui>Open</gui></guiseq> or
+<keyseq><key>Control</key><key>O</key></keyseq>.</p>
+
+</section>
+
+<section>
+<title>Appending A Project</title>
+
+<p>Sometimes it is useful to merge two or more projects in order to create
+only one document with the pages of several <app>OCRFeeder</app> projects.
+This can be accomplished by appending a project, which simply loads the pages
+from a chosen project into the current one. To do this, click in
+<guiseq><gui>File</gui><gui>Append Project</gui></guiseq> and choose the
+wanted project.</p>
+
+</section>
+
+<section>
+<title>Clearing A Project</title>
+
+<p>If all the information is a project should be deleted (for example,
+to start over again), it can be done by choosing
+<guiseq><gui>Edit</gui><gui>Clear Project</gui></guiseq>.</p>
+
+</section>
+
+
+</page>
diff --git a/help/C/unpaper.page b/help/C/unpaper.page
new file mode 100644
index 0000000..96189ce
--- /dev/null
+++ b/help/C/unpaper.page
@@ -0,0 +1,45 @@
+<page xmlns="http://projectmallard.org/1.0/"
+ type="topic"
+ id="unpaper">
+
+<info>
+ <link type="guide" xref="index#configuration"/>
+ <link type="seealso" xref="manualeditionandcorrection"/>
+ <desc>Cleaning images before performing OCR</desc>
+</info>
+
+<title>Unpaper</title>
+
+<p><em>Unpaper</em> is a tool to clean images in order to make
+them easier to read on screen. It is aimed mainly at images
+obtained from scanned documents which usually show dust, black
+margins or other flaws.</p>
+
+<p><app>OCRFeeder</app> can use <em>Unpaper</em> to clean its
+images before processing them, which usually results in a better
+recognition.</p>
+
+<p><em>Unpaper</em> needs to be installed in order to be used.
+If it is not installed, <app>OCRFeeder</app> won't show it's action
+in the interface.</p>
+
+<p>To use <em>Unpaper</em> on a loaded image, click
+<guiseq><gui>Tools</gui><gui>Unpaper</gui></guiseq>. The
+<gui>Unpaper Image Processor</gui> dialog will be shown with
+<em>Unpaper</em>'s options and an area to preview the changes before
+applying them to the loaded image. Depending on the size and
+characteristics of the image, using this tool might take some time.</p>
+
+<p><em>Unpaper</em> can be configured opening
+<guiseq><gui>Edit</gui><gui>Preferences</gui></guiseq> and accessing
+the <gui>Tools</gui> tab. In this area one can enter the path to
+<em>Unpaper</em>'s executable (normally this is already configured if
+<em>Unpaper</em> was installed the first time <app>OCRFeeder</app> was
+run). In the same area, under <gui>Image Pre-Processing</gui>, one can
+check <gui>Unpaper images</gui> to make images being processed automatically
+by <em>Unpaper</em> after they are loaded into <app>OCRFeeder</app>.
+The options taken by <em>Unpaper</em> when it's automatically
+called after adding an image can be configured by clicking the
+<gui>Unpaper Preferences</gui> button.</p>
+
+</page>
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]