Difference between revisions of "User:GChriss/MediaDigitization"

From WikiConference North America
Jump to navigation Jump to search
(add mini-description)
(sentence structure and all)
Line 4: Line 4:
   
 
===hOCR Workflow Tools===
 
===hOCR Workflow Tools===
The hOCR Workflow Tools project is a collection of tools to facilitate generation of text-searchable digital documents and is particularly useful in contexts where traditional OCR techniques would fare poorly (''e.g.'' handwritten notes) implemented ''via'' two [http://www.inkscape.org/en/ Inkscape] extensions:
+
The hOCR Workflow Tools project is a collection of tools to facilitate generation of text-searchable digital documents and is particularly useful in contexts where traditional OCR techniques would fare poorly (''e.g.'' handwritten notes). It's implemented ''via'' two [http://www.inkscape.org/en/ Inkscape] extensions:
 
:[https://gitorious.org/hocr-workflow/inkscape-hocr Inkscape Extension: Export Image Overlay Text as hOCR]
 
:[https://gitorious.org/hocr-workflow/inkscape-hocr Inkscape Extension: Export Image Overlay Text as hOCR]
 
:[https://gitorious.org/hocr-workflow/inkscape-hocrpdf Inkscape Extension: Create Multi-Page PDF from hOCR HTML Directory]
 
:[https://gitorious.org/hocr-workflow/inkscape-hocrpdf Inkscape Extension: Create Multi-Page PDF from hOCR HTML Directory]
   
  +
Accurate text-searchable documents bring new life and layers of reader engagement to source materials.
   
 
<br />
 
<br />

Revision as of 03:03, 1 June 2014


An informal, introductory session led by User:GChriss on the following novel digization techniques:

hOCR Workflow Tools

The hOCR Workflow Tools project is a collection of tools to facilitate generation of text-searchable digital documents and is particularly useful in contexts where traditional OCR techniques would fare poorly (e.g. handwritten notes). It's implemented via two Inkscape extensions:

Inkscape Extension: Export Image Overlay Text as hOCR
Inkscape Extension: Create Multi-Page PDF from hOCR HTML Directory

Accurate text-searchable documents bring new life and layers of reader engagement to source materials.


The BookLiberator

See http://gchriss.tumblr.com/post/84946122863/bookliberator


High-Resolution Imaging

Using image sensors with a high pixel density (defined as the number of sensor pixels divided by total sensor size) combined with high-resolving-power lenses it's possible to image arbitrary surfaces in much higher detail than using document scanning or traditional macro photography techniques.  For an example image created using this technique please see:

http://media.openvideo.pro/u/gchriss/m/docuzoom-microscale-1-dollar-bill

Beyond an introduction to the novel technique and how it can be applied in historical research contexts, a working "works/doesn't yet work/future work" status update will be presented with a particular focus on large-document automated scanning.  An Elphel 353L camera as well as a A10-OLinuXino-LIME interfaced with an OV5642 image sensor via GPIO pins will be on display.


Open Video Reference Build

The ‘Open Video Reference Build’ is a set of tools designed to facilitate working with open video in multiple contexts such as software development, live-streaming, A/V conferencing, video editing, and machine recognition.  It currently consists of three BASH scripts that create a series of well-defined software packages running in a libre, long-term-support operating system: Trisquel.

Video can be difficult to work with.  The Open Video Reference Build is designed to reduce as much complexity as possible without sacrificing build precision or extensibility.  See: https://gitorious.org/openvideo_reference_build