Estimate locations of text lines within text canvases.

None yet

getLines(canvas)

Arguments

canvas: A N dimensional matrix (product of png::readPNG()) corresponding to a bitmap image containing rasterized text.

Value

A data frame containing 1 row for each line of text in the canvas, and the following columns:

line: Integer indicating text line number, counting from the top of the canvas.
top: Integer indicating upper extent of line in pixels.
baseline: Integer indicating location of line's baseline (estimated) in pixels. NOT YET IMPLEMENTED.
bottom: Integer indicating lower extent of line in pixels.

Details

This function, and it's companions getChars() and GetMargins(), are utilities designed to extract details of text position in a bitmap file, a text canvas, as we'll call it here. A text canvas is simply a bitmap that contains one or more lines of text. The text is assumed to be organized as it would be on a printed page.

In our own lab we generate text canvases programmatically to use as stimulus items in the context of experiment control software (ECS) like SRR's Experiment Builder, or PST's E-Prime. The Python scripts we use for the purpose also generate region files (AKA area of interest files, etc). Because we build the text canvases ourselves, we have precise control over the locations of words within them.

Another approach to displaying text in an experiment is to let the ECS build the text canvases from plain text representations of the stimuli. That's the standard way of presenting text (sentences, paragraphs, whatever) in SRR Experiment Builder, for example. However, EB will generate text canvases for each trial based on the text provided to it. These are pre-compiled and stored in a specific directory under the deployed EB script. So, if your EB script is named "SentenceReading", then it will live in a directory with that same name. Any screen displayed in the execution of the script, including text canvases, will be in a sub-folder called "runtime/images".

Sadly, these files will have unhelpful names consisting of a long string of numbers, like "307641742100813627.png". And, as noted "runtime/images" will also also contain png files for screens other than the text canvases used as experimental stimuli, with no hint in the file names as to which are which. So, the only way to separate out the text canvases of interest is to go through the files manually and sort them out. (To be clear, I think this is a problem that is not specific to EB. I guess any experiment control software will have a similar issue.)

But, once you have those stimulus files, you may want to parse out the locations of text in the files so you can do some reanalysis based on regions of interest other than those that were specified prior to data acquisition. Certainly, this could be done by making manual measurements of each text canvas, but the functions, getLines(), getChars(), and getMargins() are designed to at least partially automate the process.

Note

TODO: Work on getting top and bottom bounds for each line, as well as estimated baseline.

TODO: Find out if there is a way to interpret the filenames of bitmaps located in SRR EB script subdirectory "runtime/images" in a meaningful way. Most helpful would be to get a list of those files that SRR DataViewer knows to used as background images for a specific EDF file.

Author

Dave Braze <davebraze@gmail.com>

Examples

    cnvs <- system.file("extdata/story01.png", package="FDBeye")
    cnvs <- png::readPNG(cnvs)
    fcnvs <- apply(cnvs, c(1,2), sum) # flatten to a single plane for convenience

    ## get lines
    getLines(fcnvs)
#> Warning: the condition has length > 1 and only the first element will be used
#>       line top baseline bottom
#>  [1,]    1  73       NA     89
#>  [2,]    2 134       NA    150
#>  [3,]    3 195       NA    211
#>  [4,]    4 256       NA    272
#>  [5,]    5 319       NA    335
#>  [6,]    6 384       NA    400
#>  [7,]    7 445       NA    461
#>  [8,]    8 506       NA    522
#>  [9,]    9 567       NA    583
#> [10,]   10 628       NA    644
#> [11,]   11 689       NA    705
#> [12,]   12 754       NA    770
#> [13,]   13 815       NA    831