Just Great Software

Latest Issue of the Just Great Software Newsletter

January 2017

  1. Updated: PowerGREP 5.0.6
  2. Tips & Tricks: Display All Characters Correctly

Updated: PowerGREP 5.0.6

Both PowerGREP 5.0.5 and 5.0.6 were released this month. This newsletter entry describes what's new since PowerGREP 5.0.4.

PowerGREP 5.0.6 fixes a bunch of recently introduced bugs. Since version 5.0.0, saving a file stored inside an archive failed with an access violation if a backup copy needed to be created. Since version 5.0.3, searching through individually compressed files (.gz, .bz2, .lzma, and .xz) could cause PowerGREP to crash completely. Since version 5.0.4, actions using the "count matches" action type could not be previewed or executed and actions using the "convert matches files to text" or "convert copies of matches files to text" target type could not be executed. Version 5.0.6 restores all this functionality to the way it worked prior to these versions.

PowerGREP 5.0.5 fixed a bug that could cause PowerGREP to freeze or crash completely when searching through certain DOCX files (documents saved by Microsoft Word 2007 or later). If you use PowerGREP 5.0.x to search through plain text conversions of DOCX files, then you should upgrade to version 5.0.5 immediately.

PowerGREP's built-in converter for DOC files (documents saved by Microsoft Word 2003 or prior) is now capable of converting DOCX files that have an incorrect .doc extension to plain text. Previously it would produce cryptic errors indicating the DOC file was unreadable. The reverse is not true. PowerGREP's DOCX converter will not open DOC files that have an incorrect .docx extension. It will say that the file is not a valid DOCX file. The reason is that PowerGREP's DOCX converter is capable of reading and writing DOCX files. But PowerGREP's converter for the legacy DOC format is read-only.

Many of the syntax coloring schemes used by PowerGREP's built-in editor highlight URLs and email addresses as links. Double-clicking such links now opens the link. Exactly what happens depends on the syntax coloring scheme. Links can open another file in PowerGREP's editor, open a file in its default application, or open an URL in a web browser. If you want to change how links are opened, you'll need to use the syntax coloring scheme editor to edit the syntax coloring scheme you're using. In the Syntax Coloring section in the Preferences you can specify which scheme should be used for which files.

The Excel converter was also improved, allowing PowerGREP to search through XLSX files that it previously failed to read.

Tips & Tricks: Display All Characters Correctly

This tips & tricks section is based on two conversations in EditPad Pro's user forum. It applies to EditPad Lite 7 and EditPad Pro 7. If you have any comments or questions, you're welcome to join the discussion on EditPad's forum. Look for the subjects "Issue displaying Unicode characters" and "Fonts and Symbols". These same tips can be used in AceText 3, PowerGREP 4 and 5, RegexBuddy 4, and RegexMagic 2. They all use the same text layout mechanism which you can configure in their preferences. You can access the forum if you have a licensed copy of any of these products. Select to show all groups on the Login screen when not using EditPad to see the EditPad forum.

For EditPad—or any other editor—to display text correctly it needs to use the correct encoding to know which characters to display and it needs to use a font that can actually display those characters. If EditPad displays your file with the wrong characters then you need to change the encoding. For example, an ASCII file misinterpreted as UTF-16 will show Chinese gibberish. To fix this, select Convert|Text Encoding in the menu, choose the "reinterpret" option, and then select the encoding that makes EditPad display the correct characters. EditPad supports all Unicode encodings and all legacy code pages that still have any relevance.

With the correct encoding selected, some characters may appear as hollow rectangles or as question marks inside a rectangle. These rectangles are placeholder symbols that indicate that the font cannot display these characters. If your file mixes different scripts then some scripts may be perfectly readable while others may look odd or be too hard to read. Some scripts like Thai that need more vertical space may look tiny next to English characters. This often happens when your font does not support one of the scripts and automatic font substitution is picking an inappropriate font.

Most fonts only support specific scripts. This allows the font to be optimized for those scripts. Styles that make a font beautiful or practical usually only work well for specific scripts. For example, monospaced fonts that programmers like only support alphabets that can be displayed nicely in a monospaced fashion. Universal fonts like Arial Unicode MS or Lucida Unicode tend to be bland.

If your file only uses one script (in addition to basic Latin) or if you're happy with a universal font, then you can simply select Options|Font in EditPad's menu and pick a font that works for you. A Thai font like Leelawadee, for example, will perfectly display a Thai document with appropriately sized Thai and Latin (English) characters.

Windows 7 and later ship with a wide range of fonts dedicated to specific scripts. You may not be able to select some of the fonts for more obscure scripts in the Options|Font dialog EditPad because Windows is hiding them. To make a font visible (or invisible) in the font dialog, use Windows Explorer to navigate to the C:\Windows\Fonts folder. Right-click a font and select Show (or Hide) to toggle its visibility. Fonts that are hidden are still available to applications that specifically request them. Hiding fonts you don't really need makes it easier to pick fonts that you do work with.

If your file uses multiple scripts or uses many special symbols then multiple fonts may be needed to display all characters correctly and readably. An application can achieve this in two ways.

One way is to rely on automatic font fallback performed by Windows. You select one font that works well for most of your file. Your application asks Windows to display all text with this file while allowing Windows to substitute different fonts for characters that your chosen font doesn't support. Unfortunately, this often doesn't work well or at all. Some characters may still be left as empty rectangles. Substituted fonts may use different sizing or spacing. Characters supported by both the original font and the substitute font, such as basic whitespace and punctuation, may sometimes be displayed with one font and sometimes with another depending on the font used for the surrounding characters.

The other way is for the application to select the correct fonts for all the characters that it wants to display. Word processors work this way as they allow the document to be formatted with as many different fonts as you like.

In EditPad Pro, the text layout configuration determines whether Windows or EditPad handles font fallback. You can configure this for the active file via the Options menu or as a default via the file type configuration. If you select a left-to-right only text layout configuration then EditPad lets you select a single font and allows Windows to perform automatic font fallback.

If you select a complex script text layout then EditPad lets you select a main font and as many fallback fonts as you like. EditPad displays all characters supported by the main font using that font. Other characters are displayed by the topmost font in your list of fallback fonts that supports them. So select a font for your file's main script as the main font. Add fonts specific to other scripts higher in the list of fallback fonts. Add universal fonts lower in the list. If your file contains a character not supported by any font then EditPad uses the main font to display it, which will result in an empty rectangle or a question mark in a rectangle. To avoid that, you could add the last resort font as the bottommost fallback font. This font contains generic glyphs for all characters, including invalid characters. The glyphs indicate Unicode block and script names, which makes it easier to figure out which script you're missing in your list of fallback fonts.

EditPad's font fallback works at the Unicode block level. EditPad assumes that if a font supports some characters in a given Unicode block then it must support all characters in that block. This assumption is generally true. It improves performance and avoids mixing too many different fonts. If you have a font that only supports some characters from a certain Unicode block then you need to place it below another font that does fully support that block in the list of fallback fonts to avoid empty rectangles.

 

That's it for this month. Thank you for using our software, and see you next month!

Kind regards,
Jan Goyvaerts

Subscribe to The Just Great Software Newsletter

Email

Please enter your email address below if you wish to receive the Just Great Software Newsletter via email once a month. If you have previously subscribed but your email address has changed, please enter both your old and new email address so we can properly update our database.

Your email address:

If you are resubscribing because your email address has changed, please provide your old address here:

RSS Feed

If you use a news reader, subscribe to the Just Great Software RSS news feed to read newsletter articles as we write them, rather than once per month. You can expect a couple to a handful of articles each month.