GoldenDict++ OCR Setup「划词及语言设置」©🚩🌱

English is supported by default. If you need OCR supporting other languages, extra language-pack of the ocr-engine is needed and the data-path which is used by ocr-engine for loading language data should be reset to your language-pack dir — download and unpack the following package, open GoldenDIct’s Preference dialog and swich to OCR Popup, select an OCR-Engine and set the data path by clicking the button next to the Engines’ List, then select module language(s) for the engine 「如需OCR支持其它语言,请下载下列OCR支持库,解压后到首选项对话框的划词页,针对OCR引擎重设其识别库目录后选择需要识别的语言」

All the packages are available at Netdisk 「上述数据包可以从网盘下载」

OCR and ScreenCapture Plugins

划词插件插件存放在运行目录下的gdp文件夹内(名称以gdp.gscgdp.ocr开始的文件),Tesseract引擎使用的默认数据存放在运行目录下的tessdata文件夹内,Nicomsoft引擎使用的默认数据存放在运行目录下的nsocr文件夹内,划词引擎及数据按需加载但非GoldenDict++版运行的必需组件 — 在不启用划词时程序并不加载划词相关的功能模块(也即不会多占内存和其它硬件资源)。

NameFile NamePlatformRatingsRemarks
MacVisiongdp.ocr.macvision.*macOS 10.15~13.5*****Apple’s Vision framework, preferred and recommended on macOS Big Sur or Monterey
WinRT Ocrgdp.ocr.winrtocr.*Windows 8/10/11*****Windows.Media.Ocr, preferred and recommended on Windows
Tesseractgdp.ocr.tesseract.*All*****With the power of Tesseract, hundreds of languages are supported. Preferred and recommended
WeChatOCRgdp.ocr.wechatocr.*Windows x64****Automatically installed with WeChat x64
Youdao OCRgdp.ocr.youdao.*All*****ai.youdao.com/DOCSIRMA/html/ocr
Baidu OCRgdp.ocr.baidu.*All*****ai.baidu.com/ai-doc/OCR
Tencent OCRgdp.ocr.tencent.*All*****cloud.tencent.com/document/product/866
Google OCRgdp.ocr.google.*All***NOT Tested;developers.google.cn/codelabs
Nicomsoftgdp.ocr.nicomsoft.*Windows***Nicomsoft OCR is no longer officially maintained or updated
winmaskgdp.gsc.winmask.*Windows/Linux*****Perfect graber supports taking dynamic shot on multi-screens. Preferred on Windows and Linux
fromcliboardgdp.gsc.fromcliboard.*All*****Screen graber using external tools. Preferred on macOS and Linux
qtcameragdp.gsc.qtcamera.*All***Camera image capture using QCamera.

gdocr_config.png

Apple’s Vision OCR

引擎支持的语言由Apple公司新版本的macOSiOS系统附带(自带,无需额外安装):

macos_vision_ocr.png

WinRT OCR / Windows.Media.Ocr

引擎支持的语言由Windows系统提供,可在系统的设置项中安装额外的支持语言: gdocr_config.png

gdocr_config.png

Tesseract OCR

By default GD++ comes packaged with the following languages: English, Chinese Simplified, and Chinese-Traditional (GD++发行包中默认携带了英文简体中文繁体中文tessdata数据包).

Follow these steps if you would like to install additional OCR languages (参考以下步骤安装额外的语言数据包):

  1. Download the appropriate OCR language dictionary (下载您需要的识别语言的数据包).
  2. Open the “.zip” file you just downloaded with 7-Zip or similar decompression software (用解压缩软件打开已下载的压缩包).
  3. Drag all files contained within the zip file to the tessdata folder (从解压缩软件的文件列表中拖拽所有的文件到GD++部署目录下的tessdata文件夹内):
  4. Re-select module language(s) for the engine (在GD++中重新为该引擎配置识别语言).

The following OCR languages are supported(全量tessdata数据包支持的语言):

Chinese SimplifiedChinese-Simplified (vertical)Chinese-Traditional
AfrikaansIrishNorwegian
AmharicGalicianOccitan(post1500)
ArabicGreek, Ancient(to1453)Oriya
AssameseGujaratiPanjabi;Punjabi
AzerbaijaniHaitian; HaitianCreolePolish
Azerbaijani-CyrilicHebrewPortuguese
BelarusianHindiPushto;Pashto
BengaliCroatianQuechua
TibetanHungarianRomanian; Moldavian; Moldovan
BosnianArmenianRussian
BretonInuktitutSanskrit
BulgarianIndonesianSinhala;Sinhalese
Catalan;ValencianIcelandicSlovak
CebuanoItalianSlovak-Fraktur
CzechItalian-OldSlovenian
JavaneseSindhiJapanese(vertical)
Spanish; CastilianJapaneseSpanish; Castilian-Old
Chinese-Traditional (vertical)KannadaAlbanian
CherokeeGeorgianSerbian
CorsicanGeorgian-OldSerbian-Latin
WelshKazakhSundanese
DanishCentralKhmerSwahili
Danish-FrakturKirghiz; KyrgyzSwedish
GermanKurmanji (Kurdish-LatinScript)Syriac
German-FrakturKoreanTamil
Dhivehi; Divehi; MaldivianKorean(vertical)Tatar
DzongkhaKurdish(ArabicScript)Telugu
Greek, Modern(1453-)Kurdish(ArabicScript)Tajik
EnglishLaoTagalog
English, Middle(1100-1500)LatinThai
EsperantoLatvianTigrinya
MathandequationsLithuanianTonga
EstonianLuxembourgishTurkish
BasqueMalayalamUighur;Uyghur
FaroeseMarathiUkrainian
PersianMacedonianUrdu
Filipino;PilipinoMalteseUzbek
FinnishMongolianUzbek-Cyrilic
FrenchMaoriVietnamese
German-FrakturMalayYiddish
French, Middle(ca.1400-1600)BurmeseYoruba
WesternFrisianNepaliDutch;Flemish
ScottishGaelic;Gaelic

Nicomsoft OCR

该引擎预设的配置参数存在于nsocr目录下的Config.dat文件中,可使用文本编辑器修改,参数信息请参考官方faqhelp文档。

Chinese SimplifiedChinese TraditionalEnglishEstonian
BulgarianHungarianSlovakFinnish
CatalanIndonesianSlovenianFrench
CroatianItalianSpanishGerman
CzechLatvianSwedishRomanian
DanishLithuanianTurkishRussian
DutchNorwegianArabicJapanese
PolishPortugueseKorean

进阶

参考GoldenDict++插件接口定义一文可以开发自己的划屏OCR引擎。