GoldenDict++ OCR Setup「划词及语言设置」©🚩🌱

English is supported by default. If you need OCR supporting other languages, extra language-pack of the ocr-engine is needed and the data-path which is used by ocr-engine for loading language data should be reset to your language-pack dir — download and unpack the following package, open GoldenDIct’s Preference dialog and swich to OCR Popup, select an OCR-Engine and set the data path by clicking the button next to the Engines’ List, then select module language(s) for the engine 「如需OCR支持其它语言,请下载下列OCR支持库,解压后到首选项对话框的划词页,针对OCR引擎重设其识别库目录后选择需要识别的语言」

All the packages are available at Netdisk 「上述数据包可以从网盘下载」

OCR and ScreenCapture Plugins

划词插件插件存放在运行目录下的gdp文件夹内(名称以gdp.gscgdp.ocr开始的文件),Tesseract引擎使用的默认数据存放在运行目录下的tessdata文件夹内,Nicomsoft引擎使用的默认数据存放在运行目录下的nsocr文件夹内,划词引擎及数据按需加载但非GoldenDict++版运行的必需组件 — 在不启用划词时程序并不加载划词相关的功能模块(也即不会多占内存和其它硬件资源)。

NameFile NamePlatformRatingsRemarks
MacVisiongdp.ocr.macvision.*macOS ≥10.15*****Apple’s Vision framework, preferred and recommended on macOS Big Sur or Monterey
WinRTgdp.ocr.winrtocr.*Windows ≥8*****Windows.Media.Ocr, preferred and recommended on Windows
Tesseractgdp.ocr.tesseract.*All*****With the power of Tesseract, hundreds of languages are supported. Preferred and recommended
WeChatgdp.ocr.wechatocr.*Windows/Linux x64****微信文字识别 Automatically installed with WeChat x64
TextIngdp.ocr.textin.*All*****合合 TextIn 智能文字识别 textin.com/document/display_type_2
XfYungdp.ocr.xfyun.*All*****科大讯飞文字识别 xfyun.cn/doc/words/universal_character_recognition
Youdaogdp.ocr.youdao.*All*****有道智云通用文字识别 ai.youdao.com/DOCSIRMA/html/ocr
Baidugdp.ocr.baidu.*All*****百度大脑通用文字识别 ai.baidu.com/ai-doc/OCR
Tencentgdp.ocr.tencent.*All*****腾讯云文字识别 cloud.tencent.com/document/product/866
Googlegdp.ocr.google.*All***NOT Tested;developers.google.cn/codelabs
Nicomsoftgdp.ocr.nicomsoft.*Windows***Nicomsoft OCR is no longer officially maintained or updated
WidgetMaskgdp.gsc.winmask.*Windows/Linux*****Perfect graber supports taking dynamic shot on multi-screens. Preferred on Windows and Linux
ExternalToolsgdp.gsc.fromcliboard.*All*****Screen graber using external tools. Preferred on macOS and Linux
QCameragdp.gsc.qtcamera.*All***Camera image capture using QCamera.

gdocr_config.png

(TextIn合合|XfYun科大讯飞|Youdao有道智云|Tencent腾讯云|Baidu百度云)OCR

需要连接到互联网才能使用云OCR功能,且在使用前需要先在各云平台申请OCR接口参数(应用ID访问密钥等)并填写到对应的\*OCR.api文件中去。

注意:使用这些OCR接口可能需要向云厂商预付或后付费用,计量单位为调用接口次数。划词属于单次少量文字识别场景,所以请酌情使用!

Apple’s Vision OCR

引擎支持的语言由Apple公司新版本的macOSiOS系统附带(自带,无需额外安装):

macos_vision_ocr.png

WinRT OCR / Windows.Media.Ocr

引擎支持的语言由Windows系统提供,可在系统的设置项中安装额外的支持语言: gdocr_config.png

gdocr_config.png

Tesseract OCR

By default GD++ comes packaged with the following languages: English, Chinese Simplified, and Chinese-Traditional (GD++发行包中默认携带了英文简体中文繁体中文tessdata数据包).

Follow these steps if you would like to install additional OCR languages (参考以下步骤安装额外的语言数据包):

  1. Download the appropriate OCR language dictionary (下载您需要的识别语言的数据包).
  2. Open the “.zip” file you just downloaded with 7-Zip or similar decompression software (用解压缩软件打开已下载的压缩包).
  3. Drag all files contained within the zip file to the tessdata folder (从解压缩软件的文件列表中拖拽所有的文件到GD++部署目录下的tessdata文件夹内):
  4. Re-select module language(s) for the engine (在GD++中重新为该引擎配置识别语言).

The following OCR languages are supported(全量tessdata数据包支持的语言):

Chinese SimplifiedChinese-Simplified (vertical)Chinese-Traditional
AfrikaansIrishNorwegian
AmharicGalicianOccitan(post1500)
ArabicGreek, Ancient(to1453)Oriya
AssameseGujaratiPanjabi;Punjabi
AzerbaijaniHaitian; HaitianCreolePolish
Azerbaijani-CyrilicHebrewPortuguese
BelarusianHindiPushto;Pashto
BengaliCroatianQuechua
TibetanHungarianRomanian; Moldavian; Moldovan
BosnianArmenianRussian
BretonInuktitutSanskrit
BulgarianIndonesianSinhala;Sinhalese
Catalan;ValencianIcelandicSlovak
CebuanoItalianSlovak-Fraktur
CzechItalian-OldSlovenian
JavaneseSindhiJapanese(vertical)
Spanish; CastilianJapaneseSpanish; Castilian-Old
Chinese-Traditional (vertical)KannadaAlbanian
CherokeeGeorgianSerbian
CorsicanGeorgian-OldSerbian-Latin
WelshKazakhSundanese
DanishCentralKhmerSwahili
Danish-FrakturKirghiz; KyrgyzSwedish
GermanKurmanji (Kurdish-LatinScript)Syriac
German-FrakturKoreanTamil
Dhivehi; Divehi; MaldivianKorean(vertical)Tatar
DzongkhaKurdish(ArabicScript)Telugu
Greek, Modern(1453-)Kurdish(ArabicScript)Tajik
EnglishLaoTagalog
English, Middle(1100-1500)LatinThai
EsperantoLatvianTigrinya
MathandequationsLithuanianTonga
EstonianLuxembourgishTurkish
BasqueMalayalamUighur;Uyghur
FaroeseMarathiUkrainian
PersianMacedonianUrdu
Filipino;PilipinoMalteseUzbek
FinnishMongolianUzbek-Cyrilic
FrenchMaoriVietnamese
German-FrakturMalayYiddish
French, Middle(ca.1400-1600)BurmeseYoruba
WesternFrisianNepaliDutch;Flemish
ScottishGaelic;Gaelic

Nicomsoft OCR

该引擎预设的配置参数存在于nsocr目录下的Config.dat文件中,可使用文本编辑器修改,参数信息请参考官方faqhelp文档。

Chinese SimplifiedChinese TraditionalEnglishEstonian
BulgarianHungarianSlovakFinnish
CatalanIndonesianSlovenianFrench
CroatianItalianSpanishGerman
CzechLatvianSwedishRomanian
DanishLithuanianTurkishRussian
DutchNorwegianArabicJapanese
PolishPortugueseKorean

进阶

参考GoldenDict++插件接口定义一文可以开发自己的划屏OCR引擎。