Где находится декодированный текст в выводе pocketsphinx?

1605
mdam

Я хочу конвертировать .wav файл в текст, используя плату Intel Edison. Я следовал за этим потоком и использовал команду pocketsphinx_continuous -infile, как предложено в потоке. Это дает длинный вывод CLI. Не уверен, как извлечь текст из этого. Кто-нибудь может помочь?

root@edison:/# pocketsphinx_continuous -infile /usr/share/sounds/alsa/Front_Right.wav INFO: cmd_ln.c(691): Parsing command line: pocketsphinx_continuous \ -infile /usr/share/sounds/alsa/Front_Right.wav  Current configuration: [NAME] [DEFLT] [VALUE] -adcdev -agc none none -agcthresh 2.0 2.000000e+00 -alpha 0.97 9.700000e-01 -argfile -ascale 20.0 2.000000e+01 -aw 1 1 -backtrace no no -beam 1e-48 1.000000e-48 -bestpath yes yes -bestpathlw 9.5 9.500000e+00 -bghist no no -ceplen 13 13 -cmn current current -cmninit 8.0 8.0 -compallsen no no -debug 0 -dict -dictcase no no -dither no no -doublebw no no -ds 1 1 -fdict -feat 1s_c_d_dd 1s_c_d_dd -featparams -fillprob 1e-8 1.000000e-08 -frate 100 100 -fsg -fsgusealtpron yes yes -fsgusefiller yes yes -fwdflat yes yes -fwdflatbeam 1e-64 1.000000e-64 -fwdflatefwid 4 4 -fwdflatlw 8.5 8.500000e+00 -fwdflatsfwin 25 25 -fwdflatwbeam 7e-29 7.000000e-29 -fwdtree yes yes -hmm -infile /usr/share/sounds/alsa/Front_Right.wav -input_endian little little -jsgf -kdmaxbbi -1 -1 -kdmaxdepth 0 0 -kdtree -latsize 5000 5000 -lda -ldadim 0 0 -lextreedump 0 0 -lifter 0 0 -lm -lmctl -lmname default default -logbase 1.0001 1.000100e+00 -logfn -logspec no no -lowerf 133.33334 1.333333e+02 -lpbeam 1e-40 1.000000e-40 -lponlybeam 7e-29 7.000000e-29 -lw 6.5 6.500000e+00 -maxhmmpf -1 -1 -maxnewoov 20 20 -maxwpf -1 -1 -mdef -mean -mfclogdir -min_endfr 0 0 -mixw -mixwfloor 0.0000001 1.000000e-07 -mllr -mmap yes yes -ncep 13 13 -nfft 512 512 -nfilt 40 40 -nwpen 1.0 1.000000e+00 -pbeam 1e-48 1.000000e-48 -pip 1.0 1.000000e+00 -pl_beam 1e-10 1.000000e-10 -pl_pbeam 1e-5 1.000000e-05 -pl_window 0 0 -rawlogdir -remove_dc no no -round_filters yes yes -samprate 16000 1.600000e+04 -seed -1 -1 -sendump -senlogdir -senmgau -silprob 0.005 5.000000e-03 -smoothspec no no -svspec -time no no -tmat -tmatfloor 0.0001 1.000000e-04 -topn 4 4 -topn_beam 0 0 -toprule -transform legacy legacy -unit_area yes yes -upperf 6855.4976 6.855498e+03 -usewdphones no no -uw 1.0 1.000000e+00 -var -varfloor 0.0001 1.000000e-04 -varnorm no no -verbose no no -warp_params -warp_type inverse_linear inverse_linear -wbeam 7e-29 7.000000e-29 -wip 0.65 6.500000e-01 -wlen 0.025625 2.562500e-02  INFO: cmd_ln.c(691): Parsing command line: \ -nfilt 20 \ -lowerf 1 \ -upperf 4000 \ -wlen 0.025 \ -transform dct \ -round_filters no \ -remove_dc yes \ -svspec 0-12/13-25/26-38 \ -feat 1s_c_d_dd \ -agc none \ -cmn current \ -cmninit 56,-3,1 \ -varnorm no  Current configuration: [NAME] [DEFLT] [VALUE] -agc none none -agcthresh 2.0 2.000000e+00 -alpha 0.97 9.700000e-01 -ceplen 13 13 -cmn current current -cmninit 8.0 56,-3,1 -dither no no -doublebw no no -feat 1s_c_d_dd 1s_c_d_dd -frate 100 100 -input_endian little little -lda -ldadim 0 0 -lifter 0 0 -logspec no no -lowerf 133.33334 1.000000e+00 -ncep 13 13 -nfft 512 512 -nfilt 40 20 -remove_dc no yes -round_filters yes no -samprate 16000 1.600000e+04 -seed -1 -1 -smoothspec no no -svspec 0-12/13-25/26-38 -transform legacy dct -unit_area yes yes -upperf 6855.4976 4.000000e+03 -varnorm no no -verbose no no -warp_params -warp_type inverse_linear inverse_linear -wlen 0.025625 2.500000e-02  INFO: acmod.c(246): Parsed model-specific feature parameters from /usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/feat.params INFO: feat.c(713): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='current', VARNORM='no', AGC='none' INFO: cmn.c(142): mean[0]= 12.00, mean[1..12]= 0.0 INFO: acmod.c(167): Using subvector specification 0-12/13-25/26-38 INFO: mdef.c(517): Reading model definition: /usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/mdef INFO: mdef.c(528): Found byte-order mark BMDF, assuming this is a binary mdef file INFO: bin_mdef.c(336): Reading binary model definition: /usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/mdef INFO: bin_mdef.c(513): 50 CI-phone, 143047 CD-phone, 3 emitstate/phone, 150 CI-sen, 5150 Sen, 27135 Sen-Seq INFO: tmat.c(205): Reading HMM transition probability matrices: /usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/transition_matrices INFO: acmod.c(121): Attempting to use SCHMM computation module INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/means INFO: ms_gauden.c(292): 1 codebook, 3 feature, size: INFO: ms_gauden.c(294): 256x13 INFO: ms_gauden.c(294): 256x13 INFO: ms_gauden.c(294): 256x13 INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/variances INFO: ms_gauden.c(292): 1 codebook, 3 feature, size: INFO: ms_gauden.c(294): 256x13 INFO: ms_gauden.c(294): 256x13 INFO: ms_gauden.c(294): 256x13 INFO: ms_gauden.c(354): 0 variance values floored INFO: s2_semi_mgau.c(903): Loading senones from dump file /usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/sendump INFO: s2_semi_mgau.c(927): BEGIN FILE FORMAT DESCRIPTION INFO: s2_semi_mgau.c(1022): Using memory-mapped I/O for senones INFO: s2_semi_mgau.c(1296): Maximum top-N: 4 Top-N beams: 0 0 0 INFO: dict.c(317): Allocating 137543 * 20 bytes (2686 KiB) for word entries INFO: dict.c(332): Reading main dictionary: /usr/local/share/pocketsphinx/model/lm/en_US/cmu07a.dic INFO: dict.c(211): Allocated 1010 KiB for strings, 1664 KiB for phones INFO: dict.c(335): 133436 words read INFO: dict.c(341): Reading filler dictionary: /usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/noisedict INFO: dict.c(211): Allocated 0 KiB for strings, 0 KiB for phones INFO: dict.c(344): 11 words read INFO: dict2pid.c(396): Building PID tables for dictionary INFO: dict2pid.c(404): Allocating 50^3 * 2 bytes (244 KiB) for word-initial triphones INFO: dict2pid.c(131): Allocated 30200 bytes (29 KiB) for word-final triphones INFO: dict2pid.c(195): Allocated 30200 bytes (29 KiB) for single-phone word triphones INFO: ngram_model_arpa.c(77): No \data\ mark in LM file INFO: ngram_model_dmp.c(142): Will use memory-mapped I/O for LM file INFO: ngram_model_dmp.c(196): ngrams 1=5001, 2=436879, 3=418286 INFO: ngram_model_dmp.c(242): 5001 = LM.unigrams(+trailer) read INFO: ngram_model_dmp.c(288): 436879 = LM.bigrams(+trailer) read INFO: ngram_model_dmp.c(314): 418286 = LM.trigrams read INFO: ngram_model_dmp.c(339): 37293 = LM.prob2 entries read INFO: ngram_model_dmp.c(359): 14370 = LM.bo_wt2 entries read INFO: ngram_model_dmp.c(379): 36094 = LM.prob3 entries read INFO: ngram_model_dmp.c(407): 854 = LM.tseg_base entries read INFO: ngram_model_dmp.c(463): 5001 = ascii word strings read INFO: ngram_search_fwdtree.c(99): 788 unique initial diphones INFO: ngram_search_fwdtree.c(147): 0 root, 0 non-root channels, 60 single-phone words INFO: ngram_search_fwdtree.c(186): Creating search tree INFO: ngram_search_fwdtree.c(191): before: 0 root, 0 non-root channels, 60 single-phone words INFO: ngram_search_fwdtree.c(326): after: max nonroot chan increased to 13428 INFO: ngram_search_fwdtree.c(338): after: 457 root, 13300 non-root channels, 26 single-phone words INFO: ngram_search_fwdflat.c(156): fwdflat: min_ef_width = 4, max_sf_win = 25 INFO: continuous.c(371): pocketsphinx_continuous COMPILED ON: May 11 2016, AT: 01:08:03  INFO: ngram_search.c(474): Resized backpointer table to 10000 entries INFO: ngram_search.c(482): Resized score stack to 200000 entries INFO: cmn_prior.c(121): cmn_prior_update: from < 56.00 -3.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > INFO: cmn_prior.c(139): cmn_prior_update: to < 40.70 3.65 2.47 -0.18 1.26 0.52 0.85 0.40 -0.07 0.56 0.30 0.10 0.59 > INFO: ngram_search_fwdtree.c(1549): 6629 words recognized (25/fr) INFO: ngram_search_fwdtree.c(1551): 960065 senones evaluated (3609/fr) INFO: ngram_search_fwdtree.c(1553): 1491379 channels searched (5606/fr), 119734 1st, 172330 last INFO: ngram_search_fwdtree.c(1557): 12770 words for which last channels evaluated (48/fr) INFO: ngram_search_fwdtree.c(1560): 165129 candidate words for entering last phone (620/fr) INFO: ngram_search_fwdtree.c(1562): fwdtree 4.05 CPU 1.523 xRT INFO: ngram_search_fwdtree.c(1565): fwdtree 4.10 wall 1.541 xRT INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 146 words INFO: ngram_search_fwdflat.c(937): 3683 words recognized (14/fr) INFO: ngram_search_fwdflat.c(939): 249390 senones evaluated (938/fr) INFO: ngram_search_fwdflat.c(941): 324546 channels searched (1220/fr) INFO: ngram_search_fwdflat.c(943): 16896 words searched (63/fr) INFO: ngram_search_fwdflat.c(945): 9422 word transitions (35/fr) INFO: ngram_search_fwdflat.c(948): fwdflat 0.55 CPU 0.207 xRT INFO: ngram_search_fwdflat.c(951): fwdflat 0.56 wall 0.211 xRT INFO: ngram_search.c(1214): </s> not found in last frame, using <sil>.264 instead INFO: ngram_search.c(1266): lattice start node <s>.0 end node <sil>.236 INFO: ngram_search.c(1294): Eliminated 17 nodes before end node INFO: ngram_search.c(1399): Lattice has 317 nodes, 715 links INFO: ps_lattice.c(1365): Normalizer P(O) = alpha(<sil>:236:264) = -1833242 INFO: ps_lattice.c(1403): Joint P(O,S) = -1847205 P(S|O) = -13963 INFO: ngram_search.c(888): bestpath 0.05 CPU 0.019 xRT INFO: ngram_search.c(891): bestpath 0.06 wall 0.021 xRT 000000000: who do INFO: ngram_search_fwdtree.c(430): TOTAL fwdtree 4.05 CPU 1.528 xRT INFO: ngram_search_fwdtree.c(433): TOTAL fwdtree 4.10 wall 1.547 xRT INFO: ngram_search_fwdflat.c(174): TOTAL fwdflat 0.55 CPU 0.208 xRT INFO: ngram_search_fwdflat.c(177): TOTAL fwdflat 0.56 wall 0.212 xRT INFO: ngram_search.c(317): TOTAL bestpath 0.05 CPU 0.019 xRT INFO: ngram_search.c(320): TOTAL bestpath 0.06 wall 0.021 xRT 
1

1 ответ на вопрос

3
Nikolay Shmyrev

To disable debug output of pocketsphinx, add an option -logfn /dev/null, then pocketsphinx will print only the decoded text, in your case it will print

 000000000: who do 

Похожие вопросы