Как конвертировать djvu в pdf и сохранить оглавление, как это возможно?

Question

Как конвертировать djvu в pdf и сохранить оглавление, как это возможно?

3403

user1198559 2014-08-23 в 06:36

Я попробовал несколько онлайн и офлайн инструментов, но информация о содержании (TOC) не была сохранена во время преобразования.

Я хотел бы преобразовать финский словарь на 5000 страниц, который находится в формате djvu и имеет около 5000 записей TOC, структурированных иерархически для быстрого поиска слов.

Любая идея, как можно сохранить информацию об оглавлении во время преобразования DJVU в PDF?

8

2 ответа на вопрос

3

1

user3124688 2015-05-25 в 16:27

Основываясь на очень четкой схеме, приведенной выше пользователем @pyrocrasty (спасибо!), Я реализовал конвертер DJVU в PDF, который сохраняет как текст OCR, так и структуру закладок. Вы можете найти это здесь:

https://github.com/kcroker/dpsprep

Благодарности за данные OCR можно найти на @zetah на форумах Ubuntu!

У меня был файл DJVU с нечисловым текстом в полях номера страницы закладки, поэтому парсер их не читал. Я заменил `j.split ('#') [1]` на `(int (re.findall (r '\ d +', j.split ('#') [1]) [0]) + 1)` и это сработало отлично. Debian Джесси потребовалось: `sudo apt-get установить pdftk djvulibre-bin python-pip ruby ruby-dev libmagickwand-dev; sudo pip установить sexpdata; sudo gem установить iconv pdfbeads` 7 лет назад 0

Accepted Answer · 2015-05-15 20:51:28

update: user3124688 has coded up this process in the script dpsprep.

I don't know of any tools that will do the conversion for you. You certainly should be able to do it, but it might take a little work. I'll outline the basic process. You'll need the open source command line utilities pdftk and djvused (part of DjVuLibre). These are available from your package manager (GNU/Linux) or their websites (Windows, OS X).

step 1: convert the file text

First, use any tool to convert the DJVU file to a PDF (without bookmarks).

Suppose the files are called filename.djvu and filename.pdf.

step 2: extract DJVU outline

Next, output the DJVU outline data to a file, like this:

djvused "filename.djvu" -e 'print-outline' > bmarks.out

This is a file listing the DJVU documents bookmarks in a serialized tree format. In fact it's just a SEXPR, and can be easily parsed. The format is as follows:

file ::= (bookmarks <bookmark>*) bookmark ::= (name page <bookmark>*) name ::= "<character>*" page ::= "#<digit>+"

For example:

(bookmarks ("bmark1" "#1") ("bmark2" "#5" ("bmark2subbmark1" "#6") ("bmark2subbmark2" "#7")) ("bmark3" "#9" ...))

step 3: convert DJVU outline to PDF metadata format

Now, we need to convert these bookmarks into the format required by PDF metadata. This file has format:

file ::= <entry>* entry ::= BookmarkBegin BookmarkTitle: <title> BookmarkLevel: <number> BookmarkPageNumber: <number> title ::= <character>*

So our example would become:

 BookmarkBegin BookmarkTitle: bmark1 BookmarkLevel: 1 BookmarkPageNumber: 1 BookmarkBegin BookmarkTitle: bmark2 BookmarkLevel: 1 BookmarkPageNumber: 5 BookmarkBegin BookmarkTitle: bmark2subbmark1 BookmarkLevel: 2 BookmarkPageNumber: 6 BookmarkBegin BookmarkTitle: bmark2subbmark2 BookmarkLevel: 2 BookmarkPageNumber: 7 BookmarkBegin BookmarkTitle: bmark3 BookmarkLevel: 1 BookmarkPageNumber: 9

Basically, you just need to write a script to walk the SEXPR tree, keeping track of the level, and output the name, page number and level of each entry it comes to, in the correct format.

step 4: extract PDF metadata and splice in converted bookmarks

Once you've got the converted list, output the PDF metadata from your converted PDF file:
```
pdftk "filename.pdf" dump_data > pdfmetadata.out 
```
Now, open the file and find the line that begins: NumberOfPages:

insert the converted bookmarks after this line. Save the new file as pdfmetadata.in
step 5: create PDF with bookmarks

Now we can create a new PDF file incorporating this metadata:
```
pdftk "filename.pdf" update_info "pdfmetadata.in" output out.pdf 
```
The file out.pdf should be a copy of your PDF with the bookmarks imported from the DJVU file.

Как конвертировать djvu в pdf и сохранить оглавление, как это возможно?

2 ответа на вопрос

step 1: convert the file text

step 2: extract DJVU outline

step 3: convert DJVU outline to PDF metadata format

step 4: extract PDF metadata and splice in converted bookmarks

step 5: create PDF with bookmarks

Похожие вопросы