| View previous topic :: View next topic |
| Author |
Message |
wpd
Joined: 23 Aug 2007 Posts: 10
|
Posted: Tue Oct 23, 2007 1:24 pm Post subject: Column mode patch |
|
|
These patches add a new column mode in addition to the page/contineous modes in portrait view to ipdf irex rev 25 / stock 2.11.
ipdf
ipdf-nologging
ipdf.patch
poppler.patch
ipdf-column.tgz
Please note that a pdf does not specify anything regarding textwords / lines / columns / flows. The poppler library, that is used by ipdf to render pdf files, has a textoutput device that makes use of a number of heuristics to determine words, lines and columns. This textoutput device is also used in the poppler tool pdf2text.
With a small interface adjustment in poppler (poppler.patch), ipdf can query poppler about rendered text columns (ipdf.patch).
Features:
- "column mode" icon is hidden under page/contineous mode when in portrait view
- Works best when zoomed in on a (set of) columns
- Every textbox with 2 lines or more, with the current rotation, is considered a column
- Short flipbar next/previous = move to next/previous column
- Long flipbar next/previous = move to next/previous page; first/last column
- Tap on column = move to tapped column
- Columns that are completely in view are skipped over when considering next/previous column. This allows you to read an entire portrait page before flipping, instead of flipping for each column individually.
- Current column is identified using a black rectangle; note that you can configure this away in the source. I left it in to allow ppl to see in what order columns are identified and processed.
- Extensive logging; try ipdf-nologging or configure the source to remove them.
- Column mode and current column are saved and loaded in manifests.
- Prerendering changes in column mode; only the previous and next page are prerendered. Pages are much larger at high zoomlevels and I experienced a lot of pagerender thrashing when using the default algorithm. The default algorithm is still used when navigating in page/contineous mode.
- Pages in high demand are less likely to be freed when the rendering wants to free up memory for new pages.
Bugfixes since 20071023:
- getColumnCount/getColumnRect did not use the same linecount comparison, which made columnback to previous page fail now and again (daudi)
- waitForPage calls renderPageNow, terminating current page render and making the switch to an unrendered next page quicker.
How to try:
Included are a ipdf and an ipdf-nologging. These arm binaries have the modified poppler library linked in statically, i.e. you can give one of them a go by simply copying it to /usr/bin/ipdf.
When using ipdf, try
/sbin/syslogd -s 500 -b 0 -O log
and take a look at the file 'log' to see what it is doing.
Ps. the static linking clobbers a securityhandler in poppler. My guess is that password protected pdfs will fail with this ipdf. This goes away if you compile from the source (ipdf-column.tgz) and copy both the new poppler library and the new ipdf to your iliad.
Last edited by wpd on Sun Oct 28, 2007 10:25 pm; edited 1 time in total |
|
| Back to top |
|
 |
jharker
Joined: 25 Apr 2007 Posts: 281 Location: Rochester, NY, USA
|
Posted: Tue Oct 23, 2007 5:07 pm Post subject: |
|
|
| Wow. This is phenomenal. Fantastic, fantastic work. I'll try it later today, and report back. |
|
| Back to top |
|
 |
daudi
Joined: 12 Aug 2007 Posts: 237 Location: Newcastle upon Tyne,UK
|
Posted: Tue Oct 23, 2007 5:54 pm Post subject: |
|
|
OK, this is very, very nice! I have just had a quick play with it and I can't wait to get stuck into some heavy reading tomorrow.
I am now playing with 3 versions of ipdf so I have modified the installer that was posted a while ago that switched between two versions of ipdf so that I now have a directory of installers and can switch between versions at will. I don't yet keep track of which version is installed, but it would not take much to add something (just a little time which I don't have just now).
In case this is useful for anyone else I have posted it here:
http://davepublic.pbwiki.com/f/ipdf_installers.zip (500Kb) _________________ My thoughts as I considered whether or not to buy an Iliad and things I've learned after getting one: http://davepublic.pbwiki.com/Great+expectations |
|
| Back to top |
|
 |
daudi
Joined: 12 Aug 2007 Posts: 237 Location: Newcastle upon Tyne,UK
|
Posted: Tue Oct 23, 2007 8:44 pm Post subject: |
|
|
I made a mistake when I said this was nice. This is not just nice, this is wonderful! I have tried it out on several journal articles and the guardian PDF newspaper. This is a major feature for me. Some times the flip bar takes me to the wrong place (e.g. if the left column starts with a graphic so that the right column starts higher up the page) but that is easy to deal with using the stylus to select the correct column section.
Awesome. Totally awesome. Thank you very much wpd! _________________ My thoughts as I considered whether or not to buy an Iliad and things I've learned after getting one: http://davepublic.pbwiki.com/Great+expectations |
|
| Back to top |
|
 |
wpd
Joined: 23 Aug 2007 Posts: 10
|
Posted: Thu Oct 25, 2007 3:35 pm Post subject: |
|
|
@daudi - Glad you like it; from the forums I gathered that you and I had the same "column itch"
I think there is at least one bug in there still; sometimes ipdf does not close when I want it too, and I guess it has got sth to do with waiting for a page to be rendered, while the renderer has gone away. A temporary solution = hitting the up or/and system key a couple of times does make it go away eventually.
Anyway - any bugs + descriptions on how to hit them are much appreciated. |
|
| Back to top |
|
 |
emkay
Joined: 03 Aug 2006 Posts: 74 Location: London UK
|
Posted: Thu Oct 25, 2007 4:55 pm Post subject: |
|
|
Great work - I've been spending quite a lot of time reformatting my two-page-on-view pdfs. I'll definitely be checking this out.
Thanks! |
|
| Back to top |
|
 |
jharker
Joined: 25 Apr 2007 Posts: 281 Location: Rochester, NY, USA
|
Posted: Sat Oct 27, 2007 8:09 am Post subject: |
|
|
I finally had a chance to try this out on some journal articles, and I have to say, it's very very nice. Excellent work, very slick.
I have one bug and one idea for improvement: the bug is, if there's more than one box per column, it doesn't always get them in the right order. Sometimes a box in the second column occurs before a box in the first column. It seems like the way to fix that would be to simply order columns from left to right and top to bottom...? Or maybe it's more complicated than that?
Also, you could set it to auto-zoom to fit the width of the current column. Currently I have a problem with abstracts, since when I zoom to a good level for a column width, the abstract often becomes wider than the screen. Of course, if columns have comparable widths it could skip the zoom to save time.
Anyway, really excellent work! Thanks a lot! |
|
| Back to top |
|
 |
daudi
Joined: 12 Aug 2007 Posts: 237 Location: Newcastle upon Tyne,UK
|
Posted: Sat Oct 27, 2007 9:04 am Post subject: |
|
|
I've been using this intensively for a few days now and if I had to choose between 20 hours of battery life or this feature I'd choose this in an instant.
I think it sometimes gets it wrong with respect to the first box on the page when the first box in the right-hand column starts higher up the page than the first box on the left column, This is easy to fix using the stylus assuming you have a bit of the adjacent column visible. Which brings me to:
| jharker wrote: | | Also, you could set it to auto-zoom to fit the width of the current column. Currently I have a problem with abstracts, since when I zoom to a good level for a column width, the abstract often becomes wider than the screen. Of course, if columns have comparable widths it could skip the zoom to save time. |
I am not so sure that always automatically resizing to the current column width would be what I'd want. Sometimes the column can be very narrow indeed (e.g. the telegraph newspaper pdf that I read to test how well it handled 5 columns---it handled them very well). Full zoom would be silly in this case. Also if it does get it wrong having part of the neighbouring column visible means that it is easy to click in the correct column to fix things. Or if you want to skip ahead it would be a pain to have to use the flip bar several times to progress through multiple columns when it could be possible to use the pen to select an adjacent column. So I'd like to be able to tell it not to auto-zoom. Perhaps it could be a setting stored in the manifest?
I found a little bug that I noticed last night and need to test more to define it more precisely. I was reading the guardian top stories pdf and was on page 3. Using the flip bar to go back to page 2 took me to the first page instead. I'm not sure if it is something special about this particular PDF or any time I go back. I'll try to test it a little more to characterize it better. |
|
| Back to top |
|
 |
tribble

Joined: 04 Aug 2006 Posts: 645 Location: Bonn, Germany
|
|
| Back to top |
|
 |
wpd
Joined: 23 Aug 2007 Posts: 10
|
Posted: Sat Oct 27, 2007 1:00 pm Post subject: |
|
|
| jharker wrote: | | I have one bug and one idea for improvement: the bug is, if there's more than one box per column, it doesn't always get them in the right order. |
The problem is that there is no way to determine "the" correct column sequence using only the information that can be found in the pdf. For some pdfs we would want xy ordering, and for others yx (the latter being the default in poppler). We could allow users to specify column ordering using some kind of UI - any suggestions? (icon/gesture?)
| jharker wrote: | | Also, you could set it to auto-zoom to fit the width of the current column. |
Agreed, but this should be settable also. I refrained from doing this now because a rerender of a page when zooms are changed can mean a long wait penalty (some set me back more than a minute ). I would like to tie this in with a renderer rewrite that allows multiple zooms of different pages to be cached. We could then analyse a new page for columns, determine the two? optimal zooms and schedule them for rendering. |
|
| Back to top |
|
 |
jharker
Joined: 25 Apr 2007 Posts: 281 Location: Rochester, NY, USA
|
Posted: Sat Oct 27, 2007 11:37 pm Post subject: |
|
|
Regarding auto-zoom... Honestly, I was thinking we'd use the zoom bug as a feature: since there's an upper limit on the max zoom, zooming to fit a column width wouldn't actually zoom all the way: in most cases you'd see the stuff around the column too. But I agree that's kind of a hack. I think a better way would be to compare the widths of all the columns on a page and determine one or two zoom levels that would satisfy all the column widths. That way if two columns have similar widths, they would use the same zoom and this would save time.
I agree, in the long run if we re-write the pre-rendering stuff, we can include things like rendering multiple zoom levels and not have to worry about zooming delays.
I'm having a hard time thinking of an example where xy ordering makes sense. At least in English and Western languages, I would think that yx would be the preferred order at least 99% of the time... am I missing something?
But in any case, what I meant to say was that, assuming it's set to do yx by default right now, it seems a bit buggy.
Since we're getting into bug discussions now, I want to repeat that I really REALLY like this mod. Any critiques are aimed at making it even better than it already is!  |
|
| Back to top |
|
 |
wpd
Joined: 23 Aug 2007 Posts: 10
|
Posted: Sun Oct 28, 2007 10:29 pm Post subject: |
|
|
| daudi wrote: | | I found a little bug that I noticed last night and need to test more to define it more precisely. I was reading the guardian top stories pdf and was on page 3. Using the flip bar to go back to page 2 took me to the first page instead. I'm not sure if it is something special about this particular PDF or any time I go back. I'll try to test it a little more to characterize it better. |
Thanks for including the pdf in which this happened - turns out I was using different compares for linecount when counting columns vs traversing columns. Thread top has a new version. |
|
| Back to top |
|
 |
wpd
Joined: 23 Aug 2007 Posts: 10
|
Posted: Sun Oct 28, 2007 10:54 pm Post subject: |
|
|
| jharker wrote: | | Regarding auto-zoom... Honestly, I was thinking we'd use the zoom bug as a feature: since there's an upper limit on the max zoom, zooming to fit a column width wouldn't actually zoom all the way: in most cases you'd see the stuff around the column too. |
Good idea. We won't be able to do anything about maxzoom anyway; it is there to protect us from zooming in to where we fill up all of the ram in a single page render.
| jharker wrote: | | I'm having a hard time thinking of an example where xy ordering makes sense. |
I'm having a hard time too now that I think about it
For 4 disjunct columns
a b
c d
xy would mean
1 3
| / |
2 4
yx would mean
1-2
/
3-4
If you're not taking other pdf objects (like horizontal bars separating different articles within, say a magazine) into account, both setups can be used, right?
| jharker wrote: | | But in any case, what I meant to say was that, assuming it's set to do yx by default right now, it seems a bit buggy. |
I had grand plans for implementing all kinds of heuristics, but noticed that poppler already does text flow guessing. Right now, that is what we are using . We could build in some more column dumping to let us see why a particular column gets stuck in a particular place in a flow and try to take it from there. |
|
| Back to top |
|
 |
daudi
Joined: 12 Aug 2007 Posts: 237 Location: Newcastle upon Tyne,UK
|
Posted: Wed Oct 31, 2007 1:45 pm Post subject: |
|
|
Hi wpd,
I forgot to thank you for the update that made the back-flip work with the guardian, so thanks!
I now have another PDF that is misbehaving. I don't know if there is a problem with the PDF or if there is something about it that might help improve your algorithm. Are you able to access this pdf? Just looking at it it looks like it should be straight-forward enough to determine where the next text block is but for some reason it gets confused when forward- or back-flipping in a number of places. For example, just go to page 2, view it at full size so you see the whole page, select column mode, and flip-forward to begin your magical mystery tour of the article  _________________ My thoughts as I considered whether or not to buy an Iliad and things I've learned after getting one: http://davepublic.pbwiki.com/Great+expectations |
|
| Back to top |
|
 |
daudi
Joined: 12 Aug 2007 Posts: 237 Location: Newcastle upon Tyne,UK
|
Posted: Wed Oct 31, 2007 4:04 pm Post subject: |
|
|
Another article from the same journal (Diabetic Medicine) and again there are problems following the text flow. Entire blocks of text are skipped. It is not as bad as the one I linked to above, though.
Both articles seem to have been produced using acrobat distiller 5.0.5 for the Mac; one was PDF version 1,3, the othe PDF version 1.5. I've no idea if these details are important. _________________ My thoughts as I considered whether or not to buy an Iliad and things I've learned after getting one: http://davepublic.pbwiki.com/Great+expectations |
|
| Back to top |
|
 |
|