iRex Forum Forum Index iRex Forum

 
 FAQFAQ   SearchSearch   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Column mode patch
Goto page 1, 2  Next
 
Post new topic   Reply to topic    iRex Forum Forum Index -> iPDF Viewer
View previous topic :: View next topic  
Author Message
wpd



Joined: 23 Aug 2007
Posts: 10

PostPosted: Tue Oct 23, 2007 1:24 pm    Post subject: Column mode patch Reply with quote

These patches add a new column mode in addition to the page/contineous modes in portrait view to ipdf irex rev 25 / stock 2.11.
ipdf
ipdf-nologging
ipdf.patch
poppler.patch
ipdf-column.tgz

Please note that a pdf does not specify anything regarding textwords / lines / columns / flows. The poppler library, that is used by ipdf to render pdf files, has a textoutput device that makes use of a number of heuristics to determine words, lines and columns. This textoutput device is also used in the poppler tool pdf2text.

With a small interface adjustment in poppler (poppler.patch), ipdf can query poppler about rendered text columns (ipdf.patch).

Features:
- "column mode" icon is hidden under page/contineous mode when in portrait view
- Works best when zoomed in on a (set of) columns
- Every textbox with 2 lines or more, with the current rotation, is considered a column
- Short flipbar next/previous = move to next/previous column
- Long flipbar next/previous = move to next/previous page; first/last column
- Tap on column = move to tapped column
- Columns that are completely in view are skipped over when considering next/previous column. This allows you to read an entire portrait page before flipping, instead of flipping for each column individually.
- Current column is identified using a black rectangle; note that you can configure this away in the source. I left it in to allow ppl to see in what order columns are identified and processed.
- Extensive logging; try ipdf-nologging or configure the source to remove them.
- Column mode and current column are saved and loaded in manifests.
- Prerendering changes in column mode; only the previous and next page are prerendered. Pages are much larger at high zoomlevels and I experienced a lot of pagerender thrashing when using the default algorithm. The default algorithm is still used when navigating in page/contineous mode.
- Pages in high demand are less likely to be freed when the rendering wants to free up memory for new pages.

Bugfixes since 20071023:
- getColumnCount/getColumnRect did not use the same linecount comparison, which made columnback to previous page fail now and again (daudi)
- waitForPage calls renderPageNow, terminating current page render and making the switch to an unrendered next page quicker.

How to try:

Included are a ipdf and an ipdf-nologging. These arm binaries have the modified poppler library linked in statically, i.e. you can give one of them a go by simply copying it to /usr/bin/ipdf.

When using ipdf, try

/sbin/syslogd -s 500 -b 0 -O log

and take a look at the file 'log' to see what it is doing.

Ps. the static linking clobbers a securityhandler in poppler. My guess is that password protected pdfs will fail with this ipdf. This goes away if you compile from the source (ipdf-column.tgz) and copy both the new poppler library and the new ipdf to your iliad.


Last edited by wpd on Sun Oct 28, 2007 10:25 pm; edited 1 time in total
Back to top
View user's profile Send private message
jharker



Joined: 25 Apr 2007
Posts: 281
Location: Rochester, NY, USA

PostPosted: Tue Oct 23, 2007 5:07 pm    Post subject: Reply with quote

Wow. This is phenomenal. Fantastic, fantastic work. I'll try it later today, and report back.
Back to top
View user's profile Send private message
daudi



Joined: 12 Aug 2007
Posts: 237
Location: Newcastle upon Tyne,UK

PostPosted: Tue Oct 23, 2007 5:54 pm    Post subject: Reply with quote

OK, this is very, very nice! I have just had a quick play with it and I can't wait to get stuck into some heavy reading tomorrow.

I am now playing with 3 versions of ipdf so I have modified the installer that was posted a while ago that switched between two versions of ipdf so that I now have a directory of installers and can switch between versions at will. I don't yet keep track of which version is installed, but it would not take much to add something (just a little time which I don't have just now).

In case this is useful for anyone else I have posted it here:

http://davepublic.pbwiki.com/f/ipdf_installers.zip (500Kb)
_________________
My thoughts as I considered whether or not to buy an Iliad and things I've learned after getting one: http://davepublic.pbwiki.com/Great+expectations
Back to top
View user's profile Send private message
daudi



Joined: 12 Aug 2007
Posts: 237
Location: Newcastle upon Tyne,UK

PostPosted: Tue Oct 23, 2007 8:44 pm    Post subject: Reply with quote

I made a mistake when I said this was nice. This is not just nice, this is wonderful! I have tried it out on several journal articles and the guardian PDF newspaper. This is a major feature for me. Some times the flip bar takes me to the wrong place (e.g. if the left column starts with a graphic so that the right column starts higher up the page) but that is easy to deal with using the stylus to select the correct column section.

Awesome. Totally awesome. Thank you very much wpd!
_________________
My thoughts as I considered whether or not to buy an Iliad and things I've learned after getting one: http://davepublic.pbwiki.com/Great+expectations
Back to top
View user's profile Send private message
wpd



Joined: 23 Aug 2007
Posts: 10

PostPosted: Thu Oct 25, 2007 3:35 pm    Post subject: Reply with quote

@daudi - Glad you like it; from the forums I gathered that you and I had the same "column itch" Smile

I think there is at least one bug in there still; sometimes ipdf does not close when I want it too, and I guess it has got sth to do with waiting for a page to be rendered, while the renderer has gone away. A temporary solution = hitting the up or/and system key a couple of times does make it go away eventually.

Anyway - any bugs + descriptions on how to hit them are much appreciated.
Back to top
View user's profile Send private message
emkay



Joined: 03 Aug 2006
Posts: 74
Location: London UK

PostPosted: Thu Oct 25, 2007 4:55 pm    Post subject: Reply with quote

Great work - I've been spending quite a lot of time reformatting my two-page-on-view pdfs. I'll definitely be checking this out.
Thanks!
Back to top
View user's profile Send private message
jharker



Joined: 25 Apr 2007
Posts: 281
Location: Rochester, NY, USA

PostPosted: Sat Oct 27, 2007 8:09 am    Post subject: Reply with quote

I finally had a chance to try this out on some journal articles, and I have to say, it's very very nice. Excellent work, very slick.

I have one bug and one idea for improvement: the bug is, if there's more than one box per column, it doesn't always get them in the right order. Sometimes a box in the second column occurs before a box in the first column. It seems like the way to fix that would be to simply order columns from left to right and top to bottom...? Or maybe it's more complicated than that?

Also, you could set it to auto-zoom to fit the width of the current column. Currently I have a problem with abstracts, since when I zoom to a good level for a column width, the abstract often becomes wider than the screen. Of course, if columns have comparable widths it could skip the zoom to save time.

Anyway, really excellent work! Thanks a lot!
Back to top
View user's profile Send private message
daudi



Joined: 12 Aug 2007
Posts: 237
Location: Newcastle upon Tyne,UK

PostPosted: Sat Oct 27, 2007 9:04 am    Post subject: Reply with quote

I've been using this intensively for a few days now and if I had to choose between 20 hours of battery life or this feature I'd choose this in an instant.

I think it sometimes gets it wrong with respect to the first box on the page when the first box in the right-hand column starts higher up the page than the first box on the left column, This is easy to fix using the stylus assuming you have a bit of the adjacent column visible. Which brings me to:

jharker wrote:
Also, you could set it to auto-zoom to fit the width of the current column. Currently I have a problem with abstracts, since when I zoom to a good level for a column width, the abstract often becomes wider than the screen. Of course, if columns have comparable widths it could skip the zoom to save time.

I am not so sure that always automatically resizing to the current column width would be what I'd want. Sometimes the column can be very narrow indeed (e.g. the telegraph newspaper pdf that I read to test how well it handled 5 columns---it handled them very well). Full zoom would be silly in this case. Also if it does get it wrong having part of the neighbouring column visible means that it is easy to click in the correct column to fix things. Or if you want to skip ahead it would be a pain to have to use the flip bar several times to progress through multiple columns when it could be possible to use the pen to select an adjacent column. So I'd like to be able to tell it not to auto-zoom. Perhaps it could be a setting stored in the manifest?

I found a little bug that I noticed last night and need to test more to define it more precisely. I was reading the guardian top stories pdf and was on page 3. Using the flip bar to go back to page 2 took me to the first page instead. I'm not sure if it is something special about this particular PDF or any time I go back. I'll try to test it a little more to characterize it better.
Back to top
View user's profile Send private message
tribble



Joined: 04 Aug 2006
Posts: 645
Location: Bonn, Germany

PostPosted: Sat Oct 27, 2007 10:20 am    Post subject: Reply with quote

This is really awesome. Thank you very much.
_________________

www.justread.de - Die neue Art zu lesen
offizieller iRex Reseller in Deutschland
Back to top
View user's profile Send private message Visit poster's website
wpd



Joined: 23 Aug 2007
Posts: 10

PostPosted: Sat Oct 27, 2007 1:00 pm    Post subject: Reply with quote

jharker wrote:
I have one bug and one idea for improvement: the bug is, if there's more than one box per column, it doesn't always get them in the right order.


The problem is that there is no way to determine "the" correct column sequence using only the information that can be found in the pdf. For some pdfs we would want xy ordering, and for others yx (the latter being the default in poppler). We could allow users to specify column ordering using some kind of UI - any suggestions? (icon/gesture?)

jharker wrote:
Also, you could set it to auto-zoom to fit the width of the current column.


Agreed, but this should be settable also. I refrained from doing this now because a rerender of a page when zooms are changed can mean a long wait penalty (some set me back more than a minute Smile). I would like to tie this in with a renderer rewrite that allows multiple zooms of different pages to be cached. We could then analyse a new page for columns, determine the two? optimal zooms and schedule them for rendering.
Back to top
View user's profile Send private message
jharker



Joined: 25 Apr 2007
Posts: 281
Location: Rochester, NY, USA

PostPosted: Sat Oct 27, 2007 11:37 pm    Post subject: Reply with quote

Regarding auto-zoom... Honestly, I was thinking we'd use the zoom bug as a feature: since there's an upper limit on the max zoom, zooming to fit a column width wouldn't actually zoom all the way: in most cases you'd see the stuff around the column too. But I agree that's kind of a hack. Smile I think a better way would be to compare the widths of all the columns on a page and determine one or two zoom levels that would satisfy all the column widths. That way if two columns have similar widths, they would use the same zoom and this would save time.

I agree, in the long run if we re-write the pre-rendering stuff, we can include things like rendering multiple zoom levels and not have to worry about zooming delays.

I'm having a hard time thinking of an example where xy ordering makes sense. At least in English and Western languages, I would think that yx would be the preferred order at least 99% of the time... am I missing something?

But in any case, what I meant to say was that, assuming it's set to do yx by default right now, it seems a bit buggy.

Since we're getting into bug discussions now, I want to repeat that I really REALLY like this mod. Any critiques are aimed at making it even better than it already is! Very Happy
Back to top
View user's profile Send private message
wpd



Joined: 23 Aug 2007
Posts: 10

PostPosted: Sun Oct 28, 2007 10:29 pm    Post subject: Reply with quote

daudi wrote:
I found a little bug that I noticed last night and need to test more to define it more precisely. I was reading the guardian top stories pdf and was on page 3. Using the flip bar to go back to page 2 took me to the first page instead. I'm not sure if it is something special about this particular PDF or any time I go back. I'll try to test it a little more to characterize it better.


Thanks for including the pdf in which this happened - turns out I was using different compares for linecount when counting columns vs traversing columns. Thread top has a new version.
Back to top
View user's profile Send private message
wpd



Joined: 23 Aug 2007
Posts: 10

PostPosted: Sun Oct 28, 2007 10:54 pm    Post subject: Reply with quote

jharker wrote:
Regarding auto-zoom... Honestly, I was thinking we'd use the zoom bug as a feature: since there's an upper limit on the max zoom, zooming to fit a column width wouldn't actually zoom all the way: in most cases you'd see the stuff around the column too.


Good idea. We won't be able to do anything about maxzoom anyway; it is there to protect us from zooming in to where we fill up all of the ram in a single page render.

jharker wrote:
I'm having a hard time thinking of an example where xy ordering makes sense.


I'm having a hard time too now that I think about it Smile

For 4 disjunct columns
a b
c d

xy would mean
1 3
| / |
2 4

yx would mean
1-2
/
3-4

If you're not taking other pdf objects (like horizontal bars separating different articles within, say a magazine) into account, both setups can be used, right?

jharker wrote:
But in any case, what I meant to say was that, assuming it's set to do yx by default right now, it seems a bit buggy.


I had grand plans for implementing all kinds of heuristics, but noticed that poppler already does text flow guessing. Right now, that is what we are using Smile. We could build in some more column dumping to let us see why a particular column gets stuck in a particular place in a flow and try to take it from there.
Back to top
View user's profile Send private message
daudi



Joined: 12 Aug 2007
Posts: 237
Location: Newcastle upon Tyne,UK

PostPosted: Wed Oct 31, 2007 1:45 pm    Post subject: Reply with quote

Hi wpd,

I forgot to thank you for the update that made the back-flip work with the guardian, so thanks!

I now have another PDF that is misbehaving. I don't know if there is a problem with the PDF or if there is something about it that might help improve your algorithm. Are you able to access this pdf? Just looking at it it looks like it should be straight-forward enough to determine where the next text block is but for some reason it gets confused when forward- or back-flipping in a number of places. For example, just go to page 2, view it at full size so you see the whole page, select column mode, and flip-forward to begin your magical mystery tour of the article Smile
_________________
My thoughts as I considered whether or not to buy an Iliad and things I've learned after getting one: http://davepublic.pbwiki.com/Great+expectations
Back to top
View user's profile Send private message
daudi



Joined: 12 Aug 2007
Posts: 237
Location: Newcastle upon Tyne,UK

PostPosted: Wed Oct 31, 2007 4:04 pm    Post subject: Reply with quote

Another article from the same journal (Diabetic Medicine) and again there are problems following the text flow. Entire blocks of text are skipped. It is not as bad as the one I linked to above, though.

Both articles seem to have been produced using acrobat distiller 5.0.5 for the Mac; one was PDF version 1,3, the othe PDF version 1.5. I've no idea if these details are important.
_________________
My thoughts as I considered whether or not to buy an Iliad and things I've learned after getting one: http://davepublic.pbwiki.com/Great+expectations
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    iRex Forum Forum Index -> iPDF Viewer All times are GMT + 1 Hour
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


You can find our complete guidelines here.
Powered by phpBB © 2001, 2005 phpBB Group