Creating a pdf file from web pages

I'm trying to create a pdf file of online forum posts. Using the browser print to pdf option works, but only up to a point in that the saved pdf file includes all the hidden text in the online original rather than just the text and images I see without clicking or hovering. Hope that makes sense.

How can I best save what I see?

Secondly, having hopefully achieved the above, can I combine pdf files? The forum I want to save consists of 30 threads, each with 3 or 4 pages. Saving as above means saving one page at a time which is OK, but would leave me with 100+ pdf files.

Any recommended prog or add on I could use? Ideally open source i.e. cheap or free?

Thanks!

Reply to
Graeme
Loading thread data ...

Does the forum have a separate 'print view' mode? Or could you use the Stylus browser extension to rewrite the page style sheet to change the quoted text to 0pt white?

PDFSAM and others will concatenate or merge PDFs into one file

formatting link

Owain

Reply to
spuorgelgoog

I've given up with "printing" from web pages and copy/paste into a Word Processor then print (to pdf/file/printer).

html is no longer html.

Reply to
AnthonyL

+1. Is this a forum with binaries? Collecting the text is relatively straightforward. I sometimes use Notepad as a first stage.
Reply to
newshound

Thanks for the suggestions. yes, lots of large, full colour images. The forum software is phpBB which seems to be popular, and works well.

I think straight copy/paste of the text is the way forward, with images added where appropriate.

HTML is indeed no longer html, at least at my basic level. I note Owain's suggestion to rewrite the style sheet, but that is way beyond my experience. I do, though, like the idea of producing a pdf file for each thread then concatenate or merge PDFs into one file.

Reply to
Graeme

You can have a look at userstyles as it may have been done for you already, eg

formatting link
Owain

Reply to
spuorgelgoog

I used to copy and past into Word, but now print direct to pdf, or save the whole page if appropriate as HTML.

For printing and using the web style sheet I use the addon "Print Edit WE". I have not looked back since. You can also de-select areas you don't want to output.

I suggest you take a look before giving up.

Reply to
Fredxx

Lyx. Does an amazing job of merging PDFs. But it does take quite some getting your head around it.

Free - tick.

Reply to
polygonum_on_google

Opera browser does a good job of saving web pages to pdf. It may not do the number of pages you want, but might be worth a try.

Reply to
Richard

If its got to be accessible ie, not just a graphic of a page, I don't actually think there is any obvious way to do what you want since Adobe say they want dosh when you try to do a conversion grin. Brian

Reply to
Brian Gaff (Sofa)

You could try pasting the URLs into something like

formatting link
- having just quickly tried it it seems to do a reasonable job of 'printing' what you see (although it does add a small banner on the bottom). There a browser extensions to make it a bit more '1-click' too.

Reply to
Mathew Newton

This is NOT a forum.

Reply to
AnthonyL

He wasn't talking about "here", he was talking about A N Other forum.

Reply to
Andy Burns

Funny how we read things. I read 'Is this a forum with binaries?' as referring to the online forum I mentioned, not 'Is this (uk.d-i-y) a forum with binaries?', not least because newshound would know that uk.d-i-y is not a binary group.

Thanks for all the comments. I have transferred the first two forum threads to pdf by copying and pasting the text and images required which gives a clean and satisfactory result, although somewhat laborious. Oh well, perhaps what lockdown was designed for?

Cheers,

Reply to
Graeme

You can create PDF files by hand, which would bring the question perilously close to the group charter.

The following file can be copied into Notepad and stored as "helloworld.pdf". Where the extension may help the icon of the file look like an Acrobat Reader icon.

The file is copied off the web, and I messed with it a bit and screwed up the checksums. (I added two sentences, used some matrix operators to step the line beginning for the next line, then corrected the stream length to

112 characters (includes a line termination character per line.)

If you screw up the file enough, Acrobat tries to repair it internally before displaying this. This might cause a 20 second delay until it opens.

----------------- Do not copy this line ------------------ %PDF-1.7

1 0 obj % entry point << /Type /Catalog /Pages 2 0 R >>

endobj

2 0 obj << /Type /Pages /MediaBox [ 0 0 200 200 ] /Count 1 /Kids [ 3 0 R ] >>

endobj

3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 4 0 R >>

/Contents 5 0 R >>

endobj

4 0 obj << /Type /Font /Subtype /Type1 /BaseFont /Times-Roman >>

endobj

5 0 obj % page content << /Length 112 >>

stream BT

70 50 TD /F1 12 Tf (Hello, world!) Tj 1 0 0 1 70 40 Tm (We meet again.) Tj 1 0 0 1 70 30 Tm (The end.) Tj ET endstream endobj

xref

0 6 0000000000 65535 f 0000000010 00000 n 0000000079 00000 n 0000000173 00000 n 0000000301 00000 n 0000000380 00000 n trailer << /Size 6 /Root 1 0 R >>

startxref

492 %%EOF

----------------- Do not copy this line ------------------

It's a gnarly language, and barely feasible as a means for humans to package stuff by hand. Real files have a lot more baggage inside.

And if you looked inside another PDF and your conclusion is "Paul, a PDF doesn't look like this!". Of course not. PDF is available in binary and text format. And this is a human readable example. What I don't understand about this sample file, is it's missing a short "binary string" that has appeared in some other so-called text ones. And the file still seems to work.

Many modern documents contain "embedded fonts". Which would ruin a simple example like this. This sample file relies on the interpreter having a Times-Roman font. If you change the declaration to ComicSans, the document will likely not display (ComicSans not a part of a base set of fonts).

Refs:

Sample chit-chat:

formatting link
Where I got the sample file as my base file:

formatting link
Paul

Reply to
Paul

Yes, sorry, lost in translation.

Reply to
AnthonyL

HomeOwnersHub website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.