Archive – Herring's Fishbait

June 3, 2016June 10, 2016

PowerShell: Archiving All The Web-Pages on A Site (Example: bbc.co.uk/food)

With the outcry caused by the BBC removing the BBC food section from their website and the rush of people trying to mirror it or download the data I thought; just how would you do that with PowerShell?

(Of course, if you’re not interested in re-inventing the wheel Wget does this much better)

After a bit of thought I came up with the following requirements;

It would need to recursively call itself going through all the links on the page.
These should be filtered so I only get the pages matching a particular sub page (in the bbc example we only want the /food ones).
It should download the pages and try and keep a representation of the hierarchy (so /food/recipies/cucumber.html is saved to \food\recipies\cucumber.html on the disk).
I’m not interested in fixing the links yet; as long as we get a copy it should be fine.
We need someway to terminate the recursion so it doesn’t keep processing the same pages. It also needs to only process each page once.

So a vague loop would be to go to a web page, go to all the links matching the sub page we want, output them to disk and record the page name to a list to make sure we don’t visit it again. You’d end up with a nice folder full of file versions of the website.

Invoke-WebRequest is good for this; it gets the web page content but also puts all the links from the page in a handy property of the object. Easy to enumerate through!

The script and detail are after the break.

Edit: I had another pass at this script and optimised it a bit here. Continue reading “PowerShell: Archiving All The Web-Pages on A Site (Example: bbc.co.uk/food)”

August 27, 2014

Powershell Archiving Script, Part 8 (Final Part)

The final functions are done (in the last part) now it’s just the main body of the script. This will bring all the code together and produce the output we want. The full script can be found here.

Continue reading “Powershell Archiving Script, Part 8 (Final Part)”

August 26, 2014August 27, 2014

Powershell Archiving Script, Part 7

We’re on the home straight now. In the last part we dealt with the last major workhorse of the script (actually moving objects to and from the archive with Move-ArchiveObject) and in this part we deal with some of the formatting / presentation functions. The full script can be found here.

Continue reading “Powershell Archiving Script, Part 7”

August 22, 2014August 26, 2014

Powershell Archiving Script, Part 6

In the last part we looked at the Get-FolderInformation function, which returns an object describing a folder so the user can tell if they want to process it or not. This part is going to focus on the Move-ArchiveObject function which will actually perform the archive (or return from archive) process on a chosen folder. The full script can be found here.

Continue reading “Powershell Archiving Script, Part 6”

August 20, 2014August 22, 2014

Powershell Archiving Script, Part 5

Continuing on from the last part where I defined the Get-FolderSize function the next function to be defined is Get-FolderInformation. This gets the relevant information about all the folders that could be processed and outputs that information as objects. The full script can be found here.

Continue reading “Powershell Archiving Script, Part 5”

August 18, 2014August 20, 2014

Powershell Archiving Script, Part 4

In the last part we wrote out the Powershell to handle the parameters (including some crude validation) and for the skeleton of the rest of the script. In the next few parts we’ll define the functions the script will use. The full script can be found here.

Continue reading “Powershell Archiving Script, Part 4”

August 13, 2014August 19, 2014

Powershell Archiving Script, Part 3

In this part we’ll start writing some Powershell. Initially we’ll write about the function call and function structure (see below). The full script can be found here. Continue reading “Powershell Archiving Script, Part 3”

August 6, 2014August 11, 2014

Powershell Archiving Script, Part 2

Or, A Cunning Plan for Archiving.

The first thing to do is a quick brain-dump of things I want the script to do. From that I should be able to get a better idea about how I want it to work.

So in no particular order here is a list of all the elements I want in the script;

Continue reading “Powershell Archiving Script, Part 2”

August 1, 2014August 6, 2014

Powershell Archiving Script, Part 1

A script to archive items between fast storage (SSD) and slow storage (mechanical hard-disk) using symbolic links to make the location of the files transparent to the OS. A solution to Steam games flooding my C:\ drive!

That should be pretty easy, right?

Continue reading “Powershell Archiving Script, Part 1”