PowerShell: Optimising a Scripts Performance (Archiving Web-Pages Script)

Hi.  I wrote and ran a quick script to crawl a web-site and archive all it’s web-pages.  It worked but I noticed some serious performance issues;  its memory usage would occasionally balloon up to 19 GB (!);  it would slow down occasionally and it just generally seemed inefficient.

So I had a think and came up with a better script with a few simple changes;  it stays on about 500MB usage now and runs much better 🙂 Continue reading “PowerShell: Optimising a Scripts Performance (Archiving Web-Pages Script)”

PowerShell: Archiving All The Web-Pages on A Site (Example: bbc.co.uk/food)

With the outcry caused by the BBC removing the BBC food section from their website and the rush of people trying to mirror it or download the data I thought; just how would you do that with PowerShell?

(Of course, if you’re not interested in re-inventing the wheel Wget does this much better)

After a bit of thought I came up with the following requirements;

  • It would need to recursively call itself going through all the links on the page.
  • These should be filtered so I only get the pages matching a particular sub page (in the bbc example we only want the /food ones).
  • It should download the pages and try and keep a representation of the hierarchy (so /food/recipies/cucumber.html is saved to \food\recipies\cucumber.html on the disk).
  • I’m not interested in fixing the links yet; as long as we get a copy it should be fine.
  • We need someway to terminate the recursion so it doesn’t keep processing the same pages.  It also needs to only process each page once.

So a vague loop would be to go to a web page, go to all the links matching the sub page we want, output them to disk and record the page name to a list to make sure we don’t visit it again.  You’d end up with a nice folder full of file versions of the website.

Invoke-WebRequest is good for this; it gets the web page content but also puts all the links from the page in a handy property of the object.  Easy to enumerate through!

The script and detail are after the break.

Edit:  I had another pass at this script and optimised it a bit here. Continue reading “PowerShell: Archiving All The Web-Pages on A Site (Example: bbc.co.uk/food)”

PowerShell: Get Largest Mailboxes on an Exchange Server (One-Line Command)

Hi.  Last week a customer asked me find out which mailboxes had eaten all the pies on a particular Exchange server;  getting a list of the largest mailboxes and whether they were in a disconnected state (already  removed and waiting purge).

To get an accurate picture I needed to take into account the deleted items in the mailbox as well.  It’s a small command but it’s got a few squirrelly bits I’ll go into as well after the line. Continue reading “PowerShell: Get Largest Mailboxes on an Exchange Server (One-Line Command)”

PowerShell: Setting Exchange Send-As Permissions without Using the Add-ADPermission cmdlet

The Send-As permission for objects in Exchange is set on the AD object (rather than the mailbox itself). Normally, the weapon of choice is the Add-ADPermission cmdlet but interestingly that cmdlet is only available if you have some serious Exchange permissions; Organization Management. What you’re doing though requires fairly low-level AD permissions; you’re just modifying some attributes on an object. So I did some investigation and came up with a function to set Send-As permissions without using Add-ADPermission. Continue reading “PowerShell: Setting Exchange Send-As Permissions without Using the Add-ADPermission cmdlet”

PowerShell: Synchronizing a Folder (and Sub-Folders) Part 5

I’ve made some more changes to the syncing script. The first was some corrections about how it deals with paths with ‘odd’ symbols in them (like “[“) and the second was to properly output objects listing all the changes its made (for logging or further processing).

Update : I’ve revisited this script a few times with new additions and modifications.The latest full version of the script is here.  That post also includes links covering the other revisions to the script.

Continue reading “PowerShell: Synchronizing a Folder (and Sub-Folders) Part 5”

PowerShell: Converting PDF Bank Statements to PDF

I wrote some very quick and dirty code to import American Express PDF statements to CSV here.  I could export the PDF to TXT and then process the text file with PowerShell.I had to revisit it the other day as I had a raft of PDF statements to convert and import into YNAB (and not just from AMEX).

Of course, all the bank and credit card vendors use a standard PDF format for statements so it was easy.

Right?

No, actually pretty much everyone just does their own thing.  And by ‘own thing’ I mean PDFs that are not even consistent within themselves.

So:  the joy of regular expressions. Continue reading “PowerShell: Converting PDF Bank Statements to PDF”

PowerShell: Copy Directory Structure and a Random Sample of Files from Each Directory

I got my wife a new digital picture frame as a present.  It looks cool but the attached storage options (USB or card) aren’t big enough to take all our digital photos.

The decision about which pictures to include relies on either organisational skills OR an artistic eye, neither of which I have.

So what about making it strictly random?  Copying the entire directory structure but only a random sample of the files in each folder?

That I CAN do 🙂 Continue reading “PowerShell: Copy Directory Structure and a Random Sample of Files from Each Directory”