How to extract emails from a PST file
How to extract emails from a PST file using Bash on Centos
How to extract emails from a PST file using Bash on Centos – Todays post explains something I have been looking to do for some time now to extract the email data from a PST file. If you have ever used Microsoft Outlook then you might be familiar with the famous (or infamous) pst data files your emails get stored in.
Nowadays most people have migrated their email addresses to either Gmail, or other online platform for a variety of reasons, but probably one of the main reasons has been so you dont lose all your emails after you reinstall your OS, or have a problem with your disk. With an online service at least your emails are stored in the cloud!
The data in an email file is not the prettiest of things, but with some more bash scripting (i will keep this in mind for a future article) getting the useful data out can be very useful.
So for todays post we will just do the basics… extract your emails while keeping the folders intact on Centos.
After some searching around there is a package called libpst. It is not installed as default so to install it on Centos 6 using yum do this –
yum install libpst.x86_64
If you so a ‘
yum search libpst‘ first you will find the package name for the 32 bit version if you need.
The install will automatically install a dependency libpst-libs for you.
After the install is complete you now have a very useful command readpst.
My advice is create a working directory where you want your emails to be put like this –
mkdir -p /tmp/work
Put your pst file in /tmp/work just to keep things in the same place.
There are a number of options to the readpst command you can read up on the man page, but to keep things simple here is what I did –
readpst -S outlook.pst
# readpst -S outlook.pst
Opening PST file and indexes...
Processing Folder "Junk E-mail"
Processing Folder "Sent"
"Sent" - 7 items done, 0 items skipped.
Processing Folder "Inbox"
Processing Folder "Deleted Items"
Processing Folder "Outbox"
Processing Folder "Sent Items"
Processing Folder "Calendar"
Processing Folder "Contacts"
Processing Folder "Journal"
Processing Folder "Notes"
Processing Folder "Tasks"
Processing Folder "Drafts"
Processing Folder "RSS Feeds"
Processing Folder "Junk E-mail1"
"Personal Folders" - 15 items done, 0 items skipped.
"Tasks" - 0 items done, 1 items skipped.
"Sent Items" - 0 items done, 1 items skipped.
"Journal" - 0 items done, 1 items skipped.
"Calendar" - 1 items done, 5 items skipped.
"Junk E-mail1" - 2 items done, 0 items skipped.
"Inbox" - 3491 items done, 5 items skipped.
Display what you have just extracted with a ls and you will see something like this –
Outlinfo.pst Personal Folders
# cd Personal\ Folders/
[Personal Folders]# ls
16 Calendar Inbox Journal Junk E-mail1 Sent Sent Items Tasks
[Personal Folders]# cd Inbox/
As you can see your whole folder structure is kept, and each email is given a number (along with any embedded images as well). You can then cat the file to see its contents. Be aware that the layout will be different depending if the email was a text or a HTML formatted email, which you can parse with bash after comparing the appropriate content type area of each file –
Content-Type: text/plain; charset="UTF-8"
Content-Type: text/html; charset="UTF-8"
As I say I will get around to writing a followup article with some more examples of how you can extract specific information from your emails, but basically grep, sed, and awk will be the major tools you need to get what you want out!
If you like this article then please subscribe to my blog, and be sure to check out more bash script tutorials from the menu at the top of the site.