A Website to Text Converter


wk2txt is a command line tool based on the Webkit Engine to get the text content of a websites as seen by current webbrowsers which includes the evaluation of javascript code present in many modern webpages.

wk2txt can process a whole list of URLs, which can be built by using the "--input" option multiple times or by passing the filename of a list of URLs to the "--urlfile". This file is expected to have 1 URL per line.

The contents of the webpages is written to one file per webpage. The files are stored in the directory specified by the "--output" option. The filename is derived from the URL, it is the SHA1 hash code for the URL.

By default, wk2txt dumps the contents and the meta data of the webpage in plain text format. If the "--xml" option is used, content and meta data is dumped in XML format.


wk2txt *options*

   --input URL               convert *URL* to text
   --urlfile file            convert URLs in *file* to text
   --output directory        save output in directory *directory*
   --xml                     dump in XML format
   --timeout inverval        timeout after *interval* seconds (NOT IMPLEMENTED)
   --debug                   print diagnostic information
   --help                    print help on options
   --version                 print version number


Christian Plessl <>

Roman Plessl <>


Download and get the source code from github

Last modified 13 years ago Last modified on Feb 7, 2010, 9:17:26 PM