WebGrabber is a special version of a web site copier. Other than being implemented in Java and can operate on many platforms, its UNIQUE characteristic is that it will import only the web files that you specify from Internet shortcuts. This principle is advantageous under the following circumstances:
- You can copy various web pages sharing a common topic but dispersed in many web sites and referencing each-other
- You can copy only a subset that interests you, not the whole web site which might contain thousands of web pages
- It can cope with dynamic-created pages and pages coming from databases
NO OTHER WEB SITE COPIER CAN DO IT.
WebGrabber is not, however, the best solution for all your web page copying needs.
- If you really want to copy a whole web site (or a sub-directory of the web site), use a classical web site copier. You can do it with WebGrabber but you might forget to specify some hyperlinks.
- WebGrabber is not a download manager (ex.g for downloading very large files).
WebGrabber fills 95% of my web page copying needs.
- Is implemented in Java
- Copies web files from Internet shortcuts
- Is topic-centric instead of web-site-centric
- Can import web pages for many topics with just one command
- Copies embedded web files (e.g. images, style sheets, etc) of web pages
- Adjusts the hyperlinks of imported web pages so that they work locally between each-other
- Has drag-drop support (Not available on Linux)
- Has editable policies
- Is fully documented
- Includes a tutorial
- Has a context-sensitive help system (Java-based) with configurable usage strategy
- Can simultaneously import many web files (through a multi-thread technique)
- Simplifies the creation of sub-directories and Internet shortcuts (with drag-drop support)
- Displays chosen topic statistics
- Monitors its own working condition onto a terminal and/or file
- Reports topic-related progress of its activities onto a dialog display and onto a file (one for each topic)
- Provides supports for generating Joliet compliant file names and reporting non-compliant ones
- Can remove segments of the imported HTML files that contain Web bugs, spying scripts, advertising and other tagged items that provoke network access.
- WebGrabber v. 1.1 is free
Main Window (Half-Size)
Configuration Dialog (75% Size)
Progress Dialog (75% Size)
Utility Dialog to Create Sub-Directories and Shortcuts (75% Size)
- A Java virtual machine version 1.3 or newer should be installed into your operating system.
- A screen resolution of 1024 by 768 pixels or more
- 128 Mb of memory
- 2 Mb of disk space
Note: WebGrabber should work on Win9x/ME/NT systems and on a Linux System with the Gnome 1.4 desktop environment. If you have tested it in Windows XP, please, tell me.
- Create a directory that will receive the application files
- Unzip the webgrabber.zip file into your newly created directory
This is it!
Double-click the webgrabber.jar file to start the application.
20030412 Version 1.1 release.
- Add the ability to remove HTML sections of imported Web pages that may contain unwanted scripts that provoke undesired Internet communications. For this purpose, search expressions (a highly simplified version of regular expressions) are defined by the user.
- Add supportive functions for enforcing compliance, or reporting non-compliance of file names to Joliet specifications. Useful if you plan to archive the imported Web pages into a Joliet formatted CD.
- The sub-directory of a web file may now be part of its file name. This reduces ambiguous names that would otherwise differ only by an arbitrary number.
- Add the ability to signal the end of a long operation by playing an audio file
- Add Compound-Import commands
- Accepts and correctly imports files which had the space in their name already replaced by "%20" character sequence. Usually, such files come from Windows-based servers.
- Corrects default directories of File Open/Save dialogs
- Corrects problems of double file extensions when the shortcut file name contains a file extension preceding the ".url" extension
- Corrects an annoying bug of the content of REPORT.txt file from an "ALL" operation (ex: File > Grab All): Each topic report now contains the activity messages of its own topic only. Previously, messages from other topics were inserted.
- Corrects other minor bugs
- Note: The version 1.1 configuration file is not compatible with the ones of version 1.0.
20021004 Version 1.0_02 maintenance release
- The optimization command now reports (on the Progress dialog and in a report file) the result of the command for each topic root directory:
- The number of files, before and after
- The disk space used, before and after
- webgrabber.jar can be used as a Help server and as a Web site copier using a different OS processes. No need for the wghelp.jar file.
020927 Version 1.0_01 maintenance release
- A minor bug about capitalization handling has been corrected.
020905 Version 1.0 release
- Refactoring of the control layer
- A minor bug related to saving its desktop layout under some circumstances has been corrected.
- The List > Optional Table Fields > Load Layout menu item has been added.
20020823 Version 1.0 beta1
- Url Interpretation is now saved into the Shortcut list, there is no more requirement to complete the integration of imported Web files within the same session. Note: Shortcut list files Beta1 are not compatible with those of Alpha2.
- Revert to Named command is disabled if one shortcut item of the list has reached the Hyperlink-Adjusted state
- The Ignored flag setting of a redundant shortcut item is interlocked with those of other shortcut items sharing the same URL (this reduces user operation errors)
- Much Improved user error tolerance
- Much Improved Linux/Gnome 1.4 support
- Corrected an important bug on Linux; the List > Optimize menu command was not working properly and the resulting imported web files were corrupted.
20020807: version 1.0 alpha 2
Changes since last version:
- Added rudimentary support for Linux/Gnome 1.4 (sorry no support yet for KDE)
- Corrected a bug of the List > Optimize command. Under some circumstances (if some shortcut files were added to a root directory where already previous shortcut items were integrated up to a previous Optimize command).
- Improved the interpretation of some ambiguous URLs through an extra query from the web site.
20020721: version 1.0 alpha 1, first public release. This version is not yet robust and may generate uncaught exceptions (Java term for application crashes) but will not affect your system. WebGrabber alpha 1 is not ready for UNIX/LINUX systems, it is also not ready for internationalization and its documentation needs some extra works.
Bugs and Limitations
20020927 Version 1.0_02
NOTE: Please, I will appreciate that you report any bug.
Note: WebGrabber is released without drag-drop support for Linux.
A software bug is assigned a priority level of:
- Urgent: Might result in non-WebGrabber data corruption or loss (e.g. a security hole)
- Serious: Might result in WebGrabber instability or loss of its own data (e.g. configuration file corruption)
- Important: Might result in data corruption of some of imported web files during the session or result in failure to fully accomplish its purpose
- Annoyance: Might result in non-important (usually reversible or correctable) data loss, annoying behavior, etc
- Documentation: Wording errors of its GUI elements, documentation errors, confusing messages, inconsistencies, etc
- Wish: Missing feature that one would expect to have; this is not really a bug.
Known Bugs on Windows
Known Bugs on Linux
- Annoyance: Drag-drop not working
- Annoyance: Font errors but usable (Not observed with Java 1.4.0)
- Annoyance: Internet Shortcuts on KDE2 creates exceptions because the shortcut file name contains forward slash characters.
Workaround: Their file names should be edited in such a way that there are no such characters
- Annoyance: Tiny Splash/About dialog display on KDE2 (Not observed with Java 1.4.0)
- Important: Not ready for KDE but workable under the Gnome 1.4 desktop environment.
Plans and Wishes
As with any products, there is room for improvement and supplemental features. Here are some for future versions of WebGrabber:
- Command-line arguments
- Configuration file
- Layout file
- Look and Feel
- Reduce the required number of mouse clicks (e.g. not to display the Progress dialog for a "Open Topic Roots" and for a Name command. One command only for Hyperlink-Adjust AND Optimize)
- Reduce the executable size of the Help Server or of WebGrabber
- Ability to detect redirect pages
- Ability to re-order the table display
- Context menus (i.e. popup menus)
- Ability to highlight items of the table that share common characteristics
- User Scripting (Jython)
Some of them may not be implemented. Suggestions are welcome but I am not committing into implementing any of them.
Marcel St-Amant firstname.lastname@example.org