As I continued to hack on mwcrawler over the last month, I found that it didn’t really meet my needs for various reasons: slowness, difficulty of maintaining and adding sources, repeated grabbing of the same URL, and lack of response from the original author. So I’ve rewritten it and released Maltrieve, which (as the name indicates) retrieves malware directly from the sources listed at a number of sites. Improvements listed in the README include:
- Proxy support
- Multithreading for improved performance
- Logging of source URLs
- Multiple user agent support
- Better error handling
Right now, Maltrieve only looks at four meta-sources because two of the six in mwcrawler appear offline. But I have at least four more on deck, and mwcrawler didn’t parse all of its meta-sources correctly in any case. I also know of a few bugs that I haven’t figured out how to squash yet, but the core functionality works and it needs a broader audience to bang on it. Thus, I’ve tagged this version “beta-1″. Don’t rely on this for serious production, please.
If you use it, please let me know just so I can bask in the warm glow of productivity. The project itself remains under the GPL, of course. Suggestions, bug reports, etc. also would make me happy, whether via issues and pull requests on Github, contacting me on Twitter, or comments here.