phpBB Scraper

Just as an FYI, I have a phpBB scraper ready to go, and have tested it with cassieiswatching.myfreeforum.org, which I now have an XML dump of, should something ever happen to it 😉 I’ll work on making that XML dump usable (and searchable) in an effort to gather as much CiW-related information in one place. Once usable, I’ll make a download available as well. If someone wants the raw files in the meantime, just ask 🙂

9 Comments

  1. Wintermute April 12, 2018 7:21 pm 

    I don't really know how long it would take, as I've never benchmarked it. I know I've run it against a small forum, on a fairly fast server, but I honestly couldn't tell you how long it took. I do seem to recall that it worked, though, as I have a capture of that forum around somewhere.

  2. Simon April 12, 2018 7:21 pm 

    That's a good idea, thanks. I've launched the script on my server (which is in a datacenter so it's running 24/7 anyway) and still no luck after ~15 hours. The phpbb forum has more than 8000 topics so I'm not sure if it's normal or not.

    It's not very urgent so I can wait…

  3. Wintermute April 12, 2018 7:21 pm 

    I would maybe suggest setting the value of $start_topic and $end_topic maybe 10 or so apart (and make sure at least one valid topic # falls between the two values) and do a test-run, just to make sure it completes without error. Then crank up the value of $end_topic to a sufficiently high value to capture everything.

  4. Simon April 12, 2018 7:21 pm 

    Great thanks!

    I did find it yesterday and used it but after around 4.5 hours it was still running (and whilst the destination folder was rightly created on my machine it was empty). I guess it's taking time to scrape a whole forum!

    Have a nice day

  5. Wintermute April 12, 2018 7:21 pm 

    I found the code 🙂 Rather than paste it in here, the original URL that I found it at was https://github.com/indigolemon/phpBB-Scraper. I didn't modify it, other than set the username, password, and URL to the site I was scraping.

  6. Wintermute April 12, 2018 7:21 pm 

    Sure… I'll see if I can locate it. It's someone else's code, and I don't even remember if I had tweaked it at all.

  7. Simon April 12, 2018 7:21 pm 

    Hey! The scraper if possible 🙂

    Thanks!

  8. Wintermute April 12, 2018 7:21 pm 

    The scraper, or the raw files? I may have replaced both, so let me know, and I'll see what I can locate 🙂

  9. Simon April 12, 2018 7:21 pm 

    Hey!

    Any chance you'd share it with me?

    Thanks

Comments are closed.