We have successfully restored the primary domain (www.cdc.gov) of the site. With this success, we plan to launch publicly. There are several parts of the site that are still works-in-progress.

  • All subdomains (e.g., XYZ.cdc.gov) are under reconstruction. This is taking longer because the archiving mechanism for the primary domain is not available for the subdomains.
  • Some links are broken as a result of the archival crawl. We hope to capture and repair these through our bug tracking system.
  • We do have all of the datasets downloaded, but not necessarily their pages. Offering those data for individual download is on our to-do.

Our Process

Primary www subdomain

The GutHub of this programming to be shared by March 10, 2025.

  1. Download the archived zim file from archive.org
  2. Convert its contents to a LeveldDB file (~95GB).
  3. Set up Flask python server with adjustments to add the banner and break logo image links.

Other subdomains

  1. Identify WayBack WARC.
  2. Extract its contents.
  3. Process html to add banner and break logo image links.

Data

  1. Download the torrent file from archive.org.
  2. Use a torrent client and, where possible, provide seeding for other downloads.
  3. Provide back to public? <- we are here