The WARC (Web ARChive) file format is a successor to the ARC format. Specifies a method for combining multiple digital resources into an aggregate archival file together with related information.
Subcategories 1
Related categories 2
Sites 17
Loading new listings for you to review...
- Web Data Commons The project extracts structured data from the Common Crawl and provides it for public download.
- Common Crawl data set Description of the data set.
- Github: example-warc-java Java and Clojure examples for processing Common Crawl WARC files.
- Github: webarchive-commons Common web archive utility code.
- WARC, Web ARChive file format Format description, ISO 28500:2009. Used by archival institutions to store content harvested by web crawls, for example via use of the Heritrix harvesting tool.
- Wget with WARC output About the development version of Wget which is capable to save WARC files.
- The WARC File Format (ISO 28500) Information, maintenance, drafts, hosted by the Bibliothèque nationale de France.
- Internetarchive/warc Python library for reading and writing warc files and warc headers.
- WARC File Format Specifications Collection of a number of drafts prepared as the WARC format has developed.
- Example ARC and WARC files Short examples of the ARC and WARC files that are generated by the Internet Archive's crawlers.
- Web Archive Transformation (WAT) Specification, Utilities, and Usage Overview Utilities to extract metadata from WARC files and create data analysis reports. Terminology, using WAT and Pig for data analysis.
- The WARC Ecosystem Wiki with resources about the WARC format and the tools that support it.
- International Internet Preservation Consortium: Tools and Software Perspectives of setting up a Web archiving chain, contains tools recommended and used by members of the IIPC.
- WSDK A lightweight Erlang library to write Web Archiving software. Overview, requirements, quick start, tutorial, support services, bugs reports, license and third party libraries.
- WARC Implementation Guidelines v.1 To gather advice and best practice to help institutions designing and creating WARC files for collection management, access, preservation, and interoperability with collections from different institutions.
- Github: pylibwarc A Python library for dealing with Web ARChive (WARC) files.
- Digital Preservation Coalition: Web-Archiving Report intended for those with an interest in, or responsibility for, setting up a web archive, particularly new practitioners or senior managers wishing to develop a holistic understanding of the issues and options available.