Welcome, and thank you for volunteering! It’s been amazing to see so many people come together to help archive Ukrainian cultural heritage sites.
If you haven’t filled out the volunteer form, that’s the place to start. We go through the responses to that form a couple times a day and send people a link to our Slack, where we’re organizing everything.
We’re posting relevant information for newcomers in the #orientation channel, including things like times for upcoming Zoom sessions. You can drop by anytime in the Zoom session window to chat with an organizer who can help you get started, or troubleshoot tasks that you’re working on. Head to that channel first once you join Slack.
Our latest updates and handy tips and tricks will be pinned at the top of each channel. For example, Orientation’s pins include:
Other channels will have their own pin collections, and the pins get updated more frequently than these web pages do. If you’re confused about anything, just ask in the channel and we’ll give you the latest info we have.
We’ve organized our Slack channels primarily around the tasks people are doing:
#browsertrix is for people who are relatively tech-comfortable, and involves running the WebCrawler Browsertrix Docker container. If that doesn’t resonate with you, we have many other options!
#linkcollection is for people working on finding new links to add to our workflow by submitting them via the URL form or in bulk, directly to our working spreadsheet.
#manualwebrecorder is for people using the Fast ArchiveWeb interface or ArchiveWeb.page browser plugin to manually create web archives by navigating sites (especially complex sites w/ lots of javascript or human interaction). If you can browse a website, you can handle these tasks!
#waybackmachine is for people checking that websites are well captured by the Internet Archive’s Wayback Machine and submitting links (including large sets of links) to the Wayback Machine for archiving. You can do this manually or try to automate it with code from the #scraping channel.
#internetarchive is for people who are submitting files to the SUCHO collection on the Internet Archive. Many of those files are the result of people working in the #scraping channel.
#scraping is for people writing their own custom scrapers using code, often as a way to speed up tasks in other channels (like #waybackmachine) or to capture things that automated crawlers like Browsertrix or the Wayback Machine can’t capture (library catalogs, digital archives, etc.)
#translation is full of people who can read Russian and/or Ukrainian. If you’re confused about how to navigate a site you’re working on, or what something means, head over there and ask.
#qualitycontrol is for people who can more or less read Ukrainian and/or Russian, to check on the web archives we’ve created and make sure they’re actually complete.
#metadata is for people who are curating metadata for items uploaded into the SUCHO collection on the Internet Archive
#wikimedia is for people working with WikiData: getting information from them, and sending updated information back.
There are other areas of Slack that don’t have to do with tasks:
#mentors is if you’re looking for a person who can help guide you to a task to get started with. The response time depends on how many people are online at a given time.
#pets is for sharing pictures of animals!
#random is where we share duck pictures, music playlists, funny jokes, and other such things.
#academic-outreach is where we’re discussing potential academic publications (e.g. handbook of emergency web archiving) that document the work of the project and the challenges we’ve encountered
Our work is coordinated through an enormous Google Sheet with many tabs:
If you look at the bottom of that screen shot, there are Browsertrix, Manual Webrecorder, and InternetArchive tabs visible; we have about 12 other tabs coordinating other sections of the work.
When we talk about doing things in “The Spreadsheet,” this is usually the spreadsheet we mean. (However, the Metadata team has a different spreadsheet for its team’s work,, and Erica is maintaining a Situation Monitoring spreadsheet with information about specific locations of high interest.)
See the “Low Tech Helping” and “High Tech Helping” sections (below) if you’re looking for a type of task to focus on.
You can also ask in #general and/or #situation-monitoring if you want to focus on locations under active bombardment or with identified damage.
Once you’ve found your area of focus, claiming and completing tasks will be handled in the spreadsheet.
We have several teams working on projects that mostly need a web browser and enthusiasm (you don’t even need to read Cyrillic for most of these).
In addition to the Link Collection and Metadata volunteer groups above, there are a couple of teams where reading Ukrainian is a vital prerequisite:
Go to the tab that corresponds to the area you’re working in (for example, Browsertrix or Internet Archive or Link Collection).
Find a row that has a URL but doesn’t have a Status or Claimed By in it. (Often these will be toward the bottom of the sheet. You could also re-try things that were Skipped or Failed – some sites have been coming back online!)
Put your name down in the “claimed by” column when you decide to work on a task.
Change the status to “in progress”. We try to break up most tasks into “bite-sized pieces” so people can do just a little.
(If you have a larger amount of time available to work on this, feel free to claim multiple tasks. If you don’t get to them, you can remove your name and change the status back to blank.)
When you’ve claimed a URL, you’ll probably want to put it into a safety checker like https://sitecheck.sucuri.net/to make sure it hasn’t been infested with malware. (You may also want to protect your computer and browser with the steps mentioned in the Safety First page.)
Visit the Tutorials Page for more specifics on how the Internet Archive, Browsertrix Crawler, and other workflows go in more detail.
If you’ve got questions while working on a task, try asking in the corresponding channel (e.g. #manualwebrecorder for tasks in the Manual Webrecorder tab of the sheet).
If you’re not sure what to do with the data when you’re done with the task, and there aren’t more specific directions in the instructions for the task, check out our guide to data upload. Be sure to mark the task as done in the sheet when it’s completed, and fill in any info needed there (e.g. links to the spreadsheets you created full of Internet Archive links)