Automate download of PDF/PNG files
Mathematica notebook for use in automating download of PDF/PNG files.
Concatenating URLs in Excel or Sheets
If you are going to be grabbing more than just a few URLs to upload to the Internet Archive using the Google Sheets tool then these hacks might be useful to you!
- When you find an item that has multiple PDF links, for example, a journal, you can take a look at the filename to see if there is a consistent pattern that will allow you to easily generate all of the URLs in a spreadsheet without having to copy and paste them individually.
- Take a look at this journal, for example: http://nbuv.gov.ua/j-tit/UZTNU_istor
- Select one of the years (2013) and then an issue (2013 Т. 26(65); 1)
- Hover over the first item in the table of contents (Титул) and you will see a filename: http://nbuv.gov.ua/j-pdf/UZTNU_istor_2013_26(65)_1_1.pdf
- You’ll notice that the first part of the URL is going to be consistently used by each item following the first item
- When you get to the third article in the table of contents you will see a slightly different filename: http://nbuv.gov.ua/UJRN/UZTNU_istor_2013_26(65)_1_3
- Let’s compare:
- http://nbuv.gov.ua/j-pdf/UZTNU_istor_2013_26(65)_1_1.pdf
- http://nbuv.gov.ua/UJRN/UZTNU_istor_2013_26(65)_1_3
- The /j-pdf/ is not present in the filenames for articles #3 down and they do not include the .pdf extension.
- So if you have 20 articles within a journal, such as in the Вчені записки Таврійського національного університету імені В. І. Вернадського. you can break up the filepath in a spreadsheet and use the concatenate formula to automatically populate the sheet with the URLs rather than manually copying and pasting.
- If we use the journal example above - you can break up the filename like this across columns (from left to right)
- Prefix: http://nbuv.gov.ua/j-pdf/UJRN/UZTNU_istor_
- Year_Vol_Iss: 201326(65)_1
- Article_Number: 1.pdf
- Full Link: http://nbuv.gov.ua/j-pdf/UJRN/UZTNU_istor_2013_26(65)_1_1.pdf
- The concatenate formula would be entered into the “full link” column and the general formula is: =CONCATENATE(A2,”“,B2,”“,C2)
- The letter and number combo refers to the column and row number
- As you enter the information for each of the columnn (Prefix, Year_Vol_Iss, Article_Number) you will then fill down each column automatically.
- Here is a template that you can use for your needs.