Skip to content

Commit 33fcf70

Browse files
authored
Merge pull request #1 from ehanson8/master
Updates
2 parents 81c69fc + d50efd4 commit 33fcf70

19 files changed

+958
-4
lines changed

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,3 +49,6 @@ secrets.py
4949
*.pyc
5050
data/*
5151
!data/.keep
52+
.profile
53+
*.csv
54+
*.json

README.md

Lines changed: 48 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,27 @@ All of these scripts require a secrets.py file in the same directory that must c
1010

1111
## Scripts
1212

13+
#### [addBibNumbersAndPost.py](/addBibNumbersAndPost.py)
14+
Based on a specified CSV file with URIs and bib numbers, this script posts the specified bib number to the ['user_defined]['real_1'] field for record specified by the URI.
15+
16+
#### [dateCheck.py](/dateCheck.py)
17+
Retrieves 'begin,' 'end,' 'expression,' and 'date_type' for all dates associated with all resources in a repository
18+
19+
#### [eadToCsv.py](/eadToCsv.py)
20+
Based on a specified file name and a specified file path, this script extracts selected elements from an EAD XML file and prints them to a CSV file.
21+
22+
#### [getAccessionUDFs.py](/getAccessionUDFs.py)
23+
This GET script retrieves all of the user-defined fields from all of the accessions in the specified repository.
24+
25+
#### [getAccessions.py](/getAccessions.py)
26+
This GET script retrieves all of the accessions from a particular repository into a JSON file.
27+
28+
#### [getAllArchivalObjectTitles.py](/getAllArchivalObjectTitles.py)
29+
Retrieves titles from all archival objects in a repository. Upon running the script, you will be prompted enter the resource ID (just the number, not the full URI).
30+
31+
#### [getArchivalObjectCountByResource.py](/getArchivalObjectCountByResource.py)
32+
Retrieves a count of archival objects associated with a particular resource. Upon running the script, you will be prompted enter the resource ID (just the number, not the full URI).
33+
1334
#### [getArchivalObjectsByResource.py](/getArchivalObjectsByResource.py)
1435
A GET script to extract all of the archival objects associated with a particular resource. Upon running the script, you will be prompted enter the resource ID (just the number, not the full URI).
1536

@@ -18,17 +39,43 @@ This GET script retrieves specific properties, including proprerties that have a
1839

1940
#### [getPropertiesFromAgentsPeopleCSV.py](/getPropertiesFromAgentsPeopleCSV.py)
2041
This GET script retrieves specific properties from the JSON of ArchivesSpace agent_people records into a CSV file which is specified in variable 'f' on line 17. In this example, the script retrieves the 'uri,' 'sort_name,' 'authority_id,' and 'names' properties from the JSON records by iterating through the JSON records with the function 'for i in range (...)' on line 19. The f.writerow(....) function on line 20 specifies which properties are retrieved from the JSON and the f.writerow(....) on line 18 specifies header row of the CSV file.
42+
2143
#### [getResources.py](/getResources.py)
2244
This GET scripts retrieves all of the resources from a particular repository into a JSON file which is specified in variable 'f' on line 16. This GET script can be adapted to other record types by editing the 'endpoint' variable on line 13 (e.g. 'repositories/[repo ID]/accessions' or 'agents/corporate_entities').
2345

2446
#### [getSingleRecord.py](/getSingleRecord.py)
2547
This GET script retrieves a single ArchivesSpace record based on the record's 'uri,' which is specified in the 'endpoint' variable on line 13.
2648

49+
#### [getTopContainerCountByResource.py](/getTopContainerCountByResource.py)
50+
Retrieves a count of top containers associated with archival objects associated with a particular resource. Upon running the script, you will be prompted enter the resource ID (just the number, not the full URI).
51+
52+
#### [getTopContainerCountByResourceNoAOs.py](/getTopContainerCountByResourceNoAOs.py)
53+
Retrieves a count of top containers directly associated (not through an archival object) with a particular resource. Upon running the script, you will be prompted enter the resource ID (just the number, not the full URI).
54+
55+
#### [getTopContainers.py](/getTopContainers.py)
56+
This GET script retrieves all of the top containers from a particular repository into a JSON file.
57+
58+
#### [getUrisAndIds.py](getUrisAndIds.py)
59+
For the specified record type, this script retrieves URI and the 'id_0,' 'id_1,' 'id_2,' 'id_3,' and a concatenated version of all the 'id' fields.
60+
2761
#### [postContainersFromCSV.py](/postContainersFromCSV.py)
2862
This script works to create instances (consisting of top_containers) from a separate CSV file. The CSV file should have two columns, indicator and barcode. The directory where this file is stored must match the directory in the filePath variable. The script will prompt you first for the exact name of the CSV file, and then for the exact resource or accession to attach the containers to.
2963

3064
#### [postNew.py](/postNew.py)
3165
This POST script will post new records to a generic API endpoint based the record type, 'agents/people' in this example. This script can be modified to accommodate other data types (e.g. 'repositories/[repo ID]/resources' or 'agents/corporate_entities'). It requires a properly formatted JSON file (specified where [JSON File] appears in the 'records' variable on line 13) for the particular ArchivesSpace record type you are trying to post.
3266

3367
#### [postOverwrite.py](/postOverwrite.py)
34-
This POST script will overwrite existing ArchivesSpace records based the 'uri' and can be used with any ArchivesSpace record type (e.g. resource, accession, subject, agent_people, agent_corporate_entity, archival_object, etc.). It requires a properly formatted JSON file (specified where [JSON File] appears in the 'records' variable on line 13) for the particular ArchivesSpace record type you are trying to post.
68+
This POST script will overwrite existing ArchivesSpace records based the 'uri' and can be used with any ArchivesSpace record type (e.g. resource, accession, subject, agent_people, agent_corporate_entity, archival_object, etc.). It requires a properly formatted JSON file (specified where [JSON File] appears in the 'records' variable on line 13) for the particular ArchivesSpace record type you are trying to post.
69+
70+
#### [resourcesWithNoBibNum.py](/resourcesWithNoBibNum.py
71+
Prints the URIs to a CSV file of all resources in a repository without a bib number stored in the ['user_defined']['real_1'] field.
72+
73+
#### [searchForUnassociatedContainers.py](/searchForUnassociatedContainers.py)
74+
Prints the URIs to a CSV file of all top containers that are not associated with a resource or archival object.
75+
76+
#### [unpublishArchivalObjectsByResource.py](/unpublishArchivalObjectsByResource.py)
77+
This script unpublishes all archival objects associated with the specified resource. Upon running the script, you will be prompted enter the resource ID (just the number, not the full URI).
78+
79+
#### [updateFindingAidData.py](/updateFindingAidData.py)
80+
81+
#### [updateResourceWithCSV.py](/updateResourceWithCSV.py)

addBibNumbersAndPost.py

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
import json
2+
import requests
3+
import secrets
4+
import time
5+
import csv
6+
7+
startTime = time.time()
8+
9+
baseURL = secrets.baseURL
10+
user = secrets.user
11+
password = secrets.password
12+
13+
auth = requests.post(baseURL + '/users/'+user+'/login?password='+password).json()
14+
session = auth["session"]
15+
headers = {'X-ArchivesSpace-Session':session, 'Content_Type':'application/json'}
16+
17+
urisBibs = csv.DictReader(open(''))
18+
19+
f=csv.writer(open('bibNumberPush.csv', 'wb'))
20+
f.writerow(['uri']+['existingValue']+['bibNum'])
21+
22+
for row in urisBibs:
23+
uri = row['asURI']
24+
bibNum = row['bibNum']
25+
print uri
26+
record = requests.get(baseURL + uri, headers=headers).json()
27+
try:
28+
print record['user_defined']
29+
record['user_defined']['real_1'] = bibNum
30+
existingValue = 'Y'
31+
except:
32+
value = {}
33+
value['real_1'] = row['bibNum']
34+
record['user_defined'] = value
35+
print value
36+
existingValue = 'N'
37+
record = json.dumps(record)
38+
post = requests.post(baseURL + uri, headers=headers, data=record)#.json()
39+
print post
40+
f.writerow([uri]+[existingValue]+[bibNum]+[post])
41+
42+
elapsedTime = time.time() - startTime
43+
m, s = divmod(elapsedTime, 60)
44+
h, m = divmod(m, 60)
45+
print 'Total script run time: ', '%d:%02d:%02d' % (h, m, s)

dateCheck.py

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
import json
2+
import requests
3+
import secrets
4+
import time
5+
import csv
6+
7+
startTime = time.time()
8+
9+
baseURL = secrets.baseURL
10+
user = secrets.user
11+
password = secrets.password
12+
13+
auth = requests.post(baseURL + '/users/'+user+'/login?password='+password).json()
14+
session = auth["session"]
15+
headers = {'X-ArchivesSpace-Session':session, 'Content_Type':'application/json'}
16+
17+
endpoint = '/repositories/3/resources?all_ids=true'
18+
19+
ids = requests.get(baseURL + endpoint, headers=headers).json()
20+
21+
records = []
22+
f=csv.writer(open('duplicateBeginEndDates.csv', 'wb'))
23+
f2=csv.writer(open('asDates.csv', 'wb'))
24+
f.writerow(['uri']+['begin']+['end']+['expression']+['type'])
25+
f2.writerow(['uri']+['begin']+['end']+['expression']+['type'])
26+
counter = 0
27+
for id in ids:
28+
endpoint = '/repositories/3/resources/'+str(id)
29+
output = requests.get(baseURL + endpoint, headers=headers).json()
30+
for date in output['dates']:
31+
counter = counter + 1
32+
print counter
33+
try:
34+
begin = date['begin']
35+
except:
36+
begin = ''
37+
try:
38+
end = date['end']
39+
except:
40+
end = ''
41+
try:
42+
expression = date['expression']
43+
except:
44+
expression = ''
45+
if begin == end and begin != '' and begin != '':
46+
f.writerow([output['uri']]+[begin]+[end]+[expression]+[date['date_type']])
47+
else:
48+
f2.writerow([output['uri']]+[begin]+[end]+[expression]+[date['date_type']])
49+
50+
elapsedTime = time.time() - startTime
51+
m, s = divmod(elapsedTime, 60)
52+
h, m = divmod(m, 60)
53+
print 'Total script run time: ', '%d:%02d:%02d' % (h, m, s)

eadToCsv.py

Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
import csv
2+
from bs4 import BeautifulSoup
3+
4+
5+
6+
def extractValuesFromComponentLevel (componentLevel):
7+
level = componentLevel.name
8+
componentLevelLabel = componentLevel['level']
9+
unittitle = componentLevel.find('did').find('unittitle').text.replace('\n','').encode('utf-8')
10+
try:
11+
unitdate = componentLevel.find('did').find('unitdate').text.encode('utf-8')
12+
except:
13+
unitdate = ''
14+
try:
15+
scopecontentElement = componentLevel.find('scopecontent').find_all('p')
16+
scopecontent = ''
17+
for paragraph in scopecontentElement:
18+
paragraphText = paragraph.text.replace('\n','').replace(' ',' ').replace(' ',' ').encode('utf-8')
19+
scopecontent = scopecontent + paragraphText
20+
except:
21+
scopecontent = ''
22+
try:
23+
container1 = componentLevel.find('did').find_all('container')[0].text.encode('utf-8')
24+
except:
25+
container1 = ''
26+
try:
27+
containerId1 = componentLevel.find('did').find_all('container')[0]['id']
28+
except:
29+
containerId1 = ''
30+
try:
31+
containerType1 = componentLevel.find('did').find_all('container')[0]['type']
32+
except:
33+
containerType1 = ''
34+
try:
35+
container2 = componentLevel.find('did').find_all('container')[1].text.encode('utf-8')
36+
except:
37+
container2 = ''
38+
try:
39+
containerId2 = componentLevel.find('did').find_all('container')[1]['id']
40+
except:
41+
containerId2 = ''
42+
try:
43+
containerType2 = componentLevel.find('did').find_all('container')[1]['type']
44+
except:
45+
containerType2 = ''
46+
global sortOrder
47+
sortOrder += 1
48+
f.writerow([sortOrder]+[level]+[componentLevelLabel]+[unittitle]+[unitdate]+[scopecontent]+[containerType1]+[container1]+[containerId1]+[containerType2]+[container2]+[containerId2])
49+
50+
filepath = raw_input('Enter file path: ')
51+
fileName = raw_input('Enter file name: ')
52+
xml = open(filepath+fileName)
53+
54+
f=csv.writer(open(filepath+'eadFields.csv', 'wb'))
55+
f.writerow(['sortOrder']+['<co?>']+['<co?> level']+['<unittitle>']+['<unitdate>']+['<scopecontent>']+['containerType1']+['container1']+['containerId1']+['containerType2']+['container2']+['containerId2'])
56+
upperComponentLevels = BeautifulSoup(xml, 'lxml').find('dsc').find_all('c01')
57+
sortOrder = 0
58+
for upperComponentLevel in upperComponentLevels:
59+
componentLevelLabel = upperComponentLevel['level']
60+
unittitle = upperComponentLevel.find('did').find('unittitle').text.encode('utf-8')
61+
try:
62+
scopecontentElement = upperComponentLevel.find('scopecontent').find_all('p')
63+
scopecontent = ''
64+
for paragraph in scopecontentElement:
65+
paragraphText = paragraph.text.replace('\\n','').replace(' ',' ').replace(' ',' ').encode('utf-8')
66+
scopecontent = scopecontent + paragraphText
67+
except:
68+
scopecontent = ''
69+
sortOrder += 1
70+
f.writerow([sortOrder]+['c01']+[componentLevelLabel]+[unittitle]+['']+[scopecontent]+['']+['']+['']+['']+['']+[''])
71+
72+
componentLevelArray = upperComponentLevel.find_all('c02')
73+
for componentLevel in componentLevelArray:
74+
extractValuesFromComponentLevel(componentLevel)
75+
componentLevelArray = componentLevel.find_all('c03')
76+
for componentLevel in componentLevelArray:
77+
extractValuesFromComponentLevel(componentLevel)
78+
componentLevelArray = componentLevel.find_all('c04')
79+
for componentLevel in componentLevelArray:
80+
extractValuesFromComponentLevel(componentLevel)
81+
componentLevelArray = componentLevel.find_all('c05')
82+
for componentLevel in componentLevelArray:
83+
extractValuesFromComponentLevel(componentLevel)
84+
componentLevelArray = componentLevel.find_all('c06')
85+
for componentLevel in componentLevelArray:
86+
extractValuesFromComponentLevel(componentLevel)
87+
componentLevelArray = componentLevel.find_all('c07')
88+
for componentLevel in componentLevelArray:
89+
extractValuesFromComponentLevel(componentLevel)
90+
componentLevelArray = componentLevel.find_all('c08')
91+
for componentLevel in componentLevelArray:
92+
extractValuesFromComponentLevel(componentLevel)
93+
componentLevelArray = componentLevel.find_all('c09')
94+
for componentLevel in componentLevelArray:
95+
extractValuesFromComponentLevel(componentLevel)
96+
componentLevelArray = componentLevel.find_all('c10')
97+
for componentLevel in componentLevelArray:
98+
extractValuesFromComponentLevel(componentLevel)

getAccessionUDFs.py

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
import json
2+
import requests
3+
import secrets
4+
import time
5+
import csv
6+
7+
startTime = time.time()
8+
9+
def findKey(d, key):
10+
if key in d:
11+
yield d[key]
12+
for k in d:
13+
if isinstance(d[k], list) and k == 'children':
14+
for i in d[k]:
15+
for j in findKey(i, key):
16+
yield j
17+
18+
baseURL = secrets.baseURL
19+
user = secrets.user
20+
password = secrets.password
21+
22+
auth = requests.post(baseURL + '/users/'+user+'/login?password='+password).json()
23+
session = auth["session"]
24+
headers = {'X-ArchivesSpace-Session':session, 'Content_Type':'application/json'}
25+
26+
endpoint = '/repositories/3/accessions?all_ids=true'
27+
28+
ids = requests.get(baseURL + endpoint, headers=headers).json()
29+
30+
udfs = []
31+
for id in ids:
32+
print id
33+
endpoint = '/repositories/3/accessions/'+str(id)
34+
output = requests.get(baseURL + endpoint, headers=headers).json()
35+
try:
36+
userDefined = output['user_defined']
37+
for k, v in userDefined.items():
38+
if k not in udfs:
39+
udfs.append(k)
40+
except:
41+
userDefined = ''
42+
udfs.sort()
43+
udfsHeader = ['title', 'uri'] + udfs
44+
f=csv.writer(open('accessionsUdfs.csv', 'wb'))
45+
f.writerow(udfsHeader)
46+
47+
for id in ids:
48+
print id
49+
endpoint = '/repositories/3/accessions/'+str(id)
50+
output = requests.get(baseURL + endpoint, headers=headers).json()
51+
title = output['title'].encode('utf-8')
52+
uri = output['uri']
53+
accessionUdfs = []
54+
for udf in udfs:
55+
try:
56+
keyValue = udf+'|'+output['user_defined'][udf].encode('utf-8')
57+
except:
58+
keyValue = udf+'|'
59+
accessionUdfs.append(keyValue)
60+
accessionUdfs.sort()
61+
accessionUdfsUpdated = []
62+
for accessionUdf in accessionUdfs:
63+
edited = accessionUdf[accessionUdf.index('|')+1:]
64+
accessionUdfsUpdated.append(edited)
65+
accessionUdfsRow = [title, uri] + accessionUdfsUpdated
66+
f.writerow(accessionUdfsRow)
67+
68+
elapsedTime = time.time() - startTime
69+
m, s = divmod(elapsedTime, 60)
70+
h, m = divmod(m, 60)
71+
print 'Total script run time: ', '%d:%02d:%02d' % (h, m, s)

getAccessions.py

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
import json
2+
import requests
3+
import secrets
4+
import time
5+
6+
startTime = time.time()
7+
8+
baseURL = secrets.baseURL
9+
user = secrets.user
10+
password = secrets.password
11+
12+
auth = requests.post(baseURL + '/users/'+user+'/login?password='+password).json()
13+
session = auth["session"]
14+
headers = {'X-ArchivesSpace-Session':session, 'Content_Type':'application/json'}
15+
print 'authenticated'
16+
17+
endpoint = '/repositories/3/accessions?all_ids=true'
18+
19+
ids = requests.get(baseURL + endpoint, headers=headers).json()
20+
21+
records = []
22+
for id in ids:
23+
endpoint = '/repositories/3/accessions/'+str(id)
24+
output = requests.get(baseURL + endpoint, headers=headers).json()
25+
records.append(output)
26+
27+
f=open('accessions.json', 'w')
28+
json.dump(records, f)
29+
f.close()
30+
31+
elapsedTime = time.time() - startTime
32+
m, s = divmod(elapsedTime, 60)
33+
h, m = divmod(m, 60)
34+
print 'Total script run time: ', '%d:%02d:%02d' % (h, m, s)

0 commit comments

Comments
 (0)