Skip to content

Commit 5ecfb7a

Browse files
committed
new CSV scripts added
1 parent 20924c9 commit 5ecfb7a

File tree

3 files changed

+73
-0
lines changed

3 files changed

+73
-0
lines changed

README.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,9 @@
11
# text-audio-image-processing
22
Scripts for processing audio and images
33

4+
#### [combineCSVs.py](combineCSVs.py)
5+
Based on user input, combines several CSV file that have the same specified base file name followed by sequential numbers (e.g 'report1.csv', 'report2.csv', 'report3.csv'). A file name suffix may added if one exists after the file number (e.g. 'report1edited.csv', 'report2edited.csv', 'report3edited.csv'). Used to combine files split by [splitCSV.py](splitCSV.py).
6+
47
#### [dhashImageComparison.py](dhashImageComparison.py)
58
Based on user input, creates dhashes for all of the image files in the specified directory and then compares them using a BK-tree and creates a CSV file of all dhashes matches that are below the specified threshold (e.g. '40' means the dhashes are 40% different and 60% similar).
69

@@ -13,6 +16,9 @@ Based on a specified file and a specified threshold (e.g. '90' means the strings
1316
#### [stringComparisonFromCSVOldAndNew.py](stringComparisonFromCSVOldAndNew.py)
1417
Based on a specified files of new and old strings and a specified threshold (e.g. '90' means the strings are 90% similar and 10% different), compares each string against every other string in the new strings file, identifies all strings with a similarity above the specified threshold, and prints it to a new CSV file. It also compares each string to a CSV file of old strings that have previously received an authorized form (e.g. adding new name headings to an existing authority file)
1518

19+
#### [splitCSV.py](splitCSV.py)
20+
Based on user input, splits the specified CSV file into separate CSV files with specified number of rows. The header row is repeated in each new file. The files can be combined later with [combineCSVs.py](combineCSVs.py).
21+
1622
#### [transcribeAudioFile.py](transcribeAudioFile.py)
1723
Generates a rough, unformatted transcript of a specified MP3 using the free Wit ([https://wit.ai/](https://wit.ai/)) speech-to-text API. The script requires a secrets.py file in the same directory that must contain the following text:
1824
```

combineCSVs.py

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
import csv
2+
import argparse
3+
from pathlib import Path
4+
5+
parser = argparse.ArgumentParser()
6+
parser.add_argument('-f', '--baseFileName', help='the base file name for the files to be combined. optional - if not provided, the script will ask for input')
7+
parser.add_argument('-s', '--suffix', help='the suffix that exists after the file number in the file name. optional - if not provided, the script will ask for input')
8+
args = parser.parse_args()
9+
10+
if args.baseFileName:
11+
baseFileName = args.baseFileName
12+
else:
13+
baseFileName = raw_input('Enter the base file name for the files to be combined: ')
14+
if args.suffix:
15+
suffix = args.suffix
16+
else:
17+
suffix = raw_input('Enter the suffix that exists after the file number in the file name: ')
18+
19+
f = csv.writer(open(baseFileName + 'Combined.csv', 'wb'))
20+
21+
fileNum = 1
22+
file = baseFileName +str(fileNum) + suffix + '.csv'
23+
24+
while Path(file).is_file() == True:
25+
print file
26+
with open(file) as csvfile:
27+
reader = csv.reader(csvfile)
28+
if fileNum > 1:
29+
reader.next()
30+
for row in reader:
31+
f.writerow(row)
32+
fileNum += 1
33+
file = baseFileName +str(fileNum) + suffix + '.csv'

splitCSV.py

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
import csv
2+
import argparse
3+
4+
parser = argparse.ArgumentParser()
5+
parser.add_argument('-f', '--file', help='the CSV file to split. optional - if not provided, the script will ask for input')
6+
parser.add_argument('-n', '--num', help='the number of rows to include in each file. optional - if not provided, the script will ask for input')
7+
args = parser.parse_args()
8+
9+
if args.file:
10+
file = args.file
11+
else:
12+
file = raw_input('Enter the CSV file to split: ')
13+
if args.num:
14+
num = args.num
15+
else:
16+
num = raw_input('Enter the number of rows to include in each file: ')
17+
18+
with open(file) as csvfile:
19+
reader = csv.DictReader(csvfile)
20+
header = reader.fieldnames
21+
22+
baseFileName = file.replace('.csv','')
23+
num = int(num)
24+
25+
csvfile = open(file).readlines()
26+
filenum = 1
27+
for i in range(len(csvfile)):
28+
if i % num == 0:
29+
f = open(baseFileName + str(filenum) + '.csv', 'wb')
30+
if filenum != 1:
31+
f.write(str(header).replace('[','').replace(']','').replace('\'',''))
32+
f.write('\n')
33+
f.writelines(csvfile[i:i+num])
34+
filenum += 1

0 commit comments

Comments
 (0)