Call for programmers

The_Traveler

Completely Counter-dependent
Supporting Member
Joined
Dec 11, 2006
Messages
18,743
Reaction score
8,047
Location
Mid-Atlantic US
Website
www.lewlortonphoto.com
Can others edit my Photos
Photos NOT OK to edit
I have a small problem that I'd like to fix easily.

I download bulk metadata from Zenfolio in a cvs file in a slightly non-standard format.
The header row has no double quotes and the first column has no quotes.
I edit the file in a spreadsheet (in my case OpenOffice Calc) as suggested.
The output from Calc has quotes around the header row and ever text field output
I can clearly fix the header line with a paste but I'd like to be able to run this through a little application to fix everything easily without having to use a text editor (invariably I miss one or delete a comma I do need and it takes me forever to find the error)

What simple programming language would you suggest I can learn easily just to do this?

(I don't mind the learning, that would be fun compared to editing that metadata file with a text editor.)

TIA,
 
I am an old Unix guy. Ok, I am actually an old VMS guy, but Unix won twenty years ago, so, I adapted.

Windows? I'd install cygwin (free) and use either sed or awk. On a Mac I'd just open a Terminal window and do the same.
 
Nice to see I'm not the only "old VMS guy" around.....
 
I'm a linux guy and I think the learning curve on cygwin + awk or sed + a script is probably not worth the effort unless you already know awk or sed + scripting.

There might be a way to program this in open office basic... otherwise you may want to source edit open office. This may be possible in either java or if you're really unlucky C++ because then you'd need a windows compatible toolchain. If it can't be accomplished in OO basic maybe javascript or python would be better.

I can take a look at this over the weekend. Any chance you can post a link (dropbox or something) to a minimal working example of a file with this formatting?
 
I can send you an actual example, no problem.

To install unix to learn awk to fix this problem seems a bit like signing up for medical school and a surgery residency to take out a splinter.
It might appeal to some people; to me, not so much.
but thanks
 
Installing cygwin is not installing unix. It is installed a package of unix tools. It's no harder than installing any other application on Windows. And the tools I suggested are specifically designed for this sort of task.

All that "open the file, now read a line out of it, ..., and now write it out" stuff goes away with sed or awk, making the problem much simpler in many respects,
 
OK if understand you correctly you want to remove all the quotes in the file, if so have you tried using Find and replace? So in Calc click Edit--> Find & Replace next in the dialog box add " in the search for box next leave the replace with box empty and click Replace All. ;) Magic all the " are gone.
 
Installing cygwin is not installing unix. It is installed a package of unix tools. It's no harder than installing any other application on Windows. And the tools I suggested are specifically designed for this sort of task.

All that "open the file, now read a line out of it, ..., and now write it out" stuff goes away with sed or awk, making the problem much simpler in many respects,
Thanks for correcting me on cygwin. I don't use windows so i don't know much about that world. :)

if you are fluent in sed and awk of course it's worth your time. pretend not, how do you feel about it then?
 
Tough call, honestly. Either sed or awk can solve this problem in literally seconds. These are exactly the kinds of problems these tools solve.

Lew's problem is simply stated but weirdly difficult to solve without some sort of relatively sophisticated matching and replacing. Assuming I am understanding it right. The canonical solution here is regular expressions (worth checking to see if your text editor will do regular expression based find/replace, actually).

The first thing I would do, probably, is see if OpenOffice has some export options to not botch up the output. In the absence of that, there's gonna be some sort of learning curve. It really comes down to what's going to be more useful for you in the long run. If small programs to solve various and sundry little problems is in your future, might as well learn some Visual Basic, and figure out what things there are to help you crunch away of CSV files. If your future is more specific, and involves pretty specifically bulk editing/processing/crunching CSV files and similar sorts of line-by-line text processing, then there simply are no tools better than the cygwin ones.

If neither, well, it's just a matter of taste.
 
Python is not too bad to work with. I prefer COBOL, but there aren't many inexpensive compilers for Windows.
 
I would just copy all the data from column 1-7/8 and paste it in a new sheet do the Find and Replace recopy and paste the data with no quotes Back to the original sheet.
 
I can do that, I'm just trying to avoid hand work since I have to do this 20 or 30 times with files of differing lengths
My goal (or hope) is just to be able to feed the incorrectly formatted file into the waiting maw of a exe file and have that spit out the file I need.
The work will be tedious enough and I'd like to skip as much tedium as possible.
 
If I understand correctly, you need to remove the quotes from around the FIRST column only of the CSV file?

If so, and if the first column never contains any double quotes itself (if it does, you're in trouble, since OpenOffice does not seem to do the right thing, then:

sed 's/^"\([^"]*\)",/\1,/' < original.csv > new.csv

in a "Cygwin bash shell" (which is basically the same as a command prompt, but more rational about a few things) will do it.

If there ARE potentially double quotes in the first column, but never any commas:

sed 's/^"\([^,]*\)",/\1,/' < original.csv > new.csv

will work. If you have both double quotes AND commas in the first column, you're hooped, since Calc is going to do it wrong anyways.

This is a programming language, of sorts. Regular expressions are equivalent to state machines, which are a (large) subset of everything that can be computed.
 
I find an advanced text editor (ie UltraEdit, PFEdit, Notepad++[free] etc) are indispensable for quick text edits like this. Just record a macro (like in excel) by editing the first line properly, then re-run the macro x times to apply it to the remaining lines. Alternatively, most of the editors allow you to select your 'column' the only apply your find/replace to that selected text...which would probably work for the situation you describe.
 
Last edited:

Most reactions

New Topics

Back
Top