How can i parse a comma delimited string into a list (caveat)?

I need to be able to take a string like:

'''foo, bar, "one, two", three four'''


['foo', 'bar', 'one, two', 'three four']

I have an feeling (with hints from #python) that the solution is going to involve the shlex module.

Asked by: First Name818 | Posted: 01-10-2021

Answer 1

It depends how complicated you want to get... do you want to allow more than one type of quoting. How about escaped quotes?

Your syntax looks very much like the common CSV file format, which is supported by the Python standard library:

import csv
reader = csv.reader(['''foo, bar, "one, two", three four'''], skipinitialspace=True)
for r in reader:
  print r


['foo', 'bar', 'one, two', 'three four']


Answered by: Clark389 | Posted: 02-11-2021

Answer 2

The shlex module solution allows escaped quotes, one quote escape another, and all fancy stuff shell supports.

>>> import shlex
>>> my_splitter = shlex.shlex('''foo, bar, "one, two", three four''', posix=True)
>>> my_splitter.whitespace += ','
>>> my_splitter.whitespace_split = True
>>> print list(my_splitter)
['foo', 'bar', 'one, two', 'three', 'four']

escaped quotes example:

>>> my_splitter = shlex.shlex('''"test, a",'foo,bar",baz',bar \xc3\xa4 baz''',
>>> my_splitter.whitespace = ',' ; my_splitter.whitespace_split = True 
>>> print list(my_splitter)
['test, a', 'foo,bar",baz', 'bar \xc3\xa4 baz']

Answered by: Vivian597 | Posted: 02-11-2021

Answer 3

You may also want to consider the csv module. I haven't tried it, but it looks like your input data is closer to CSV than to shell syntax (which is what shlex parses).

Answered by: Julia427 | Posted: 02-11-2021

Answer 4

You could do something like this:

>>> import re
>>> pattern = re.compile(r'\s*("[^"]*"|.*?)\s*,')
>>> def split(line):
...  return [x[1:-1] if x[:1] == x[-1:] == '"' else x
...          for x in pattern.findall(line.rstrip(',') + ',')]
>>> split("foo, bar, baz")
['foo', 'bar', 'baz']
>>> split('foo, bar, baz, "blub blah"')
['foo', 'bar', 'baz', 'blub blah']

Answered by: Kristian740 | Posted: 02-11-2021

Answer 5

I'd say a regular expression would be what you're looking for here, though I'm not terribly familiar with Python's Regex engine.

Assuming you use lazy matches, you can get a set of matches on a string which you can put into your array.

Answered by: Thomas756 | Posted: 02-11-2021

Answer 6

If it doesn't need to be pretty, this might get you on your way:

def f(s, splitifeven):
    if splitifeven & 1:
        return [s]
    return [x.strip() for x in s.split(",") if x.strip() != '']

ss = 'foo, bar, "one, two", three four'

print sum([f(s, sie) for sie, s in enumerate(ss.split('"'))], [])

Answered by: Max108 | Posted: 02-11-2021

Similar questions

python - How should I extract % delimited tags

I want to get the %tagname% from a file and copy them to a dictionary only tagname in python.

Python - Nested List to Tab Delimited File?

I have a nested list comprising ~30,000 sub-lists, each with three entries, e.g., nested_list = [['x', 'y', 'z'], ['a', 'b', 'c']]. I wish to create a function in order to output this data construct into a tab delimited format, e.g., x y z a b c Any help greatly appreciated! Thanks in advance, Seafoid.

python - Fastest way to generate delimited string from 1d numpy array

I have a program which needs to turn many large one-dimensional numpy arrays of floats into delimited strings. I am finding this operation quite slow relative to the mathematical operations in my program and am wondering if there is a way to speed it up. For example, consider the following loop, which takes 100,000 random numbers in a numpy array and joins each array into a comma-delimited string. import nu...

Python: Indexing a file that is tab delimited

I have a text file that is tab delimited and looks like: 1_0 NP_045689 100.00 279 0 0 18 296 18 296 3e-156 539 1_0 NP_045688 54.83 259 108 6 45 296 17 273 2e-61 224 I need to parse out specific columns such as column 2. I've tried with the code below: z = open('

python - How to parse data in a variable length delimited file?

I have a text file which does not confirm to standards. So I know the (end,start) positions of each column value. Sample text file : # # # # Techy Inn Val NJ Found the position of # using this code : 1 f = open('sample.txt', 'r') 2 i = 0 3 positions = [] 4 for line in f: 5 if line.find('#') > 0: 6 print line 7 for each i...

python - How to simply read in input from stdin delimited by space or spaces

Hello I'm a trying to learn python, In C++ to read in string from stdin I simply do string str; while (cin>>str) do_something(str) but in python, I have to use line = raw_input() then x = line.split() then I have to loop through the list x to access each str to do_something(str) this seems li...

python - Best way to iterate through entries delimited by two keywords?

Text File contents: &CRB A='test1' B=123,345, 678 &END Misc text potentially between entries ... &CRB A='test2' B=788, 345, 3424 &END &CRB A='test3' B=788, 345, 3424 &END &CRB A='test4' B=788, 345, 3424 &END What is the most efficient way to iterate through the entries between the keywords? Note that some entrie...

Python: Parsing a colon delimited file with various counts of fields

I'm trying to parse a a few files with the following format in 'clientname'.txt hostname:comp1 time: Fri Jan 28 20:00:02 GMT 2011 fs:good:45 memory:bad:78 swap:good:34 Mail:good Each section is delimited by a : but where lines 0,2,6 have 2 fields... lines 1,3-5 have 3 or more fields. (A big issue I've had trouble with is the time: line, since 20:00:02 is really a time and ...

Python: convert camel case to space delimited using RegEx and taking Acronyms into account

I am trying to convert camel case to space separated values using python. For example: divLineColor -> div Line Color This line does that successfully: label = re.sub("([A-Z])"," \g<0>",label) The problem I am having is with things like simpleBigURL they should do this: simpleBigURL -> simple Big URL

Parsing a Comma Delimited File With Python and adding currency fields

I'm trying to use Python to parse a comma delimited file with a layout similar to this: AccountNumber,Invoice_Number,Gross_Amt,Adjustments,TotalDue "234","56787","19.37",,"19.37" "234","56788","204.76","-10.00","194.76" "234","56789","139.77",,"139.77" "567","12543","44.89","30.00","74.89" What I want to accomplish is to total gross amount, adjustments, and Total Due, then add them on to ...

Still can't find your answer? Check out these communities...

PySlackers | Full Stack Python | NHS Python | Pythonist Cafe | Hacker Earth | Discord Python