Extracting unique items from a list of mappings

He're an interesting problem that looks for the most Pythonic solution. Suppose I have a list of mappings {'id': id, 'url': url}. Some ids in the list are duplicate, and I want to create a new list, with all the duplicates removed. I came up with the following function:

def unique_mapping(map):
    d = {}
    for res in map:
        d[res['id']] = res['url']

    return [{'id': id, 'url': d[id]} for id in d]

I suppose it's quite efficient. But is there a "more Pythonic" way ? Or perhaps a more efficient way ?


Asked by: Maya194 | Posted: 05-10-2021






Answer 1

Your example can be rewritten slightly to construct the first dictionary using a generator expression and to remove necessity of construction of another mappings. Just reuse the old ones:

def unique_mapping(mappings):
    return dict((m['id'], m) for m in mappings).values()

Although this came out as a one-liner, I still think it's quite readable.

There are two things you have to keep in mind when using your original solution and mine:

  • the items will not always be returned in the same order they were originally
  • the later entry will overwrite previous entries with the same id

If you don't mind, then I suggest the solution above. In other case, this function preserves order and treats first-encountered ids with priority:

def unique_mapping(mappings):
    addedIds = set()
    for m in mappings:
        mId = m['id']
        if mId not in addedIds:
            addedIds.add(mId)
            yield m

You might need to call it with list(unique_mappings(mappings)) if you need a list and not a generator.

Answered by: Lydia488 | Posted: 06-11-2021



Answer 2

There are a couple of things you could improve.

  • You're performing two loops, one over the original dict, and then again over the result dict. You could build up your results in one step instead.

  • You could change to use a generator, to avoid constructing the whole list up-front. (Use list(unique_mapping(items)) to convert to a full list if you need it)

  • There's no need to store the value when just checking for duplicates, you can use a set instead.

  • You're recreating a dictionary for each element, rather than returning the original. This may actually be needed (eg. you're modifying them, and don't want to touch the original), but if not, its more efficient to use the dictionaries already created.

Here's an implementation:

def unique_mapping(items):
    s = set()
    for res in items:
        if res['id'] not in s:
            yield res
            s.add(res['id'])

Answered by: Catherine806 | Posted: 06-11-2021



Answer 3

I think this can be made simpler still. Dictionaries don't tolerate duplicate keys. Make your list of mappings into a dictionary of mappings. This will remove duplicates.

>>> someListOfDicts= [
    {'url': 'http://a', 'id': 'a'}, 
    {'url': 'http://b', 'id': 'b'}, 
    {'url': 'http://c', 'id': 'a'}]

>>> dict( [(x['id'],x) for x in someListOfDicts ] ).values()

[{'url': 'http://c', 'id': 'a'}, {'url': 'http://b', 'id': 'b'}]

Answered by: Briony469 | Posted: 06-11-2021



Similar questions

Extracting text from HTML file using Python

I'd like to extract the text from an HTML file using Python. I want essentially the same output I would get if I copied the text from a browser and pasted it into notepad. I'd like something more robust than using regular expressions that may fail on poorly formed HTML. I've seen many people recommend Beautiful Soup, but I've had a few problems using it. For one, it picked up unwanted text, such as JavaScript ...


python - Extracting Embedded Images From Outlook Email

I am using Microsoft's CDO (Collaboration Data Objects) to programmatically read mail from an Outlook mailbox and save embedded image attachments. I'm trying to do this from Python using the Win32 extensions, but samples in any language that uses CDO would be helpful. So far, I am here... The following Python code will read the last email in my mailbox, print the names of the attachments, and print the mes...


python - Extracting data from MS Word

I am looking for a way to extract / scrape data from Word files into a database. Our corporate procedures have Minutes of Meetings with clients documented in MS Word files, mostly due to history and inertia. I want to be able to pull the action items from these meeting minutes into a database so that we can access them from a web-interface, turn them into tasks and update them as they are completed. Which ...


python - Extracting info from large structured text files

I need to read some large files (from 50k to 100k lines), structured in groups separated by empty lines. Each group start at the same pattern "No.999999999 dd/mm/yyyy ZZZ". Here´s some sample data. No.813829461 16/09/1987 270 Tit.SUZANO PAPEL E CELULOSE S.A. (BR/BA) C.N.P.J./C.I.C./N INPI : 16404287000155 Procurador: MARCELLO DO NASCIMENTO No.815326777 28/12/1989 ...


Extracting a URL in Python

In regards to: Find Hyperlinks in Text using Python (twitter related) How can I extract just the url so I can put it into a list/array? Edit Let me clarify, I don't want to parse the URL into pieces. I want to extract the URL from the text of the string to put it into an array. Thank...


Extracting YouTube Video's author using Python and YouTubeAPI

how do I get the author/username from an object using: GetYouTubeVideoEntry(video_id=youtube_video_id_to_output) I'm using Google's gdata.youtube.service Python library Thanks in advance! :)


python - Extracting the To: header from an attachment of an email

I am using python to open an email on the server (POP3). Each email has an attachment which is a forwarded email itself. I need to get the "To:" address out of the attachment. I am using python to try and help me learn the language and I'm not that good yet ! The code I have already is this import poplib, email, mimetypes oPop = poplib.POP3( 'xx.xxx.xx.xx' ) oPop.user( 'a...


Extracting decimals from a number in Python

I am writing a function to extract decimals from a number. Ignore the exception and its syntax, I am working on 2.5.2 (default Leopard version). My function does not yet handle 0's. My issue is, the function produces random errors with certain numbers, and I don't understand the reason. I will post an error readout after the code. Function: def extractDecimals(num): try: if(num &gt...


Python: Extracting data from buffer with ctypes

I am able to successfully call a function with ctypes in Python. I now have a buffer that is filled with Structures of data I want to extract. What is the best strategy for this? Anything else I should post? Function: class list(): def __init__(self): #[...] def getdirentries(self, path): self.load_c() self.fd = os.open(path, os.O_RDONLY) self.statinfo = o...


Extracting text fields from HTML using Python?

what is the best way to extract data from this HTML file and put it into MySQL database with company phone number, company name and email with a primary key as phone number? </tr><tr class="tableRowOdd"> <td>"JSC company inc. 00" <email@email.com></td> <td>1231231234</td> </tr><tr class="tableRowEven"> ...






Still can't find your answer? Check out these communities...



PySlackers | Full Stack Python | NHS Python | Pythonist Cafe | Hacker Earth | Discord Python



top