Monday, September 30, 2019

python - How do you remove duplicates from a list whilst preserving order?



Is there a built-in that removes duplicates from list in Python, whilst preserving order? I know that I can use a set to remove duplicates, but that destroys the original order. I also know that I can roll my own like this:




def uniq(input):
output = []
for x in input:
if x not in output:
output.append(x)
return output


(Thanks to unwind for that code sample.)




But I'd like to avail myself of a built-in or a more Pythonic idiom if possible.



Related question: In Python, what is the fastest algorithm for removing duplicates from a list so that all elements are unique while preserving order?


Answer



Here you have some alternatives: http://www.peterbe.com/plog/uniqifiers-benchmark



Fastest one:



def f7(seq):

seen = set()
seen_add = seen.add
return [x for x in seq if not (x in seen or seen_add(x))]


Why assign seen.add to seen_add instead of just calling seen.add? Python is a dynamic language, and resolving seen.add each iteration is more costly than resolving a local variable. seen.add could have changed between iterations, and the runtime isn't smart enough to rule that out. To play it safe, it has to check the object each time.



If you plan on using this function a lot on the same dataset, perhaps you would be better off with an ordered set: http://code.activestate.com/recipes/528878/



O(1) insertion, deletion and member-check per operation.




(Small additional note: seen.add() always returns None, so the or above is there only as a way to attempt a set update, and not as an integral part of the logical test.)


No comments:

Post a Comment

hard drive - Leaving bad sectors in unformatted partition?

Laptop was acting really weird, and copy and seek times were really slow, so I decided to scan the hard drive surface. I have a couple hundr...