Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between copy.copy and dataclasses.replace

To my understanding, dataclasses.replace(x) does the same thing as copy.copy(x), except it only works if x is a dataclass, and offers the ability to replace members. However I've also noticed it's about 3x faster. I'm curious now why copy would be so much slower, and if there are other differences between the two functions that should be considered.

import dataclasses
import time
import copy

@dataclasses.dataclass()
class X:
    x=1
    y=1
    z=1

x = X()

start = time.perf_counter()
for _ in range(100000):
    a = dataclasses.replace(x)
t1 = time.perf_counter() - start

start = time.perf_counter()
for _ in range(100000):
    a = copy.copy(x)
t2 = time.perf_counter() - start

print(t1)  # 0.4
print(t2)  # 1.2
like image 571
Dax Fohl Avatar asked May 04 '26 08:05

Dax Fohl


1 Answers

As Chris_Rands alluded to in their comment, copy.copy has quite a bit of extra logic needed to handle arbitrary Python objects—this extra logic likely accounts for the difference in speed. In contrast, dataclasses.replace can get away with only making a few checks since the function only needs to work for dataclasses. You can see how much simpler dataclasses.replace is than copy.copy (and the functions it calls) in the source code for dataclasses.py and copy.py.

If you look at the source for copy.copy, you will notice that the code that copies an X object boils down to this.

def fastcopy(x):
    red = getattr(x,"__reduce_ex__")(4)
    return red[0](*red[1])

Without the extra checks, this fastcopy function seems to perform comparably to dataclasses.replace. The full code I tested with is below, along with the times I got.

import dataclasses
import time
import copy

def fastcopy(x):
    red = getattr(x,"__reduce_ex__")(4)
    return red[0](*red[1])

@dataclasses.dataclass()
class X:
    x=1
    y=1
    z=1

x = X()

start = time.perf_counter()
for _ in range(100000):
    a = dataclasses.replace(x)
t1 = time.perf_counter() - start

start = time.perf_counter()
for _ in range(100000):
    a = fastcopy(x)
t2 = time.perf_counter() - start

print(t1) # 0.1
print(t2) # 0.1
like image 60
fakedad Avatar answered May 08 '26 21:05

fakedad