strtoull (C++11) |
strtodstrtold (C++11) |
strtouimax (C++11) |
| | |
* memcpy( void* dest, const void* src, count ); | | |
| | |
Copies count bytes from the object pointed to by src to the object pointed to by dest . Both objects are reinterpreted as arrays of unsigned char .
If the objects overlap, the behavior is undefined.
If either dest or src is an invalid or null pointer , the behavior is undefined, even if count is zero.
If the objects are potentially-overlapping or not TriviallyCopyable , the behavior of memcpy is not specified and may be undefined .
Parameters Return value Notes Example See also |
[ edit ] Parameters
dest | - | pointer to the memory location to copy to |
src | - | pointer to the memory location to copy from |
count | - | number of bytes to copy |
[ edit ] Return value
[ edit ] notes.
std::memcpy may be used to implicitly create objects in the destination buffer.
std::memcpy is meant to be the fastest library routine for memory-to-memory copy. It is usually more efficient than std::strcpy , which must scan the data it copies or std::memmove , which must take precautions to handle overlapping inputs.
Several C++ compilers transform suitable memory-copying loops to std::memcpy calls.
Where strict aliasing prohibits examining the same memory as values of two different types, std::memcpy may be used to convert the values.
[ edit ] Example
[ edit ] see also.
| moves one buffer to another (function) |
| fills a buffer with a character (function) |
| copies a certain amount of wide characters between two non-overlapping arrays (function) |
| copies characters (public member function of ) |
copy_if (C++11) | copies a range of elements to a new location (function template) |
| copies a range of elements in backwards order (function template) |
| checks if a type is trivially copyable (class template) |
for memcpy |
- Recent changes
- Offline version
- What links here
- Related changes
- Upload file
- Special pages
- Printable version
- Permanent link
- Page information
- In other languages
- This page was last modified on 25 October 2023, at 09:01.
- This page has been accessed 1,962,862 times.
- Privacy policy
- About cppreference.com
- Disclaimers
- Stack Overflow for Teams Where developers & technologists share private knowledge with coworkers
- Advertising & Talent Reach devs & technologists worldwide about your product, service or employer brand
- OverflowAI GenAI features for Teams
- OverflowAPI Train & fine-tune LLMs
- Labs The future of collective knowledge sharing
- About the company Visit the blog
Collectives™ on Stack Overflow
Find centralized, trusted content and collaborate around the technologies you use most.
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Get early access and see previews of new features.
Is it better to use std::memcpy() or std::copy() in terms to performance?
Is it better to use std::memcpy() as shown below, or is it better to use std::copy() in terms to performance? Why?
- Note that char can be signed or unsigned, depending on the implementation. If the number of bytes can be >= 128, then use unsigned char for your byte arrays. (The (int *) cast would be safer as (unsigned int *) , too.) – Dan Breslau Commented Jan 16, 2011 at 18:03
- 14 Why aren't you using std::vector<char> ? Or since you say bits , std::bitset ? – GManNickG Commented Jan 16, 2011 at 19:20
- 2 Actually, could you please explain to me what (int*) copyMe->bits[0] does? – user2138149 Commented Aug 25, 2015 at 21:38
- 4 not sure why something that seems like such a mess with so little vital context provided was at +81, but hey. @user3728501 my guess is that the start of the buffer holds an int dictating its size, but that seems like a recipe for implementation-defined disaster, like so many other things here. – underscore_d Commented Apr 12, 2016 at 19:22
- 3 In fact, that (int *) cast is just pure undefined behaviour, not implementation-defined. Trying to do type-punning via a cast violates strict aliasing rules and hence is totally undefined by the Standard. (Also, in C++ although not C, you can't type-pun via a union either.) Pretty much the only exception is if you're converting to a variant of char* , but the allowance is not symmetrical. – underscore_d Commented Mar 19, 2017 at 19:07
8 Answers 8
I'm going to go against the general wisdom here that std::copy will have a slight, almost imperceptible performance loss. I just did a test and found that to be untrue: I did notice a performance difference. However, the winner was std::copy .
I wrote a C++ SHA-2 implementation. In my test, I hash 5 strings using all four SHA-2 versions (224, 256, 384, 512), and I loop 300 times. I measure times using Boost.timer. That 300 loop counter is enough to completely stabilize my results. I ran the test 5 times each, alternating between the memcpy version and the std::copy version. My code takes advantage of grabbing data in as large of chunks as possible (many other implementations operate with char / char * , whereas I operate with T / T * (where T is the largest type in the user's implementation that has correct overflow behavior), so fast memory access on the largest types I can is central to the performance of my algorithm. These are my results:
Time (in seconds) to complete run of SHA-2 tests
Total average increase in speed of std::copy over memcpy: 2.99%
My compiler is gcc 4.6.3 on Fedora 16 x86_64. My optimization flags are -Ofast -march=native -funsafe-loop-optimizations .
Code for my SHA-2 implementations.
I decided to run a test on my MD5 implementation as well. The results were much less stable, so I decided to do 10 runs. However, after my first few attempts, I got results that varied wildly from one run to the next, so I'm guessing there was some sort of OS activity going on. I decided to start over.
Same compiler settings and flags. There is only one version of MD5, and it's faster than SHA-2, so I did 3000 loops on a similar set of 5 test strings.
These are my final 10 results:
Time (in seconds) to complete run of MD5 tests
Total average decrease in speed of std::copy over memcpy: 0.11%
Code for my MD5 implementation
These results suggest that there is some optimization that std::copy used in my SHA-2 tests that std::copy could not use in my MD5 tests. In the SHA-2 tests, both arrays were created in the same function that called std::copy / memcpy . In my MD5 tests, one of the arrays was passed in to the function as a function parameter.
I did a little bit more testing to see what I could do to make std::copy faster again. The answer turned out to be simple: turn on link time optimization. These are my results with LTO turned on (option -flto in gcc):
Time (in seconds) to complete run of MD5 tests with -flto
Total average increase in speed of std::copy over memcpy: 0.72%
In summary, there does not appear to be a performance penalty for using std::copy . In fact, there appears to be a performance gain.
Explanation of results
So why might std::copy give a performance boost?
First, I would not expect it to be slower for any implementation, as long as the optimization of inlining is turned on. All compilers inline aggressively; it is possibly the most important optimization because it enables so many other optimizations. std::copy can (and I suspect all real world implementations do) detect that the arguments are trivially copyable and that memory is laid out sequentially. This means that in the worst case, when memcpy is legal, std::copy should perform no worse. The trivial implementation of std::copy that defers to memcpy should meet your compiler's criteria of "always inline this when optimizing for speed or size".
However, std::copy also keeps more of its information. When you call std::copy , the function keeps the types intact. memcpy operates on void * , which discards almost all useful information. For instance, if I pass in an array of std::uint64_t , the compiler or library implementer may be able to take advantage of 64-bit alignment with std::copy , but it may be more difficult to do so with memcpy . Many implementations of algorithms like this work by first working on the unaligned portion at the start of the range, then the aligned portion, then the unaligned portion at the end. If it is all guaranteed to be aligned, then the code becomes simpler and faster, and easier for the branch predictor in your processor to get correct.
Premature optimization?
std::copy is in an interesting position. I expect it to never be slower than memcpy and sometimes faster with any modern optimizing compiler. Moreover, anything that you can memcpy , you can std::copy . memcpy does not allow any overlap in the buffers, whereas std::copy supports overlap in one direction (with std::copy_backward for the other direction of overlap). memcpy only works on pointers, std::copy works on any iterators ( std::map , std::vector , std::deque , or my own custom type). In other words, you should just use std::copy when you need to copy chunks of data around.
- 53 I want to emphasize that this doesn't mean that std::copy is 2.99% or 0.72% or -0.11% faster than memcpy , these times are for the entire program to execute. However, I generally feel that benchmarks in real code are more useful than benchmarks in fake code. My entire program got that change in execution speed. The real effects of just the two copying schemes will have greater differences than shown here when taken in isolation, but this shows that they can have measurable differences in actual code. – David Stone Commented Apr 3, 2012 at 17:31
- 3 I want to disagree with your findings, but results are results :/. However one question (I know it was a long time ago and you don't remember research, so just comment the way you think), you probably didn't look into assembly code; – ST3 Commented Jan 6, 2015 at 9:17
- 2 In my opinion memcpy and std::copy has different implementations, so in some cases compiler optimizes surrounding code and actual memory copy code as a one integral piece of code. It other words sometimes one is better then another and even in other words, deciding which to uses is premature or even stupid optimization, because in every situation you have to do new research and, what is more, programs are usually being developed, so after some minor changes advantage of function over other may be lost. – ST3 Commented Jan 6, 2015 at 9:18
- 4 @ST3: I would imagine that in the worst case, std::copy is a trivial inline function that just calls memcpy when it is legal. Basic inlining would eliminate any negative performance difference. I will update the post with a bit of an explanation of why std::copy might be faster. – David Stone Commented Jan 8, 2015 at 2:21
- 12 Very informative analysis. Re Total average decrease in speed of std::copy over memcpy: 0.11% , whilst the number is correct, the results aren't statistically significant. A 95% confidence interval for the difference in means is (-0.013s, 0.025), which includes zero. As you pointed out there was variation from other sources and with your data, you'd probably say the performance is the same. For reference, the other two results are statistically significant -- the chances you'd see a difference in times this extreme by chance are about 1 in 100 million (first) and 1 in 20,000 (last). – TooTone Commented Mar 11, 2016 at 13:24
All compilers I know will replace a simple std::copy with a memcpy when it is appropriate, or even better, vectorize the copy so that it would be even faster than a memcpy .
In any case: profile and find out yourself. Different compilers will do different things, and it's quite possible it won't do exactly what you ask.
See this presentation on compiler optimisations (pdf).
Here's what GCC does for a simple std::copy of a POD type.
Here's the disassembly (with only -O optimisation), showing the call to memmove :
If you change the function signature to
then the memmove becomes a memcpy for a slight performance improvement. Note that memcpy itself will be heavily vectorised.
- 2 How can I do profiling. What tool to use (in windows and linux)? – user576670 Commented Jan 16, 2011 at 18:00
- 7 @Konrad, you're correct. But memmove shouldn't be faster - rather, it should be slighter slower because it has to take into account the possibility that the two data ranges overlap. I think std::copy permits overlapping data, and so it has to call memmove . – Charles Salvia Commented Jan 16, 2011 at 18:04
- 2 @Konrad: If memmove was always faster than memcpy, then memcpy would call memmove. What std::copy actually might dispatch to (if anything) is implementation-defined, so it's not useful to mention specifics without mentioning implementation. – Fred Nurk Commented Jan 16, 2011 at 18:04
- 1 Although, a simple program to reproduce this behavior, compiled with -O3 under GCC shows me a memcpy . It leads me to believe GCC checks whether there's memory overlap. – jweyrich Commented Jan 16, 2011 at 18:31
- 1 @Konrad: standard std::copy allows overlap in one direction but not the other. The beginning of the output can't lie within the input range, but the beginning of the input is allowed to lie within the output range. This is a little odd, because the order of assignments is defined, and a call might be UB even though the effect of those assignments, in that order, is defined. But I suppose the restriction allows vectorization optimizations. – Steve Jessop Commented Jan 16, 2011 at 20:58
Always use std::copy because memcpy is limited to only C-style POD structures, and the compiler will likely replace calls to std::copy with memcpy if the targets are in fact POD.
Plus, std::copy can be used with many iterator types, not just pointers. std::copy is more flexible for no performance loss and is the clear winner.
- Why should you wanna copy around iterators? – Atmocreations Commented Oct 14, 2011 at 15:41
- 3 You're not copying the iterators, but rather the range defined by two iterators. For instance, std::copy(container.begin(), container.end(), destination); will copy the contents of container (everything between begin and end ) into the buffer indicated by destination . std::copy doesn't require shenanigans like &*container.begin() or &container.back() + 1 . – David Stone Commented Apr 26, 2012 at 17:13
In theory, memcpy might have a slight , imperceptible , infinitesimal , performance advantage, only because it doesn't have the same requirements as std::copy . From the man page of memcpy :
To avoid overflows, the size of the arrays pointed by both the destination and source parameters, shall be at least num bytes, and should not overlap (for overlapping memory blocks, memmove is a safer approach).
In other words, memcpy can ignore the possibility of overlapping data. (Passing overlapping arrays to memcpy is undefined behavior.) So memcpy doesn't need to explicitly check for this condition, whereas std::copy can be used as long as the OutputIterator parameter is not in the source range. Note this is not the same as saying that the source range and destination range can't overlap.
So since std::copy has somewhat different requirements, in theory it should be slightly (with an extreme emphasis on slightly ) slower, since it probably will check for overlapping C-arrays, or else delegate the copying of C-arrays to memmove , which needs to perform the check. But in practice, you (and most profilers) probably won't even detect any difference.
Of course, if you're not working with PODs , you can't use memcpy anyway.
- 7 This is true for std::copy<char> . But std::copy<int> can assume that its inputs are int-aligned. That will make a far bigger difference, because it affects every element. Overlap is a one-time check. – MSalters Commented Jan 17, 2011 at 8:39
- 2 @MSalters, true, but most implementations of memcpy I've seen check for alignment and attempt to copy words rather than byte by byte. – Charles Salvia Commented Apr 28, 2012 at 8:13
- 1 std::copy() can ignore overlapping memory, too. If you want to support overlapping memory, you have to write the logic yourself to call std::reverse_copy() in the appropriate situations. – Cygon Commented Jun 6, 2012 at 11:23
- 2 There is an opposite argument that can be made: when going through memcpy interface it loses the alignment information. Hence, memcpy has to do alignment checks at run-time to handle unaligned beginnings and ends. Those checks may be cheap but they are not free. Whereas std::copy can avoid these checks and vectorize. Also, the compiler may prove that source and destination arrays do not overlap and again vectorize without the user having to choose between memcpy and memmove . – Maxim Egorushkin Commented Jan 12, 2016 at 15:42
My rule is simple. If you are using C++ prefer C++ libraries and not C :)
- 47 C++ was explicitly designed to allow using C libraries. This was not an accident. It is often better to use std::copy than memcpy in C++, but this has nothing to do with which one is C, and that kind of argument is usually the wrong approach. – Fred Nurk Commented Jan 16, 2011 at 18:06
- 4 @FredNurk Usually you want to avoid weak area of C where C++ provide a safer alternative. – Phil1970 Commented Apr 18, 2017 at 23:13
- @Phil1970 I'm not sure that C++ is much safer in this case. We still have to pass valid iterators that don't overrun, etc. I guess being able to use std::end(c_arr) instead of c_arr + i_hope_this_is_the_right_number_of elements is safer? and perhaps more importantly, clearer. And that'd be the point I emphasise in this specific case: std::copy() is more idiomatic, more maintainable if the types of the iterators changes later, leads to clearer syntax, etc. – underscore_d Commented Jan 2, 2018 at 18:08
- 2 @underscore_d std::copy is safer because it correctly copies the passed data in case they are not POD-types. memcpy will happily copy a std::string object to a new representation byte by byte. – Jens Commented Jan 17, 2019 at 9:05
Just a minor addition: The speed difference between memcpy() and std::copy() can vary quite a bit depending on if optimizations are enabled or disabled. With g++ 6.2.0 and without optimizations memcpy() clearly wins:
When optimizations are enabled ( -O3 ), everything looks pretty much the same again:
The bigger the array the less noticeable the effect gets, but even at N=1000 memcpy() is about twice as fast when optimizations aren't enabled.
Source code (requires Google Benchmark):
- 22 Measuring performance with optimizations disabled is... well... pretty much pointless... If you are interested in performance you won't compile without optimizations. – bolov Commented Oct 18, 2016 at 13:32
- 6 @bolov Not always. A relatively fast program under debug is in some cases important to have. – Acorn Commented Jul 11, 2019 at 15:25
- @bolov I used to think the same, but actually games running in debug mode can be heavily impacted by this. Well, maybe there are other solutions like inlining in debug mode... but that is a use case already. – Germán Diago Commented Mar 31, 2021 at 12:08
- Measuring speed without optimization? It is like trying to time muscle cars' speed while they are being pushed by hands. – stackoverblown Commented May 20 at 15:01
If you really need maximum copying performance (which you might not), use neither of them .
There's a lot that can be done to optimize memory copying - even more if you're willing to use multiple threads/cores for it. See, for example:
What's missing/sub-optimal in this memcpy implementation?
both the question and some of the answers have suggested implementations or links to implementations.
- 7 pedant mode: with the usual caveat that " use neither of them " means if you have proven that you have a highly specific situation/requirement for which neither Standard function provided by your implementation is fast enough ; otherwise, my usual concern is that people who haven't proven that get sidetracked on prematurely optimising copying code instead of the usually more useful parts of their program. – underscore_d Commented Jan 2, 2018 at 18:22
Profiling shows that statement: std::copy() is always as fast as memcpy() or faster is false.
HP-Compaq-dx7500-Microtower 3.13.0-24-generic #47-Ubuntu SMP Fri May 2 23:30:00 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux. gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2
The code (language: c++):
g++ -O0 -o test_stdcopy test_stdcopy.cpp memcpy() profile: main:21: now:1422969084:04859 elapsed:2650 us std::copy() profile: main:27: now:1422969084:04862 elapsed:2745 us memcpy() elapsed 44 s std::copy() elapsed 45 s g++ -O3 -o test_stdcopy test_stdcopy.cpp memcpy() profile: main:21: now:1422969601:04939 elapsed:2385 us std::copy() profile: main:28: now:1422969601:04941 elapsed:2690 us memcpy() elapsed 27 s std::copy() elapsed 43 s
Red Alert pointed out that the code uses memcpy from array to array and std::copy from array to vector. That coud be a reason for faster memcpy.
Since there is
v.reserve(sizeof(arr1));
there shall be no difference in copy to vector or array.
The code is fixed to use array for both cases. memcpy still faster:
- 2 wrong, your profiling shows that copying into an array is faster than copying into a vector. Off topic. – Red Alert Commented Feb 13, 2015 at 1:58
- I could be wrong, but in your corrected example, with memcpy, aren't you copying arr2 into arr1, while with std::copy, you are copying arr1 into arr2?... What you could do is to make multiple, alternating experiments (once a batch of memcpy, once a batch of std::copy, then back again with memcopy, etc., multiple times.). Then, I would use clock() instead of time(), because who knows what your PC could be doing in addition to that program. Just my two cents, though... :-) – paercebal Commented Apr 17, 2015 at 8:55
- 9 So, switching std::copy from a vector to an array somehow made memcpy take nearly twice as long? This data is highly suspect. I compiled your code using gcc with -O3, and the generated assembly is the same for both loops. So any difference in time you observe on your machine is only incidental. – Red Alert Commented May 6, 2015 at 0:46
Your Answer
Reminder: Answers generated by artificial intelligence tools are not allowed on Stack Overflow. Learn more
Sign up or log in
Post as a guest.
Required, but never shown
By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy .
Not the answer you're looking for? Browse other questions tagged c++ performance or ask your own question .
- The Overflow Blog
- The evolution of full stack engineers
- One of the best ways to get value for AI coding tools: generating tests
- Featured on Meta
- Join Stack Overflow’s CEO and me for the first Stack IRL Community Event in...
- User activation: Learnings and opportunities
- Staging Ground Reviewer Motivation
- What does a new user need in a homepage experience on Stack Overflow?
Hot Network Questions
- Is the closest diagonal state to a given state always the dephased original state?
- What's wrong with this solution?
- What came of the Trump campaign's complaint to the FEC that Harris 'stole' (or at least illegally received) Biden's funding?
- When deleting attribute from GDB file all the fields in the remaining attributes get deleted as well in QGIS
- Why do so many great Tzaddikim die so young?
- Was using an older version of a legal card from a nonlegal set ever not legal?
- Use other units than point for arraycolsep
- Paying a parking fine when I don't trust the recipient
- How can I format a partition in a mapper device?
- How did NASA know figure out when and where the Apollo capsule would touch down on the ocean?
- Does any row of Pascal's triangle contain a Pythagorean triple?
- How can I support a closet rod where there's no shelf?
- jq - ip addr show in tabular format
- Navigating career options after a disastrous PhD performance and a disappointed advisor?
- How to prove that the Greek cross tiles the plane?
- NSolve uses all CPU resources
- How do elected politicians get away with not giving straight answers?
- Can anyone ID this bike? NSW, Australia
- Best memory / storage solution for high read / write throughput application(s)?
- How can a microcontroller (such as an Arduino Uno) that requires 7-21V input voltage be powered via USB-B which can only run 5V?
- Remove spaces from the 3rd line onwards in a file on linux
- What is a natural-sounding verb form for the word dorveille?
- Why is so much of likelihood theory focused on solving the score function equation?
- Big Transition of Binary Counting in perspective of IEEE754 floating point
IMAGES
VIDEO
COMMENTS
Note when assigning the struct, the compiler knows at compile time how big the move is going to be, so it can unroll small copies (do a move n-times in row instead of looping) for instance. Note -mno-memcpy: -mmemcpy. -mno-memcpy. Force (do not force) the use of "memcpy()" for non-trivial block moves. The default is -mno-memcpy, which allows ...
o->i = *i; struct outer o; // assign a bunch of fields in o->i... frob(&o.i, o); return 0; If gcc decides to replace that assignment with memcpy, then it's an invalid call because the source and dest overlap. Obviously, if I change the assignment statement in frob to call memmove instead, then the problem goes away.
And don't worry about performance: memcpy () gets inlined by the compiler for small sizes and does generate a single MOV instruction when it's possible (e.g. copying 4 or 8 bytes). For bigger structs, memcpy () and val = *ptr are still identical because, val = *ptr actually emits code just calling memcpy ().
Cats_and_Shit. •. I think you could say that the assignment operator is parametrically polymorphic, while memcpy uses subtyping. Parametric polymorphism is when an operation is defined abstractly such that works on any type (or any type with certain qualities); for assignment in C that is everything except arrays.
Recommend use straight assignment '=' instead of memcpy. If structure has pointer or array member, please consider the pointer alias problem, it will lead dangling pointer once incorrect use. Better way is implement structure assignment function in C, and overload the operator= function in C++. Reference:
The first parameter of memcpy is declared as void * restrict, which means both &buff and buff are converted to the same type before the call. Conclusion. In summary, the two memcpy lines work the same because the array buff automatically converts to a pointer to its first element, and the unary & operator gives the address of the whole array ...
To know the difference we must start by clarifying concepts. In C++ everything has a type, including text literals. Thus, the literal "juan"has a type that is a constant 1 const char[5] formation of five characters. It's five characters times the four letters plus the string termination character and it's constant because it's a literal.
Best Answer. You should never expect them outperform assignments. The reason is, the compiler will use memcpy anyway when it thinks it would be faster (if you use optimize flags). If not and if the structure is reasonable small that it fits into registers, direct register manipulation could be used which wouldn't require any memory access at all.
Simply assign one object to another with the assignment operator. Structure assignments have been supported since C90. Of course if you only want to assign some members of a struct object to another, then you'll have to do it manually. Using memcpy is not usually necessary, since the language directly supports copying structures.
What is memcpy() memcpy() is a standard function used in the C programming language to copy blocks of memory from one place to another. Its prototype is defined in the string.h header file as follows:. void *memcpy(void *dest, const void *src, size_t n); The memcpy() function copies the contents of a source buffer to a destination buffer, starting from the memory location pointed to by src ...
Should memcpy and the assignment operator (=) be interchangeable when reading/writing bytes to a void* or char* on the heap in c/c++? I hope this question isn't to specific or inappropriately placed but I'm not sure where to ask and haven't been able to find any answers via Google. Also, sorry in advance but I can't share my code due to my ...
return 0; when i use the default assignment operator of the Test struct, the execution time is around 6600 milliseconds, but when i overload the assignment operator use memcpy with the size of the block to copy being known at compile time, the same program executes at around 3800 milliseconds.
Return value. dest [] Notestd::memcpy may be used to implicitly create objects in the destination buffer.. std::memcpy is meant to be the fastest library routine for memory-to-memory copy. It is usually more efficient than std::strcpy, which must scan the data it copies or std::memmove, which must take precautions to handle overlapping inputs.. Several C++ compilers transform suitable memory ...
One difference is that with memcpy the operands are not allowed to overlap, and the compiler knows that (__builtin_memcpy). With the first function the compiler itself has to prove that p doesn't point to one of the char members of x .
On the contrary,i think the outputs after "===" are right.I can completly understand it.But i can't undertand the outputs before "===".I would reedit the code to avoid the misunderstanding.I think it should call the user defined constructor EntityId_t(int id) and movement assignment operator,
Yes of course, though memory allocated with new is taken from the "free store" it is still just in the application memory address space. memset and memcpy simply take address parameters for a source and destination, and these parameters can technically be to any address in your application's address space.
The implicitly-defined copy assignment operator for a non-union class X performs memberwise copy assignment of its subobjects. The direct base classes of X are assigned first, in the order of their declaration in the base-specifier-list, and then the immediate non-static data members of X are assigned, in the order in which they were declared ...
following the c++ 11 standard i suppose that g++ implicitly uses a member-wise copy whenever an assignment between two objects is done. Surprisingly I noticed that contrary to the standard, g++ seems to invoke a memcpy sized at the actual size of the object. Here below a snippet of what I meant.
std::copy () can ignore overlapping memory, too. If you want to support overlapping memory, you have to write the logic yourself to call std::reverse_copy () in the appropriate situations. There is an opposite argument that can be made: when going through memcpy interface it loses the alignment information.