Over the past several years, I keep stumbling upon deployment systems and such concepts as "sharding" which use as their raison d'être the ability to scale across an arbitrary number of cheap, "commodity" (usually 1U) servers.
The implication is that "larger" servers either have a higher price per performance or are somehow more difficult to administer. I reject both suppositions.
The day of "big iron" is well past us. This isn't to say one can't still buy large machines, or even run Linux on an IBM z-series, but for most practical intents, there are only two classes of Linux server hardware.
The larger class is based on the quad-processor Xeon 7xxx series motherboards. These machine are, I admit, less bang for the buck, if ones "bang" is fungible processor power and/or memory.
Everything else, however, has either linear or even sub-linear pricing.
Let's lo0k at the current pricing from Dell, whom I find to be the cheapest of the brand-name vendors:
CPU (cores@clock) model slots $price
X3430 (firstname.lastname@example.org) R310 4mem $1257
E5620 (email@example.com) R410 8mem $1319
E5620 (firstname.lastname@example.org) R510 8mem $1418
E5620 (email@example.com) R610 12mem $1762
E5620 (firstname.lastname@example.org) T610 12mem $1537
E5620 (email@example.com) R710 18mem $1712
E5620 (firstname.lastname@example.org) T710 18mem $1498
1*E6510 (email@example.com) R810 16mem $3821
2*E7520 (firstname.lastname@example.org) R810 32mem $5531
2*E7520 (email@example.com) R910 32mem $5790
4*E7520 (firstname.lastname@example.org) R910 64mem $8855
These are all configured with rack rails with cable arms and as little memory as possible, assuming one would buy commodity memory. What's notable is that the "small" machines with 4 and 8 memory slots are under 10% cheaper than the next ones up and that the 18-slot models are cheaper than the 12-slotters.
If one is memory-bound, the best deal for the money is the 5U-tall T710. If you're fortunate enough to be in a facility with plenty of power but not plenty of space, then the 2U-tall R710 makes sense for the extra 15%. Either way, assembling that many memory slots out of the smaller 1Us is going to be more expensive, more space and power consuming, and will yield less usable memory, since each box has some common OS overhead.
What I also find notable is that the higher-end servers, though over twice as expensive for the cheapest model, are still cheaper and smaller for the memory slots than enough 1Us. Even over the 2Us, the price premium is under 50% for the base system, and likely a good deal less once the memory itself is included.
Since memory density increases with Moore's law, if you have 3% monthly growth or less and you comfortably fit into one of the $1500 servers, there's no need to worry about "sharding" due to memory. Similarly, if you're at 10% monthly growth (doubling every year), you have 2 years to grow into the then-current larger machines, assuming that number of memory slots per same cost server  doesn't increase.
For a startup, 2 years is a lot of engineering time that could be spent on actually driving the growth rather than focusing on how to handle it if it happens to appear.
For now, pricing of CPU "horsepower" across the different servers is left as an exercise to the reader who enjoys comparing benchmarks.
 The virtualization proponents seem to go both ways on this, the other way being the subdivision of larger servers into several smaller, virtual machines.
 Linux on x86 is the only one that counts these days, right?
 Often the case with modern languages such as Java and Python. The practice of using memcached or other in-memory databases similarly leads to memory scarcity.
 That is, without paying a huge premium for the highest density memory, which premium often only exists for a short period of time.
 Or, rather, per processor, unless we go back to a serial connection technology like FB-DIMM.