Is there any cross-room communication? Can you spawn a process per room? Scaling limited at 25% CPU on a 4 vcpu node strongly suggests a locked section limiting you to effectively single threaded performance. Multiple processes serving rooms should bypass that if you can't find it otherwise, but maybe there's something wrong in your load balancing etc.
Personally, I'd rather run with fewer layers, because then you don't have to debug the layers when you have perf issues. Do matchmaking wherever with whatever layers, and let your room servers run in the host os, no containers. But nobody likes my ideas. :P
Edit to add: your network load is tiny. This is almost certainly something with your software, or how you've setup your layers. Unless those vCPUs are ancient, you should be able to push a whole lot more packets.
There is no cross-room communication. I could spawn a process per room but I was trying to address this issue with my current Docker setup where I have multiple `game` containers that run a single node.js process and each process can host multiple rooms.
Not having to use Docker sounds simpler but it's that's where I'm at atm haha.
I agree that the network load feels very small. Maybe it's a socket.io related issue where when many broadcasts are being fired at once, then a shared I/O step gets bottlenecked?
Here's my actual typing broadcast code, I was originally broadcasting from the socket event callback itself but I found performance improved slightly by batching broadcasts per player in a setInterval loop (also note that only 1 player in a given room can be typing at once, so batching broadcasts per room shouldn't address the bottleneck).
/**
* Used to handle very frequent typing events more gracefully to avoid overloading CPU
*/
const TypingUsersMap = new Map<
ConnectionId,
{
socketId: string | null; // doesn't exist for bots
roomId: PublicRoomId;
userId: UserId;
currentInput: string;
}
>();
type ConnectionId = `${UserId}:${PublicRoomId}`;
// ! this should be same as client throttle interval
const TYPING_BROADCAST_INTERVAL = 200;
export let typingBroadcastInterval: NodeJS.Timeout | undefined = undefined;
export const startTypingBroadcastJob = () => {
typingBroadcastInterval = setInterval(() => {
const freshTypingUsersMap = new Map(TypingUsersMap);
TypingUsersMap.clear();
if (freshTypingUsersMap.size === 0) return; // Nothing to do
// Go through each user that has a pending update
for (const [_connectionId, data] of freshTypingUsersMap.entries()) {
const socket = data.socketId
? io.sockets.sockets.get(data.socketId)
: undefined;
// Use the data we stored to perform the broadcast
if (socket) {
// emit to other players
socket
.to(data.roomId)
.volatile.emit(
SOCKET_EVENT_NAMES.USER_TYPING_RES,
data.userId,
data.currentInput
);
} else {
// bots emit to everyone
io.to(data.roomId).volatile.emit(
SOCKET_EVENT_NAMES.USER_TYPING_RES,
data.userId,
data.currentInput
);
}
}
}, TYPING_BROADCAST_INTERVAL);
};
export const stopTypingBroadcastJob = () => {
if (typingBroadcastInterval) {
clearInterval(typingBroadcastInterval);
typingBroadcastInterval = undefined;
}
};
// this is called from the USER_TYPING socket event callback. so effectively every throttled keystroke by the user gets queued.
export const queueTypingEvent = ({
socketId,
roomId,
userId,
currentInput,
}: {
socketId: string | null;
roomId: PublicRoomId;
userId: UserId;
currentInput: string;
}) => {
const connectionId: ConnectionId = `${userId}:${roomId}`;
TypingUsersMap.set(connectionId, {
socketId,
roomId,
userId,
currentInput,
});
};
> This suggests to me that the bottleneck isn’t CPU or app logic, but something deeper in the stack
Just a word of caution - I have seen plenty of people speed towards eg "it must be a bug in the kernel" when 98% of the time it is the app or some config.
Try buffering the outgoing keystrokes to each client. Then, someone typing "hello world" in a server of 50 people will use 50 syscalls instead of 550 syscalls.
Think Nagle's algorithm.
I could increase this interval, but I'd like to keep it as short as I can afford to to keep that realtime feel (i.e. other players can see what the current turn player is typing).
Have you verified that is the case?
net.core.wmem_max = 16777216
net.core.rmem_max = 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
Perhaps the reality for low latency multiplayer games is to embrace horizontal scaling and not vertically scaling? Not sure.Computation can sometimes scale well vertically but proprietary OS’s are more likely to be tuned for it…as a premium feature.