I'm currently attempting to get the best sending performance for an 802.11 frame, I am using libpcap but I wondered if I could speed it up using raw sockets (or any other possible method).
Consider this simple example code for libpcap with a device handle already created previously:
char ourPacket[60][50] = { {0x01, 0x02, ... , 0x50}, ... , {0x01, 0x02, ... , 0x50} };
for( ; ; )
    {
        for(int i; i = 0; i < 60; ++i)
        {
            pcap_sendpacket(deviceHandle, ourPacket[i], 50);
        }
    }
This code segment is done on a thread for each separate CPU core. Is there any faster way to do this for raw 802.11 frame/packets containing Radiotap headers that are stored in an array?
Looking at pcap's source code for pcap_inject (the same function but different return value), it doesn't seem to be using raw sockets to send packets? No clue.
I don't care about capturing performance, as a lot of the other questions have answered that. Is raw sockets even for sending layer 2 packets/frames?
As Gill Hamilton mentioned, the answer will depend on a lot of things. If you see super gains on one system, you may not see them on another, even if both are "running Linux". You're better off testing the code yourself. That being said, here is some info from what my team has found:
Note 1: all the gains were for code that did not just write frames/packets to sockets, but analyzed them and processed them, so it is likely that much or most of our gains were there rather than the read/write.
We are writing a direct raw socket implementation to send/receive Ethernet frames and IP packets. We're seeing about a 250%-450% performance gain on our measliest R&D system which is a MIPS 24K 5V system on a chip with a MT7530 integrated Ethernet NIC/Switch which can barely handle sustained 80 Mbit. On a very modest but much beefier test system with an Intel Celeron J1900 and I211 Gigabit controllers it drops to about 50%-100% vs c libpcap. In fact, we only saw about 80%-150% vs. Python dpkt/scapy implementation. We only saw maybe about a 20% gain on a generic i5 Linux dual-gigabit system vs. a c libpcap implementation. So based on our non-rigorous testing, we saw up to a 20x difference in performance gains of the code depending on the system.
Note 2: All of these gains were when using maximum optimizations and strictest compile parameters during compiling of the custom c code, but not necessarily for the c libpcap code (making all warnings errors on some of the above systems make the libpcap code not compile, and who wants to debug that?), so the differences may be less dramatic. We need to squeeze out every last ounce of performance to enable some sophisticated packet processing using no more than 5.0V and 1.5A, so we'll ultimately be going with a custom ASIC which may be FPGA. That being said, it's A LOT of work to get it working without bugs and we're likely going to be implementing significant portions of the Ethernet/IP/TCP/UPD stack, so I don't recommend it.
Last note: The CPU usage on the MIPS 24K system was about 1/10 for the custom code, but again, I would say that that vast majority of that gain was from the processing.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With