Serial faster than SPI using SD card lbrary

Hey I have written some code that connects via I2C to a motion sensor (MPU 9250). The I2C frequency is max. (400kHz). I am using an Atmega 32u4 (Adafruit). I also adapted my sketch to speed up the SD card with

Data = SD.open(filename,  O_CREAT | O_WRITE);

I then get 49 samples per second, which is pretty bad. I admit that I am doing some floating point arithmetic, but is that really the bottleneck here? Is it maybe the I2C?

Here is the loop function with the SD:

if (counter != numSamples) {
    //digitalWrite(13, HIGH);
    readMPU9250Data(MPU9250Data); // INT cleared on any read
    //   readAccelData(accelCount);  // Read the x/y/z adc values

    // Now we'll calculate the accleration value into actual g's
    ax = (float)MPU9250Data[0] * aRes - accelBias[0]; // get actual g value, this depends on scale being set
    ay = (float)MPU9250Data[1] * aRes - accelBias[1];
    az = (float)MPU9250Data[2] * aRes - accelBias[2];

    //   readGyroData(gyroCount);  // Read the x/y/z adc values

    // Calculate the gyro value into actual degrees per second
    gx = (float)MPU9250Data[4] * gRes; // get actual gyro value, this depends on scale being set
    gy = (float)MPU9250Data[5] * gRes;
    gz = (float)MPU9250Data[6] * gRes;

    readMagData(magCount);  // Read the x/y/z adc values

    // Calculate the magnetometer values in milliGauss
    // Include factory calibration per data sheet and user environmental corrections
    //if (newMagData == true) {
    //newMagData = false; // reset newMagData flag
    mx = (float)magCount[0] * mRes * magCalibration[0] - magBias[0]; // get actual magnetometer value, this depends on scale being set
    my = (float)magCount[1] * mRes * magCalibration[1] - magBias[1];
    mz = (float)magCount[2] * mRes * magCalibration[2] - magBias[2];
    if (USE_CALIBRATION) {
      mx *= magScale[0];
      my *= magScale[1];
      mz *= magScale[2];
    }
    //}

    //}
    //For MATLAB
    //ax, ay, az, gx * PI / 180.0f, gy * PI / 180.0f, gz * PI / 180.0f,  my,  mx, mz
      swimData.print(ax, 5);
      swimData.print(" ");
      swimData.print(ay, 5);
      swimData.print(" ");
      swimData.print(az, 5);
      swimData.print(" ");
      swimData.print(gx, 5);
      swimData.print(" ");
      swimData.print(gy, 5);
      swimData.print(" ");
      swimData.print(gz, 5);
      swimData.print(" ");
      swimData.print(my / 1000.0f, 5); //In G
      swimData.print(" ");
      swimData.print(mx / 1000.0f, 5);
      swimData.print(" ");
      swimData.print(mz / 1000.0f, 5);
      swimData.print(" ");
      swimData.println(millis() / 1000.0f, 5);
    counter++;
  }
  else {
    swimData.close();
    digitalWrite(13, HIGH);
  }

If I compare it to the one, which prints it to serial:

    //digitalWrite(13, HIGH);
    readMPU9250Data(MPU9250Data); // INT cleared on any read
    //   readAccelData(accelCount);  // Read the x/y/z adc values

    // Now we'll calculate the accleration value into actual g's
    ax = (float)MPU9250Data[0] * aRes - accelBias[0]; // get actual g value, this depends on scale being set
    ay = (float)MPU9250Data[1] * aRes - accelBias[1];
    az = (float)MPU9250Data[2] * aRes - accelBias[2];

    //   readGyroData(gyroCount);  // Read the x/y/z adc values

    // Calculate the gyro value into actual degrees per second
    gx = (float)MPU9250Data[4] * gRes; // get actual gyro value, this depends on scale being set
    gy = (float)MPU9250Data[5] * gRes;
    gz = (float)MPU9250Data[6] * gRes;

    readMagData(magCount);  // Read the x/y/z adc values

    // Calculate the magnetometer values in milliGauss
    // Include factory calibration per data sheet and user environmental corrections
    //if (newMagData == true) {
    //newMagData = false; // reset newMagData flag
    mx = (float)magCount[0] * mRes * magCalibration[0] - magBias[0]; // get actual magnetometer value, this depends on scale being set
    my = (float)magCount[1] * mRes * magCalibration[1] - magBias[1];
    mz = (float)magCount[2] * mRes * magCalibration[2] - magBias[2];
    if (USE_CALIBRATION) {
      mx *= magScale[0];
      my *= magScale[1];
      mz *= magScale[2];
    }
    //}

    //}
    //For MATLAB
    //ax, ay, az, gx * PI / 180.0f, gy * PI / 180.0f, gz * PI / 180.0f,  my,  mx, mz
    Serial.print(ax, 5);
    Serial.print(" ");
    Serial.print(ay, 5);
    Serial.print(" ");
    Serial.print(az, 5);
    Serial.print(" ");
    Serial.print(gx, 5);
    Serial.print(" ");
    Serial.print(gy, 5);
    Serial.print(" ");
    Serial.print(gz, 5);
    Serial.print(" ");
    Serial.print(my / 1000.0f, 5); //In G
    Serial.print(" ");
    Serial.print(mx / 1000.0f, 5);
    Serial.print(" ");
    Serial.print(mz / 1000.0f, 5);
    Serial.print(" ");
    Serial.println(millis() / 1000.0f, 5);
}

I don't have that one comparison, but that is just one/two line/s in assembly so that overhead can be ignored.
Does anyone have a suggestion, why serial runs faster here? I am using a baud rate of 115200.
Also doe anyone have a suggestion on how to speed up things, other than outsourcing the floating point operations?

Thanks in advance!

If you choose to write raw data, each record will be 40 bytes. I guess this will double speed.
Each 256bytes (ca.6 records) there is a "time windows" due to page write. (which is 'slow').

It IS possible to obtain faster speeds; SDfat-lib / a buffer to hold logging data while writing happens