I've got an error merging two dataframes by row. The last version I used pd.concat([df1, df2], axis=0), but in pandas version 2.1.0 doesn't work. Anybody knows how to solve the error?
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[47], line 2
      1 print(real_last.shape, real_exp.shape) #(59202, 34) (4583, 34)
----> 2 real_out = pd.concat([real_exp, real_last], axis=0)
      3 print(real_out.shape)
File c:\Users\sarud\anaconda3\envs\ETLupdate\Lib\site-packages\pandas\core\reshape\concat.py:393, in concat(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy)
    378     copy = False
    380 op = _Concatenator(
    381     objs,
    382     axis=axis,
   (...)
    390     sort=sort,
    391 )
--> 393 return op.get_result()
File c:\Users\sarud\anaconda3\envs\ETLupdate\Lib\site-packages\pandas\core\reshape\concat.py:680, in _Concatenator.get_result(self)
    676             indexers[ax] = obj_labels.get_indexer(new_labels)
    678     mgrs_indexers.append((obj._mgr, indexers))
...
--> 230 return super()._concat_same_type(to_concat, axis=axis)
File arrays.pyx:190, in pandas._libs.arrays.NDArrayBacked._concat_same_type()
ValueError: all the input array dimensions except for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 4583 and the array at index 1 has size 59202
I have the packages:
print(sys.version, pd.__version__, np.__version__, sep='\n')
3.11.5 | packaged by Anaconda, Inc. | (main, Sep 11 2023, 13:26:23) [MSC v.1916 64 bit (AMD64)]
2.1.0
1.26.0
The dataframes has the same structure, check a sample:
print(real_last.sample(2).T.to_markdown())
| 43597 | 9338 | |
|---|---|---|
| Orden | 006710000 | 006781111 | 
| Operacion | 0010 | 0020 | 
| Operacion.text | XXXXXX | YYYYYYY | 
| Cl.orden | NP | NP | 
| Cl.actividad | 030 | 035 | 
| Ubic.tecnica | XXXX-XX-LAS-DES-BAP19 | XXXX-XX-S13-MBA | 
| Status.sistema | CTEC NOTI IMOP KKMP PREC | LIB. IMOP KKMP PREC | 
| Status.sistema.op | NOTI CONT CTEC NLIQ | LIB. NLIQ | 
| Stat.Usuario | TRAT | TRAT | 
| Fe.Entrada | 2023-06-25 00:00:00 | 2023-07-23 00:00:00 | 
| Fe.Lib | 2023-07-06 00:00:00 | 2023-07-23 00:00:00 | 
| Fe.Ini.real.ot | 2023-07-01 00:00:00 | NaT | 
| Fe.Ini.real.op | 2023-07-01 00:00:00 | NaT | 
| Fe.Ini.temp | 2023-07-06 00:00:00 | 2023-07-23 00:00:00 | 
| Aviso | 00120100 | 11194911 | 
| Modif.por | XXXXX005 | XXXXX011 | 
| Fe.Modif | 2023-07-06 00:00:00 | 2023-07-23 00:00:00 | 
| Autor | XXXXX003 | XXXXX021 | 
| Grupo.planif | XXT | XX1 | 
| G.hojas.ruta | nan | nan | 
| CGH | nan | nan | 
| Plan.mant.prev | nan | nan | 
| Pos.PM | nan | nan | 
| Pto.tbjo.resp | XXXXXXXX | XXXXXXXX | 
| Pto.tbjo.op | XXXXXXXX | XXXXXXXX | 
| Cantidad | 1 | 0 | 
| Duracion.normal | 1.0 | 0.0 | 
| Trabajo | 1.0 | 0.0 | 
| Trabajo.real | 1.0 | 0.0 | 
| Costos tot.reales | 147.03 | 0.0 | 
| Sum.costo.plan | 147.03 | 479.96 | 
| Tot.plan.general | 147.03 | 479.96 | 
| Total.real.general | 147.03 | 0.0 | 
| Costo.dist | 0.0 | 0.0 | 
print(real_exp.sample(2).T.to_markdown())
| 926 | 990 | |
|---|---|---|
| Orden | 222212222 | 333323333 | 
| Operacion | 0120 | 0040 | 
| Operacion.text | XXXXXXXXXX | YYYYYYYYY | 
| Cl.orden | PL | PL | 
| Cl.actividad | 010 | 010 | 
| Ubic.tecnica | XXXX-XX-S07-ALI-CTR7B | XXXX-XX-SCA-AL2-AOG1C | 
| Status.sistema | CTEC NOTI IMPR FMAT IMOP MOVM NLIQ PREC* | LIB. NOTI IMPR DOCU IMOP KKMP NLIQ PREC* | 
| Status.sistema.op | NOTI CTEC IMPR NLIQ | NOTI CONT IMPR LIB. NLIQ PLAN | 
| Stat.Usuario | TBTR | TRAT | 
| Fe.Entrada | 2023-08-02 00:00:00 | 2023-08-02 00:00:00 | 
| Fe.Lib | 2023-08-23 00:00:00 | 2023-08-21 00:00:00 | 
| Fe.Ini.real.ot | 2023-09-04 00:00:00 | 2023-09-05 00:00:00 | 
| Fe.Ini.real.op | 2023-09-05 00:00:00 | 2023-09-06 00:00:00 | 
| Fe.Ini.temp | 2023-09-07 00:00:00 | 2023-09-04 00:00:00 | 
| Aviso | 33333333 | 44444444 | 
| Modif.por | XXXXX009 | XXXXX003 | 
| Fe.Modif | 2023-09-10 00:00:00 | 2023-09-07 00:00:00 | 
| Autor | XXXXXXXXXXXX | XXXXXXXXXXXX | 
| Grupo.planif | XX0 | XXC | 
| G.hojas.ruta | 1886 | 76326 | 
| CGH | 3 | 3 | 
| Plan.mant.prev | 8763 | 191111 | 
| Pos.PM | 95475 | 357140 | 
| Pto.tbjo.resp | XXXXXXXX | XXXXXXXX | 
| Pto.tbjo.op | XXXXXXXX | XXXXXXXX | 
| Cantidad | 4 | 2 | 
| Duracion.normal | 4.0 | 1.0 | 
| Trabajo | 16.0 | 2.0 | 
| Trabajo.real | 16.0 | 0.5 | 
| Costos tot.reales | 1627.5 | 0.04 | 
| Sum.costo.plan | 2336.45 | 0.09 | 
| Tot.plan.general | 2336.45 | 0.09 | 
| Total.real.general | 1627.5 | 0.04 | 
| Costo.dist | nan | nan | 
I can't trigger the ValueError with the given examples but, since your dataframes hold datetimes values, this could be maybe due to a dtypes and/or resolution mismatch like in this Q/A. You can also check GH55067 that discusses a similar issue.
Try this :
real_out = pd.concat([real_exp, real_last.astype(real_exp.dtypes)], axis=0)Output :
print(real_out)
           Orden Operacion  ... Total.real.general Costo.dist
926    222212222      0120  ...             1627.5        NaN
990    333323333      0040  ...               0.04        NaN
43597  006710000      0010  ...             147.03        0.0
9338   006781111      0020  ...                0.0        0.0
[4 rows x 34 columns]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With