I need to convert pandas DataFrame object to a series of SQL statements that reproduce the object.
For example, suppose I have a DataFrame object:
>>> df = pd.DataFrame({'manufacturer': ['Audi', 'Volkswagen', 'BMW'],
'model': ['A3', 'Touareg', 'X5']})
>>> df
manufacturer model
0 Audi A3
1 Volkswagen Touareg
2 BMW X5
I need to convert it to the following SQL representation (not exactly the same):
CREATE TABLE "Auto" (
"index" INTEGER,
"manufacturer" TEXT,
"model" TEXT
);
INSERT INTO Auto (manufacturer, model) VALUES ('Audi', 'A3'), ('Volkswagen', 'Touareg'), ('BMW', 'X5');
Luckily, pandas DataFrame object has to_sql() method which allows dumping the whole DataFrame to a database through SQLAlchemy engine. I decided to use SQLite in-memory database for this:
>>> from sqlalchemy import create_engine
>>> engine = create_engine('sqlite://', echo=False) # Turning echo to True just logs SQL statements, I'd avoid parsing this logs
>>> df.to_sql(name='Auto', con=engine)
I'm stuck at this moment. I can't dump SQLite in-memory database to SQL statements either I can't find sqlalchemy driver that would dump SQL statements into a file instead of executing them.
Is there a way to dump all queries sent to SQLAlchemy engine as SQL statements to a file?
My not elegant solution so far:
>>> from sqlalchemy import MetaData
>>> meta = MetaData()
>>> meta.reflect(bind=engine)
>>> print(pd.io.sql.get_schema(df, name='Auto') + ';')
CREATE TABLE "Auto" (
"manufacturer" TEXT,
"model" TEXT
);
>>> print('INSERT INTO Auto ({}) VALUES\n{};'.format(', '.join([repr(c) for c in df.columns]), ',\n'.join([str(row[1:]) for row in engine.execute(meta.tables['Auto'].select())])))
INSERT INTO Auto ('manufacturer', 'model') VALUES
('Audi', 'A3'),
('Volkswagen', 'Touareg'),
('BMW', 'X5');
I would actually prefer a solution that does not require building the SQL statements manually.
SQLite actually allows one to dump the whole database to a series of SQL statements with dump command. This functionality is also available in python DB-API interface for SQLite: sqlite3, specifically, through connection object's iterdump() method. As far as I know, SQLAlchemy does not provide this functionality.
Thus, to dump pandas DataFrame to a series of SQL statements one needs to first dump it to in-memory SQLite database, and then dump this database using iterdump() method:
from sqlalchemy import create_engine
engine = create_engine('sqlite://', echo=False)
df.reset_index().to_sql(name=table_name, con=engine) # reset_index() is needed to preserve index column in dumped data
with engine.connect() as conn:
for line in conn.connection.iterdump():
stream.write(line)
stream.write('\n')
engine().connect().connection allows to get raw DBAPI connection.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With