Import XML into SQL database

I'm working with a 20 gig XML file that I would like to import into a SQL database (preferably MySQL, since that is what I am familiar with). This seems like it would be a common task, but after Googling around a bit I haven't been able to figure out how to do it. What is the best way to do this?

I know this ability is built into MySQL 6.0, but that is not an option right now because it is an alpha development release.

Also, if I have to do any scripting I would prefer to use Python because that's what I am most familiar with.

Thanks.


Asked by: Dominik356 | Posted: 30-11-2021






Answer 1

You can use the getiterator() function to iterate over the XML file without parsing the whole thing at once. You can do this with ElementTree, which is included in the standard library, or with lxml.

for record in root.getiterator('record'):
    add_element_to_database(record) # Depends on your database interface.
                                    # I recommend SQLAlchemy.

Answered by: Maddie868 | Posted: 01-01-2022



Answer 2

Take a look at the iterparse() function from ElementTree or cElementTree (I guess cElementTree would be best if you can use it)

This piece describes more or less what you need to do: http://effbot.org/zone/element-iterparse.htm#incremental-parsing

This will probably be the most efficient way to do it in Python. Make sure not to forget to call .clear() on the appropriate elements (you really don't want to build an in memory tree of a 20gig xml file: the .getiterator() method described in another answer is slightly simpler, but does require the whole tree first - I assume that the poster actually had iterparse() in mind as well)

Answered by: Wilson393 | Posted: 01-01-2022



Answer 3

I've done this several times with Python, but never with such a big XML file. ElementTree is an excellent XML library for Python that would be of assistance. If it was possible, I would divide the XML up into smaller files to make it easier to load into memory and parse.

Answered by: Lana638 | Posted: 01-01-2022



Answer 4

It may be a common task, but maybe 20GB isn't as common with MySQL as it is with SQL Server.

I've done this using SQL Server Integration Services and a bit of custom code. Whether you need either of those depends on what you need to do with 20GB of XML in a database. Is it going to be a single column of a single row of a table? One row per child element?

SQL Server has an XML datatype if you simply want to store the XML as XML. This type allows you to do queries using XQuery, allows you to create XML indexes over the XML, and allows the XML column to be "strongly-typed" by referring it to a set of XML schemas, which you store in the database.

Answered by: Daisy853 | Posted: 01-01-2022



Answer 5

The MySQL documentation does not seem to indicate that XML import is restricted to version 6. It apparently works with 5, too.

Answered by: Sam837 | Posted: 01-01-2022



Similar questions

python - How do I import csv files into a ms sql database?

I'm really out of my league here, but I hear that it can be done. I have a PBX phone server with raw phone data in thousands of different CSV files that are captured every so often (between 8 minutes and 2 hours) that are stored in dated folders. The only way I can connect to the server is through WinSCP, which just gives me the file structure (looks like Filezilla FTP). So 2 things: H...


python - Import and export sqlite database in iOS

I'm trying to synchronize two sqlite core data databases and putting them back on device (2 iPads). For that I'm writing a script in Python in order to: 1) copy/export 2 "store.data" databases corresponding to Core Data DB from each device. They derive and use the same model. 2) creating a new database after merging them --> I wrote a Python script that is working (with union sqlite3 function). I hav...


python - Import Error When Adding a Database to a Flask App

I'm following this tutorial and when running $ nosetests in Step 2, I'm getting the following import errors: EE ====================================================================== ERROR: Failure: ImportError (No module named flask_sqlalchemy) --------------------------------------------------...


python - Import .json to Database

I successfully downloaded tweets into a json file. Now I try to import it in a database with this function: def import_json(fi): logging.warning("Loading tweets from json file {0}".format(fi)) for line in open(fi, "rb"): data = json.loads(line.decode('utf-8')) database.create_tweet_from_dict(data) the json-file "keywords_BVBS04.json" lays in a folder called data which is ...


python - How to import or upload data into database model in Django using CSV file?

I created a wold map based web app, when user clicks on a particular country it return information related to a particular country: The model running behind the app is: Models.py from __future__ import unicode_literals from django.db import models class Country(models.Model): name = models.CharField(max_length=200) def __unicode__(self): return self.name class CountryDe...


python - What is the best way to import a small sized MySQL Database to Django?

Schematics Image I have created a simple 5 table SQL Database for my family company. It consists of tables: Companies, CompanyDetails, Contacts, Continents, Products. I am focusing on making a frontend interface for displaying the Companies using drop-down menus for Type, Category, Country and/or Continent. Since, I already ...


python - How to import CSV file data into my database using pandas and pyodbc?

import the package import pandas as pd import pyodbc building the connection conn = pyodbc.connect("Driver={ODBC Driver 17 for SQL Server}; Server=servername; Database=databasename; Trusted_Connection=Yes") cur = conn.cursor() save the SQL code that build the table string = "CREATE TABLE TESTDATA(Log int PRIMARY KEY, User varchar(15))" tab...


python - Import csv to mysql database

Here is my code to import my csv to mysql database: import csv import mysql.connector cnx = mysql.connector.connect(user='root', password='', host='localhost', database='jeremy_db') file = open('C:\\Users\\trendMICRO\\Desktop\\OJT\\test.csv', 'rb') # open the file in read binary mode csv_data = csv.reader(file) for row in csv_dat...


python - CSV import to database

I am getting the error 'sqlite3.ProgrammingError: Incorrect number of bindings supplied. The current statement uses 4, and there are 1 supplied.' The below code should be making a database and creating a table with the the titles listed below. Then take values from a csv. file and add it under the allotted headings. Any help would be would be appreciated! import const import sqlite3 SEP = ',' DATA_FI...


mysql - Python import the database from the .gz file

I work with a tool where I get the database dump for the MySQL and the Postgresql in the .gz compressed file. This is the file I have exported earlier seamlessly (using python). However, when I need to test for the importing, this doesn't work. The tool work inside the docker container. My python code is provided, def cmd_schema_import(self, args=None): self.require_configured(w...


python - Import database dump file to excel

I'm wondering if it is possible for a (database) dump file to be imported to excel. Eventually, I'll be analyzing the data with Python Jupyter Notebook. Also, can it also be imported onto Jupyter Notebook? How can I open the dump file with an excel to display the information of a table like below? Below is an example lets say from Information Table


python - I am getting import errors when trying to import my database into my own module

I have a module called crop.py in my flask application. Everything works great until I want to import database parts from models.py. I'm guessing it has something to do with initializing the module into the app and then trying to import models into it, but I'm not sure. Here is the error: flask.cli.NoAppException: While importing 'test', an ImportError was raised. Here is my basic app construction. -app model...


python - How to import a .sql file into DuckDB database?

I'm exploring DuckDB for one of my project. Here I have a sample Database file downloaded from https://www.wiley.com/en-us/SQL+for+Data+Scientists%3A+A+Beginner%27s+Guide+for+Building+Datasets+for+Analysis-p-9781119669364 I'm trying to import


python - Import images from Excell sheet and save them in our local database in django

Closed. This question needs to be more focused. It ...


python - Sometimes can't delete an Oracle database row using Django

I have a unit test which contains the following line of code Site.objects.get(name="UnitTest").delete() and this has worked just fine until now. However, that statement is currently hanging. It'll sit there forever trying to execute the delete. If I just say print Site.objects.get(name="UnitTest") then it works, so I know that it can retrieve the site. ...


python - How do I test a django database schema?

I want to write tests that can show whether or not the database is in sync with my models.py file. Actually I have already written them, only to find out that django creates a new database each time the tests are run based on the models.py file. Is there any way I can make the models.py test use the existing database schema? The one that's in mysql/postgresql, and not the one that's in /myapp/models.py ?


python - Using user input to find information in a Mysql database

I need to design a program using python that will ask the user for a barcode. Then, using this barcode, it will search a mysql to find its corresponding product. I am a bit stuck on how to get started. Does anyone have any tips for me?


python - Using 'old' database with django

I'm using a hand built (Postgres) database with Django. With "inspectdb" I was able to automatically create a model for it. The problem is that some tables have multiple primary keys (for many-to-many relations) and they are not accessible via Django. What's the best way to access these tables?


database - python orm

This is a newbie theory question - I'm just starting to use Python and looking into Django and orm. Question: If I develop my objects and through additional development modify the base object structures, inheritance, etc. - would Django's ORM solution modify the database automatically OR do I need to perform a conversion (if the app is live)? So, I start with a basic Phone app Object person: name, address, city, s...


python - Large Sqlite database search

How is it possible to implement an efficient large Sqlite db search (more than 90000 entries)? I'm using Python and SQLObject ORM: import re ... def search1(): cr = re.compile(ur'foo') for item in Item.select(): if cr.search(item.name) or cr.search(item.skim): print item.name This function runs in more than 30 seconds. How sh...


abap - Query SAP database from Python?

Closed. This question needs details or clarity. It ...


A database for python 3?

I'm coding a small piece of server software for the personal use of several users. Not hundreds, not thousands, but perhaps 3-10 at a time. Since it's a threaded server, SQLite doesn't work. It complains about threads like this: ProgrammingError: SQLite objects created in a thread can only be used in that same thread.The object was created in thread id 140735085562848 and this is thread id 43...


python - URLs stored in database for Django site

I've produced a few Django sites but up until now I have been mapping individual views and URLs in urls.py. Now I've tried to create a small custom CMS but I'm having trouble with the URLs. I have a database table (SQLite3) which contains code for the pages like a column for header, one for right menu, one for content.... so on, so on. I also have a column for the URL. How do I get Django to call the information in...


python - Using Django database layer outside of Django?

I've got a nice database I've created in Django, and I'd like to interface with through some python scripts outside of my website stuff, so I'm curious if it's possible to use the Django database API outside of a Django site, and if so does anyone have any info on how it can be done? Google hasn't yielded many hits for this.


database - Problem opening berkeley db in python

I have problems opening a berkeley db in python using bdtables. As bdtables is used by the library I am using to access the database, I need it to work. The problem seems to be that the db environment I am trying to open (I got a copy of the database to open), is version 4.4 while libdb is version 4.6. I get the following error using bsddb.dbtables.bsdTableDB([dbname],[folder]): (-30972, "DB_VERSION...


database - Store simple user settings in Python

I am programming a website in which users will have a number of settings, such as their choice of colour scheme, etc. I'm happy to store these as plain text files, and security is not an issue. The way I currently see it is: there is a dictionary, where all the keys are users and the values are dictionaries with the users' settings in them. For example, userdb["bob"]["colour_scheme"] would have the value "b...


python - Sometimes can't delete an Oracle database row using Django

I have a unit test which contains the following line of code Site.objects.get(name="UnitTest").delete() and this has worked just fine until now. However, that statement is currently hanging. It'll sit there forever trying to execute the delete. If I just say print Site.objects.get(name="UnitTest") then it works, so I know that it can retrieve the site. ...


python - How do I test a django database schema?

I want to write tests that can show whether or not the database is in sync with my models.py file. Actually I have already written them, only to find out that django creates a new database each time the tests are run based on the models.py file. Is there any way I can make the models.py test use the existing database schema? The one that's in mysql/postgresql, and not the one that's in /myapp/models.py ?


python - Using user input to find information in a Mysql database

I need to design a program using python that will ask the user for a barcode. Then, using this barcode, it will search a mysql to find its corresponding product. I am a bit stuck on how to get started. Does anyone have any tips for me?


python - Given an rpm package name, query the yum database for updates

I was imagining a 3-line Python script to do this but the yum Python API is impenetrable. Is this even possible? Is writing a wrapper for 'yum list package-name' the only way to do this?


python - Using 'old' database with django

I'm using a hand built (Postgres) database with Django. With "inspectdb" I was able to automatically create a model for it. The problem is that some tables have multiple primary keys (for many-to-many relations) and they are not accessible via Django. What's the best way to access these tables?


python - How do I notify a process of an SQLite database change done in a different process?

Let's say I have two or more processes dealing with an SQLite database - a "player" process and many "editor" processes. The "player" process reads the database and updates a view - in my case it would be a waveform being mixed to the soundcard depending on events stored in the database. An "editor" process is any editor for that database: it changes the database constantly. Now I want the player t...


database - python orm

This is a newbie theory question - I'm just starting to use Python and looking into Django and orm. Question: If I develop my objects and through additional development modify the base object structures, inheritance, etc. - would Django's ORM solution modify the database automatically OR do I need to perform a conversion (if the app is live)? So, I start with a basic Phone app Object person: name, address, city, s...


python - Large Sqlite database search

How is it possible to implement an efficient large Sqlite db search (more than 90000 entries)? I'm using Python and SQLObject ORM: import re ... def search1(): cr = re.compile(ur'foo') for item in Item.select(): if cr.search(item.name) or cr.search(item.skim): print item.name This function runs in more than 30 seconds. How sh...






Still can't find your answer? Check out these communities...



PySlackers | Full Stack Python | NHS Python | Pythonist Cafe | Hacker Earth | Discord Python



top