Then, follow these steps:
Download the files from the following address and put all of them in the same directory: https://datasets.imdbws.com/
Create a database. Use a collation like
Import the data using the
s32cinemagoer.py /path/to/the/tsv.gz/files/ URI
URI is the identifier used to access the SQL database. For example:
s32cinemagoer.py ~/Download/imdb-s3-dataset-2018-02-07/ \ postgresql://user:password@localhost/imdb
Please notice that for some database engines (like MySQL and MariaDB) you may need to specify the charset on the URI and sometimes also the dialect, with something like ‘mysql+mysqldb://username:password@localhost/imdb?charset=utf8’
Once the import is finished - which should take about an hour or less on a modern system - you will have a SQL database with all the information and you can use the normal Cinemagoer API:
from imdb import Cinemagoer ia = Cinemagoer('s3', 'postgresql://user:password@localhost/imdb') results = ia.search_movie('the matrix') for result in results: print(result.movieID, result) matrix = results ia.update(matrix) print(matrix.keys())
Running the script again will drop the current tables and import the data again.
Installing the tqdm package, a progress bar is shown while the database is populated and the –verbose argument is used.
|||Until the end of 2017, IMDb used to distribute a more comprehensive subset of its data in a different format. Cinemagoer can also import that data but note that the data is not being updated anymore. For more information, see Old data files.|