Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python source distribution (sdist) - generated data files

During the build of my package I am generating data files.

I would like to create source distribution (setup.py sdist), such as if they was originally in the source tree, BUT, I don't want to generate them in the source tree but on someplace else (preferably build/generated) to not clutter my source (and accidentally commit it).

For example, in the end I want to have data.txt under dist_root/generated/data.txt ("dist_root" is where setup.py resides).

I used the data_files setuptools (not the package_data as this data is not of package) and encountered the following problems:

  1. If I generate data.txt under build, it is pruned as the process is filtering any file under build_base.
  2. If I generate it under some temp folder say dist_root/temp/data.txt, this "temp" folder is being chained.

so if I put data_files = [('generated, temp/data.txt)], I will get in the distribution a chain path dist_root/generated/temp/data.txt

Seems like my only choice is to generate it under dist_root/generated/data.txt but then, again, I'm cluttering my source tree and can not know how to clean it as this "generated" folder name is dynamic.

Any workarounds?

like image 946
Lior Cohen Avatar asked Jan 16 '26 19:01

Lior Cohen


1 Answers

Preferred solution: write files to source dir, remove them after sdist finishes

You can override the sdist command to write files in the source dir and clean them up after the command finishes:

import os
from distutils import dir_util

from setuptools import setup
from setuptools.command.sdist import sdist as sdist_orig


class sdist(sdist_orig):

    def run(self):
        # generate data files
        genbase = os.path.join(os.path.dirname(__file__), 'temp')
        self.mkpath(genbase)
        with open(os.path.join(genbase, 'data.txt'), 'w') as fp:
            fp.write('hello distutils world')
        # run original sdist
        super().run()
        # clean up generated data files
        dir_util.remove_tree(genbase, dry_run=self.dry_run)


setup(
    ...
    data_files=[
        ('generated', ['temp/data.txt']),
    ],
    cmdclass={'sdist': sdist},
)

Generating data files without writing them to source dir

Adapting source metadata in the sdist temp

Although dirty enough, the least hacky way possible would be updating the source metadata directly in the sdist dir. This way, you will still have a valid egg metadata and don't have to deal with missing source files on the whole sdist way.

genfiles = ['temp/data.txt']


class sdist(sdist_orig):

    def make_release_tree(self, base_dir, files):
        super().make_release_tree(base_dir, files)
        for path in genfiles:
            fullpath = os.path.join(base_dir, path)
            self.mkpath(os.path.dirname(fullpath))
            if not self.dry_run:
                with open(fullpath, 'w') as fp:
                fp.write('hello distutils world')
            # also adapt source metadata file
            cmd_egg_info = self.get_finalized_command('egg_info')
            sourcemeta = os.path.join(base_dir, 
                                      cmd_egg_info.egg_name + '.egg-info', 
                                      'SOURCES.txt')
            with open(sourcemeta, 'a') as fp:
                fp.write('\n')
                fp.write(path)


setup(
    ...,
    data_files=[
        ('generated', genfiles),
    ],
    cmdclass={'sdist': sdist},
)

Basically, the generated data files are simply ignored until the actual copying of source files happens. Then, the files are generated (as a substitution to copying existing files), and, since the source metadata would be otherwise incomplete, it is updated with the generated files.

Write non-existing files in metadata

I would strongly recommend against doing that.

All the other approaches would be much more dirty as they would write non-existent files to metadata and make distutils/setuptools to ignore the non-existing files on the whole way of generating source distribution. But if you insist, here's a solution with least possible monkeypatching:

genfiles = ['temp/data.txt']


class FileList(setuptools.command.egg_info.FileList):

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.files += genfiles

    def _safe_path(self, path):
        return path in genfiles or super()._safe_path(path)


class sdist(sdist_orig):

    def run(self):
        # monkeypatch begin
        FileListOrig = setuptools.command.egg_info.FileList
        setuptools.command.egg_info.FileList = FileList
        # monkeypatch end
        super().run()
        # restore the original class
        setuptools.command.egg_info.FileList = FileListOrig

    def make_release_tree(self, base_dir, files):
        super().make_release_tree(base_dir, files)
        for path in genfiles:
            fullpath = os.path.join(base_dir, path)
            self.mkpath(os.path.dirname(fullpath))
            if not self.dry_run:
                with open(fullpath, 'w') as fp:
                    fp.write('hello distutils world')


setup(
    ...,
    data_files=[
        ('generated', genfiles),
    ],
    cmdclass={'sdist': sdist},
)
like image 180
hoefling Avatar answered Jan 19 '26 08:01

hoefling



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!