I want to add a JSON file to my Dataflow (Apache Beam) package and use it inside the code.
I've seen several questions on Stack Overflow with different answers, and I tried the recommended approach with a
MANIFEST.in and adding
data_files to the
setup.py file. But nothing I tried works for me.
Here is my current setup.
mapping.json in both the common folder and the root folder for testing purposes.)
recursive-include common *.json
import setuptools setuptools.setup( packages=setuptools.find_packages(), data_files=[ ("common", ["mapping.json"]) ], include_package_data=True, install_requires=[ 'apache-beam[gcp]==2.31.0', 'python-dateutil==2.8.1' ], )
import json from pathlib import Path def _load_category_theme_mapping(file_name): path = Path(__file__).parent / file_name with path.open('r', encoding='utf-8') as file: return json.load(file) mapping = _load_category_theme_mapping("mapping.json")
I'm using Flex Templates to run my Dataflow job and I copy the
common folder to the target
When I run the Dataflow job with this setup, it just throws an error.
FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/lib/python3.7/site-packages/common/category_theme_mapping.json'
I tried moving the
.json file outside of the
common folder (into the root folder) and changed the code (and the Dockerfile) accordingly to read from the base folder.
Then I changed the
setup.py file to have the
(".", ["mapping.json"] and
MANIFEST.in to have
include *.json, but it still fails.
I also tried without having a
MANIFEST.in, but then the launcher fails without any informative log.
Any idea what I'm doing wrong?