I want to add a JSON file to my Dataflow (Apache Beam) package and use it inside the code.
I've seen several questions on Stack Overflow with different answers, and I tried the recommended approach with a MANIFEST.in
and adding data_files
to the setup.py
file. But nothing I tried works for me.
Here is my current setup.
(I have mapping.json
in both the common folder and the root folder for testing purposes.)
recursive-include common *.json
import setuptools
setuptools.setup(
packages=setuptools.find_packages(),
data_files=[
("common", ["mapping.json"])
],
include_package_data=True,
install_requires=[
'apache-beam[gcp]==2.31.0',
'python-dateutil==2.8.1'
],
)
import json
from pathlib import Path
def _load_category_theme_mapping(file_name):
path = Path(__file__).parent / file_name
with path.open('r', encoding='utf-8') as file:
return json.load(file)
mapping = _load_category_theme_mapping("mapping.json")
I'm using Flex Templates to run my Dataflow job and I copy the common
folder to the target common
folder.
When I run the Dataflow job with this setup, it just throws an error.
FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/lib/python3.7/site-packages/common/category_theme_mapping.json'
I tried moving the .json
file outside of the common
folder (into the root folder) and changed the code (and the Dockerfile) accordingly to read from the base folder.
Then I changed the setup.py
file to have the data_files
to (".", ["mapping.json"]
and MANIFEST.in
to have include *.json
, but it still fails.
I also tried without having a MANIFEST.in
, but then the launcher fails without any informative log.
Any idea what I'm doing wrong?
