NOTICE: This software (or technical data) was produced for the U.S. Government under contract, and is subject to the Rights in Data-General Clause 52.227-14, Alt. IV (DEC 2007). Copyright 2023 The MITRE Corporation. All Rights Reserved.
Object Storage Overview
By default, OpenMPF will write markup files, JSON output objects, and extracted artifacts to directories in
$MPF_HOME/share. For multi-node deployments, $MPF_HOME/share points to a directory on a network share.
Most often, the share is managed by the Network File System (NFS) protocol, although using NFS is not a requirement.
Alternatively, OpenMPF supports writing these files to an object storage server. That may be desirable in cloud deployments to better support integration between systems, and/or to consolidate file storage as a cost-saving measure.
When a file cannot be uploaded to the server, the Workflow Manager will fall back to storing it in $MPF_HOME/share.
If and when a failure occurs, the JSON output object will contain a descriptive message in the jobWarnings field.
If the job completes without other issues, the final status will be COMPLETE_WITH_WARNINGS.
Common Object Storage Properties
The following system properties are common to the various types of object storage solutions that OpenMPF supports:
http.object.storage.upload.retry.count- The number of times OpenMPF will attempt to upload an object to the storage server after the first failed attempt.
- When using S3, the AWS SDK's default retry strategy is used.
- When using NGINX, exponential back off is used between retry attempts. There is a 500ms delay before the first retry. The delay doubles for each subsequent retry.
S3 Object Storage
OpenMPF supports downloading media and uploading results to an S3 compatible server such as Ceph or Minio. The use of S3 is controlled through the following job properties and system properties:
S3_ACCESS_KEYjob property ors3.access.keysystem property- The access key that will be used when downloading and uploading to S3.
- When provided with
S3_SECRET_KEY, media will be downloaded with S3 authentication unlessS3_UPLOAD_ONLYis true.
S3_SECRET_KEYjob property ors3.secret.keysystem property- The secret key that will be used when downloading and uploading to S3.
S3_SESSION_TOKENjob property ors3.session.tokensystem property- Only required when the S3 bucket is configured to require a session key. This generally occurs when multi-factor authentication is required.
- OpenMPF does not handle generating the session key.
S3_USE_VIRTUAL_HOSTjob property ors3.use.virtual.hostsystem property- When false or not provided, path-style requests will be used.
- When true, virtual hosted-style access will be used.
- When true,
S3_HOSTmust also be provided. - The CNAME configuration described here is not supported.
S3_HOSTjob property ors3.hostsystem property- The host of the S3 server without the bucket name.
- If
S3_RESULTS_BUCKET=https://bucket.s3.amazonaws.com,S3_HOSTshould be set tos3.amazonaws.com - Only used when
S3_USE_VIRTUAL_HOST=true.
S3_RESULTS_BUCKETjob property ors3.results.bucketsystem property- URI to bucket where result objects should be stored.
- To disable the upload of result objects, do not provide a value for this property.
- Example when
S3_USE_VIRTUAL_HOST=false:https://s3host/results_bucket - Example when
S3_USE_VIRTUAL_HOST=true:https://results_buckets.s3host
S3_UPLOAD_ONLYjob property ors3.upload.onlysystem property- When true, media will not be downloaded using S3 authentication.
If
S3_RESULTS_BUCKETis set, S3 authentication will be used to upload result objects. - When false or not provided, S3 authentication will be used to download remote media.
S3 authentication will also be used to upload result objects if
S3_RESULTS_BUCKETis set. - If you want to run a job where some media is in S3 and some is hosted elsewhere,
you can set
S3_UPLOAD_ONLYtotrueas a media specific property on the media that is hosted elsewhere.
- When true, media will not be downloaded using S3 authentication.
If
S3_REGIONjob property ors3.regionsystem property- The S3 region
to use when accessing S3. For example:
us-east-1 - Some S3 compatible servers like Minio ignore this.
- The S3 region
to use when accessing S3. For example:
S3_UPLOAD_OBJECT_KEY_PREFIXjob property ors3.upload.object.key.prefixsystem property- Specifies a prefix to prepend to object keys when uploading to S3.
Custom NGINX HTTP Object Storage
OpenMPF supports a custom NGINX object storage server solution. If you're interested, please contact us. We can make the server-side code available upon request.
For those who choose to run their own custom NGINX object storage server, please configure OpenMPF by setting
the http.object.storage.nginx.service.uri property to the URI of the NGINX server.
The following system properties are unique to the custom NGINX object storage solution:
http.object.storage.nginx.service.uri- Enables use of NGINX when provided.
- The URI to the custom NGINX object storage server. For example:
https://somehost:123543/somepath. - You must provide a valid value.
http.object.storage.nginx.upload.thread.count- The number of threads used to upload objects to the storage server.
- In general, the default value is sufficient.
http.object.storage.nginx.upload.segment.size- The chunk size, in bytes, that is used to upload objects to the storage server.
- In general, the default value is sufficient.
The NGINX object storage server will determine the sha256 hash for the file once it's been uploaded. It then uses that hash to name the file and returns the file URI to OpenMPF.