yaml配合hydra,omegaconf
使用环境变量
name: &name "Citrinet-512-8x-Stride"
model:
sample_rate: &sample_rate 16000
train_ds:
manifest_filepath: "${oc.env:train_data_dir}/train_manifest.json"
查看系统环境变量
在命令行输入以下命令
printenv
查看PATH环境变量
printenv PATH
查看sehll变量
set
命令行新增shell变量
TEST_VAR='Hello World!'
查看刚刚新增的变量
set | grep TEST_VAR
可以查看环境变量中是否有这个变量
printenv | grep TEST_VAR
notebook 中加载环境变量
%dotenv -v
print(os.environ.get('train_data_dir'))
或者指定一个env文件
%dotenv /ntt/path/to/.env
python dotenv加载的环境变量,是不能在shell中读取到的
shell 中设置环境变量,创建env.sh 输入
export qqqq=123123
命令行输入source env.sh shell脚本中的路径拼接echo $train_data_dir"/train_manifest.json"
以指定环境变量运行python 脚本
train_data_dir=/ntt/aldata python app.py
hydra 配置框架用法详解
1
创建config.yaml
model:
sample_rate: &sample_rate 16000
train_ds:
manifest_filepath: ???
sample_rate: 16000
batch_size: 32
trim_silence: false
max_duration: 16.7
shuffle: true
is_tarred: false
tarred_audio_filepaths: null
use_start_end_token: false
defaults:
- _self_
- foo: bar
同级目录下创建test.py
from omegaconf import DictConfig, OmegaConf
import hydra
from nemo.core.config import hydra_runner
@hydra_runner(config_path='.',config_name="config")
def my_app(cfg: DictConfig) -> None:
print(OmegaConf.to_yaml(cfg))
if __name__ == "__main__":
my_app()
命令行执行
python3 test.py model.train_ds.sample_rate=$sample_rate model.train_ds.manifest_filepath=/home/hydra/train_manifest.json +model.val_ds.sample_rate=44444 ++model.train_ds.batch_size=545 ~foo
解释+ 新增(yaml文件中不存在),~ 删除(yaml文件中存在的某个键值对),++ (yaml文件中)存在则覆盖,不存在则新增
2
from hydra.utils import get_original_cwd, to_absolute_path
@hydra.main()
def my_app(_cfg: DictConfig) -> None:
print(f"Current working directory : {os.getcwd()}")
print(f"Orig working directory : {get_original_cwd()}")
print(f"to_absolute_path('foo') : {to_absolute_path('foo')}")
print(f"to_absolute_path('/foo') : {to_absolute_path('/foo')}")
3
命令行中 hydra.verbose=true 可以输出DEBUG级别的日志信息, 相似的还有hydra/job_logging=disabled 关闭log输出
import logging
from omegaconf import DictConfig
import hydra
log = logging.getLogger(__name__)
@hydra.main()
def my_app(_cfg: DictConfig) -> None:
log.info("Info level message")
log.debug("Debug level message")
if __name__ == "__main__":
my_app()
python3 test.py hydra.verbose=true
python my_app.py hydra.verbose=[__main__,hydra]
4
--cfg 可选的有job,hydra,all
python my_app.py --cfg job
类似有 您可以使用 --package 或 -p 来显示配置的子集 --info 标志可以提供有关 Hydra 和您的应用程序的各个方面的信息
--info all --info config --info defaults --info defaults-tree --info plugins
5 tab 补全
shell有多种,比如bash、zsh、csh、ksh、sh、tcsh等 Zsh与现有bash完全兼容
eval "$(python my_app.py -sc install=bash)"
6 ConfigStore
from dataclasses import dataclass
import hydra
from hydra.core.config_store import ConfigStore
@dataclass
class MySQLConfig:
host: str = "localhost"
port: int = 3306
cs = ConfigStore.instance()
cs.store(name="config", node=MySQLConfig)
@hydra.main(config_path=None, config_name="config")
def my_app(cfg: MySQLConfig) -> None:
if cfg.pork == 80:
print("Is this a webserver?!")
if __name__ == "__main__":
my_app()
7 ConfigStore 组配置
@dataclass
class MySQLConfig:
driver: str = "mysql"
host: str = "localhost"
port: int = 3306
@dataclass
class PostGreSQLConfig:
driver: str = "postgresql"
host: str = "localhost"
port: int = 5432
timeout: int = 10
@dataclass
class Config:
db: Any
cs = ConfigStore.instance()
cs.store(name="config", node=Config)
cs.store(group="db", name="mysql", node=MySQLConfig)
cs.store(group="db", name="postgresql", node=PostGreSQLConfig)
@hydra.main(config_path=None, config_name="config")
def my_app(cfg: Config) -> None:
print(OmegaConf.to_yaml(cfg))
8 配置继承
将 MISSING 分配给字段以指示它没有默认值。 这相当于???
from omegaconf import MISSING
@dataclass
class DBConfig:
host: str = "localhost"
port: int = MISSING
driver: str = MISSING
@dataclass
class MySQLConfig(DBConfig):
driver: str = "mysql"
port: int = 3306
@dataclass
class PostGreSQLConfig(DBConfig):
driver: str = "postgresql"
port: int = 5432
timeout: int = 10
@dataclass
class Config:
db: DBConfig
9 只读的config
@dataclass(frozen=True)
class SerialPort:
baud_rate: int = 19200
data_bits: int = 8
stop_bits: int = 1
cs = ConfigStore.instance()
cs.store(name="config", node=SerialPort)
@hydra.main(config_name="config")
def my_app(cfg: SerialPort) -> None:
print(cfg)
if __name__ == "__main__":
my_app()
环境变量设置
hydra:
job:
env_copy:
- AWS_KEY
env_set:
RANK: ${hydra:job.num}
defaults:
- override hydra/job_logging: custom
配置输出路径
hydra:
run:
dir: ./outputs/${now:%Y-%m-%d}/${now:%H-%M-%S}
11 帮助
python my_app.py --help
python my_app.py --hydra-help
12 插件
带颜色的log
pip install hydra_colorlog --upgrade
defaults:
- override hydra/job_logging: colorlog
- override hydra/hydra_logging: colorlog
其他插件
13 hydra术语
1,Input Configs 2,Config files 3,Structured Config 4,Other configs
Primary Config Output Config
5,Overrides 6,Defaults List 7,Config Group 8,Config Group Option 9,Package 10,Config Search Path 类似PYTHONPATH 11,Plugins
14 hydra 对象实例化
1,在一个my_app.py中创建
class Optimizer:
algo: str
lr: float
def __init__(self, algo: str, lr: float) -> None:
self.algo = algo
self.lr = lr
2,yaml 文件中按如下写_target_
optimizer:
_target_: my_app.Optimizer
algo: SGD
lr: 0.01
3,实例化
opt = instantiate(cfg.optimizer)
print(opt)
opt = instantiate(cfg.optimizer, lr=0.2)
print(opt)
4,递归实例化参考这儿 5,禁用递归_recursive_=False
optimizer = instantiate(cfg.trainer, _recursive_=False)
print(optimizer)
15 组合
from hydra import compose, initialize
from omegaconf import OmegaConf
if __name__ == "__main__":
with initialize(config_path="conf", job_name="test_app"):
cfg = compose(config_name="config", overrides=["db=mysql", "db.user=me"])
print(OmegaConf.to_yaml(cfg))
initialize(config_path="conf", job_name="test_app")
cfg = compose(config_name="config", overrides=["db=mysql", "db.user=me"])
print(OmegaConf.to_yaml(cfg))
16 应用安装示例
$ python examples/advanced/hydra_app_example/hydra_app/main.py
dataset:
name: imagenet
path: /datasets/imagenet
$ pip install examples/advanced/hydra_app_example
...
Successfully installed hydra-app-0.1
$ hydra_app
dataset:
name: imagenet
path: /datasets/imagenet
17 Callbacks
from hydra.experimental.callback import Callback
class MyCallback(Callback):
def __init__(self, bucket: str, file_path: str) -> None:
self.bucket = bucket
self.file_path = file_path
def on_job_end(self, config: DictConfig, **kwargs: Any) -> None:
print(f"Job ended,uploading...")
@hydra.main(config_path="conf", config_name="config")
def my_app(cfg: DictConfig) -> None:
print(OmegaConf.to_yaml(cfg))
if __name__ == "__main__":
my_app()
|