MarkTechPost@AI 08月25日
GluonTS多模型工作流指南
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本教程提供一个实用的GluonTS指南,涵盖了复杂合成数据集的生成、预处理以及并行应用多个模型。重点在于如何在同一流程中处理多种估计器,优雅地管理缺失依赖,并仍能生成可用结果。通过集成评估和可视化步骤,我们构建了一个能够无缝训练、比较和解释模型的流程。教程还包括了模型性能的详细评估,如MASE、sMAPE和加权分位数损失,并提供了高级可视化来对比不同模型的预测、残差和不确定性区间。即使在后端不可用时,也能通过合成数据演示整个流程。

📊 **数据生成与准备**: 教程详细演示了如何使用`ComplexSeasonalTimeSeries`等工具创建包含趋势、季节性和噪声的合成多变量时间序列数据集。通过`PandasDataset`和`split`函数,数据被有效地转换为GluonTS可用的格式,并划分为训练集和测试集,为模型训练和评估奠定了基础。

⚙️ **多模型集成与训练**: 文章展示了如何灵活地集成和训练多种GluonTS估计器,包括PyTorch和MXNet的DeepAR以及FeedForward模型。通过条件导入和错误处理,即使某些后端不可用,也能确保流程的健壮性。每个模型都经过训练,并生成概率性预测,为后续的比较分析做准备。

📈 **模型评估与可视化**: 教程强调了使用`Evaluator`进行模型性能评估的重要性,通过MASE、sMAPE和加权分位数损失等指标来量化模型的准确性。此外,还提供了高级可视化功能,如多模型预测对比图、预测值与实际值散点图、残差分布图以及模型性能对比条形图,帮助用户直观地理解和比较不同模型的表现。

🛠️ **健壮的开发流程**: 整个工作流设计得非常健壮,能够处理各种环境配置(如PyTorch/MXNet可用性)和潜在的训练/评估错误。通过模块化的代码和清晰的步骤,用户可以轻松地复制、修改和扩展此流程,将其应用于实际数据集,从而加速时间序列预测的研究和应用。

In this tutorial, we explore GluonTS from a practical perspective, where we generate complex synthetic datasets, prepare them, and apply multiple models in parallel. We focus on how to work with diverse estimators in the same pipeline, handle missing dependencies gracefully, and still produce usable results. By building in evaluation and visualization steps, we create a workflow that highlights how models can be trained, compared, and interpreted in a single, seamless process. Check out the FULL CODES here. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks.

import numpy as npimport pandas as pdimport matplotlib.pyplot as pltfrom datetime import datetime, timedeltaimport warningswarnings.filterwarnings('ignore')from gluonts.dataset.pandas import PandasDatasetfrom gluonts.dataset.split import splitfrom gluonts.evaluation import make_evaluation_predictions, Evaluatorfrom gluonts.dataset.artificial import ComplexSeasonalTimeSeriestry:   from gluonts.torch import DeepAREstimator   TORCH_AVAILABLE = Trueexcept ImportError:   TORCH_AVAILABLE = Falsetry:   from gluonts.mx import DeepAREstimator as MXDeepAREstimator   from gluonts.mx import SimpleFeedForwardEstimator   MX_AVAILABLE = Trueexcept ImportError:   MX_AVAILABLE = False

We begin by importing the core libraries for data handling, visualization, and GluonTS utilities. We also set up conditional imports for PyTorch and MXNet estimators, allowing us to flexibly use whichever backend is available in our environment. Check out the FULL CODES here. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks.

def create_synthetic_dataset(num_series=50, length=365, prediction_length=30):   """Generate synthetic multi-variate time series with trends, seasonality, and noise"""   np.random.seed(42)   series_list = []     for i in range(num_series):       trend = np.cumsum(np.random.normal(0.1 + i*0.01, 0.1, length))             daily_season = 10 * np.sin(2 * np.pi * np.arange(length) / 7)        yearly_season = 20 * np.sin(2 * np.pi * np.arange(length) / 365.25)              noise = np.random.normal(0, 5, length)       values = np.maximum(trend + daily_season + yearly_season + noise + 100, 1)             dates = pd.date_range(start='2020-01-01', periods=length, freq='D')             series_list.append(pd.Series(values, index=dates, name=f'series_{i}'))     return pd.concat(series_list, axis=1)

We create a synthetic dataset where each series combines trend, seasonality, and noise. We design it so every run produces consistent results, and we return a clean multi-series DataFrame ready for experimentation. Check out the FULL CODES here. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks.

print(" Creating synthetic multi-series dataset...")df = create_synthetic_dataset(num_series=10, length=200, prediction_length=30)dataset = PandasDataset(df, target=df.columns.tolist())training_data, test_gen = split(dataset, offset=-60)test_data = test_gen.generate_instances(prediction_length=30, windows=2)print(" Initializing forecasting models...")models = {}if TORCH_AVAILABLE:   try:       models['DeepAR_Torch'] = DeepAREstimator(           freq='D',           prediction_length=30       )       print(" PyTorch DeepAR loaded")   except Exception as e:       print(f" PyTorch DeepAR failed to load: {e}")if MX_AVAILABLE:   try:       models['DeepAR_MX'] = MXDeepAREstimator(           freq='D',           prediction_length=30,           trainer=dict(epochs=5)       )       print(" MXNet DeepAR loaded")   except Exception as e:       print(f" MXNet DeepAR failed to load: {e}")     try:       models['FeedForward'] = SimpleFeedForwardEstimator(           freq='D',           prediction_length=30,           trainer=dict(epochs=5)       )       print(" FeedForward model loaded")   except Exception as e:       print(f" FeedForward failed to load: {e}")if not models:   print(" Using artificial dataset with built-in models...")   artificial_ds = ComplexSeasonalTimeSeries(       num_series=10,       prediction_length=30,       freq='D',       length_low=150,       length_high=200   ).generate()     training_data, test_gen = split(artificial_ds, offset=-60)   test_data = test_gen.generate_instances(prediction_length=30, windows=2)

We generate a 10-series dataset, wrap it into a GluonTS PandasDataset, and split it into training and test windows. We then initialize multiple estimators (PyTorch DeepAR, MXNet DeepAR, and FeedForward) when available, and fall back to a built-in artificial dataset if no backends load. Check out the FULL CODES here. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks.

trained_models = {}all_forecasts = {}if models:   for name, estimator in models.items():       print(f" Training {name} model...")       try:           predictor = estimator.train(training_data)           trained_models[name] = predictor                     forecasts = list(predictor.predict(test_data.input))           all_forecasts[name] = forecasts           print(f" {name} training completed!")                 except Exception as e:           print(f" {name} training failed: {e}")           continueprint(" Evaluating model performance...")evaluator = Evaluator(quantiles=[0.1, 0.5, 0.9])evaluation_results = {}for name, forecasts in all_forecasts.items():   if forecasts:        try:           agg_metrics, item_metrics = evaluator(test_data.label, forecasts)           evaluation_results[name] = agg_metrics           print(f"\n{name} Performance:")           print(f"  MASE: {agg_metrics['MASE']:.4f}")           print(f"  sMAPE: {agg_metrics['sMAPE']:.4f}")           print(f"  Mean wQuantileLoss: {agg_metrics['mean_wQuantileLoss']:.4f}")       except Exception as e:           print(f" Evaluation failed for {name}: {e}")

We train each available estimator, collect probabilistic forecasts, and store the fitted predictors for reuse. We then evaluate results with MASE, sMAPE, and weighted quantile loss, giving us a consistent, comparative view of model performance. Check out the FULL CODES here. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks.

def plot_advanced_forecasts(test_data, forecasts_dict, series_idx=0):   """Advanced plotting with multiple models and uncertainty bands"""   fig, axes = plt.subplots(2, 2, figsize=(15, 10))   fig.suptitle('Advanced GluonTS Forecasting Results', fontsize=16, fontweight='bold')     if not forecasts_dict:       fig.text(0.5, 0.5, 'No successful forecasts to display',               ha='center', va='center', fontsize=20)       return fig     if series_idx < len(test_data.label):       ts_label = test_data.label[series_idx]       ts_input = test_data.input[series_idx]['target']             colors = ['blue', 'red', 'green', 'purple', 'orange']             ax1 = axes[0, 0]       ax1.plot(range(len(ts_input)), ts_input, 'k-', label='Historical', alpha=0.8, linewidth=2)       ax1.plot(range(len(ts_input), len(ts_input) + len(ts_label)),               ts_label, 'k--', label='True Future', alpha=0.8, linewidth=2)             for i, (name, forecasts) in enumerate(forecasts_dict.items()):           if series_idx < len(forecasts):               forecast = forecasts[series_idx]               forecast_range = range(len(ts_input), len(ts_input) + len(forecast.mean))                             color = colors[i % len(colors)]               ax1.plot(forecast_range, forecast.mean,                       color=color, label=f'{name} Mean', linewidth=2)                             try:                   ax1.fill_between(forecast_range,                                  forecast.quantile(0.1), forecast.quantile(0.9),                                  alpha=0.2, color=color, label=f'{name} 80% CI')               except:                   pass              ax1.set_title('Multi-Model Forecasts Comparison', fontsize=12, fontweight='bold')       ax1.legend()       ax1.grid(True, alpha=0.3)       ax1.set_xlabel('Time Steps')       ax1.set_ylabel('Value')             ax2 = axes[0, 1]       if all_forecasts:           first_model = list(all_forecasts.keys())[0]           if series_idx < len(all_forecasts[first_model]):               forecast = all_forecasts[first_model][series_idx]               ax2.scatter(ts_label, forecast.mean, alpha=0.7, s=60)                             min_val = min(min(ts_label), min(forecast.mean))               max_val = max(max(ts_label), max(forecast.mean))               ax2.plot([min_val, max_val], [min_val, max_val], 'r--', alpha=0.8)                             ax2.set_title(f'Prediction vs Actual - {first_model}', fontsize=12, fontweight='bold')               ax2.set_xlabel('Actual Values')               ax2.set_ylabel('Predicted Values')               ax2.grid(True, alpha=0.3)             ax3 = axes[1, 0]       if all_forecasts:           first_model = list(all_forecasts.keys())[0]           if series_idx < len(all_forecasts[first_model]):               forecast = all_forecasts[first_model][series_idx]               residuals = ts_label - forecast.mean               ax3.hist(residuals, bins=15, alpha=0.7, color='skyblue', edgecolor='black')               ax3.axvline(x=0, color='r', linestyle='--', linewidth=2)               ax3.set_title(f'Residuals Distribution - {first_model}', fontsize=12, fontweight='bold')               ax3.set_xlabel('Residuals')               ax3.set_ylabel('Frequency')               ax3.grid(True, alpha=0.3)             ax4 = axes[1, 1]       if evaluation_results:           metrics = ['MASE', 'sMAPE']            model_names = list(evaluation_results.keys())           x = np.arange(len(metrics))           width = 0.35                     for i, model_name in enumerate(model_names):               values = [evaluation_results[model_name].get(metric, 0) for metric in metrics]               ax4.bar(x + i*width, values, width,                      label=model_name, color=colors[i % len(colors)], alpha=0.8)                     ax4.set_title('Model Performance Comparison', fontsize=12, fontweight='bold')           ax4.set_xlabel('Metrics')           ax4.set_ylabel('Value')           ax4.set_xticks(x + width/2 if len(model_names) > 1 else x)           ax4.set_xticklabels(metrics)           ax4.legend()           ax4.grid(True, alpha=0.3)       else:           ax4.text(0.5, 0.5, 'No evaluation\nresults available',                   ha='center', va='center', transform=ax4.transAxes, fontsize=14)     plt.tight_layout()   return figif all_forecasts and test_data.label:   print(" Creating advanced visualizations...")   fig = plot_advanced_forecasts(test_data, all_forecasts, series_idx=0)   plt.show()     print(f"\n Tutorial completed successfully!")   print(f" Trained {len(trained_models)} model(s) on {len(df.columns) if 'df' in locals() else 10} time series")   print(f" Prediction length: 30 days")     if evaluation_results:       best_model = min(evaluation_results.items(), key=lambda x: x[1]['MASE'])       print(f" Best performing model: {best_model[0]} (MASE: {best_model[1]['MASE']:.4f})")     print(f"\n Environment Status:")   print(f"  PyTorch Support: {'' if TORCH_AVAILABLE else ''}")   print(f"  MXNet Support: {'' if MX_AVAILABLE else ''}")  else:   print("  Creating demonstration plot with synthetic data...")     fig, ax = plt.subplots(1, 1, figsize=(12, 6))     dates = pd.date_range('2020-01-01', periods=100, freq='D')   ts = 100 + np.cumsum(np.random.normal(0, 2, 100)) + 20 * np.sin(np.arange(100) * 2 * np.pi / 30)     ax.plot(dates[:70], ts[:70], 'b-', label='Historical Data', linewidth=2)   ax.plot(dates[70:], ts[70:], 'r--', label='Future (Example)', linewidth=2)   ax.fill_between(dates[70:], ts[70:] - 5, ts[70:] + 5, alpha=0.3, color='red')     ax.set_title('GluonTS Probabilistic Forecasting Example', fontsize=14, fontweight='bold')   ax.set_xlabel('Date')   ax.set_ylabel('Value')   ax.legend()   ax.grid(True, alpha=0.3)     plt.tight_layout()   plt.show()     print("\n Tutorial demonstrates advanced GluonTS concepts:")   print("  • Multi-series dataset generation")   print("  • Probabilistic forecasting")   print("  • Model evaluation and comparison")   print("  • Advanced visualization techniques")   print("  • Robust error handling")

We train each available model, generate probabilistic forecasts, and evaluate them with consistent metrics before visualizing comparisons, residuals, and uncertainty bands. If no models are available, we still demonstrate the workflow with a synthetic example so we can inspect plots and key concepts end to end.

In conclusion, we put together a robust setup that balances data creation, model experimentation, and performance analysis. Instead of relying on a single configuration, we see how to adapt flexibly, test multiple options, and visualize results in ways that make comparison intuitive. This gives us a stronger foundation for experimenting with GluonTS and applying the same principles to real datasets, while keeping the process modular and easy to extend.


Check out the FULL CODES here. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.

The post A Coding Guide to Build Flexible Multi-Model Workflows in GluonTS with Synthetic Data, Evaluation, and Advanced Visualizations appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

GluonTS 时间序列预测 机器学习 Python 数据科学 AI Time Series Forecasting Machine Learning Data Science Synthetic Data Model Evaluation Visualization
相关文章