tensortrade.environments.trading_environment module¶
-
class
tensortrade.environments.trading_environment.TradingEnvironment(portfolio, action_scheme, reward_scheme, feed=None, window_size=1, use_internal=True, **kwargs)[source]¶ Bases:
gym.core.Env,tensortrade.base.core.TimeIndexedA trading environments made for use with Gym-compatible reinforcement learning algorithms.
-
__init__(portfolio, action_scheme, reward_scheme, feed=None, window_size=1, use_internal=True, **kwargs)[source]¶ Parameters: - portfolio (
Union[Portfolio,str]) – The Portfolio of wallets used to submit and execute orders from. - action_scheme (
Union[ActionScheme,str]) – The component for transforming an action into an Order at each timestep. - reward_scheme (
Union[RewardScheme,str]) – The component for determining the reward at each timestep. - feed (optional) – The pipeline of features to pass the observations through.
- kwargs (optional) – Additional arguments for tuning the environments, logging, etc.
- portfolio (
-
action_scheme¶ The component for transforming an action into an Order at each time step.
Return type: ActionScheme
-
agent_id= None¶
-
compile()[source]¶ Sets the observation space and the action space of the environment. Creates the internal feed and sets initialization for different components.
-
episode_id= None¶
-
episode_trades¶ A dictionary of trades made this episode, organized by order id.
Return type: Dict[str,ForwardRef]
-
reset()[source]¶ Resets the state of the environments and returns an initial observation.
Return type: arrayReturns: The episode’s initial observation.
-
reward_scheme¶ The component for determining the reward at each time step.
Return type: RewardScheme
-
step(action)[source]¶ Run one timestep within the environments based on the specified action.
Parameters: action ( int) – The trade action provided by the agent for this timestep.Return type: Tuple[array,float,bool,dict]Returns: observation (pandas.DataFrame) – Provided by the environments’s exchange, often OHLCV or tick trade history data points. reward (float): An size corresponding to the benefit earned by the action taken this timestep. done (bool): If True, the environments is complete and should be restarted. info (dict): Any auxiliary, diagnostic, or debugging information to output.
-