Mamaba + Latent Space Models
Learning about these non-traditional transformers only further ingrained my belief that there must be a better way to reach actual intellegnt systems then just contually pushing transformers around.
It seems to me that we are machines that have learned how to model the world and then act on that model. This allows us to act multimodally and gives us the ability to reason in distinct tasks. We can say, ‘there is a bear’ and run away from the bear because we have ingrained inside of us the ability to create, inside ourselves a world of our perception. When a bear (or another threat) enters this world we immediate realkize it and, without having to be told by an outside force, try to distance ourselves from the threat. This is reiforcemnt learning on a world model.
I’m imagining AGI as a ‘perception model’. I’m sure there is a better name for it but this is what I’m calling it untill I know that name. This is a model that can interact with the world in many different ways and each time they recieve data input it gets ‘embedded’ (or something analagious happens) into the same space. Then the model uses this perception and turns it to an understanding then transforms the understanding to action (a transformation of the perception to action is uncousios behavior when then understanding to action can be called consious behaviour).
We each have a world model that we create in our minds about what the environment is. We can think of constructing this using the 2 minds by building a space to embed multimodal sensory input of the same dimension (if we only have one modality this step trivializes - but with two or more we can hopefully help it to develp a true understanding of the world). Then we can use manipulations to convert it from perception to understanding (we can use embedded thought for this - humans seem to embed our thoughts to, but just in natural language). With something simple like ARC, we can build a data/text system and augment it with a vision system if we want multimodality, we can develop a an emdedded thought program to most effectively do a series of manipulations to the system, we can use the single examples as labels and guesses and, as mentioned in a previous post we can allow for strong in testing changes to the functionality (at least for the model on the edge that makes the final descision- the hand model) the models could be seperated by…
There is also the two brain (thinking fast and thinking slow) that we can use to build something that has a model of the world informed by your senses and augmented by memory and then processed by some understanding mechinaism that combines thoughts together to create not only chains of thought in embedding space but also unique thought manipulations that are common ways to arrange thoughts (you can think of these as final convolutional filters if you wish that can be used to discriminate ways to manipulate a single thought to form another then these can be trained on certain tasks so that an AI can solve them using the right combo of these discriminators). First, we need a trusted world model, then we need a manipulation model (think about the understanding model we described before). Then we need to find a task to train these independent models to work together on.
These are 3 very hard problems but:
- First we can at least think of a easier version of the last one.
- Then we can use our rudementory dataset (single modal) to create a world model in embeddings. That model gets stored rather than manipulated to make predictions and then the predictions made are compared against future input and the future input is given to the model to update it’s world model.
- Next, the framework to manipulate the world model is included into the process. This would manipulate the copy of the world model to augment the preduction rather than allowing the machine to directly predict based on the world model.
- Then we test this on the first task
- Then we find an even harder, multimodal dataset.
- Finally, we test on this dataset.