We prefer running models locally. You are in full control of your data and energy required to run everything. Large Language models are trained on many billions of parameters, mostly textual, with training data dividend into chunks called tokens (3 words contains approx. 4 tokens), then relationship between tokens are established in a very, very large multidimensional matrix space, and finally all that knowledge is accumulated and compressed into several gigabytes of data. Retrieving the data requires resource and time consuming mathematical computations. Response you get, token by token, is just one most probable direction in pretrained space. As such, it very depends on the prompt that you ask at the beginning, so be as detailed as possible for best results. Key take however is that model behavior and response is pre-trained, although for its complexity it resemblance inteligent behavior in response to your requests.